Skip to content

SQL: tuple IN subquery #6326

@philrz

Description

@philrz

SuperDB does not yet support a "tuple IN subquery", e.g., a query that contains WHERE ... IN ... in which the left hand side is a tuple and the right hand side is a subquery.

Details

Repro is with super commit ecdb021.

Here's a working example in Postgres.

$ psql postgres
psql (17.6 (Homebrew))
Type "help" for help.

postgres=# CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    job_title VARCHAR(50),
    office VARCHAR(50)
);
CREATE TABLE

postgres=# INSERT INTO employees (employee_id, name, department_id, job_title, office) VALUES
(1, 'Alice Chen', 10, 'Manager', 'Sales'),
(2, 'Bob Smith', 10, 'Manager', 'Marketing'),
(3, 'Carol Jones', 20, 'Analyst', 'Sales'),
(4, 'David Lee', 15, 'Director', 'Engineering'),
(5, 'Eve Williams', 20, 'Analyst', 'Marketing'),
(6, 'Frank Brown', 30, 'Developer', 'Engineering'),
(7, 'Grace Davis', 25, 'Consultant', 'Marketing');
INSERT 0 7

postgres=# SELECT employee_id, name, department_id, job_title
FROM employees
WHERE (department_id, job_title) IN (
    SELECT department_id, job_title
    FROM employees
    WHERE office = 'Sales'
);
 employee_id |     name     | department_id | job_title 
-------------+--------------+---------------+-----------
           1 | Alice Chen   |            10 | Manager
           2 | Bob Smith    |            10 | Manager
           3 | Carol Jones  |            20 | Analyst
           5 | Eve Williams |            20 | Analyst
(4 rows)

When run in super the same example returns no output.

$ super -version
Version: ecdb02110

$ super -f parquet -o employees.parquet -c "
VALUES
(1, 'Alice Chen', 10, 'Manager', 'Sales'),
(2, 'Bob Smith', 10, 'Manager', 'Marketing'),
(3, 'Carol Jones', 20, 'Analyst', 'Sales'),
(4, 'David Lee', 15, 'Director', 'Engineering'),
(5, 'Eve Williams', 20, 'Analyst', 'Marketing'),
(6, 'Frank Brown', 30, 'Developer', 'Engineering'),
(7, 'Grace Davis', 25, 'Consultant', 'Marketing')
| values {employee_id:c0, name:c1, department_id:c2, job_title:c3, office:c4}" && super -c "
SELECT employee_id, name, department_id, job_title
FROM employees.parquet
WHERE (department_id, job_title) IN (
    SELECT department_id, job_title
    FROM employees.parquet
    WHERE office = 'Sales'
);"

[no output]

@mccanne explained that this is currently happening because the tuple on the left hand side of the IN turns into records of the format {c0: ..., c1: ..., ...} and comparing records for equality currently requires the types to match, so the specific field names returned by the subquery will cause a mismatch.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions