Skip to content

exclude_using_columns might wrongly retain columns from a projected join input #20989

@aalexandrov

Description

@aalexandrov

Describe the bug

Currently, the exclude_using_columns called from expand_wildcard doesn't consider the filtering semantics of semi- and antijoins when expanding wildcards on top of joins defined via USING(<columns>) syntax.

From each set of columns equated by a USING(<column>) expression, the code currently (1) sorts the set entries, and (2) retains only the first entry from each set.

Because of that, the columns surviving the exclude_using_columns call might be wrongly chosen from the filtering side if the table qualifier from that side is sorted before the filtered side qualifier.

To Reproduce

For example, given this schema of two identical tables:

create table s(x1 int, x2 int, x3 int);
create table t(x1 int, x2 int, x3 int);

One would expect that the schema of queries where the s and t names are swapped will be identical. However, currently this is not the case:

-- Q1 schema: x1 int, x2 int, x3 int (because s < t)
select * from s left semi join t using (x1);

-- Q2 schema: x2 int, x3 int (x1 is wrongly excluded because t < s)
select * from t left semi join s using (x1);

Expected behavior

I expect the schemas of Q1 and Q2 to be identical.

Additional context

I have a fix for that, for which I will open a PR shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions