Skip to content

[Bug][Lineage] Collect tables referenced in filter conditions for lineage analysis #7206

@lyne7-sc

Description

@lyne7-sc

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

When a SQL query contains a subquery in the WHERE clause, the table referenced within the subquery are not included in the extracted upstream table lineage.

For example,

insert overwrite v2_catalog.db.tb3
select *
from v2_catalog.db.tb1 t1
where exists (select 1 from v2_catalog.db.tb2 t2 where t2.col1 = t1.col1);

the current result is:

Lineage(
        List("v2_catalog.db.tb1"),
        List("v2_catalog.db.tb3"),
        List(
          ("v2_catalog.db.tb3.col1", Set("v2_catalog.db.tb1.col1")),
          ("v2_catalog.db.tb3.col2", Set("v2_catalog.db.tb1.col2")),
          ("v2_catalog.db.tb3.col3", Set("v2_catalog.db.tb1.col3")))))

the output omits table v2_catalog.db.tb2, which is referenced in the filter condition.

So I propose to add a new a configuration to control whether to collect the tables referenced in filter conditions as lineage input tables

Affects Version(s)

1.11.0

Kyuubi Server Log Output

Kyuubi Engine Log Output

Kyuubi Server Configurations

Kyuubi Engine Configurations

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions