Skip to content

fix(snowflake): use schema-qualified pagination markers for SHOW VIEWS/STREAMS #14108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sgomezvillamor
Copy link
Contributor

Summary

  • Fix pagination markers in get_views_for_database and get_streams_for_database to use schema-qualified names
  • Add comments explaining the need for schema-qualified pagination markers

Problem

Snowflake's SHOW VIEWS FROM and SHOW STREAMS FROM commands use partial name matching, which can cause issues when the same view or stream name exists in multiple schemas within the same database. This leads to:

  • Inconsistent pagination results
  • Missing views/streams during ingestion
  • Ambiguous cursor positioning

Solution

Changed pagination markers from using just the object name to using fully qualified schema_name.object_name format:

  • view_pagination_marker = view_nameview_pagination_marker = f"{schema_name}.{view_name}"
  • stream_pagination_marker = stream_namestream_pagination_marker = f"{schema_name}.{stream_name}"

This aligns with Snowflake's lexicographic ordering (database, schema, object name) and eliminates ambiguity when objects with the same name exist across different schemas.

Test plan

  • Existing unit tests should continue to pass
  • Integration tests with Snowflake environments containing views/streams with duplicate names across schemas should now work reliably

🤖 Generated with Claude Code

Copy link

codecov bot commented Jul 16, 2025

❌ 6 Tests Failed:

Tests completed Failed Passed Skipped
6092 6 6086 71
View the full list of 3 ❄️ flaky tests
tests.lineage.test_lineage_sdk::test_filtered_column_level_lineage

Flake rate in main: 7.27% (Passed 51 times, Failed 4 times)

Stack Traces | 0.019s run time
test_client = <datahub.sdk.main_client.DataHubClient object at 0x7f3fee552c90>
test_datasets = {'downstream1': Dataset('urn:li:dataset:(urn:li:dataPlatform:snowflake,test_lineage_downstream_001,PROD)'), 'downstrea...ream_003,PROD)'), 'upstream': Dataset('urn:li:dataset:(urn:li:dataPlatform:snowflake,test_lineage_upstream_001,PROD)')}

    def test_filtered_column_level_lineage(
        test_client: DataHubClient, test_datasets: Dict[str, Dataset]
    ):
        filtered_column_lineage_results = test_client.lineage.get_lineage(
            source_urn=str(test_datasets["upstream"].urn),
            source_column="id",
            direction="downstream",
            max_hops=3,
            filter=F.and_(F.platform("mysql"), F.entity_type("dataset")),
        )
    
>       assert len(filtered_column_lineage_results) == 1
E       assert 0 == 1
E        +  where 0 = len([])

tests/lineage/test_lineage_sdk.py:185: AssertionError
tests.lineage.test_lineage_sdk::test_column_level_lineage_from_schema_field

Flake rate in main: 7.27% (Passed 51 times, Failed 4 times)

Stack Traces | 0.021s run time
test_client = <datahub.sdk.main_client.DataHubClient object at 0x7f3fee552c90>
test_datasets = {'downstream1': Dataset('urn:li:dataset:(urn:li:dataPlatform:snowflake,test_lineage_downstream_001,PROD)'), 'downstrea...ream_003,PROD)'), 'upstream': Dataset('urn:li:dataset:(urn:li:dataPlatform:snowflake,test_lineage_upstream_001,PROD)')}

    def test_column_level_lineage_from_schema_field(
        test_client: DataHubClient, test_datasets: Dict[str, Dataset]
    ):
        source_schema_field = SchemaFieldUrn(test_datasets["upstream"].urn, "id")
        column_lineage_results = test_client.lineage.get_lineage(
            source_urn=str(source_schema_field), direction="downstream", max_hops=3
        )
    
>       assert len(column_lineage_results) == 3
E       assert 0 == 3
E        +  where 0 = len([])

tests/lineage/test_lineage_sdk.py:203: AssertionError
tests.lineage.test_lineage_sdk::test_table_level_lineage

Flake rate in main: 7.27% (Passed 51 times, Failed 4 times)

Stack Traces | 0.154s run time
test_client = <datahub.sdk.main_client.DataHubClient object at 0x7f3fee552c90>
test_datasets = {'downstream1': Dataset('urn:li:dataset:(urn:li:dataPlatform:snowflake,test_lineage_downstream_001,PROD)'), 'downstrea...ream_003,PROD)'), 'upstream': Dataset('urn:li:dataset:(urn:li:dataPlatform:snowflake,test_lineage_upstream_001,PROD)')}

    def test_table_level_lineage(
        test_client: DataHubClient, test_datasets: Dict[str, Dataset]
    ):
        table_lineage_results = test_client.lineage.get_lineage(
            source_urn=str(test_datasets["upstream"].urn),
            direction="downstream",
            max_hops=3,
        )
    
>       assert len(table_lineage_results) == 3
E       assert 0 == 3
E        +  where 0 = len([])

tests/lineage/test_lineage_sdk.py:109: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link

alwaysmeticulous bot commented Jul 16, 2025

✅ Meticulous spotted 0 visual differences across 1453 screens tested: view results.

Meticulous evaluated ~8 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit 73d6486. This comment will update as new commits are pushed.

@sgomezvillamor
Copy link
Contributor Author

Some integration tests have shown that this is still not working as expected, and the cursor continues to behave inconsistently during pagination. So closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant