-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[FLINK-38444][table] Join unique keys should not contain null values #27090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
assertEquals( | ||
toBitSet(Array(1), Array(1, 5), Array(1, 5, 6)), | ||
mq.getUpsertKeys(logicalLeftJoinOnUniqueKeys).toSet) | ||
assertEquals(toBitSet(Array(1)), mq.getUpsertKeys(logicalLeftJoinOnUniqueKeys).toSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am i missing something, I do not see tests for nulls on each side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above is the test: 5, and 6 were columns coming from the right side of the join. So they can't be included in unique keys adn that's why "Array(1, 5), Array(1, 5, 6)" were remove as possible unique keys.
The same for all the other tests that were updated, they test the new behaviour
CI is failling on unrelated e2e tests issue happening across PRs |
Hi, @gustavodemorais. I'm wondering why we can't have null columns as part of unique keys? From what I see in the SQL standard, it seems that unique keys can contain null columns.
|
Hey @xuyangzhong, thanks for the comment. I was at Flink Forward and just reading your reply now. That's a good catch. The issue is that we are using the unique keys as upsert keys and in my understand we can't allow upsert keys to contain null values and that's what I was trying to fix here. It leads to a runtime error since a sink expects the upsert key not to contain a null value which can happen in a case of a left join. https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/metadata/FlinkRelMdUpsertKeys.scala#L352 We could also have additional logic in getJoinUpsertKeys to remove such keys. Do you have other suggestions? |
What is the purpose of the change
The planner currently considers the union of both the unique and upsert keys from the left and from the right to be a valid resulting upsert key. That's true for inner joins but for left/right/full joins that leads to a resulting unique key that contains columns that can be null, which is not valid.
Brief change log
Verifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation