-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-52040][PYTHON][SQL][CONNECT] ResolveLateralColumnAliasReference should retain the plan id #50831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you resolve the conflicts, @zhengruifeng ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
SparkSessionE2ESuite and AdaptiveQueryExecSuite failed, would you rerun tests? |
LGTM thank you! |
The k8s failure is unrelated, thanks, merging to master/4.0! |
…e should retain the plan id ResolveLateralColumnAliasReference should retain the plan id bug fix before: ``` In [1]: from pyspark.sql import functions as sf In [2]: df1 = spark.range(10).select((sf.col("id") + sf.lit(1)).alias("x"), (sf.col("x") + sf.lit(1)).alias("y")) In [3]: df2 = spark.range(10).select(sf.col("id").alias("x")) In [4]: df1.join(df2, df1.x == df2.x).select(df1.y) Out[4]: 25/05/08 16:38:28 ERROR ErrorUtils: Spark Connect RPC error during: analyze. UserId: ruifeng.zheng. SessionId: af3deba7-1e48-49fd-adad-2046a72ed341. org.apache.spark.sql.AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "y". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 at org.apache.spark.sql.errors.QueryCompilationErrors$.cannotResolveDataFrameColumn(QueryCompilationErrors.scala:4147) at org.apache.spark.sql.catalyst.analysis.ColumnResolutionHelper.resolveDataFrameColumn(ColumnResolutionHelper.scala:562) at org.apache.spark.sql.catalyst.analysis.ColumnResolutionHelper.tryResolveDataFrameColumns(ColumnResolutionHelper.scala:537) ``` after: ``` In [1]: from pyspark.sql import functions as sf In [2]: df1 = spark.range(10).select((sf.col("id") + sf.lit(1)).alias("x"), (sf.col("x") + sf.lit(1)).alias("y")) In [3]: df2 = spark.range(10).select(sf.col("id").alias("x")) In [4]: df1.join(df2, df1.x == df2.x).select(df1.y).show() +---+ | y| +---+ | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| | 10| +---+ ``` yes, above query works after this change added test no Closes #50831 from zhengruifeng/fix_lca. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 688281a) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
ResolveLateralColumnAliasReference should retain the plan id
Why are the changes needed?
bug fix
before:
after:
Does this PR introduce any user-facing change?
yes, above query works after this change
How was this patch tested?
added test
Was this patch authored or co-authored using generative AI tooling?
no