fix: CometScanExec on Spark 3.5.2 #915

Kimahriman · 2024-09-05T14:50:26Z

Which issue does this PR close?

Closes #912

Rationale for this change

Fixes CometScanExec running on Spark 3.5.2+. Currently it will fail with a runtime exception, and will fail to compile if specifying 3.5.2 with

[ERROR] /Users/abinford/projects/arrow-datafusion-comet/spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala:144: value OP_ID_TAG is not a member of object org.apache.spark.sql.catalyst.plans.QueryPlan

This is because OP_ID_TAG is removed in Spark 3.5.2+, and the operator ID tracking is replaced with a separate internal map of plan -> ID, so there's no way to manually pass the ID on to a delegating plan. Instead simply copies the implementation of DataSourceScanExec's method.

What changes are included in this PR?

The only effect of the change is the verbose string output for CometScanExec. Instead of delegating to the underlying DataSourceScanExec, just copied the implementation over. This is in line with other Comet operators that implement their own verbose string, and makes more sense in the formatted explain as the operator names line up.

Before:

== Physical Plan ==
* ColumnarToRow (2)
+- CometScan parquet  (1)


(1) Scan parquet 
Output [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]
Batched: true
Location: InMemoryFileIndex [file:.../arrow-datafusion-comet/spark/target/tmp/spark-a0b56a3c-caa2-4631-ad15-ec83e8522948/test.parquet]
ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date>

(2) ColumnarToRow [codegen id : 1]
Input [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]

After:

== Physical Plan ==
* ColumnarToRow (2)
+- CometScan parquet  (1)


(1) CometScan parquet 
Output [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]
Batched: true
Location: InMemoryFileIndex [file:.../arrow-datafusion-comet/spark/target/tmp/spark-36667e9a-75e0-4f90-8442-5f47ecd1cf4b/test.parquet]
ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date>

(2) ColumnarToRow [codegen id : 1]
Input [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]

How are these changes tested?

Manually verified by building with Spark 3.5.2 ./mvnw clean package -Pspark-3.5 -Dspark.version=3.5.2 -DskipTests.

Kimahriman · 2024-09-05T16:10:25Z

Oof this breaks a lot of explain plan comparison tests. If this change is ok I can try to work on updating them

kazuyukitanimura · 2024-09-11T23:04:10Z

Is the change in the metadata? If so, should we fix the metadata instead?

kazuyukitanimura · 2024-09-11T23:08:05Z

And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)

Kimahriman · 2024-09-11T23:24:00Z

Is the change in the metadata? If so, should we fix the metadata instead?

Are you referring to the compilation error or the change in explain output?

And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)

Agreed. I thought about doing that here but I wasn't sure the best way to go about updating the diff for the Spark SQL tests

kazuyukitanimura · 2024-09-13T23:29:14Z

Is the change in the metadata? If so, should we fix the metadata instead?

Are you referring to the compilation error or the change in explain output?

I meant CometScanExec.metadata inherited from DataSourceScanExec

parthchandra · 2024-09-13T23:38:16Z

spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala

@@ -141,8 +141,30 @@ case class CometScanExec(
    if (wrapped == null) Map.empty else wrapped.metadata

  override def verboseStringWithOperatorId(): String = {
-    getTagValue(QueryPlan.OP_ID_TAG).foreach(id => wrapped.setTagValue(QueryPlan.OP_ID_TAG, id))
-    wrapped.verboseStringWithOperatorId()
+    val metadataStr = metadata.toSeq.sorted


Since the original issue was that OP_ID_TAG has been removed, is it not sufficient to just remove the offending line

Just removing setting the OP_ID_TAG leads to a formatted explain of:

== Physical Plan == * ColumnarToRow (2) +- CometScan parquet (1) (unknown) Scan parquet Output [20]: [_1#0, _2#1, _3#2, _4#3, _5#4L, _6#5, _7#6, _8#7, _9#8, _10#9, _11#10L, _12#11, _13#12, _14#13, _15#14, _16#15, _17#16, _18#17, _19#18, _20#19] Batched: true Location: InMemoryFileIndex [file:/Users/abinford/projects/arrow-datafusion-comet/spark/target/tmp/spark-c0b82b7c-3de1-431b-96bf-56cb37f3a463/test.parquet] ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date> (2) ColumnarToRow [codegen id : 1] Input [20]: [_1#0, _2#1, _3#2, _4#3, _5#4L, _6#5, _7#6, _8#7, _9#8, _10#9, _11#10L, _12#11, _13#12, _14#13, _15#14, _16#15, _17#16, _18#17, _19#18, _20#19]

which is the whole reason it was added in the first place, to fix the unknown operator ID bc35fa5

Plus by what I could see, every other operator is prefixed with Comet in the formatted explain, so it's weird for Scan to be the one thing that doesn't actually match up to the physical plan

I see, that makes sense.
On an unrelated note, wonder why CometScanExec extends DataSourceScanExec instead of FileSourceScanExec (if it had we would have got the verboseString for free)

parthchandra

lgtm

andygrove

LGTM. Thanks @Kimahriman

andygrove · 2024-09-20T14:19:20Z

@Kimahriman would you be able to rebase this PR so that we can merge it?

Kimahriman · 2024-09-20T18:56:58Z

@Kimahriman would you be able to rebase this PR so that we can merge it?

Oof 95 conflicts I would have to manually resolve, let me just regenerate these plans all again tonight

Kimahriman · 2024-09-20T19:26:29Z

Ok wasn't too bad to find/replace fix the issues again, we'll see if I messed anything up in the CI

codecov-commenter · 2024-09-20T20:18:47Z

Codecov Report

Attention: Patch coverage is 70.58824% with 5 lines in your changes missing coverage. Please review.

Project coverage is 34.05%. Comparing base (fa275f1) to head (2757186).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...ala/org/apache/spark/sql/comet/CometScanExec.scala	70.58%	0 Missing and 5 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #915      +/-   ##
============================================
- Coverage     34.16%   34.05%   -0.12%     
+ Complexity      880      879       -1     
============================================
  Files           112      112              
  Lines         43286    43301      +15     
  Branches       9572     9578       +6     
============================================
- Hits          14789    14745      -44     
- Misses        25478    25518      +40     
- Partials       3019     3038      +19

Flag	Coverage Δ
	`34.05% <70.58%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Kimahriman · 2024-09-20T23:01:22Z

Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests

andygrove · 2024-09-24T09:54:57Z

Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests

Thanks. I am re-running the failed jobs now.

Kimahriman added 4 commits September 4, 2024 21:36

Implement full verboseStringWithOperatorId instead of delegating

9c0bb9a

Bump Spark 3.5 version

926b367

Undo version change

cf2a0e0

Merge branch 'main' into scan-operator-id

1915849

Kimahriman added 2 commits September 6, 2024 20:14

Merge branch 'main' into scan-operator-id

0dfc8c4

Update plane explains

c8b40ee

parthchandra reviewed Sep 13, 2024

View reviewed changes

parthchandra approved these changes Sep 16, 2024

View reviewed changes

andygrove approved these changes Sep 19, 2024

View reviewed changes

Merge branch 'main' into scan-operator-id

2757186

andygrove merged commit 5b3f7bc into apache:main Sep 24, 2024
74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: CometScanExec on Spark 3.5.2 #915

fix: CometScanExec on Spark 3.5.2 #915

Kimahriman commented Sep 5, 2024 •

edited

Loading

Kimahriman commented Sep 5, 2024

kazuyukitanimura commented Sep 11, 2024

kazuyukitanimura commented Sep 11, 2024 •

edited

Loading

Kimahriman commented Sep 11, 2024

kazuyukitanimura commented Sep 13, 2024 •

edited

Loading

parthchandra Sep 13, 2024

Kimahriman Sep 14, 2024

Kimahriman Sep 14, 2024

parthchandra Sep 16, 2024

parthchandra left a comment

andygrove left a comment

andygrove commented Sep 20, 2024

Kimahriman commented Sep 20, 2024

Kimahriman commented Sep 20, 2024

codecov-commenter commented Sep 20, 2024

Kimahriman commented Sep 20, 2024

andygrove commented Sep 24, 2024

fix: CometScanExec on Spark 3.5.2 #915

fix: CometScanExec on Spark 3.5.2 #915

Conversation

Kimahriman commented Sep 5, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Kimahriman commented Sep 5, 2024

kazuyukitanimura commented Sep 11, 2024

kazuyukitanimura commented Sep 11, 2024 • edited Loading

Kimahriman commented Sep 11, 2024

kazuyukitanimura commented Sep 13, 2024 • edited Loading

parthchandra Sep 13, 2024

Choose a reason for hiding this comment

Kimahriman Sep 14, 2024

Choose a reason for hiding this comment

Kimahriman Sep 14, 2024

Choose a reason for hiding this comment

parthchandra Sep 16, 2024

Choose a reason for hiding this comment

parthchandra left a comment

Choose a reason for hiding this comment

andygrove left a comment

Choose a reason for hiding this comment

andygrove commented Sep 20, 2024

Kimahriman commented Sep 20, 2024

Kimahriman commented Sep 20, 2024

codecov-commenter commented Sep 20, 2024

Codecov Report

Kimahriman commented Sep 20, 2024

andygrove commented Sep 24, 2024

Kimahriman commented Sep 5, 2024 •

edited

Loading

kazuyukitanimura commented Sep 11, 2024 •

edited

Loading

kazuyukitanimura commented Sep 13, 2024 •

edited

Loading