Skip to content

[#2750] feat(spark): Support Spark 4.1#2751

Merged
zuston merged 3 commits into
apache:masterfrom
LuciferYang:spark-4.1-support
May 27, 2026
Merged

[#2750] feat(spark): Support Spark 4.1#2751
zuston merged 3 commits into
apache:masterfrom
LuciferYang:spark-4.1-support

Conversation

@LuciferYang

@LuciferYang LuciferYang commented May 24, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Add a spark4.1 Maven profile that reuses the existing client-spark/spark4 module to compile and run against Spark 4.1.1.

API shims

Spark 4.1 introduced two binary-incompatible additions to APIs called from this module:

  • org.apache.spark.scheduler.MapStatus.apply — gained a trailing checksumVal: Long parameter.
  • org.apache.spark.util.collection.ExternalSorter — constructor gained a trailing rowBasedChecksums: Array[RowBasedChecksum] parameter.

Scala default parameters do not surface to Java callers, so a single Spark4Compat.java cannot satisfy both 4.0.x and 4.1.x signatures. Two parallel source roots ship a matching Spark4Compat:

  • src/main/java-spark4_0/ — 3-arg mapStatus, 5-arg ExternalSorter ctor (current Spark 4.0 shape).
  • src/main/java-spark4_1/ — 4-arg mapStatus (checksumVal = 0L), 6-arg ExternalSorter ctor (empty RowBasedChecksum[]).

build-helper-maven-plugin selects the source root via ${spark4.compat.source.dir}; the existing spark4 profile keeps the 4.0 default, the new spark4.1 profile overrides it to the 4.1 variant.

RssShuffleReader now calls Spark4Compat.newExternalSorter(...) and RssShuffleWriter calls Spark4Compat.mapStatus(...) instead of constructing those Spark types directly.

The client-spark/extension (Spark UI) module already uses scala-jakarta under the spark4 profile; Spark 4.1's WebUIPage.render keeps the same jakarta.servlet.http.HttpServletRequest signature, so the existing source root works for spark4.1 as well — only a spark4.1 profile body that mirrors the spark4 one is added.

spark4 and spark4.1 are mutually exclusive at build time; they share a Maven coordinate but produce incompatible bytecode against different Spark majors. Run mvn clean between profile switches locally.

Runtime dependency adjustments under -Pspark4.1

Spark 4.1.1 bumped several runtime libraries; the spark4.1 profile pins matching versions and the code makes the necessary lifecycle adjustments:

  • Netty 4.1.118 → 4.2.7.Final. Spark 4.1's NettyUtils.createEventLoop references io.netty.channel.nio.NioIoHandler (new in Netty 4.2). Netty 4.2 also tightens PooledByteBuf.nioBuffer() to require refCnt > 0. Response decoders that wrap the frame buffer in a NettyManagedBuffer body now call retain() explicitly, so the body's reference count is independent of the frame's. TransportFrameDecoder.shouldRelease is simplified to always return true, and TransportFrameDecoderTest is updated accordingly. Netty 4.1 silently tolerated the previous shape; 4.2 throws IllegalReferenceCountException.
  • Jetty 9.3.24 → 9.4.53.v20231009. Aligns with Spark 4.1's bundled Jetty. JettyServer.createThreadPool is already source-compatible with both lines.
  • Jackson 2.18.2 → 2.20.0. Spark 4.1.1 ships jackson-module-scala_2.13:2.20.0, which validates the classpath jackson-databind version on registration and rejects anything outside [2.20.0, 2.21.0). Without the bump, RDDOperationScope's static initializer (ObjectMapper.registerModule(DefaultScalaModule)) throws JsonMappingException, surfacing as ExceptionInInitializerError on the first SQL/RDD-touching test and NoClassDefFoundError on every later one.

These adjustments live entirely inside the spark4.1 profile and the response-decoder ref-count fix; the default build and -Pspark4 (Spark 4.0.2) paths are unchanged.

Why are the changes needed?

Following #1805 (Spark 4.0.2 support, now closed), users on Spark 4.1 currently cannot link Uniffle's Spark 4 client because of the API additions above. Closes #2750.

Does this PR introduce any user-facing change?

No (build / packaging only). A new -Pspark4.1 build flag is available; default builds and the existing -Pspark4 flag are unchanged.

How was this patch tested?

  • Local: mvn clean package -Pspark4.1 -DskipTests against Spark 4.1.1 — passes.
  • Local: mvn clean package -Pspark4 -DskipTests against Spark 4.0.2 — still passes.
  • Verified the resulting Spark4Compat.class differs across the two profiles (different bytecode size / arity), confirming the multi-source-root selection works.
  • Verified API signatures via javap -p on Spark 4.0.2 / 4.1.1 jars from Maven Central match the shim implementations.
  • Local: mvn -Pspark4.1 -pl integration-test/spark4 -am testAQERepartitionTest and MapSideCombineTest pass after the netty refCnt + jackson alignment fixes.
  • Local: TransportFrameDecoderTest passes against Netty 4.2 (-Pspark4.1) and continues to pass on the default Netty 4.1 line.
  • CI: parallel.yml now runs -Pspark4.1 (java-version 17) in the matrix; sequential.yml adds an Execute -Pspark4.1 step gated on java-version 17. Latest run on this branch (26386496434) is all-green across the full 45-job matrix.

@LuciferYang LuciferYang marked this pull request as draft May 24, 2026 17:34
@github-actions

github-actions Bot commented May 24, 2026

Copy link
Copy Markdown

Test Results

 3 617 files  +  216   3 617 suites  +216   7h 47m 48s ⏱️ + 24m 45s
 1 263 tests ±    0   1 252 ✅ ±    0  11 💤 ± 0  0 ❌ ±0 
18 026 runs  +1 096  17 989 ✅ +1 085  37 💤 +11  0 ❌ ±0 

Results for commit dc3719a. ± Comparison against base commit c81ef85.

♻️ This comment has been updated with latest results.

@codecov-commenter

codecov-commenter commented May 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.77%. Comparing base (c81ef85) to head (1fd752e).

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #2751   +/-   ##
=========================================
  Coverage     51.76%   51.77%           
+ Complexity     3982     3979    -3     
=========================================
  Files           600      600           
  Lines         33228    33228           
  Branches       3141     3141           
=========================================
+ Hits          17202    17203    +1     
+ Misses        14910    14908    -2     
- Partials       1116     1117    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add a `spark4.1` Maven profile that reuses the existing `client-spark/spark4`
module to compile and run against Spark 4.1.1.

Spark 4.1 introduced two binary-incompatible additions to APIs called from
this module:
- `MapStatus.apply` gained a trailing `checksumVal: Long` parameter.
- `ExternalSorter`'s constructor gained a trailing
  `rowBasedChecksums: Array[RowBasedChecksum]` parameter.

Scala default parameters do not surface to Java callers, so a single
`Spark4Compat.java` cannot satisfy both 4.0.x and 4.1.x signatures. Two
parallel source roots ship a matching `Spark4Compat`:
- `src/main/java-spark4_0/` — 3-arg `mapStatus`, 5-arg `ExternalSorter` ctor
- `src/main/java-spark4_1/` — 4-arg `mapStatus` (checksumVal=0L),
  6-arg `ExternalSorter` ctor (empty `RowBasedChecksum[]`)

`build-helper-maven-plugin` selects the source root via
`${spark4.compat.source.dir}`; the `spark4` profile keeps the 4.0 default,
the `spark4.1` profile overrides it to the 4.1 variant.

`RssShuffleReader` now calls `Spark4Compat.newExternalSorter(...)` and
`RssShuffleWriter` calls `Spark4Compat.mapStatus(...)`.

The `client-spark/extension` (Spark UI) module already uses
`scala-jakarta` under the `spark4` profile; Spark 4.1's `WebUIPage.render`
keeps the same `jakarta.servlet.http.HttpServletRequest` signature, so the
existing source root works for `spark4.1` as well.

CI:
- parallel.yml: add `profile: spark4.1, java-version: 17` to the matrix.
- sequential.yml: add an `Execute -Pspark4.1` step gated on java-version 17.
…g into body

Response decoders that wrap the frame ByteBuf into a NettyManagedBuffer
body relied on TransportFrameDecoder.shouldRelease() returning false to
transfer ownership of the single ref count from the frame to the body.
Netty 4.2's stricter PooledByteBuf lifecycle exposes this fragility:
RepartitionWithHadoopHybridStorageRssTest fails with
IllegalReferenceCountException: refCnt: 0 from
NettyManagedBuffer.nioByteBuffer when running under Spark 4.1 (which
ships with netty 4.2.7.Final).

Make ownership explicit: each body-wrapping decoder now calls
byteBuf.retain() so the body has its own independent ref count, and
TransportFrameDecoder.shouldRelease() always returns true. The frame
buffer's lifetime is owned by the decoder loop; the body's lifetime is
owned by the response handler.

Updated TransportFrameDecoderTest to match the new contract: every
decoded message expects shouldRelease() == true, and body-bearing
messages release the frame buffer first, then the body.

Verified locally:
  mvn -Pspark4.1 -pl integration-test/spark-common,integration-test/spark4 \
      -Dtest=RepartitionWithHadoopHybridStorageRssTest test
  Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
….1.1

Spark 4.1.1 ships jackson-module-scala_2.13:2.20.0, which validates the
classpath jackson-databind version on registration and rejects anything
outside [2.20.0, 2.21.0). The spark4.1 profile previously pinned
jackson-databind/jackson-core to 2.18.2, so RDDOperationScope's static
initializer (which calls ObjectMapper.registerModule(DefaultScalaModule))
threw JsonMappingException, surfacing as ExceptionInInitializerError on
the first test (AQERepartitionTest) and NoClassDefFoundError on every
later test that touches RDDOperationScope (e.g. MapSideCombineTest via
SparkContext.parallelize).

Bump jackson.version to 2.20.0 so databind/core/module-scala stay in
lockstep under -Pspark4.1.
@LuciferYang LuciferYang marked this pull request as ready for review May 25, 2026 08:53
@LuciferYang

Copy link
Copy Markdown
Contributor Author

@roryqi @zuston

@zuston zuston left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. left a minor comment

@zuston zuston merged commit 7f08785 into apache:master May 27, 2026
45 checks passed
@zuston

zuston commented May 27, 2026

Copy link
Copy Markdown
Member

merged! thanks @LuciferYang

@LuciferYang

Copy link
Copy Markdown
Contributor Author

Thank you @zuston

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Support Spark 4.1

3 participants