Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] [Spark] Support Spark 4.0 (preview) #3940

Open
2 tasks done
YannByron opened this issue Aug 12, 2024 · 7 comments
Open
2 tasks done

[Feature] [Spark] Support Spark 4.0 (preview) #3940

YannByron opened this issue Aug 12, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@YannByron
Copy link
Contributor

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Support Spark4.0 (preview1)

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@YannByron YannByron added the enhancement New feature or request label Aug 12, 2024
@YannByron
Copy link
Contributor Author

@ulysses-you we can discuss this here.

@ulysses-you
Copy link
Contributor

thank you @YannByron for the guide.

I looked at Spark 4.0.0-preview, the main challenge is scala2.13. Others like JDK17, inferface changes are not big issues.

For scala2.13, as far as I can see, Spark community paid a huge cost to support it and drop the scala2.12, and even for now there are some performance regression due to scala2.13, so I think it affects Paimon much.

For my self, I perfer to copy paimon-spark-common to a new module paimon-spark-4.0, so that we did not need to touch previous Spark version code. We can focus on the support with Spark 4.0.0 and higher version (may create paimon-spark-4-common if necessary).

cc @JingsongLi what do you think about?

@awol2005ex
Copy link

You can use "com.thoughtworks.enableIf" for multi versions of scala

@JingsongLi
Copy link
Contributor

Hi @ulysses-you @YannByron , I would like to ask whether paimon-spark-4-common and paimon-spark-common can reuse most of the code. I believe Spark 3 has very long-term support, and we also need to support Spark 4. If we end up copying a lot of code in this process, it will result in maintaining two separate codebases, which can be very costly. Therefore, my concern is whether we can reuse a significant portion of the code.

@YannByron
Copy link
Contributor Author

maybe we can allow paimon-spark-common to support both scala.version and spark.version properties(scala-2.12 and spark 3.5.2 by default), that make paimon-spark-common compatible spark 3.5 and 4.x.Then provide a profile in top-level pom to compile paimon-spark.

This approach doesn't allow compile both spark 3.x and spark4.x at the same time and we have to modify something like CI. But this can avoid copying codes and make more reuse.

Meanwhile, paimon-spark3-common and paimon-spark4-common can be derived from paimon-spark-common easily if required.

@JingsongLi @ulysses-you WDYT~

@ulysses-you
Copy link
Contributor

The main issue of reuse module to me is we need to compile spark twice for different scala version. But I'm +1 for @YannByron if you are fine with it.

@JingsongLi
Copy link
Contributor

@YannByron This approach just like Flink with two scala versions. I am OK with it~

This was referenced Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants