-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-54002][DEPLOY] Support integrating BeeLine with Connect JDBC driver #52706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
SPARK-53934 has landed on master. |
### What changes were proposed in this pull request? This is the initial implementation of the Connect JDBC driver. In detail, this PR implements the essential JDBC interfaces listed below. - `java.sql.Driver` - `java.sql.Connection` - `java.sql.Statement` - `java.sql.ResultSet` - `java.sql.ResultSetMetaData` - `java.sql.DatabaseMetaData` At the first step, this PR only supports `NullType`, `BooleanType`, `ByteType`, `ShortType`, `IntegerType`, `LongType`, `FloatType`, `DoubleType`, and `StringType`. ### Why are the changes needed? Basically implement the feature proposed in [SPIP: JDBC Driver for Spark Connect](https://issues.apache.org/jira/browse/SPARK-53484) ### Does this PR introduce _any_ user-facing change? It's a new feature. ### How was this patch tested? New UTs are added. And I have also cross-verified BeeLine cases with SPARK-54002 (#52706) ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52705 from pan3793/SPARK-53934. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, I believe this PR is a little beyond of the scope of SPARK-53484: SPIP JDBC Driver for Spark Connect because we didn't sign off for moving away from spark-sql-core_*.jar for this additional module.
This PR modifies the classpath for bin/beeline - excluding spark-sql-core_.jar and adding jars/connect-repl/.jar, making it as same as bin/spark-connect-shell, the modified classpath looks like
Specifically, I have a worry about this part which is blindly pushing the users away from the classic code.
- if (isRemote) {
+ if (isRemote || isBeeLine) {
Please make beeline work in the existing class path by default. New code path should be applied additionally by configuration or environment variables, @pan3793 .
@dongjoon-hyun In general, I agree with your concerns and ideas to have a switch for the classpath assembly, but I think we can have a different default behavior, due to the following reasons:
Given the above reasons, I think we can change the classpath as proposed by this PR by default, and have an internal switch (i.e., env var Or, if we are very conservative, we can provide a switch (e.g., env var or also cc @LuciferYang |
|
Please note that Apache Spark's default distribution is non-Spark-Connect version. The majority users are not using Connect at all. So, there is nothing hurts the existing classic user experience here.
As I mentioned, the only point which we disagree with the following.
If this PR contains any removal of classic Spark behavior, I need to cast a veto here unfortunately. |
| if (isRemote && "1".equals(getenv("SPARK_SCALA_SHELL")) && project.equals("sql/core")) { | ||
| continue; | ||
| } | ||
| if (isBeeLine && project.equals("sql/core")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to conduct an inspection on the BeeLine with Connect JDBC driver and apply special handling solely for this specific scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LuciferYang short answer - it's possible only in some cases, details were explained at #52706 (comment)
|
@dongjoon-hyun @LuciferYang I have updated the PR to keep the existing logic by default. Now, for Spark classic(default) distribution, the proposed classpath changes of |
What changes were proposed in this pull request?
This PR modifies the classpath for
bin/beeline- excludingspark-sql-core_*.jar,spark-connect_*.jar, etc., and addingjars/connect-repl/*.jar, making it as same asbin/spark-connect-shell, the modified classpath looks likeNote: BeeLine itself only requires Hive jars and a few third-party utilities jars to run, excluding some
spark-*.jars won't break BeeLine's existing capability for connecting to Thrift Server.To ensure no change for classic Spark behavior, for Spark classic(default) distribution, the above changes only take effect when setting
SPARK_CONNECT_BEELINE=1explicitly. For convenience, this is enabled by default for the Spark connect distributionWhy are the changes needed?
It's a new feature, with this feature, users are allowed to use BeeLine as an SQL CLI to connect to the Spark Connect server.
Does this PR introduce any user-facing change?
No, this feature must be enabled by setting
SPARK_CONNECT_BEELINE=1explicitly for classic(default) Spark distribution.How was this patch tested?
Launch a Connect Server first, in my case, the Connect Server (v4.1.0-preview2) runs at
sc://localhost:15002. To ensure changes won't break the Thrift Server use case, also launch a Thrift Server atthrift://localhost:10000Testing for dev mode
Building
Without setting
SPARK_CONNECT_BEELINE=1, it fails as expected withNo known driver to handle "jdbc:sc://localhost:15002"With setting
SPARK_CONNECT_BEELINE=1, it works as expectedAlso, test with Thrift Server to ensure no impact on existing functionalities.
It works as expected both with and without
SPARK_CONNECT_BEELINE=1Testing for Spark distribution
Spark classic distribution
Spark connect distribution
Was this patch authored or co-authored using generative AI tooling?
No.