Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49489][SQL][HIVE] HMS client respects hive.thrift.client.maxmessage.size #50022

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Feb 20, 2025

What changes were proposed in this pull request?

Port HIVE-26633 for Spark HMS client.

Why are the changes needed?

See discussion in #46468 (comment)

Does this PR introduce any user-facing change?

No, it fixes the bug introduced by an unreleased change, namely SPARK-47018.

How was this patch tested?

This must be tested manually, as the current Spark UT does not cover the remote HMS cases.

I constructed a test case in a testing Hadoop cluster with a remote HMS.

Firstly, create a table with a large number of partitions.

$ spark-sql --num-executors=6 --executor-cores=4 --executor-memory=1g \
    --conf spark.hive.exec.dynamic.partition.mode=nonstrict \
    --conf spark.hive.exec.max.dynamic.partitions=1000000
spark-sql (default)> CREATE TABLE p PARTITIONED BY (year, month, day) STORED AS PARQUET AS
SELECT /*+ REPARTITION(200) */ * FROM (
  (SELECT CAST(id AS STRING) AS year FROM range(2000, 2100)) JOIN
  (SELECT CAST(id AS STRING) AS month FROM range(1, 13)) JOIN
  (SELECT CAST(id AS STRING) AS day FROM range(1, 31)) JOIN
  (SELECT 'this is some data' AS data)
);

Then try to tune hive.thrift.client.max.message.size and run a query that would trigger getPartitions thrift call. For example, when set to 1mb, it throws TTransportException: MaxMessageSize reached, and the exception disappers after boosting the value.

$ spark-sql --conf spark.hive.thrift.client.max.message.size=1mb
spark-sql (default)> SHOW PARTITIONS p;
...
2025-02-20 15:18:49 WARN RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. listPartitionNames
org.apache.thrift.transport.TTransportException: MaxMessageSize reached
	at org.apache.thrift.transport.TEndpointTransport.checkReadBytesAvailable(TEndpointTransport.java:81) ~[libthrift-0.16.0.jar:0.16.0]
	at org.apache.thrift.protocol.TProtocol.checkReadBytesAvailable(TProtocol.java:67) ~[libthrift-0.16.0.jar:0.16.0]
	at org.apache.thrift.protocol.TBinaryProtocol.readListBegin(TBinaryProtocol.java:297) ~[libthrift-0.16.0.jar:0.16.0]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partition_names_result$get_partition_names_resultStandardScheme.read(ThriftHiveMetastore.java) ~[hive-metastore-2.3.10.jar:2.3.10]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partition_names_result$get_partition_names_resultStandardScheme.read(ThriftHiveMetastore.java) ~[hive-metastore-2.3.10.jar:2.3.10]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partition_names_result.read(ThriftHiveMetastore.java) ~[hive-metastore-2.3.10.jar:2.3.10]
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88) ~[libthrift-0.16.0.jar:0.16.0]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition_names(ThriftHiveMetastore.java:2458) ~[hive-metastore-2.3.10.jar:2.3.10]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition_names(ThriftHiveMetastore.java:2443) ~[hive-metastore-2.3.10.jar:2.3.10]
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionNames(HiveMetaStoreClient.java:1487) ~[hive-metastore-2.3.10.jar:2.3.10]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.base/java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173) ~[hive-metastore-2.3.10.jar:2.3.10]
	at jdk.proxy2/jdk.proxy2.$Proxy54.listPartitionNames(Unknown Source) ~[?:?]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.base/java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2349) ~[hive-metastore-2.3.10.jar:2.3.10]
	at jdk.proxy2/jdk.proxy2.$Proxy54.listPartitionNames(Unknown Source) ~[?:?]
	at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionNames(Hive.java:2461) ~[hive-exec-2.3.10-core.jar:2.3.10]
	at org.apache.spark.sql.hive.client.Shim_v2_0.getPartitionNames(HiveShim.scala:976) ~[spark-hive_2.13-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
...

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Feb 20, 2025
@pan3793
Copy link
Member Author

pan3793 commented Feb 20, 2025

@shrprasa @wangyum @Madhukar525722 I just made it to work, would be great if you could have a test with your internal cases, I will polish the code according to the GHA result and your feedback

@shrprasa
Copy link
Contributor

Thanks @pan3793. We will test and update

case proxy if JdkProxy.isProxyClass(proxy.getClass) =>
JdkProxy.getInvocationHandler(proxy) match {
case syncHandler if syncHandler.getClass.getName.endsWith("SynchronizedHandler") =>
val realMscField = SparkClassUtils.classForName(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can directly use syncHandler.getClass().getDeclaredField("client") here, and there should be no practical difference in effect.

Copy link
Contributor

@LuciferYang LuciferYang Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, perhaps we can try to refactor a bit here, maybe like

JdkProxy.getInvocationHandler(proxy) match {
    case syncHandler if syncHandler.getClass.getName.endsWith("SynchronizedHandler") =>
      val realMsc = getFieldValue(syncHandler, "client").asInstanceOf[IMetaStoreClient]
      configureMaxThriftMessageSize(hiveConf, realMsc)

    case retryHandler: RetryingMetaStoreClient =>
      val realMsc = getFieldValue(retryHandler, "base").asInstanceOf[IMetaStoreClient]
      configureMaxThriftMessageSize(hiveConf, realMsc)

    case _ => // do nothing
  }

private def getFieldValue(obj: AnyRef, fieldName: String): AnyRef = {
  val field = obj.getClass.getDeclaredField(fieldName)
  field.setAccessible(true)
  field.get(obj)
}

hmm... Perhaps the exception type caught in the catch block needs to be changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants