How do I query the data of a double Hive Catalog managed by Metalake across clusters (Cluster1/Cluster2) in Spark? #6614

gcr-ran · 2025-03-05T10:52:21Z

gcr-ran
Mar 5, 2025

I configured the hive catalogs of two clusters (cluster1 and cluster2 respectively) in a Metalakes, and the metadata information of the hive tables of the two hive clusters can be queried in the web UI. How do I configure these two Hive Catalogs so that I can query the data of these two Hive clusters in Spark?
I got an error when querying the hive data of another cluster in Spark.

25/03/05 18:47:27 ERROR SparkSQLDriver: Failed in [select * from hive103.jt.hive_students] java.lang.IllegalArgumentException: java.net.UnknownHostException: drmcluster at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:466) at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:134) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:374) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:308) at org.apache.hadoop.hdfs.DistributedFileSystem.initDFSClient(DistributedFileSystem.java:202) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:187) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) at org.apache.spark.sql.execution.streaming.FileStreamSink$.ancestorIsMetadataDirectory(FileStreamSink.scala:104) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.$anonfun$rootPaths$1(InMemoryFileIndex.scala:60) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.$anonfun$rootPaths$1$adapted(InMemoryFileIndex.scala:60) at scala.collection.TraversableLike.noneIn$1(TraversableLike.scala:319) at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:385) at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297) at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) at scala.collection.TraversableLike.filterNot(TraversableLike.scala:403) at scala.collection.TraversableLike.filterNot$(TraversableLike.scala:403) at scala.collection.AbstractTraversable.filterNot(Traversable.scala:108) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(InMemoryFileIndex.scala:60) at org.apache.kyuubi.spark.connector.hive.read.HiveInMemoryFileIndex.<init>(HiveFileIndex.scala:154) at org.apache.kyuubi.spark.connector.hive.read.HiveCatalogFileIndex.filterPartitions(HiveFileIndex.scala:109) at org.apache.kyuubi.spark.connector.hive.read.HiveCatalogFileIndex.listHiveFiles(HiveFileIndex.scala:59) at org.apache.kyuubi.spark.connector.hive.read.HiveScan.partitions(HiveScan.scala:79) at org.apache.spark.sql.execution.datasources.v2.FileScan.planInputPartitions(FileScan.scala:179) at org.apache.spark.sql.execution.datasources.v2.FileScan.planInputPartitions$(FileScan.scala:178) at org.apache.kyuubi.spark.connector.hive.read.HiveScan.planInputPartitions(HiveScan.scala:41) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputPartitions$lzycompute(BatchScanExec.scala:54) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputPartitions(BatchScanExec.scala:54) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:142) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:141) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.supportsColumnar(BatchScanExec.scala:36) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:143) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:69) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:69) at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:459) at org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:145) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:145) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:138) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:158) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:158) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:106) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:984) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:191) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:214) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1072) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1081) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.UnknownHostException: drmcluster ... 102 more java.lang.IllegalArgumentException: java.net.UnknownHostException: drmcluster at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:466) at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:134) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:374) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:308) at org.apache.hadoop.hdfs.DistributedFileSystem.initDFSClient(DistributedFileSystem.java:202) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:187) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) at org.apache.spark.sql.execution.streaming.FileStreamSink$.ancestorIsMetadataDirectory(FileStreamSink.scala:104) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.$anonfun$rootPaths$1(InMemoryFileIndex.scala:60) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.$anonfun$rootPaths$1$adapted(InMemoryFileIndex.scala:60) at scala.collection.TraversableLike.noneIn$1(TraversableLike.scala:319) at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:385) at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297) at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) at scala.collection.TraversableLike.filterNot(TraversableLike.scala:403) at scala.collection.TraversableLike.filterNot$(TraversableLike.scala:403) at scala.collection.AbstractTraversable.filterNot(Traversable.scala:108) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(InMemoryFileIndex.scala:60) at org.apache.kyuubi.spark.connector.hive.read.HiveInMemoryFileIndex.<init>(HiveFileIndex.scala:154) at org.apache.kyuubi.spark.connector.hive.read.HiveCatalogFileIndex.filterPartitions(HiveFileIndex.scala:109) at org.apache.kyuubi.spark.connector.hive.read.HiveCatalogFileIndex.listHiveFiles(HiveFileIndex.scala:59) at org.apache.kyuubi.spark.connector.hive.read.HiveScan.partitions(HiveScan.scala:79) at org.apache.spark.sql.execution.datasources.v2.FileScan.planInputPartitions(FileScan.scala:179) at org.apache.spark.sql.execution.datasources.v2.FileScan.planInputPartitions$(FileScan.scala:178) at org.apache.kyuubi.spark.connector.hive.read.HiveScan.planInputPartitions(HiveScan.scala:41) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputPartitions$lzycompute(BatchScanExec.scala:54) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputPartitions(BatchScanExec.scala:54) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:142) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:141) at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.supportsColumnar(BatchScanExec.scala:36) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:143) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:69) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:69) at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:459) at org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:145) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:145) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:138) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:158) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:158) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:106) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:984) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:191) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:214) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1072) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1081) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.UnknownHostException: drmcluster ... 102 more

FANNG1 · 2025-03-05T12:19:54Z

FANNG1
Mar 5, 2025
Collaborator

Does the two hive clusters share the same HDFS cluster or use the separate HDFS cluster? if using separate cluster you could add HDFS related configurations in Hive catalog properties with the prefix spark.bypass.

2 replies

gcr-ran Mar 6, 2025
Author

Can you elaborate on that, thanks!
I've added the following configuration:

`spark.bypass.dfs.nameservices=drmcluster
spark.bypass.dfs.ha.namenodes.drmcluster=nn1,nn2
spark.bypass.dfs.namenode.rpc-address.drmcluster.nn1=172.*.*.101:12001
spark.bypass.dfs.namenode.rpc-address.drmcluster.nn2=172.*.*.102:12001
spark.bypass.dfs.namenode.servicerpc-address.drmcluster.nn1=172.*.*.101:12002
spark.bypass.dfs.namenode.servicerpc-address.drmcluster.nn2=172.*.*.102:12002
spark.bypass.dfs.namenode.http-address.drmcluster.nn1=172.*.*.101:12003
spark.bypass.dfs.namenode.http-address.drmcluster.nn2=172.*.*.102:12003
spark.bypass.dfs.namenode.nn1.hostname=172.*.*.101
spark.bypass.dfs.namenode.nn2.hostname=172.*.*.102`

When querying hive103 catalog(drmcluster), I still get the following error:
EXPLAIN select * from hive103.jt.hive_students;

25/03/06 09:02:19 WARN FileSystem: Failed to initialize fileystem hdfs://drmcluster/ams/mhive/glue/jt.db/hive_students: java.lang.IllegalArgumentException: java.net.UnknownHostException: drmcluster
Error occurred during query planning: 
java.net.UnknownHostException: drmcluster
Time taken: 3.775 seconds, Fetched 2 row(s)

FANNG1 Mar 6, 2025
Collaborator

You could add some debug message, or enable remote debugging in GravitinoHiveCatalog to check the values of all.

  protected TableCatalog createAndInitSparkCatalog(
      String name, CaseInsensitiveStringMap options, Map<String, String> properties) {
    TableCatalog hiveCatalog = new HiveTableCatalog();
    Map<String, String> all =
        getPropertiesConverter().toSparkCatalogProperties(options, properties);
    hiveCatalog.initialize(name, new CaseInsensitiveStringMap(all));

    return hiveCatalog;
  }

gcr-ran · 2025-03-07T03:50:05Z

gcr-ran
Mar 7, 2025
Author

The problem has been solved: Here need to identify the logical name access of the two High Availability (HA) clusters for the Hadoop client configuration used.

2 replies

FANNG1 Mar 7, 2025
Collaborator

Could you share the configuration? Other users could refer to it

gcr-ran Mar 7, 2025
Author

For cross-cluster access configuration, you need to have the following configuration inside the hdfs-site.xml profile of the Hadoop client in order to be able to access drmcluster1 and drmcluster2

<!-- remote services -->
    <property>
    	<name>dfs.nameservices</name>
    	<value>drmcluster1,drmcluste2</value>
    </property>
<!-- remote namespace drmcluster1 -->
<!-- service drmcluster1 -->
    <property>
    	<name>dfs.ha.namenodes.drmcluster1</name>
    	<value>nn1,nn2</value>
    </property>
    <property>
    	<name>dfs.namenode.rpc-address.drmcluster1.nn1</name>
    	<value>172.*.*.101:12001</value>
    </property>
    <property>
    	<name>dfs.namenode.rpc-address.drmcluster1.nn2</name>
    	<value>172.*.*.102:12001</value>
    </property>
    <property>
    	<name>dfs.namenode.http-address.drmcluster1.nn1</name>
    	<value>172.*.*.101:12003</value>
    </property>
    <property>
    	<name>dfs.namenode.http-address.drmcluster1.nn2</name>
    	<value>172.*.*.102:12003</value>
    </property>
    <property>
    	<name>dfs.client.failover.proxy.provider.drmcluster1</name>
    	<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
<!-- remote namespace drmcluste2 -->
<!-- service drmcluste2 -->
    <property>
    	<name>dfs.ha.namenodes.drmcluste2</name>
    	<value>nn1,nn2</value>
    </property>
    <property>
    	<name>dfs.namenode.rpc-address.drmcluste2.nn1</name>
    	<value>100.*.*.225:12001</value>
    </property>
    <property>
    	<name>dfs.namenode.rpc-address.drmcluste2.nn2</name>
    	<value>100.*.*.226:12001</value>
    </property>
    <property>
    	<name>dfs.namenode.http-address.drmcluste2.nn1</name>
    	<value>100.*.*.225:12003</value>
    </property>
    <property>
    	<name>dfs.namenode.http-address.drmcluste2.nn2</name>
    	<value>100.*.*.226:12003</value>
    </property>
    <property>
    	<name>dfs.client.failover.proxy.provider.drmcluste2</name>
    	<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I query the data of a double Hive Catalog managed by Metalake across clusters (Cluster1/Cluster2) in Spark? #6614

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How do I query the data of a double Hive Catalog managed by Metalake across clusters (Cluster1/Cluster2) in Spark? #6614

gcr-ran Mar 5, 2025

Replies: 2 comments · 4 replies

FANNG1 Mar 5, 2025 Collaborator

gcr-ran Mar 6, 2025 Author

FANNG1 Mar 6, 2025 Collaborator

gcr-ran Mar 7, 2025 Author

FANNG1 Mar 7, 2025 Collaborator

gcr-ran Mar 7, 2025 Author

gcr-ran
Mar 5, 2025

Replies: 2 comments 4 replies

FANNG1
Mar 5, 2025
Collaborator

gcr-ran Mar 6, 2025
Author

FANNG1 Mar 6, 2025
Collaborator

gcr-ran
Mar 7, 2025
Author

FANNG1 Mar 7, 2025
Collaborator

gcr-ran Mar 7, 2025
Author