FAQ #2481

pan3793 · 2022-04-26T03:42:19Z

pan3793
Apr 26, 2022
Collaborator

Where should I ask questions/report bugs?

We encourage users to ask question in Mailing List and GitHub Discussions, and report bugs in GitHub Issues

Are there IM discussion groups for Kyuubi?

Yes, please join our WeChat group via Official Accounts(公众号) "apachekyuubi", or Slack Channel.
We encourage people to discuss in English, but Chinese is fine too.

Why did my question get no responses?

First of all, Apache Kyuubi is a community-driven open-source project, many PMC members/Committers/Contributors/Users are active in the community to answer questions. But please be clear, they are all volunteers, providing free community support without any SLA guarantee.

Please describe your question as clearly as enough, with sufficient context. How To Ask Questions The Smart Way or 提问的智慧 is a good principle for asking questions.

Some NEGATIVE examples won't get active responses.

Kyuubi does not work on my machine, is there a bug?
(Kyuubi 运行不起来，是不是有 bug？)

It is true that some errors have obvious characteristics, but the causes of most errors are complex and require sufficient context to analyze. The Hadoop ecosystem has a long history of over a decade, it is composed of a bunch of components and is quite complex today, the "DOES NOT WORK" has countless possibilities.

Have you checked Docs - Quick Start to make sure your environment meets the Kyuubi setup requirements?

Have you read/analyzed the error messages and stacktrace before asking?

Have you provided enough information for volunteers to help dig out? e.g.

components' version

configurations

stacktrace

logs(of each components, with context, not only the error snippets)

How to reproduce the issue?

Changed the code by self, or compiled by self, but does not mention that when asking questions.
(自行修改源码、自行编译，但提问时未告知)

It absolutely misleads the volunteer who helps you dig out the issue, and wastes both time.

If your self-compiled version does not work as expected, please try the Official Released version first to quickly identify if the issue is caused by compiling.

If you changed the code, similar to the above case, please try the Official Released version first to quickly identify if the issue is caused by your change.

Please provide your Baseline Version, Compiling Command, and "What's Your Change" with the question.

pan3793 · 2023-02-10T02:04:55Z

pan3793
Feb 10, 2023
Collaborator Author

Is Kyuubi a computing engine?

No. Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on lakehouses.

If you are familiar w/ Web Service stacks, the relationship between Kyuubi and Computing Engine(Spark, Flink, etc) just like Nginx and Web Service.

Does Kyuubi support the feature XXX of Spark, Flink, or other computing engines?

As a gateway, Kyuubi transparently supports most of the engine features, please read the engine's documentation to enable those features to verify if they work before asking on Kyuubi community.

Does Kyuubi support CDH / HDP?

Yes.

Integrate Kyuubi w/ CDH5 (Chinese/中文)

Integrate Kyuubi w/ CDH6 (Chinese/中文)

Integrate Kyuubi w/ Ambari (Chinese/中文)

A third-party Kyuubi Parcel for CDH/CDP platform

How to build Kyuubi?

We recommend using the official pre-built binary tarball for most users, but if you want to try unreleased features or apply the custom patches, the recommended build command for binary tarball is
./build/dist --tgz --web-ui --spark-provided --flink-provided --hive-provided
PS: --web-ui is added since 1.8. It was tested on macOS and Linux, for Windows users, please use WSL2. For other components like authz, please refer to the developer documentation.

Does Kyuubi HA work in Active-Standby mode?

No. Kyuubi HA works in Active-Active mode.

How does Kyuubi HA work?

It's a client-side load-balancing mode. After the Kyuubi Server is started, it will register itself in Zookeeper, and the client can select one of the Kyuubi Servers to connect to. If something is wrong with one of the Kyuubi Servers, the client is responsible for catching the exception and finding another healthy Kyuubi Server to reconnect and recover the session state(e.g. variables in session scope).

What is the client's strategy for choosing the Kyuubi server?

It depends on the client's implementation. Hive JDBC Driver and Kyuubi Hive JDBC Driver use random strategy.

kyuubi-ctl's output looks not good.

If it looks like

                       Zookeeper service nodes
???????????????????????????????????????????????????????????????????????
? Namespace           ? Host               ? Port  ? Version          ?
???????????????????????????????????????????????????????????????????????
? /spark-thrfitserver ? kyuubi1.jd.xxx.org ? 20008 ? 1.6.1-incubating ?
???????????????????????????????????????????????????????????????????????
? /spark-thrfitserver ? kyuubi2.jd.xxx.org ? 20008 ? 1.6.1-incubating ?
???????????????????????????????????????????????????????????????????????
2 row(s)

try export LC_ALL=C.UTF-8(debian) or export LC_ALL=en_US.UTF-8(CentOS) to recover.

HUE failed to run SQL w/ Error operating EXECUTE_STATEMENT: ... Broken pipe(Write failed)

It is because HUE are trying to connect a broken session, HUE does not auto recreate session in this case(at least in CDH6 shipped versions), please click the button on the HUE top area to recreate the session. See more discussion in #2877

Superset can not show table name list correctly, all table names are "default".

There is a known issue apache/superset#16509 when using dropbox/PyHive to connect Kyuubi/Spark. A forked acryldata/PyHive fixed the table name issue, see detail at apache/superset#20167.

Got error org.apache.kyuubi.KyuubiSQLException: Invalid SessionHandle

It's a kind of general error, just like NullPointerException. It indicates that the client uses an invalid(expired or never existed) session handle (identified by a uuid) to access Kyuubi Server. Usually, it may be caused by the following cases:

Kyuubi Server will close idle connection (kyuubi.session.idle.timeout, default is 6 hours), if you are using JDBC Connection Pool like Hikari, the connection obtained from the pool may hold an expired session handle, please do health check before using to avoid such issue.

Deploy multiple Kyuubi instances and access the REST API via Load Balancer like nginx. The HTTP requests may be forwarded to different Kyuubi instances by the Load Balancer, currently, Kyuubi does not share the session state across multiple instances, then Invalid SessionHandle will happen if you create a session in one instance and then run a query in another instance. See detail at org.apache.kyuubi.KyuubiSQLException: Invalid SessionHandle when multiple Kyuubi server is running #3413

How does one Kyuubi instance support multiple versions/types of the engine(Spark or Flink or other engines)?

Kyuubi runs a shell command to launch an engine instance if necessary. For the Spark engine, it calls $SPARK_HOME/bin/spark-submit ... to launch a Spark application as an engine instance, so users can overwrite kyuubi.engine.type and the related environment variables to use a different engine client.
jdbc:hive2://kyuubi:10009/default;#spark.app.name=my-app;kyuubi.engine.type=SPARK_SQL;kyuubi.engineEnv.SPARK_HOME=/opt/spark-3.2

Use GUI tools like DBeaver/DataGrip to access Kyuubi, refreshing table list is extremely slow.

You may find bunches of GetTables requests take a long time in the Server/Engine's log. They were triggered by GUI tools in the background to get table/column metadata for display or query completion. Unfortunately, the Kyuubi Spark engine uses an inefficient way to retrieve table properties(e.g. COMMENT) which results in the whole process time too long. You can skip the procedure by enabling kyuubi.operation.getTables.ignoreTableProperties if such metadata does not matter to you.

Periodically error message in the Kyuubi server's log.

ERROR KyuubiTBinaryFrontendHandler-pool: Thread-186 org.apache.thrift.server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.ISaslTransportException: No data or no sasl data in the stream
        at org.apache.thrift.transport.TSaslSeerTransport$Factorv.getTransport(TSas]ServerTransport.java:219)
        ...

It is typically caused by accessing the Kyuubi THRIFT_BINARY endpoint (the default port is 10009) with a different protocol, e.g. using curl to send an HTTP request to access THRIFT_BINARY port for health checking.

When using interactive Python language in Spark on YARN cluster mode, facing the following errors

Caused by: java.lang.IllegalStateException: SPARK_HOME not found!
	at org.apache.kyuubi.engine.spark.operation.ExecutePython$.$anonfun$defaultSparkHome$6(ExecutePython.scala:336) ~[__app__.jar:?]
	at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
	at org.apache.kyuubi.engine.spark.operation.ExecutePython$.defaultSparkHome(ExecutePython.scala:336) ~[__app__.jar:?]
	at org.apache.kyuubi.engine.spark.operation.ExecutePython$.$anonfun$createSessionPythonWorker$6(ExecutePython.scala:279) ~[__app__.jar:?]
        ...

An additional configuration spark.yarn.isPython=true should be provided, then spark-submit will upload and distribute the Python artifacts.

When deploying Kyuubi on K8s, and using ingress to access the Kyuubi server, the client may get connection broken occasionally.

Unexpected end of file when reading from HS2 server.
The root cause might be too many concurrent connections.
Please ask the administrator to check the number of active connections, and adjust hive.server2.thrift.max.worker.threads if applicable.
Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)

By default, BeeLine/JDBC/PyHive uses the THRIFT-BINARY protocol which is a TCP-based protocol, while the nginx-ingress has a known issue on reloading - reconfiguring causes nginx reload thus all long live sockets will close in time. See more details and workaround at kubernetes/ingress-nginx#2461

I want to use Kyuubi to execute queries which may produce tons of records, does Kyuubi have optimation for such cases?

Specific to the Spark SQL engine, yes. For details, please take a look at Solution for Large Query Results.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ #2481

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

FAQ #2481

pan3793 Apr 26, 2022 Collaborator

Replies: 1 comment

pan3793 Feb 10, 2023 Collaborator Author

pan3793
Apr 26, 2022
Collaborator

pan3793
Feb 10, 2023
Collaborator Author