Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck at Pending state #25

Open
mihdih opened this issue Apr 8, 2019 · 7 comments
Open

Stuck at Pending state #25

mihdih opened this issue Apr 8, 2019 · 7 comments

Comments

@mihdih
Copy link

mihdih commented Apr 8, 2019

Hi guys! first off thank you for sharing your awesome project. I have a quick question though. May I know what is/are the required component(s) needed for a basic paragraph/job to run? Let say maybe a Pi example. Do I need to also run the RSS chart? or is thezeppelin-with-spark chart enough?

As stated I can't seem to successfully run a paragraph if using the chart under master branch. I was just testing a simple Pi job. And there was no relevant info in the zeppelin-server log, it was just simply stuck at PENDING state.

Observation:

It works great and all works fine If using spark 2.4, the one under branch chart_upgrade_2.4

@dshirish
Copy link
Contributor

dshirish commented Apr 8, 2019

If you just want to use Zeppelin, then you can use just zeppelin-with-spark chart. RSS chart is not necessary. Make sure that you have appropriate permissions as mentioned here

Individual charts in chart_upgrade_2.4 should work, but umbrella chart is yet to be updated.

@mihdih
Copy link
Author

mihdih commented Apr 8, 2019

Thanks @dshirish for a quick reply. Actually I did that already, I even made the role cluster-admin just to be sure. But it didn't work :(. I'm kinda stuck as there were no info in the logs. I know this because there were no additional log output when i ran the job. It's the same log when i initially spin up the zeppelin server.

@mihdih
Copy link
Author

mihdih commented Apr 8, 2019

Additional Obervation:

I've tried to manually submit a job inside the zeppelin-server pod both in cluster and in client mode, and it was a success. So guess the permission is just fine.

Is there a required argument or additional option that i need to insert in SPARK_SUBMIT_OPTIONS perhaps? Because the default doesn't seem to work.

@mihdih
Copy link
Author

mihdih commented Apr 24, 2019

Hi @dshirish good day we've got an additional observation. Hope you can help

Found this in the interpreter log and seems like it got stuck at this line

 INFO [2019-04-24 00:06:35,346] ({Thread-0} RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server on port 33461

Info on the platform where it is running (baremetal):

Kubernetes: 1.13.2
Docker: 18.6.1
Os: Ubuntu 16.04
Kernel: 4.15.x

Also it worked when tried running it on GKE (different cluster), with k8s version 1.11.7, and basing on the logs, the expected next lines after where it got stuck are the following:

 INFO [2019-04-23 21:10:03,901] ({Thread-0} RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server on port 36659
 INFO [2019-04-23 21:10:04,585] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter
 INFO [2019-04-23 21:10:04,618] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter
 INFO [2019-04-23 21:10:04,624] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.DepInterpreter
 INFO [2019-04-23 21:10:04,642] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter
 INFO [2019-04-23 21:10:04,650] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.SparkRInterpreter
 INFO [2019-04-23 21:10:04,750] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1556053804744 started by scheduler org.apache.zeppelin.spark.SparkInterpreter808800579
...
..
.

So i'm not sure if this is just a version incompatibility problem or perhaps the platform itself in which on a baremetal servers.

@dshirish
Copy link
Contributor

Are you using chart_upgrade_2.4 branch? We haven't tried the charts on on baremetal servers.

@mihdih
Copy link
Author

mihdih commented Apr 25, 2019

@dshirish sorry for a late response. No i used the one in master branch where in spark version is 2.2.0.

@mihdih
Copy link
Author

mihdih commented May 19, 2019

By the way we finally found the problem that's causing the behaviour. It turns out that the interpreter passed in the backend was actually the snappydata interpreter, even though spark was explicitly set. We tried to set the handle %spark and still didn't help.

Example:

root      9196  0.0  0.0  19784  3440 ?        S    22:39   0:00 /bin/bash /zeppelin/bin/interpreter.sh -d /zeppelin/interpreter/snappydata -p 32793 -l /zeppelin/local-repo/2EAMAN95E
root      9207  0.0  0.0  19788  2000 ?        S    22:39   0:00 /bin/bash /zeppelin/bin/interpreter.sh -d /zeppelin/interpreter/snappydata -p 32793 -l /zeppelin/local-repo/2EAMAN95E

Now I'm not sure if this is just the default interpreter or perhaps there's a config that overrides the user set interpreter or perhaps it's indeed a bug. I only tested it with spark though as that what we will be using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants