Stuck at Pending state #25

mihdih · 2019-04-08T12:04:59Z

Hi guys! first off thank you for sharing your awesome project. I have a quick question though. May I know what is/are the required component(s) needed for a basic paragraph/job to run? Let say maybe a Pi example. Do I need to also run the RSS chart? or is thezeppelin-with-spark chart enough?

As stated I can't seem to successfully run a paragraph if using the chart under master branch. I was just testing a simple Pi job. And there was no relevant info in the zeppelin-server log, it was just simply stuck at PENDING state.

Observation:

It works great and all works fine If using spark 2.4, the one under branch chart_upgrade_2.4

The text was updated successfully, but these errors were encountered:

dshirish · 2019-04-08T12:18:12Z

If you just want to use Zeppelin, then you can use just zeppelin-with-spark chart. RSS chart is not necessary. Make sure that you have appropriate permissions as mentioned here

Individual charts in chart_upgrade_2.4 should work, but umbrella chart is yet to be updated.

mihdih · 2019-04-08T13:20:57Z

Thanks @dshirish for a quick reply. Actually I did that already, I even made the role cluster-admin just to be sure. But it didn't work :(. I'm kinda stuck as there were no info in the logs. I know this because there were no additional log output when i ran the job. It's the same log when i initially spin up the zeppelin server.

mihdih · 2019-04-08T22:51:13Z

Additional Obervation:

I've tried to manually submit a job inside the zeppelin-server pod both in cluster and in client mode, and it was a success. So guess the permission is just fine.

Is there a required argument or additional option that i need to insert in SPARK_SUBMIT_OPTIONS perhaps? Because the default doesn't seem to work.

mihdih · 2019-04-24T01:32:05Z

Hi @dshirish good day we've got an additional observation. Hope you can help

Found this in the interpreter log and seems like it got stuck at this line

 INFO [2019-04-24 00:06:35,346] ({Thread-0} RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server on port 33461

Info on the platform where it is running (baremetal):

Kubernetes: 1.13.2
Docker: 18.6.1
Os: Ubuntu 16.04
Kernel: 4.15.x

Also it worked when tried running it on GKE (different cluster), with k8s version 1.11.7, and basing on the logs, the expected next lines after where it got stuck are the following:

 INFO [2019-04-23 21:10:03,901] ({Thread-0} RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server on port 36659
 INFO [2019-04-23 21:10:04,585] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter
 INFO [2019-04-23 21:10:04,618] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter
 INFO [2019-04-23 21:10:04,624] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.DepInterpreter
 INFO [2019-04-23 21:10:04,642] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter
 INFO [2019-04-23 21:10:04,650] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:198) - Instantiate interpreter org.apache.zeppelin.spark.SparkRInterpreter
 INFO [2019-04-23 21:10:04,750] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1556053804744 started by scheduler org.apache.zeppelin.spark.SparkInterpreter808800579
...
..
.

So i'm not sure if this is just a version incompatibility problem or perhaps the platform itself in which on a baremetal servers.

dshirish · 2019-04-24T12:53:36Z

Are you using chart_upgrade_2.4 branch? We haven't tried the charts on on baremetal servers.

mihdih · 2019-04-25T10:32:02Z

@dshirish sorry for a late response. No i used the one in master branch where in spark version is 2.2.0.

mihdih · 2019-05-19T22:53:33Z

By the way we finally found the problem that's causing the behaviour. It turns out that the interpreter passed in the backend was actually the snappydata interpreter, even though spark was explicitly set. We tried to set the handle %spark and still didn't help.

Example:

root      9196  0.0  0.0  19784  3440 ?        S    22:39   0:00 /bin/bash /zeppelin/bin/interpreter.sh -d /zeppelin/interpreter/snappydata -p 32793 -l /zeppelin/local-repo/2EAMAN95E
root      9207  0.0  0.0  19788  2000 ?        S    22:39   0:00 /bin/bash /zeppelin/bin/interpreter.sh -d /zeppelin/interpreter/snappydata -p 32793 -l /zeppelin/local-repo/2EAMAN95E

Now I'm not sure if this is just the default interpreter or perhaps there's a config that overrides the user set interpreter or perhaps it's indeed a bug. I only tested it with spark though as that what we will be using.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck at Pending state #25

Stuck at Pending state #25

mihdih commented Apr 8, 2019

dshirish commented Apr 8, 2019

mihdih commented Apr 8, 2019

mihdih commented Apr 8, 2019 •

edited

Loading

mihdih commented Apr 24, 2019 •

edited

Loading

dshirish commented Apr 24, 2019

mihdih commented Apr 25, 2019

mihdih commented May 19, 2019 •

edited

Loading

Stuck at Pending state #25

Stuck at Pending state #25

Comments

mihdih commented Apr 8, 2019

dshirish commented Apr 8, 2019

mihdih commented Apr 8, 2019

mihdih commented Apr 8, 2019 • edited Loading

mihdih commented Apr 24, 2019 • edited Loading

dshirish commented Apr 24, 2019

mihdih commented Apr 25, 2019

mihdih commented May 19, 2019 • edited Loading

mihdih commented Apr 8, 2019 •

edited

Loading

mihdih commented Apr 24, 2019 •

edited

Loading

mihdih commented May 19, 2019 •

edited

Loading