Data transport problem #24

jtlz2 · 2019-03-13T10:10:47Z

Having deployed using your charts, and after a hello-world pi calculation, I am trying to execute some simple commands within Jupyter, based on https://github.com/jadianes/spark-py-notebooks/tree/master/nb1-rdd-creation

Note that the kernel has to be set manually to python2, since it defaults to python3.

from pyspark.sql import SparkSession
import urllib

spark = SparkSession\
      .builder\
      .appName("PythonPi")\
      .config("spark.app.name", "spark-pi")\
      .config("spark.executor.instances", "2")\
      .getOrCreate()

f = urllib.urlretrieve ("http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz", "kddcup.data_10_percent.gz")

sc = spark.sparkContext
data_file = "./kddcup.data_10_percent.gz"
raw_data = sc.textFile(data_file)

# Then the next line yields an Error:
raw_data.count()

Py4JJavaErrorTraceback (most recent call last)
[...]
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 24.0 failed 4 times, most recent failure: Lost task 0.3 in stage 24.0 (TID 52, 10.2.0.25, executor 1): java.io.FileNotFoundException: File file:/home/jovyan/kddcup.data_10_percent.gz does not exist

How do I make the data available to all spark workers in the k8s cluster?

The text was updated successfully, but these errors were encountered:

dshirish · 2019-03-13T11:58:46Z

To make the data file available to executors as well, you can keep it on a HDFS compatible file system (for example S3/GCS/HDFS etc.) and use the appropriate URI in sc.textfile() call.

jtlz2 mentioned this issue Mar 13, 2019

Integrate with k8s jadianes/spark-py-notebooks#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data transport problem #24

Data transport problem #24

jtlz2 commented Mar 13, 2019

dshirish commented Mar 13, 2019

Data transport problem #24

Data transport problem #24

Comments

jtlz2 commented Mar 13, 2019

dshirish commented Mar 13, 2019