SciSpark Dockerfile based on dylanmei/docker-zeppelin. Updates include:
- stand-alone configuration of HDFS
- installation of Anaconda Python
- Spark reconfigured for Anaconda Python
This image contains:
- Spark 1.6.1
- Hadoop 2.6.3
- PySpark support with Anaconda Python2 4.0.0.
The source image is located at DockerHub under pymonger/scispark-zeppelin.
To build a SciSpark Zeppelin image using the Dockerfile:
VERSION=v0.2_001
docker build --rm --force-rm -t pymonger/scispark-zeppelin:${VERSION} -f Dockerfile .
To run the built SciSpark Zeppelin image in a container:
docker run --rm -p 8080:8080 pymonger/scispark-zeppelin:${VERSION}
Zeppelin will be running at http://${YOUR_DOCKER_HOST}:8080
.
Customize the Dockerfile to install data to HDFS or preload notebooks to Zeppelin and rerun the build instructions above. Remember to increment the VERSION.