spark-on-k8s

In this repo, I expect to show you a couple of different ways to work with Spark out of a Hadoop Cluster. Kubernetes clusters are becoming more and more common in all sizes of companies, and use their power to process Spark is attractive. With this in mind, I'd like to invite you to join me on a journey of learning in a seek of more wide options to do Big Data.

You're about to face 3 ways of running Spark over containers:

K8s Cluster (as your Spark Master)
K8s Spark Operator (as a native module for Kubernetes)
Docker Jupyter Pyspark (a really nice way to do plenty of adhocs and local experiences).

I deeply hope you to have fun with this experience and get yourself more confident to step outside of your traditional Hadoop cluster :)

How to set everything up

Click HERE to follow the step-by-step :)

So far...

Mode	Status
K8s: Spark-Submit	OK
GCP/spark-on-k8s-operator	OK (currently in Beta)
Docker: Jupyter PySpark	OK

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

README.md

README.md

spark-on-k8s

How to set everything up

So far...

Architecture

Files

README.md

Latest commit

History

README.md

File metadata and controls

spark-on-k8s

How to set everything up

So far...

Architecture