Skip to content

Files

Latest commit

 

History

History
27 lines (20 loc) · 1.51 KB

README.md

File metadata and controls

27 lines (20 loc) · 1.51 KB

spark-on-k8s

Screen-Shot-2020-07-27-at-05-03-34.png

In this repo, I expect to show you a couple of different ways to work with Spark out of a Hadoop Cluster. Kubernetes clusters are becoming more and more common in all sizes of companies, and use their power to process Spark is attractive. With this in mind, I'd like to invite you to join me on a journey of learning in a seek of more wide options to do Big Data.

You're about to face 3 ways of running Spark over containers:

I deeply hope you to have fun with this experience and get yourself more confident to step outside of your traditional Hadoop cluster :)

How to set everything up

Click HERE to follow the step-by-step :)

So far...

Mode Status
K8s: Spark-Submit OK
GCP/spark-on-k8s-operator OK (currently in Beta)
Docker: Jupyter PySpark OK

Architecture

Screen-Shot-2020-07-27-at-04-43-52.png