Exploration of the use of ray for data processing pipelines in k8s:
Ray is an open-source unified framework for scaling AI and Python applications. It provides the compute layer for parallel processing so that you don’t need to be a distributed systems expert.
The Ray on
Kubernetes docs
provides a good starting point. Follow the RayCluster
Quickstart
to learn how to deploy the KubeRay
operator and RayCluster
custom resource
using Helm.
Warning
If using a local Rancher instance of k8s, you may need to increase CPU and memory resource limits. We reccomend giving Rancher 50-75% of available cores, and 50% of memory
First, create a local python env w/ ray
installed, and activate the env.
mamba env create -f environment.yml
mamba activate ray-exploration
Then, use the ray submit
command to submit jobs:
Note
These examples assume you have port-forwarded the kuberay head service as described in the RayCluster Quickstart
ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print('hello world')"
You can submit a python script that uses ray like this:
ray job submit --working-dir ./ -- python ray_task_example.py