-
Notifications
You must be signed in to change notification settings - Fork 221
FATE On Spark Leverage the external cluster
As FATE on Spark mentioned, FATE support to use of Spark, HDFS, and Pulsar as its computing, storage, and federation engine respectively. This document targets how to leverage these external services with KubeFATE.
The following figure describes the overall architecture of the deployment, while components except Spark and HDFS are running on the Kubernetes.
- Spark 2.4.1
- HDFS 2.7
To use external HDFS, a user only need to specify the correct "namenode" address in the "python" component of the cluster's deployment file. For example:
python:
...
hdfs:
name_node: hdfs://192.168.0.1:9000
path_prefix:
Where the 192.168.0.1:9000 is the namenode's address.
In an existing cluster, a user can update the "python-config", the configmap of the "python" component to switch to another HDFS cluster, however, it is required to restart the pod to make the update work.
It is similar to use external HDFS, a user only need to make some adjustments which include two parts:
A user needs to specify the correct "pulsar" address in "python" component of the cluster's deployment, for example, if the "pulsar" is listening on 192.168.0.2:6650
:
python:
...
pulsar:
host: 192.168.0.2
port: 6650
sslPort: 6651
mng_port: 8080
Also, the route table of "pulsar" component is required to update. Here's an example, assuming the local party name is "9999", other parties remains unchanged
pulsar:
route_table:
9999:
host: 192.168.0.2
port: 6650
sslPort: 6651
10000:
host: 192.168.220.12
port: 6650
sslPort: 6651
For a deployed FATE cluster to connect other "pulsar" cluster, a user needs to update the python and pulsar-route-table configmap respectively.
To connect an external Spark cluster will be relatively more complicated, which includes three parts:
It is recommended to use a separated python environment to install FATE's dependencies and then refer to this environment in the submission of the spark application. For how to set up the python environment and install the dependencies, a user can refer to the blog Execute Federated Learning Tasks With Spark in FATE for more details.
Once the spark application was submitted and the executor was created, the executor will connect back to the driver for further process. However, since the driver is running on the Kubernetes and the Spark cluster is running outside of the Kubernetes, it is necessary to open some ports in Kubernetes to make executor can communicate with the driver normally.
For every spark application, it is at least required 2 ports, one is for driver and another is for block manager. If a user needs multiple spark applications/fate job runs at the same time, he should open such two ports for each job in advance.
For example, assumes a use have a FATE cluster called "9999", whether the cluster is deployed or not, he can use kubectl apply
to open relevant ports for the spark driver with the following definition of "service":
apiVersion: v1
kind: Service
metadata:
name: spark-driver
namespace: fate-9999
spec:
ports:
- name: driver-1
port: 9000
protocol: TCP
- name: driver-2
port: 9001
protocol: TCP
- name: block-manager-1
port: 8000
protocol: TCP
- name: block-manager-2
port: 8001
protocol: TCP
selector:
fateMoudle: python
name: fate-9999
partyId: "9999"
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer: {}
Where it opens two pairs of ports, thus can support two jobs to run simultaneously. A user can replace the LoadBalancer
service type with NodePort
in advanced.
Assume the LB's address is "192.168.0.3", a user should specify the spark driver's configuration in the python component of the deployment file as the following example:
python:
spark:
driverHost: 192.168.0.3
driverStartPort: 9000
blockManagerStartPort: 8000
pysparkPython: /data/projects/python/venv/bin/python
Or a user can update the "spark-defaults.conf" part for "python-config" configmap relevantly.
A user can use the following instructions to verify the configuration, he should make adjustments to fit the actual environment.
- log in to the python pod
- cd to /data/projects/fate/examples/toy_example
- append the following content to
job_parameters
of "toy_example_conf.json":
"spark_run": {
"num-executors": 1,
"executor-cores": 1,
"executor-memory": "2G",
"total-executor-cores": 1
}
- run the example with the following command:
python run_toy_example.py -b 2 9999 9999 1
If return success then the configuration is working as intended.
Welcome to KubeFATE's wiki page.