Analyzing flight delay and weather data using Elyra, Kubeflow Pipelines and KFServing

This repository contains a set of Python scripts and Jupyter notebooks that analyze and predict flight delays. The datasets are hosted on the IBM Developer Data Asset Exchange.

We use Elyra to create a pipeline that can be executed locally or using a Kubeflow Pipelines runtime. This pipeline:

Loads the datasets
Pre-processes the datasets
Performs data merging and feature extraction
Analyzes and visualizes the processed dataset
Trains and evaluates machine learning models for predicting delayed flights, using features about flights as well as related weather features
Optionally deploys the trained model to Kubeflow Serving

Configuring the local development environment

It's highly recommended to create a dedicated and consistent Python environment for running the notebooks in this repository:

Install Anaconda or Miniconda
Navigate to your local copy of this repository.
Create an Anaconda environment from the yaml file in the repository:
```
$ conda env create -f flight-delays-env.yaml
```
Activate the new environment:
```
$ conda activate flight-delays-env
```
If running JupyterLab and Elyra for the first time, build the extensions:
```
$ jupyter lab build
```
Launch JupyterLab:
```
$ jupyter lab
```

Configuring a Kubeflow Pipeline runtime

Elyra's Notebook pipeline visual editor currently supports running these pipelines in a Kubeflow Pipeline runtime. If required, these are the steps to install a local deployment of KFP.

After installing your Kubeflow Pipeline runtime, use the command below (with proper updates) to configure the new KFP runtime with Elyra.

elyra-metadata install runtimes --replace=true \
       --schema_name=kfp \
       --name=kfp_runtime \
       --display_name="Kubeflow Pipeline Runtime" \
       --api_endpoint=http://[host]:[api port]/pipeline \
       --cos_endpoint=http://[host]:[cos port] \
       --cos_username=[cos username] \
       --cos_password=[cos password] \
       --cos_bucket=flights

Note: The cloud object storage endpoint above assumes a local minio object storage but other cloud-based object storage services could be configured and used in this scenario.

If using the default minio storage - following the local Kubeflow installation instructions above - the arguments should be --cos_endpoint=http://minio-service:9000, --cos_username=minio, --cos_password=minio123. The api endpoint for local Kubeflow Pipelines would then be --api_endpoint=http://127.0.0.1:31380/pipeline.

Don't forget to setup port-forwarding for the KFP ML Pipelines API service and Minio service as per the above instructions.

Elyra Notebook pipelines

Elyra provides a visual editor for building Notebook-based AI pipelines, simplifying the conversion of multiple notebooks into batch jobs or workflows. By leveraging cloud-based resources to run their experiments faster, the data scientists, machine learning engineers, and AI developers are then more productive, allowing them to spend their time using their technical skills.

Running the Elyra pipeline

The Elyra pipeline flight_delays.pipeline, which is located in the pipelines directory, can be run by clicking on the play button as seen on the image above. The submit dialog will request two inputs from the user: a name for the pipeline and a runtime to use while executing the pipeline.

The list of available runtimes comes from the registered Kubeflow Pipelines runtimes documented above and includes a Run in-place locally option for local execution.

Local execution

If running locally, the notebooks are executed and updated in-place. You can track the progress in the terminal screen where you ran jupyter lab. The downloaded and processed datasets will be available locally in notebooks/data in this case.

Kubeflow Pipelines execution

After submitting the pipeline to Kubeflow Pipelines, Elyra will show a dialog with a direct link to where the experiment is being executed on Kubeflow Piplines.

The user can access the pipelines, and respective experiment runs, via the api_endpoint of the Kubeflow Pipelines runtime (e.g. http://[host]:[port]/pipeline)

The output from the executed experiments are then available in the associated object storage and the executed notebooks are available as native .ipynb notebooks and also in html format to facilitate the visualization and sharing of the results.

Running the Elyra pipeline with model deployment to Kubeflow Serving

Please follow the instructions for running the pipeline flight_delays_with_deployment.pipeline, which adds a node at the end of the pipeline for deploying the model to KFServing.

References

Find more project details on Elyra's GitHub or watching the Elyra demo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Analyzing flight delay and weather data using Elyra, Kubeflow Pipelines and KFServing

Configuring the local development environment

Configuring a Kubeflow Pipeline runtime

Elyra Notebook pipelines

Running the Elyra pipeline

Local execution

Kubeflow Pipelines execution

Running the Elyra pipeline with model deployment to Kubeflow Serving

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Analyzing flight delay and weather data using Elyra, Kubeflow Pipelines and KFServing

Configuring the local development environment

Configuring a Kubeflow Pipeline runtime

Elyra Notebook pipelines

Running the Elyra pipeline

Local execution

Kubeflow Pipelines execution

Running the Elyra pipeline with model deployment to Kubeflow Serving

References