Basic MLOps pipeline for Santander Customer Transaction Prediction

Objective

This repo aims to demonstarate MLOps skills while solving a classification problem from Kaggle. To know more about Problem Statement, refer here.

Public Endpoints for Deployed App

Before diving into code, please go to deployment section to deploy app using docker compose to get a better understanding of it.

You can deploy it on your own easily and (possibly) free of charge on cloud. Scroll down to Docker Playground Cloud Deployment in Deployment section.

Architecture Diagram

Code Structure / Services

notebooks - Contains EDA (exploratory data analysis) and model development steps including all preprocessing and evaluation. Once preprocessing steps are defined and model is selected, final code is processed to model_training/train.py for CI/CD.
src - Contains frontend and backend services along with champion model training script.
- backend - RestAPI endpoints developed using FastAPI.
- frontend - Basic frontend app developed using Streamlit.
- model_training - Depending on model/data size and model training time, this could be excuted on locally hosted runner.
  - train_boilerplate - Boiler plate code for all steps to approach a classification problem.
  - train - Final selected model and preprocessing code that are to be executed for train/predict pipeline.
docker-compose - Compose file which starts backend and frontend services to run application.

Operations side of MLOps / Workflows Description

There are two github workflows each for managing frontend and backend services, described below -

frontend_container - Monitors file changes in src/frontend/, any change in .py files here will trigger this and it'll rebuild frontend container image for the application and push to dockerhub. More details here.
train_model - Monitors file changes in src/backend/ and src/model_training/train.py, any change here will trigger this workflow and it'll train the model described in train.py and pack it up in docker container with backend service for the application and push to dockerhub. More details here.
Updating containers on remote server is done via polling implemented here. This script is set up as CRON job with time interval of 300s. It compares local and remote containers hash and deploys updated container in case of mismatch.

Deployment

Local deployment
- Install Docker. Instructions available here. Make sure docker is up and running before proceeding.
- Install Git. Instruction here.
- Clone repo and run compose
```
git clone https://github.com/uditmanav17/assessments.git && cd ./assessments
docker compose --profile app up
```
- --profile app will start both frontend and backend services on localhost:8080 and localhost:8000 ports.
Docker Playground Cloud Deployment
- Navigate to docker playground.
- Login using your docker account. Click Start. This will direct you to a new page.
- Click Add New Instance on left pane. Then run following commands in terminal -
```
git clone https://github.com/uditmanav17/assessments.git && cd ./assessments
docker compose --profile app up
```
- This will open up port 8000 for backend endpoints and 8080 for frontend.
- To access application, click on port numbers next to OPEN PORT button to visit frontend/backend service.

Future Work / Improvements

Use event driven approach instead of polling for deployment.
Data validation checks on uploaded files.
Data versioning - DVC
Experiment and artifacts tracking - MLFlow, WandB
Better methods to save and load models like joblib, don't pickle.
Serverless on-demand architecture.
Run backend server with gunicorn instead of uvicorn.
More tips here.
If using deep learning model, try quantization and converting model to ONNX format for better inference speed and less memory usage.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
images		images
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
check_docker_hashes.py		check_docker_hashes.py
docker-compose.yml		docker-compose.yml
health_check.py		health_check.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basic MLOps pipeline for Santander Customer Transaction Prediction

Objective

Public Endpoints for Deployed App

Architecture Diagram

Code Structure / Services

Operations side of MLOps / Workflows Description

Deployment

Future Work / Improvements

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

uditmanav17/assessments

Folders and files

Latest commit

History

Repository files navigation

Basic MLOps pipeline for Santander Customer Transaction Prediction

Objective

Public Endpoints for Deployed App

Architecture Diagram

Code Structure / Services

Operations side of MLOps / Workflows Description

Deployment

Future Work / Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages