Basic MLOps pipeline for Santander Customer Transaction Prediction
This repo aims to demonstarate MLOps skills while solving a classification problem from Kaggle. To know more about Problem Statement, refer here.
Before diving into code, please go to deployment section to deploy app using docker compose to get a better understanding of it.
You can deploy it on your own easily and (possibly) free of charge on cloud. Scroll down to Docker Playground Cloud Deployment in Deployment section.
notebooks- Contains EDA (exploratory data analysis) and model development steps including all preprocessing and evaluation. Once preprocessing steps are defined and model is selected, final code is processed tomodel_training/train.pyfor CI/CD.src- Contains frontend and backend services along with champion model training script.backend- RestAPI endpoints developed using FastAPI.frontend- Basic frontend app developed using Streamlit.model_training- Depending on model/data size and model training time, this could be excuted on locally hosted runner.train_boilerplate- Boiler plate code for all steps to approach a classification problem.train- Final selected model and preprocessing code that are to be executed for train/predict pipeline.
docker-compose- Compose file which starts backend and frontend services to run application.
There are two github workflows each for managing frontend and backend services, described below -
frontend_container- Monitors file changes insrc/frontend/, any change in.pyfiles here will trigger this and it'll rebuild frontend container image for the application and push to dockerhub. More details here.train_model- Monitors file changes insrc/backend/andsrc/model_training/train.py, any change here will trigger this workflow and it'll train the model described intrain.pyand pack it up in docker container with backend service for the application and push to dockerhub. More details here.- Updating containers on remote server is done via
pollingimplemented here. This script is set up asCRONjob with time interval of300s. It compares local and remote containers hash and deploys updated container in case of mismatch.
-
Local deployment
- Install Docker. Instructions available here. Make sure docker is up and running before proceeding.
- Install Git. Instruction here.
- Clone repo and run compose
git clone https://github.com/uditmanav17/assessments.git && cd ./assessments docker compose --profile app up--profile appwill start both frontend and backend services onlocalhost:8080andlocalhost:8000ports.
-
Docker Playground Cloud Deployment
- Navigate to docker playground.
- Login using your docker account. Click Start. This will direct you to a new page.
- Click
Add New Instanceon left pane. Then run following commands in terminal -
git clone https://github.com/uditmanav17/assessments.git && cd ./assessments docker compose --profile app up- This will open up port
8000for backend endpoints and8080for frontend. - To access application, click on port numbers next to
OPEN PORTbutton to visit frontend/backend service.
- Use event driven approach instead of polling for deployment.
- Data validation checks on uploaded files.
- Data versioning - DVC
- Experiment and artifacts tracking - MLFlow, WandB
- Better methods to save and load models like joblib, don't pickle.
- Serverless on-demand architecture.
- Run backend server with
gunicorninstead ofuvicorn. - More tips here.
- If using deep learning model, try quantization and converting model to ONNX format for better inference speed and less memory usage.
