Project details
- Phase 1: Implementation of the Sentiment Analysis Model
- Model: Use a pre-trained model for sentiment analysis capable of classifying social media texts into positive, neutral, or negative sentiment. Use this model: https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest
- Dataset: Use public datasets containing texts and their respective sentiment labels.
- Phase 2: Creation of the CI/CD Pipeline
- CI/CD Pipeline: Develop an automated pipeline for model training, integration testing, and deployment of the application on Hugging Face.
- Phase 3: Deployment and Continuous Monitoring
- Deployment on Hugging Face (optional): Implement the sentiment analysis model, including data and application, on Hugging Face to facilitate integration and scalability.
- Monitoring System: Set up a monitoring system to continuously evaluate the model’s performance and the detected sentiment.
MLops_project/
├── .github/workflows/
│ └── CI_CD.yml # Continuous integration and hugging face deploy pipeline
├── deploy/
│ ├── deploy_hf.py # File to deploy model into hugging face
├── model_app/
│ ├── model_inference.py # Inference and relative apis
│ └── model_training.py # Model training
│ └── model_utility.py # Utility functions for model_app
├── monitoring/
│ ├── metrics.py # Base structure of metrics
│ └── prometheus.yml # Prometheus config
├── tests/
│ └── test_integration.py # Integration tests
│ └── test_model_training.py # Training tests (just utility)
├── docker-compose.yml # Docker compose config
├── Dockerfile # Docker image config
└── requirements.txt
└── pytest.ini # File needed by pytest├── my_latest_model # It contains the latest created model
├── my_model_versions # It contains the versioning of each created model (the latest one '/model_v_{newer_version}' it is the same one inside 'my_latest_model' )- Tested on a macOS (UNIX).
- Python version 3.12 required.
- When the app starts (using docker), in
model_inference.pytha app checks for existing models (if never used before or deleted every previous model, it will be used the default one and needed folders will be created) and then uvicorn runs the inference api (/predict). - At this point, predictions and evaluation can be made calling
/predict. - When the project is updated through a commit on github's 'main' branch, the pipeline file
CI_CD.ymltrains the model, runs integration tests (test_integration.py) and, if these steps ends with a success, it (optionally) deploys the model on HuggingFace. - While the app is running, the connection with Grafana made through Prometheus (pages reachable through the 'port' section in your terminal) allows to monitor visually the performance of the model and the sentiment (more at grafana instructions).
- In order to save resources (for example on github codespaces) these values have been set as small as reasonably possible:
raw_datasets = load_dataset(DATASET, "sentiment", split={"train": "train[:100]", "test": "test[:100]", "validation": "validation[:100]"}); increase the values to have a bigger dataset.- same reason for
per_device_train_batch_size=4,andper_device_eval_batch_size=4,insideTraningArgumentsinmodel_app/model_training.py; ideally bring it at least up to value 8 or higher, if possible 32 (best option), possibly not higher.
- Create an environment (not mandatory but strongly suggested) and activate it.
- Run
docker compose up --build -d, what it does is:docker compose upreads the docker-compose.yml and starts all the services--buildbuilds a new image- (optional)
-dkeeps the container in background (to not have the terminal saturated by logs). NOTE: to enable the deploy on Hugging Face go to deploy instructions
-
Create an environment:
python3.12 -m venv <name_your_venv>and activate it:source <name_your_venv>/bin/activateto deactivate:deactivate -
Running the app:
docker compose up --build -deverything will start properly. -
In case of any issue with docker run:
docker compose logs <service_name>to debug the service -
In case of space limitations (example github: codespaces), run these two commands to clean docker: WARNING: this can delete data, use it carefully.
docker system prune -af && docker builder prune -af
Just run pytest in the terminal to execute all the tests inside tests folder.
Deploy on Hugging Face is optional, it will be executed only if there a specific github variable (RUN_DEPLOY) setted as 'true'. To do so go to GitHub repo > Settings > Variables > Actions and create the variable RUN_DEPLOY = true.
Go on ports (they MUST be public) > open port 3000 in your browser (default credentials are user:admin pssw:admin), setup a new connection (http://prometheus:9090) and link it to a new dashboard.
- example of queries that can be called on Grafana: sentiment_positive_total, sentiment_negative_total, sentiment_requests_total, model_accuracy, model_f1, etc...
- visual example:
classic
gauge
Go on ports > open port 9090 > 'status' > 'target health'
Go to https://{codespace_name_or_localhost}-8000.app.github.dev/docs to check api.
curl -X POST
"https://{codespace_name_or_localhost}-8000.app.github.dev/predict"
-H "Content-Type: application/json"
-d '{
"text": "Ciao questo è un test fantastico",
"label": "positive"
}'