A complete MLOps pipeline on Azure Machine Learning that takes raw data through to a live prediction API.
Raw Data (CSV)
│
▼
┌─────────────────────────────────────────────────────┐
│ Azure ML Pipeline │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │ 1. Data │──→│ 2. Train │──→│ 3. Eval │ │
│ │ Prep │ │ Model │ │ Metrics │ │
│ └──────────┘ └──────────┘ └───────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ train.csv model.pkl results.json │
│ test.csv scaler.pkl │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ 4. Register │ │
│ │ in Model │ │
│ │ Registry │ │
│ └─────────────┘ │
│ │ │
└──────────────────────│───────────────────────────────┘
▼
┌─────────────────┐
│ 5. Deploy as │
│ Managed Online │
│ Endpoint (API) │
└─────────────────┘
│
▼
POST /score → Predictions
| Service | Purpose | DP-100 Topic |
|---|---|---|
| Azure ML Workspace | Central hub for all ML resources | Workspace management |
| Compute Cluster | Scalable training compute | Compute management |
| ML Pipelines (SDK v2) | Orchestrate multi-step workflows | Pipeline design |
| MLflow | Experiment tracking & logging | Experiment management |
| Model Registry | Version & catalog trained models | Model management |
| Managed Endpoints | Deploy models as REST APIs | Model deployment |
| Environments | Reproducible Python environments | Environment management |
Follow SETUP_GUIDE.md for step-by-step Azure resource creation.
cd project2-azure-ml-pipeline
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Login to Azure
az login
# Configure credentials
cp .env.example .env
# Edit .env with your subscription ID, resource group, workspace namepython src/run_pipeline.py runThis submits the pipeline to Azure ML. Watch it in ML Studio!
python src/run_pipeline.py deploypython src/run_pipeline.py testpython src/run_pipeline.py cleanupproject2-azure-ml-pipeline/
├── SETUP_GUIDE.md # Step-by-step Azure Portal instructions
├── .env.example # Template for workspace config
├── requirements.txt # Python dependencies
├── src/
│ ├── config.py # Azure ML workspace connection
│ └── run_pipeline.py # Pipeline definition + deployment
├── components/ # Scripts that run IN Azure ML
│ ├── 01_data_prep.py # Data cleaning & train/test split
│ ├── 02_train.py # Model training with MLflow
│ ├── 03_evaluate.py # Evaluation metrics
│ ├── 04_register.py # Model registration
│ └── score.py # Scoring script for deployment
└── data/
└── heart_disease.csv # Generated dataset
| Step | Script | Input | Output | Purpose |
|---|---|---|---|---|
| 1 | 01_data_prep.py |
Raw CSV | train.csv, test.csv | Clean & split data |
| 2 | 02_train.py |
train.csv | model.pkl, scaler.pkl | Train classifier |
| 3 | 03_evaluate.py |
model.pkl, test.csv | metrics JSON | Evaluate performance |
| 4 | 04_register.py |
model.pkl, metrics | Registry entry | Version the model |
| 5 | score.py |
API request | Prediction | Serve predictions |
- Azure ML Workspace provisioning and management
- ML Pipeline design with Azure ML SDK v2
- Compute cluster management (auto-scaling, auto-shutdown)
- MLflow experiment tracking (parameters, metrics, artifacts)
- Model Registry for versioning and governance
- Managed Online Endpoints for model serving
- Environment management with conda specifications
- Classification pipeline (data prep → train → evaluate → deploy)
- Production patterns: scoring scripts, input validation, error handling
python src/run_pipeline.py cleanupCompute clusters with min_nodes=0 automatically scale down. Endpoints DO NOT. They charge 24/7 until deleted.
MIT