Skip to content

designer-coderajay/azure-ml-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure ML End-to-End Pipeline

A complete MLOps pipeline on Azure Machine Learning that takes raw data through to a live prediction API.

Architecture

Raw Data (CSV)
     │
     ▼
┌─────────────────────────────────────────────────────┐
│              Azure ML Pipeline                       │
│                                                      │
│  ┌──────────┐   ┌──────────┐   ┌───────────┐       │
│  │ 1. Data  │──→│ 2. Train │──→│ 3. Eval   │       │
│  │   Prep   │   │  Model   │   │  Metrics  │       │
│  └──────────┘   └──────────┘   └───────────┘       │
│       │              │              │                │
│       ▼              ▼              ▼                │
│   train.csv      model.pkl    results.json          │
│   test.csv       scaler.pkl                         │
│                      │                               │
│                      ▼                               │
│              ┌─────────────┐                         │
│              │ 4. Register │                         │
│              │   in Model  │                         │
│              │   Registry  │                         │
│              └─────────────┘                         │
│                      │                               │
└──────────────────────│───────────────────────────────┘
                       ▼
              ┌─────────────────┐
              │ 5. Deploy as    │
              │ Managed Online  │
              │ Endpoint (API)  │
              └─────────────────┘
                       │
                       ▼
              POST /score → Predictions

Azure Services Used

Service Purpose DP-100 Topic
Azure ML Workspace Central hub for all ML resources Workspace management
Compute Cluster Scalable training compute Compute management
ML Pipelines (SDK v2) Orchestrate multi-step workflows Pipeline design
MLflow Experiment tracking & logging Experiment management
Model Registry Version & catalog trained models Model management
Managed Endpoints Deploy models as REST APIs Model deployment
Environments Reproducible Python environments Environment management

Quick Start

1. Azure Setup

Follow SETUP_GUIDE.md for step-by-step Azure resource creation.

2. Local Setup

cd project2-azure-ml-pipeline
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Login to Azure
az login

# Configure credentials
cp .env.example .env
# Edit .env with your subscription ID, resource group, workspace name

3. Run Pipeline

python src/run_pipeline.py run

This submits the pipeline to Azure ML. Watch it in ML Studio!

4. Deploy Model

python src/run_pipeline.py deploy

5. Test Endpoint

python src/run_pipeline.py test

6. Cleanup (stop billing!)

python src/run_pipeline.py cleanup

Project Structure

project2-azure-ml-pipeline/
├── SETUP_GUIDE.md              # Step-by-step Azure Portal instructions
├── .env.example                # Template for workspace config
├── requirements.txt            # Python dependencies
├── src/
│   ├── config.py               # Azure ML workspace connection
│   └── run_pipeline.py         # Pipeline definition + deployment
├── components/                 # Scripts that run IN Azure ML
│   ├── 01_data_prep.py         # Data cleaning & train/test split
│   ├── 02_train.py             # Model training with MLflow
│   ├── 03_evaluate.py          # Evaluation metrics
│   ├── 04_register.py          # Model registration
│   └── score.py                # Scoring script for deployment
└── data/
    └── heart_disease.csv       # Generated dataset

Pipeline Components Explained

Step Script Input Output Purpose
1 01_data_prep.py Raw CSV train.csv, test.csv Clean & split data
2 02_train.py train.csv model.pkl, scaler.pkl Train classifier
3 03_evaluate.py model.pkl, test.csv metrics JSON Evaluate performance
4 04_register.py model.pkl, metrics Registry entry Version the model
5 score.py API request Prediction Serve predictions

Skills Demonstrated

  • Azure ML Workspace provisioning and management
  • ML Pipeline design with Azure ML SDK v2
  • Compute cluster management (auto-scaling, auto-shutdown)
  • MLflow experiment tracking (parameters, metrics, artifacts)
  • Model Registry for versioning and governance
  • Managed Online Endpoints for model serving
  • Environment management with conda specifications
  • Classification pipeline (data prep → train → evaluate → deploy)
  • Production patterns: scoring scripts, input validation, error handling

Cost Management

⚠️ Always delete endpoints when not in use!

python src/run_pipeline.py cleanup

Compute clusters with min_nodes=0 automatically scale down. Endpoints DO NOT. They charge 24/7 until deleted.

License

MIT

About

End-to-end ML pipeline on Azure Machine Learning for heart disease prediction. Features 4-step automated workflow (data prep, training, evaluation, registration), MLflow experiment tracking, and managed endpoint deployment. Built with Azure ML SDK v2, scikit-learn, and auto-scaling compute clusters.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages