Azure ML End-to-End Pipeline

A complete MLOps pipeline on Azure Machine Learning that takes raw data through to a live prediction API.

Architecture

Raw Data (CSV)
     │
     ▼
┌─────────────────────────────────────────────────────┐
│              Azure ML Pipeline                       │
│                                                      │
│  ┌──────────┐   ┌──────────┐   ┌───────────┐       │
│  │ 1. Data  │──→│ 2. Train │──→│ 3. Eval   │       │
│  │   Prep   │   │  Model   │   │  Metrics  │       │
│  └──────────┘   └──────────┘   └───────────┘       │
│       │              │              │                │
│       ▼              ▼              ▼                │
│   train.csv      model.pkl    results.json          │
│   test.csv       scaler.pkl                         │
│                      │                               │
│                      ▼                               │
│              ┌─────────────┐                         │
│              │ 4. Register │                         │
│              │   in Model  │                         │
│              │   Registry  │                         │
│              └─────────────┘                         │
│                      │                               │
└──────────────────────│───────────────────────────────┘
                       ▼
              ┌─────────────────┐
              │ 5. Deploy as    │
              │ Managed Online  │
              │ Endpoint (API)  │
              └─────────────────┘
                       │
                       ▼
              POST /score → Predictions

Azure Services Used

Service	Purpose	DP-100 Topic
Azure ML Workspace	Central hub for all ML resources	Workspace management
Compute Cluster	Scalable training compute	Compute management
ML Pipelines (SDK v2)	Orchestrate multi-step workflows	Pipeline design
MLflow	Experiment tracking & logging	Experiment management
Model Registry	Version & catalog trained models	Model management
Managed Endpoints	Deploy models as REST APIs	Model deployment
Environments	Reproducible Python environments	Environment management

Quick Start

1. Azure Setup

Follow SETUP_GUIDE.md for step-by-step Azure resource creation.

2. Local Setup

cd project2-azure-ml-pipeline
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Login to Azure
az login

# Configure credentials
cp .env.example .env
# Edit .env with your subscription ID, resource group, workspace name

3. Run Pipeline

python src/run_pipeline.py run

This submits the pipeline to Azure ML. Watch it in ML Studio!

4. Deploy Model

python src/run_pipeline.py deploy

5. Test Endpoint

python src/run_pipeline.py test

6. Cleanup (stop billing!)

python src/run_pipeline.py cleanup

Project Structure

project2-azure-ml-pipeline/
├── SETUP_GUIDE.md              # Step-by-step Azure Portal instructions
├── .env.example                # Template for workspace config
├── requirements.txt            # Python dependencies
├── src/
│   ├── config.py               # Azure ML workspace connection
│   └── run_pipeline.py         # Pipeline definition + deployment
├── components/                 # Scripts that run IN Azure ML
│   ├── 01_data_prep.py         # Data cleaning & train/test split
│   ├── 02_train.py             # Model training with MLflow
│   ├── 03_evaluate.py          # Evaluation metrics
│   ├── 04_register.py          # Model registration
│   └── score.py                # Scoring script for deployment
└── data/
    └── heart_disease.csv       # Generated dataset

Pipeline Components Explained

Step	Script	Input	Output	Purpose
1	`01_data_prep.py`	Raw CSV	train.csv, test.csv	Clean & split data
2	`02_train.py`	train.csv	model.pkl, scaler.pkl	Train classifier
3	`03_evaluate.py`	model.pkl, test.csv	metrics JSON	Evaluate performance
4	`04_register.py`	model.pkl, metrics	Registry entry	Version the model
5	`score.py`	API request	Prediction	Serve predictions

Skills Demonstrated

Azure ML Workspace provisioning and management
ML Pipeline design with Azure ML SDK v2
Compute cluster management (auto-scaling, auto-shutdown)
MLflow experiment tracking (parameters, metrics, artifacts)
Model Registry for versioning and governance
Managed Online Endpoints for model serving
Environment management with conda specifications
Classification pipeline (data prep → train → evaluate → deploy)
Production patterns: scoring scripts, input validation, error handling

Cost Management

⚠️ Always delete endpoints when not in use!

python src/run_pipeline.py cleanup

Compute clusters with min_nodes=0 automatically scale down. Endpoints DO NOT. They charge 24/7 until deleted.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
artifacts		artifacts
components		components
src		src
trained_model		trained_model
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SETUP_GUIDE.md		SETUP_GUIDE.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure ML End-to-End Pipeline

Architecture

Azure Services Used

Quick Start

1. Azure Setup

2. Local Setup

3. Run Pipeline

4. Deploy Model

5. Test Endpoint

6. Cleanup (stop billing!)

Project Structure

Pipeline Components Explained

Skills Demonstrated

Cost Management

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Azure ML End-to-End Pipeline

Architecture

Azure Services Used

Quick Start

1. Azure Setup

2. Local Setup

3. Run Pipeline

4. Deploy Model

5. Test Endpoint

6. Cleanup (stop billing!)

Project Structure

Pipeline Components Explained

Skills Demonstrated

Cost Management

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages