A practical tutorial demonstrating MLFlow for metrics tracking and experiment management using a bike sharing dataset.
A hands-on tutorial for learning MLFlow metrics tracking and experiment management.
This project demonstrates how to use MLFlow for tracking machine learning experiments using a bike sharing demand prediction dataset. It emphasizes:
- 📊 Experiment Tracking: Learn to track metrics, parameters, and artifacts
- 🔄 Model Versioning: Understand model registry and versioning concepts
- 📈 Metrics Visualization: Compare experiments and visualize results
- 🛠️ Best Practices: Apply MLOps principles in practice
- MLFlow 3.1 Integration: Latest MLFlow features for experiment tracking and model registry
- Enhanced Model Registry: Improved model versioning and lifecycle management
- Real Dataset: Uses bike sharing demand dataset for practical learning
- Jupyter Notebooks: Interactive tutorials and examples
- Model Training: Example ML models with proper logging
- Metrics Tracking: Comprehensive metrics and parameter logging
- Artifact Management: Model and data artifact storage with enhanced metadata
mlflow-1-metrics-tracking/
├── data/ # Data files
├── dev/ # Development files
│ └── images/ # Development images
├── docs/ # Documentation and images
├── mlartifacts/ # MLFlow artifacts storage
├── mlruns/ # MLFlow runs metadata
├── models/ # Trained models storage
├── notebooks/ # Jupyter notebooks for tutorials
│ └── mlruns/ # Notebook-specific MLFlow runs
├── src/ # Source code
├── README.md # This file
├── requirements-dev.txt # Development dependencies
└── requirements.txt # Project dependencies
git clone [YOUR_REPO_URL]
cd mlflow-1-metrics-tracking
make setup # Create virtual environment
make install-all # Install dependencies
# Load bike sharing dataset
python src/load_data.py
# Check if MLFlow is properly installed
mlflow --version
Start the MLFlow tracking server:
mlflow server --host 0.0.0.0 --port 5001
Then navigate to http://localhost:5001 in your browser to:
- 📊 View Experiments: Compare different model runs with MLflow 3.1's enhanced UI
- 📈 Analyze Metrics: Visualize training metrics over time with improved charts
- 🔍 Inspect Artifacts: Download models and other artifacts with better metadata
- 📋 Compare Runs: Side-by-side comparison of experiments with advanced filtering
- 🏷️ Model Registry: Manage model versions with enhanced lifecycle stages
Launch Jupyter Lab to access the interactive tutorials in notebooks/
directory:
jupyter lab
The notebooks will guide you through:
- Data Exploration: Understanding the bike sharing dataset
- MLFlow 3.1 Setup: Setting up tracking and logging with latest features
- Experiment Tracking: Logging parameters, metrics, and artifacts with enhanced metadata
- Model Comparison: Comparing different models and hyperparameters using improved UI
- Model Registry: Managing model versions and stages with MLflow 3.1's enhanced lifecycle management
- Advanced Features: Exploring MLflow 3.1's new capabilities and improvements
- Fork the repository
- Create a feature branch:
git checkout -b feature/improvement
- Make your changes
- Test your changes: Ensure notebooks run correctly
- Commit your changes:
git commit -m 'Add improvement'
- Push to branch:
git push origin feature/improvement
- Open a Pull Request
- Follow Python best practices and PEP 8
- Add type hints to functions
- Document new features in notebooks
- Test MLFlow logging functionality
- Update README if adding new features
- Dataset: The bike sharing dataset is from Kaggle Bike Sharing Demand
- Research: Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg
- Data Source: UCI Machine Learning Repository - Bike Sharing Dataset
- Integration Example: Based on the mlflow_monitoring integration example from Evidently AI
Happy learning with MLFlow! 🎉