A complete educational journey through classical Machine Learning โ built from scratch, line by line.
Understand the math. Build the code. Train the mind.
โญ Star this repo โข ๐ด Fork it โข ๐ Documentation โข ๐ Get Started
- ๐ฏ Overview
- โจ Features
- ๐๏ธ Tech Stack
- ๐ Repository Structure
- ๐ Quick Start
- ๐งฎ Algorithms Implemented
- ๐ Dataset Collection
- ๐ธ Visualizations
- ๐ Learning Path
- ๐ค Contributing
- ๐จโ๐ป Author
- ๐ License
- โญ Support
Welcome to Machine Learning Algorithms โ your comprehensive playground for mastering classical ML! This repository features hand-crafted implementations of every major algorithm, from the ground up.
- Learn by Building: Every algorithm implemented from scratch using pure NumPy
- Compare & Contrast: Side-by-side comparisons with industry-standard Scikit-Learn
- Visual Learning: Beautiful plots and visualizations that bring theory to life
- Real-World Applications: Applied examples on diverse datasets
- Educational Focus: Clear documentation, math explanations, and code comments
๐ก Perfect for students, developers, data scientists, and AI enthusiasts who want to truly understand Machine Learning from first principles.
|
|
|
|
| Category | Technologies |
|---|---|
| Language | |
| Core Libraries | |
| ML Framework | |
| Visualization | |
| Environment | |
| Version Control |
Machine-learning-Algorithm/
โ
โโโ ๐ Supervised Learning
โ โโโ ๐ Regression
โ โ โโโ LinearRegression/ # Simple & Multiple Linear Regression
โ โ โโโ PolynomialRegression/ # Polynomial Regression
โ โ โโโ GradientDescent/ # Batch, Mini-Batch, Stochastic GD
โ โ
โ โโโ ๐ Classification
โ โ โโโ LogisticRegression/ # Binary & Multi-class Classification
โ โ โโโ KNN/ # K-Nearest Neighbors
โ โ โโโ NaiveBayes/ # Gaussian, Multinomial, Bernoulli
โ โ โโโ SupportVectorMachines/ # SVM with Kernel Tricks
โ โ โโโ DecisionTrees/ # CART Algorithm
โ โ โโโ NeuralNetworks/ # Perceptron & MLP
โ โ
โ โโโ ๐ Ensemble Methods
โ โโโ RandomForest/ # Random Forest Classifier & Regressor
โ โโโ Bagging/ # Bootstrap Aggregating
โ โโโ AdaBoost/ # Adaptive Boosting
โ โโโ GradientBoosting/ # Gradient Boosting Machines
โ โโโ XGBoost/ # Extreme Gradient Boosting
โ
โโโ ๐ Unsupervised Learning
โ โโโ ๐ Clustering
โ โ โโโ K-Means-clustering/ # K-Means from Scratch
โ โ โโโ HierarchicalClustering/ # Agglomerative & Divisive
โ โ โโโ DBSCAN/ # Density-Based Clustering
โ โ
โ โโโ ๐ Dimensionality Reduction
โ โโโ PCA/ # Principal Component Analysis
โ
โโโ ๐ DataSets/ # Curated Real-World Datasets
โ โโโ iris.csv # Classification Dataset
โ โโโ heart.csv # Healthcare Dataset
โ โโโ Social_Network_Ads.csv # Marketing Dataset
โ โโโ ipl-matches.csv # Sports Analytics
โ โโโ zomato.csv # Restaurant Data
โ โโโ student_clustering.csv # Educational Data
โ
โโโ ๐ Visualizations/ # Plots & Charts
โโโ ๐ requirements.txt # Python Dependencies
โโโ ๐ CONTRIBUTING.md # Contribution Guidelines
โโโ ๐ LICENSE # MIT License
โโโ ๐ README.md # You are here!
- Python 3.9 or higher
- pip package manager
- Git
git clone https://github.com/Nitin-Prata/Machine-learning-Algorithm.git
cd Machine-learning-AlgorithmWindows (PowerShell)
python -m venv .venv
.\.venv\Scripts\Activate.ps1macOS / Linux
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtjupyter notebookYour browser will open automatically at http://localhost:8888
- Navigate to any algorithm folder (e.g.,
LinearRegression/) - Open the Jupyter Notebook (
.ipynbfile) - Run cells sequentially to see the implementation
- Experiment with parameters and datasets
- Compare scratch implementation with Scikit-Learn
๐ Regression Algorithms (5)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| Linear Regression | โ | โ | ๐ |
| Multiple Linear Regression | โ | โ | ๐ |
| Polynomial Regression | โ | โ | ๐ |
| Ridge Regression | โ | โ | ๐ |
| Lasso Regression | โ | โ | ๐ |
๐ฏ Classification Algorithms (8)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| Logistic Regression | โ | โ | ๐ |
| K-Nearest Neighbors (KNN) | โ | โ | ๐ |
| Naive Bayes | โ | โ | ๐ |
| Support Vector Machine (SVM) | โ | โ | ๐ |
| Decision Trees | โ | โ | ๐ |
| Random Forest | โ | โ | ๐ |
| Neural Networks (MLP) | โ | โ | ๐ |
| Softmax Regression | โ | โ | ๐ |
๐ณ Ensemble Methods (5)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| Random Forest | โ | โ | ๐ |
| Bagging | โ | โ | ๐ |
| AdaBoost | โ | โ | ๐ |
| Gradient Boosting | โ | โ | ๐ |
| XGBoost | โ | โ | ๐ |
๐ Clustering Algorithms (3)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| K-Means Clustering | โ | โ | ๐ |
| Hierarchical Clustering | โ | โ | ๐ |
| DBSCAN | โ | โ | ๐ |
๐ Dimensionality Reduction (1)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| Principal Component Analysis (PCA) | โ | โ | ๐ |
โก Optimization Algorithms (3)
| Algorithm | From Scratch | Notebook |
|---|---|---|
| Batch Gradient Descent | โ | ๐ |
| Mini-Batch Gradient Descent | โ | ๐ |
| Stochastic Gradient Descent | โ | ๐ |
| Category | Count | Implementation Status |
|---|---|---|
| Regression | 5 | โ Complete |
| Classification | 8 | โ Complete |
| Ensemble Methods | 5 | โ Complete |
| Clustering | 3 | โ Complete |
| Dimensionality Reduction | 1 | โ Complete |
| Optimization | 3 | โ Complete |
| TOTAL | 25 | โ Complete |
All datasets are curated, cleaned, and ready to use in the /DataSets folder.
| Dataset | Size | Features | Use Case | Domain |
|---|---|---|---|---|
iris.csv |
150 | 4 | Multi-class Classification | Botany |
heart.csv |
303 | 13 | Binary Classification | Healthcare |
Social_Network_Ads.csv |
400 | 4 | Marketing Classification | Business |
ipl-matches.csv |
756 | 18 | Regression & Analysis | Sports |
zomato.csv |
9551 | 21 | Clustering | Food Industry |
student_clustering.csv |
2000 | 7 | Clustering | Education |
Decision boundaries, loss curves, clustering plots, and feature importance visualizations are included in each notebook.
-
Week 1-2: Linear & Logistic Regression
- Start with
LinearRegression/ - Move to
LogisticRegression/ - Understand cost functions and gradient descent
- Start with
-
Week 3: Classification Basics
- Explore
KNN/ - Study
NaiveBayes/ - Practice on iris dataset
- Explore
-
Week 4: Tree-Based Methods
- Learn
DecisionTrees/ - Build intuition with visualizations
- Learn
-
Week 5-6: Advanced Classification
- Master
SupportVectorMachines/ - Understand kernel tricks
- Implement
NeuralNetworks/
- Master
-
Week 7: Ensemble Methods
- Study
RandomForest/ - Compare with
Bagging/ - Understand bootstrap aggregating
- Study
-
Week 8: Clustering
- Implement
K-Means-clustering/ - Explore
HierarchicalClustering/ - Try
DBSCAN/
- Implement
-
Week 9-10: Boosting Algorithms
- Deep dive into
AdaBoost/ - Master
GradientBoosting/ - Optimize with
XGBoost/
- Deep dive into
-
Week 11: Dimensionality Reduction
- Understand
PCA/ - Apply to high-dimensional data
- Understand
-
Week 12: Optimization Techniques
- Compare gradient descent variants
- Implement custom optimizers
- Hyperparameter tuning
After completing this repository, you will:
- โ Understand the mathematical foundations of ML algorithms
- โ Implement algorithms from scratch using NumPy
- โ Debug and optimize ML code effectively
- โ Compare custom implementations with Scikit-Learn
- โ Visualize model behavior and decision boundaries
- โ Apply algorithms to real-world datasets
- โ Choose the right algorithm for specific problems
- โ Tune hyperparameters for optimal performance
Contributions are always welcome! Here's how you can help:
- ๐ Report bugs and issues
- ๐ก Suggest new algorithms to implement
- ๐ Improve documentation
- ๐จ Add visualizations
- ๐ Contribute new datasets
- โจ Optimize existing code
- ๐งช Add unit tests
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Please read CONTRIBUTING.md for detailed guidelines.
๐ B.Tech in Computer Science (AI) | India ๐ฎ๐ณ
๐ผ Machine Learning, AI Education & Open Source
"Learn the math. Build the code. Train the mind." ๐ง
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 Nitin Pratap Singh
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files...
If you find this repository helpful, please consider:
| Action | Why? |
|---|---|
| โญ Star this repository | Show appreciation & help others discover it |
| ๐ด Fork it | Create your own version & experiment |
| ๐ Watch | Get notified of updates |
| ๐ฌ Share | Help the ML community learn |
| ๐ Report Issues | Help improve the project |
| ๐ค Contribute | Make it even better |
Special thanks to:
- Andrew Ng for his legendary Machine Learning course that inspired this project
- CampusX for their exceptional ML tutorials and educational content
- Scikit-Learn team for the amazing library
- NumPy contributors for the numerical computing foundation
- The open-source community for inspiration
- You for taking the time to explore this repository!
- Pattern Recognition and Machine Learning by Christopher Bishop
- The Elements of Statistical Learning by Hastie, Tibshirani, Friedman
- Machine Learning Yearning by Andrew Ng
Machine Learning is not just about using libraries โ it's about understanding the principles that make those libraries work. This repository bridges the gap between theory and practice, empowering you to not just use ML, but to truly understand it.