Skip to content

ugoarzur/ml-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Playground

A word on projects

Projects are splited within ~./ml/projects/<project-name>. Each projet will share the venv, a README.md file will explain a bit on the project.

  • houses: a machine learning project to predict houses prices based on a well known and prepared dataset from sklearn.
  • titanic: a data manipulation project to create a heatmap of titanic catastrophe survivors.

With the code

This project is using uv as package manager so be sure to have it on your computer.

If you have a preference for jupyter labs there is a support for it too.

uv sync
jupyter lab

Linting & Formatting

This project uses Ruff for linting and formatting (configured in pyproject.toml).

# Check for lint errors
uv run ruff check .

# Auto-fix what can be fixed
uv run ruff check . --fix

# Format code
uv run ruff format .

# Check formatting without modifying files
uv run ruff format . --check

Repository Architecture

ml-playground/
├── backend/                     # API to serve models
│   ├── api/
│   │   ├── routes/
│   │   │   ├── titanic.py      # POST /predict/titanic
│   │   │   └── houses.py       # POST /predict/houses
│   │   └── main.py             # FastAPI app
│   └── requirements.txt
├── frontend/                    # Web interface (in the future)
│   ├── src/
│   ├── package.json
│   └── tsconfig.json
├── ml/                          # ML code
│   ├── shared/                  # Shared code between projects
│   │   ├── visualization/       # Plot
│   │   ├── metrics/             # Metrics
│   │   └── preprocessing/       # Preprocessing data
│   ├── projects/
│   │   ├── titanic/
│   │   │   ├── data/            # loaders and stored data (csv, etc)
│   │   │   ├── features/        # PolynomialFeatures
│   │   │   ├── models/          # regression, classification, etc
│   │   │   ├── trained_models/  # .pkl ou .joblib
│   │   │   └── experiments/     # Configs, metrics history, compare
│   │   └── houses/
│   │       └── ...
│   ├── jupyter/                 # Notebooks
│   └── tests/
├── assets/                      # Raw Datasets, images, etc.
└── docker-compose.yml           # Orchestration (optional)

Explanations

Projects Structure

data/ Preprocessed data storage and specific loaders

  • Loaders to load CSV and datasets
  • Raw data

features/ : Transformation

  • Feature engineering (PolynomialFeatures, encoders, scalers)

models/ : Blueprints, conception

  • Architectures definitions (RandomForest, XGBoost, etc.)
  • training code and hyperparametization

trained_models/ : The final product

  • Serialized trained models (.pkl, .joblib)
  • Ready for production

experiments/ : comparing step

  • Configurations
  • metrics history

Shared Structure

  • metrics/: metrics for training
  • pipelines: pipelines for composing models and reproduce them
  • preprocessing: data manipulation
  • visualization: ploting

Resources

term description
train_test_split create a training set and testing set
GridSearchCV test multiple scenarios for best performances
MinMaxScaler cleaning data for the model (convert to a number from 0 to 1)
KNeighborsClassifier classifier implementing the k-nearest neighbors vote.
matplotlib it's a visualization library for machine learning statistics
seaborn a plugin for matplotlib

Machine learning - Scikit-Learn

About

A playground to play with ML concepts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors