Projects are splited within ~./ml/projects/<project-name>.
Each projet will share the venv, a README.md file will explain a bit on the project.
houses: a machine learning project to predict houses prices based on a well known and prepared dataset from sklearn.titanic: a data manipulation project to create a heatmap of titanic catastrophe survivors.
This project is using uv as package manager so be sure to have it on your computer.
If you have a preference for jupyter labs there is a support for it too.
uv sync
jupyter labThis project uses Ruff for linting and formatting (configured in pyproject.toml).
# Check for lint errors
uv run ruff check .
# Auto-fix what can be fixed
uv run ruff check . --fix
# Format code
uv run ruff format .
# Check formatting without modifying files
uv run ruff format . --checkml-playground/
├── backend/ # API to serve models
│ ├── api/
│ │ ├── routes/
│ │ │ ├── titanic.py # POST /predict/titanic
│ │ │ └── houses.py # POST /predict/houses
│ │ └── main.py # FastAPI app
│ └── requirements.txt
├── frontend/ # Web interface (in the future)
│ ├── src/
│ ├── package.json
│ └── tsconfig.json
├── ml/ # ML code
│ ├── shared/ # Shared code between projects
│ │ ├── visualization/ # Plot
│ │ ├── metrics/ # Metrics
│ │ └── preprocessing/ # Preprocessing data
│ ├── projects/
│ │ ├── titanic/
│ │ │ ├── data/ # loaders and stored data (csv, etc)
│ │ │ ├── features/ # PolynomialFeatures
│ │ │ ├── models/ # regression, classification, etc
│ │ │ ├── trained_models/ # .pkl ou .joblib
│ │ │ └── experiments/ # Configs, metrics history, compare
│ │ └── houses/
│ │ └── ...
│ ├── jupyter/ # Notebooks
│ └── tests/
├── assets/ # Raw Datasets, images, etc.
└── docker-compose.yml # Orchestration (optional)
data/ Preprocessed data storage and specific loaders
- Loaders to load CSV and datasets
- Raw data
features/ : Transformation
- Feature engineering (PolynomialFeatures, encoders, scalers)
models/ : Blueprints, conception
- Architectures definitions (RandomForest, XGBoost, etc.)
- training code and hyperparametization
trained_models/ : The final product
- Serialized trained models (.pkl, .joblib)
- Ready for production
experiments/ : comparing step
- Configurations
- metrics history
metrics/: metrics for trainingpipelines: pipelines for composing models and reproduce thempreprocessing: data manipulationvisualization: ploting
| term | description |
|---|---|
| train_test_split | create a training set and testing set |
| GridSearchCV | test multiple scenarios for best performances |
| MinMaxScaler | cleaning data for the model (convert to a number from 0 to 1) |
| KNeighborsClassifier | classifier implementing the k-nearest neighbors vote. |
| matplotlib | it's a visualization library for machine learning statistics |
| seaborn | a plugin for matplotlib |