Maxd646 · Maxd646 · Dec 7, 2025 · Dec 7, 2025
diff --git a/README.md b/README.md
@@ -1,43 +1,128 @@
-# End-to-End Machine Learning Project
+# 📘 End-to-End Insurance Risk Analytics & Predictive Modeling
 
-This project is an end-to-end machine learning pipeline designed with a
-modular folder structure. It includes data processing, feature
-engineering, EDA, model building, reporting, and CI setup.
+A complete, modular, production-ready machine learning pipeline for
+insurance analytics.
 
-## Project Structure
+---
 
-    end-to-end/
-    .github\workflows
-    │   |──ci,yml
-    |   |──codeql.yml
+## Project Overview
+
+This project implements a **fully modular end-to-end ML pipeline** for
+insurance risk analytics and predictive modeling. It supports real-world
+insurance business applications such as:
+
+- Analyzing historical policies, claims, and exposures\
+- Performing EDA and anomaly detection\
+- Conducting hypothesis tests to validate key risk drivers\
+- Building models for claim probability, claim severity, and premium
+  optimization\
+- End-to-end reproducible ML pipeline with CI/CD support\
+- Integrated reporting, logging, and versioning
+
+---
+
+## Business Objective
+
+**AlphaCare Insurance Solutions (ACIS)** aims to:
+
+- Identify **low-risk customer segments**\
+- Optimize **premium pricing** while maximizing profitability\
+- Understand **factors contributing to claims**\
+- Support **actuarial and underwriting decisions**\
+- Enhance customer retention with targeted strategies
+
+---
+
+## Full Project Folder Structure
+
+    End-to-End-Insurance-Risk-Analytics-Predictive-Modeling/
+    ├── .github/
+    │   └── workflows/                 # CI/CD pipelines (tests, linting, dvc)
+    ├── configs/
+    │   ├── data.yaml                  # Dataset configuration
+    │   ├── dvc_remote.yaml            # DVC remote configuration
+    │   ├── logs.yaml                  # Logging settings
+    │   └── modeling.yaml              # ML model configurations
     ├── data/
-    │   ├── raw/               # Original data (untouched)
-    │   └── processed/         # Cleaned, transformed data
-    │
+    │   ├── raw/                       # Original data
+    │   ├── processed/                 # Cleaned & feature engineered data
+    ├── docs/                          # Documentation & reports
     ├── notebooks/
-    │   ├── 01_EDA.ipynb       # Exploratory Data Analysis
-    │   └── 02_Modeling.ipynb  # Model training & evaluation
-    │
+    │   ├── analysis/
+    │   │   ├── hypothesis_tests.ipynb
+    │   │   └── model_building.ipynb
+    │   └── exploration/
+    │       ├── data_overview.ipynb
+    │       └── eda.ipynb
+    ├── scripts/
+    │   ├── __init__.py
+    │   ├── clean_data.py
+    │   ├── run_eda_pipeline.py
+    │   ├── run_hypothesis_tests.py
+    │   └── train_models.py
     ├── src/
-    │   ├── eda/
-    │   │   └── eda_tools.py
-    │   ├── features/
-    │   │   └── build_features.py
-    │   ├── models/
-    │   │   └── train_model.py
-    │   └── utils/
-    │       └── data_loader.py
+    │   └── insurance_analytics/
+    │       ├── __init__.py
+    │       ├── core/
+    │       │   ├── __init__.py
+    │       │   ├── config.py
+    │       │   ├── logger.py
+    │       │   ├── registry.py
+    │       │   └── scheduler.py
+    │       ├── eda/
+    │       │   ├── __init__.py
+    │       │   ├── exploration.py
+    │       │   └── visualization.py
+    │       ├── models/
+    │       │   ├── __init__.py
+    │       │   ├── evaluation.py
+    │       │   ├── interpretability.py
+    │       │   ├── linear_regression.py
+    │       │   ├── random_forest.py
+    │       │   └── xgboost_model.py
+    │       ├── preprocessing/
+    │       │   ├── __init__.py
+    │       │   ├── cleaner.py
+    │       │   └── feature_engineering.py
+    │       ├── utils/
+    │       │   ├── __init__.py
+    │       │   ├── io_utils.py
+    │       │   ├── metrics.py
+    │       │   ├── project_root.py
+    │       │   ├── system.py
+    │       │   └── validation.py
+    │       └── viz/
+    │           ├── __init__.py
+    │           └── plots.py
+    ├── tests/
+    │   ├── integration/
+    │   │   ├── __init__.py
+    │   │   ├── test_dvc_integration.py
+    │   │   ├── test_eda_pipeline.py
+    │   │   ├── test_full_pipeline.py
+    │   │   └── test_model_pipeline.py
+    │   └── unit/
+    │       ├── __init__.py
+    │       ├── test_cleaners.py
+    │       ├── test_features.py
+    │       ├── test_hypothesis.py
+    │       ├── test_loaders.py
+    │       ├── test_models.py
+    │       └── test_registry.py
     ├── .gitignore
-    ├── requirements.txt
     ├── README.md
+    └── requirements.txt
+
+---
 
-## How to Run This Project
+## How to Run the Project
 
 ### Create a Virtual Environment
 
 ```bash
 python -m venv venv
-.env\Scriptsctivate
+venv\Scripts\activate        # Windows
+source venv/bin/activate     # macOS/Linux
 ```
 
 ### Install Dependencies
@@ -46,36 +131,58 @@ python -m venv venv
 pip install -r requirements.txt
 ```
 
-### Run Notebooks
+### Run Data Cleaning
 
 ```bash
-jupyter notebook
+python scripts/clean_data.py
 ```
 
-## Features
+### Run EDA Pipeline
 
-✔ Clean modular code\
-✔ Separate folders for EDA, features, models, utils\
-✔ Ready for CI/CD\
-✔ Clear data directory hierarchy\
-✔ Reproducible ML workflow
+```bash
+python scripts/run_eda_pipeline.py
+```
 
-## Requirements
+### Train Machine Learning Models
 
-Add libraries in `requirements.txt`, example:
+```bash
+python scripts/train_models.py
+```
 
-    numpy
-    pandas
-    scikit-learn
-    matplotlib
-    seaborn
-    jupyter
+### Use Jupyter Notebooks
+
+```bash
+jupyter notebook
+```
+
+---
+
+## Key Features
+
+- ✔ Modular ML architecture\
+- ✔ Clear data/configs/scripts separation\
+- ✔ DVC versioning\
+- ✔ CI-ready workflows\
+- ✔ Logging & validation utilities\
+- ✔ Interpretability (SHAP, feature importance)\
+- ✔ Reproducible experiments
+
+---
 
 ## Reports
 
-- `interim_report.md`: insights during project development\
-- `final_report.md`: final results, visualizations, and model
-  performance
+- `docs/interim_report.md`\
+- `docs/final_report.md`
+
+---
+
+## Testing
+
+```bash
+pytest
+```
+
+---
 
 ## Version Control
 

diff --git a/configs/data.yaml b/configs/data.yaml
@@ -0,0 +1,17 @@
+data:
+  data_dir: "data"
+  raw_dir: "data/raw"
+  processed_dir: "data/processed"
+
+logs:
+  logs_dir: "logs"
+
+reports:
+  reports_dir: "reports"
+  plots_dir: "reports/plots"
+
+models:
+  models_dir: "src/insurance_analytics/models"
+
+artifacts:
+  artifacts_dir: "artifacts"
diff --git a/notebooks/01_EDA.ipynb → configs/dvc_remote.yaml b/notebooks/01_EDA.ipynb → configs/dvc_remote.yaml
diff --git a/notebooks/02_Modeling.ipynb → configs/logs.yaml b/notebooks/02_Modeling.ipynb → configs/logs.yaml
diff --git a/reports/interim_report.md → configs/modeling.yaml b/reports/interim_report.md → configs/modeling.yaml
diff --git a/scripts/build_features.py → notebooks/analysis/hypothesis_tests.ipynb b/scripts/build_features.py → notebooks/analysis/hypothesis_tests.ipynb
diff --git a/scripts/evaluate.py → notebooks/analysis/model_building.ipynb b/scripts/evaluate.py → notebooks/analysis/model_building.ipynb
diff --git a/scripts/train.py → notebooks/exploration/data_overview.ipynb b/scripts/train.py → notebooks/exploration/data_overview.ipynb