Health Insurance Premium Predictor

Live Demo & API (If the URLs stop working, please restart them)

🚀 Live Testing Link: Health Insurance Premium Predictor
🖥 Backend API: Insurance Predictor API
🎮 Play with Data: Interactive Data Exploration

This project integrates the entire machine learning pipeline for predicting health insurance premiums using multiple models, extensive data analysis, and interactive UI.

📌 Project Journey & Contribution Guidelines

If you wish to contribute, follow this step-by-step guide.

🔍 1. Data Analysis & Preprocessing

Conducted extensive exploratory data analysis (EDA) using scatter plots, box plots, 3D visualizations, KDE plots, heatmaps, and more.
Interactive data visualizations available here.
Feature Engineering:
- One-hot encoding for categorical variables (sex, smoker, region).
- Log transformation applied to target variable charges using np.log1p().
Dataset Cleaning: Handled missing values, outliers, and performed feature scaling.
Libraries Used: pandas, numpy, seaborn, matplotlib, scikit-learn, optuna, streamlit, gradio, huggingface
Detailed analysis available in insurance_data_analysis.ipynb.

🤖 2. Model Selection & Performance Evaluation

A. Preprocessing & Model Testing

Tested multiple models: XGBoost, Decision Tree, Random Forest, Linear Regression, Polynomial Regression.

B. Linear Regression

Trained after preprocessing, yielding:

Mean Squared Error (MSE): 0.1746
Scaled MSE (MSE/2): 0.0873
Mean Absolute Error (MAE): 0.2685

C. Polynomial Regression (Degree = 2)

After testing multiple degrees, best results with degree = 2:

MSE: 0.1207
Scaled MSE: 0.0604
MAE: 0.2049

Saved in insurance_parametric_regression.ipynb.

D. Decision Tree & Random Forest

Hyperparameter tuning applied using Optuna.

Decision Tree Results:

MSE: 0.1464
Scaled MSE: 0.0732
MAE: 0.2188

Random Forest Results:

MSE: 0.1302
Scaled MSE: 0.0651
MAE: 0.2073

(See insurance_Decision_tree.ipynb for details.)

E. Final Model: XGBoost (Optimized with Optuna)

MSE: 0.1278
Scaled MSE: 0.0639
MAE: 0.2022

Final model saved in XGBoost_insurance_model.ipynb.

⚙️ 3. MLOps & Deployment

A. Backend API (FastAPI & Gradio)

Handles incoming user input, loads trained models, and returns predictions.
Uses FastAPI and Gradio for deployment.
Backend triggers on input, loads models via joblib, and returns results.
Deployed on Hugging Face Spaces.

Backend Features:

✅ FastAPI Setup: Handles API requests and CORS. ✅ Model Loading: Supports XGBoost, Decision Tree, Random Forest, Linear Regression, and Polynomial Regression. ✅ Data Preprocessing: One-hot encoding of categorical variables. ✅ Prediction Handling: Accepts input, preprocesses, selects the model, and returns predictions. ✅ Endpoints:

/ - Root endpoint (Welcome message).
/predict - Accepts data and returns predictions.
/health - Health check endpoint. ✅ Logging & Error Handling: Ensures smooth debugging. ✅ Cross-Origin Compatibility: Allows frontend to communicate via CORS. ✅ Deployment: Hosted on Hugging Face Spaces.

Backend file: prediction_handler.py

💻 4. Frontend (Streamlit UI & Gradio)

Developed an interactive chatbot-style UI for Health Insurance Premium Prediction.

A. Technologies Used

Streamlit - Web framework.
Gradio - Interactive UI framework.
HTML, CSS, JavaScript - UI enhancements.
Fetch API - Calls backend FastAPI.
ML Models: Supports XGBoost, Decision Tree, Random Forest, Linear Regression, and Polynomial Regression.

B. Features

🎨 Modern UI: Clean and intuitive design with custom styling. 📊 Dropdown for Model Selection: Users dynamically select models. ✅ Validation Checks:

Ensures valid age (18-100), BMI range, and categorical values. ⏳ Loading Animation & User Interaction: Enhances user experience.

Frontend file: front_end.py

🔮 5. Future Improvements

🔹 Hyperparameter tuning with Bayesian Optimization 🔹 Integration with cloud platforms (AWS/GCP) for scalable deployment 🔹 Mobile-friendly UI enhancements 🔹 Adding more ML models and explainable AI (SHAP, LIME)

📢 Want to contribute? Follow the structure in the respective .ipynb files and reach out via the repository issues section!

📝 Final Note:
For a better understanding of the project workflow, open and explore each file. I have added detailed comments in every script and notebook to guide you through each step, from data preprocessing to model deployment. These comments will help you follow the logic behind every decision and modification. 🚀

Let me know if you'd like any further refinements!

📜 License & Acknowledgments

This project is open-source and licensed under MIT License.

Special thanks to:

Hugging Face Spaces for hosting.
Optuna for hyperparameter tuning.
Streamlit & FastAPI for interactive UI and API.
Scikit-learn, Pandas, Numpy, Matplotlib, Seaborn for data processing and visualization.
Community Contributors for improvements.

💡 Stay tuned for updates! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
1 ) insurance_ipynb_visualization.ipynb		1 ) insurance_ipynb_visualization.ipynb
2) insurance_parametric_regression.ipynb		2) insurance_parametric_regression.ipynb
3) insurance_Decision_tree.ipynb		3) insurance_Decision_tree.ipynb
4) XGBoost insurance_Decision_tree.ipynb		4) XGBoost insurance_Decision_tree.ipynb
DecisionTree_model.pkl		DecisionTree_model.pkl
Final_Poly_Transformer.pkl		Final_Poly_Transformer.pkl
Final_Scaler.pkl		Final_Scaler.pkl
Flowchart - Frame 1.jpg		Flowchart - Frame 1.jpg
LICENSE		LICENSE
LinearRegression_model.pkl		LinearRegression_model.pkl
Models.ipynb		Models.ipynb
README.md		README.md
RandomForest_model.pkl		RandomForest_model.pkl
best_xgboost_model.json		best_xgboost_model.json
front_end.py		front_end.py
insurance.csv		insurance.csv
insurance_Model.pkl		insurance_Model.pkl
insurance_analytics.py		insurance_analytics.py
prediction_handler.py		prediction_handler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Health Insurance Premium Predictor

Live Demo & API (If the URLs stop working, please restart them)

📌 Project Journey & Contribution Guidelines

🔍 1. Data Analysis & Preprocessing

🤖 2. Model Selection & Performance Evaluation

A. Preprocessing & Model Testing

B. Linear Regression

C. Polynomial Regression (Degree = 2)

D. Decision Tree & Random Forest

Decision Tree Results:

Random Forest Results:

E. Final Model: XGBoost (Optimized with Optuna)

⚙️ 3. MLOps & Deployment

A. Backend API (FastAPI & Gradio)

Backend Features:

💻 4. Frontend (Streamlit UI & Gradio)

A. Technologies Used

B. Features

🔮 5. Future Improvements

📜 License & Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Health Insurance Premium Predictor

Live Demo & API (If the URLs stop working, please restart them)

📌 Project Journey & Contribution Guidelines

🔍 1. Data Analysis & Preprocessing

🤖 2. Model Selection & Performance Evaluation

A. Preprocessing & Model Testing

B. Linear Regression

C. Polynomial Regression (Degree = 2)

D. Decision Tree & Random Forest

Decision Tree Results:

Random Forest Results:

E. Final Model: XGBoost (Optimized with Optuna)

⚙️ 3. MLOps & Deployment

A. Backend API (FastAPI & Gradio)

Backend Features:

💻 4. Frontend (Streamlit UI & Gradio)

A. Technologies Used

B. Features

🔮 5. Future Improvements

📜 License & Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages