Skip to content

Healthcare analytics project using XGBoost to predict hospital readmissions with 0.701 AUC-ROC, built with Python, Pandas, and Scikit-learn." Add topics like: healthcare-analytics, machine-learning, xgboost, data-science, predictive-modeling.

Notifications You must be signed in to change notification settings

KyleSDeveloper/Healthcare_Analytics_Simulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ₯ Predicting Hospital Readmissions with Machine Learning

This project predicts patient hospital readmission using the UCI Diabetes 130-US Hospitals dataset. It applies advanced machine learning techniques to identify whether a patient is likely to be readmitted:

  • 0 β†’ Not readmitted
  • 1 β†’ Readmitted within 30 days
  • 2 β†’ Readmitted after 30 days

🎯 Project Goals

  • Clean and preprocess a real-world healthcare dataset
  • Perform exploratory data analysis (EDA)
  • Engineer relevant features
  • Handle class imbalance
  • Build a multiclass classification model using XGBoost
  • Tune hyperparameters using Optuna
  • Evaluate and interpret model performance

🧠 Machine Learning Approach

βœ… Techniques Used

  • XGBoost multiclass classification
  • Sample weighting to address class imbalance
  • Hyperparameter tuning with Optuna
  • Performance evaluation with classification reports and confusion matrices

πŸ§ͺ Label Encoding Logic

Label Meaning
0 Not readmitted
1 Readmitted within 30 days
2 Readmitted after 30 days

πŸ“Š Results

Metric Value
Accuracy 52.0%
Macro F1 Score 0.45
Class 1 Recall 31.0%
Model Tuned XGBoost (via Optuna)
  • Optuna tuning improved macro F1 and recall for class 1 (early readmission)
  • Class imbalance was addressed using sample weighting

πŸ“ Project Structure

Healthcare_Analytics_Simulation/ β”œβ”€β”€ data/ # Raw and sample data (not tracked in Git) β”œβ”€β”€ notebooks/ β”‚ β”œβ”€β”€ 01_EDA.ipynb # Exploratory data analysis β”‚ β”œβ”€β”€ 02_Modeling_XGBoost.ipynb # Initial modeling attempts β”‚ └── 03_Hyperparameter_Tuning_Optuna.ipynb β”œβ”€β”€ models/ β”‚ └── best_xgb_model.json # Trained model β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ preprocessing.py # Feature engineering and encoding β”‚ β”œβ”€β”€ train_model.py # Model training script β”œβ”€β”€ requirements.txt β”œβ”€β”€ README.md └── .gitignore


βš™οΈ How to Run

1. Clone the repo

git clone https://github.com/your-username/Healthcare_Analytics_Simulation.git
cd Healthcare_Analytics_Simulation

2. Install dependencies

pip install -r requirements.txt

3. Run notebooks

Open notebooks in Jupyter or VSCode to explore and reproduce results.
πŸ“¦ Requirements

    Python 3.12+

    XGBoost

    Scikit-learn

    Optuna

    Pandas, NumPy, Matplotlib

Install all requirements:

pip install -r requirements.txt

πŸ“Œ Key Learnings

    How to handle class imbalance in multiclass problems

    How to tune hyperparameters using Optuna

    How to balance precision/recall tradeoffs in clinical data

    How to structure and document ML projects for recruiters

πŸ“œ License

This project is for educational and portfolio purposes only. Not intended for clinical use.
πŸ™‹β€β™‚οΈ Author

Kyle Spengler

    πŸ“§ [email protected]

    🌐 LinkedIn

    πŸ’» GitHub

About

Healthcare analytics project using XGBoost to predict hospital readmissions with 0.701 AUC-ROC, built with Python, Pandas, and Scikit-learn." Add topics like: healthcare-analytics, machine-learning, xgboost, data-science, predictive-modeling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published