This project predicts patient hospital readmission using the UCI Diabetes 130-US Hospitals dataset. It applies advanced machine learning techniques to identify whether a patient is likely to be readmitted:
- 0 β Not readmitted
- 1 β Readmitted within 30 days
- 2 β Readmitted after 30 days
- Clean and preprocess a real-world healthcare dataset
- Perform exploratory data analysis (EDA)
- Engineer relevant features
- Handle class imbalance
- Build a multiclass classification model using XGBoost
- Tune hyperparameters using Optuna
- Evaluate and interpret model performance
- XGBoost multiclass classification
- Sample weighting to address class imbalance
- Hyperparameter tuning with Optuna
- Performance evaluation with classification reports and confusion matrices
| Label | Meaning |
|---|---|
| 0 | Not readmitted |
| 1 | Readmitted within 30 days |
| 2 | Readmitted after 30 days |
| Metric | Value |
|---|---|
| Accuracy | 52.0% |
| Macro F1 Score | 0.45 |
| Class 1 Recall | 31.0% |
| Model | Tuned XGBoost (via Optuna) |
- Optuna tuning improved macro F1 and recall for class 1 (early readmission)
- Class imbalance was addressed using sample weighting
Healthcare_Analytics_Simulation/ βββ data/ # Raw and sample data (not tracked in Git) βββ notebooks/ β βββ 01_EDA.ipynb # Exploratory data analysis β βββ 02_Modeling_XGBoost.ipynb # Initial modeling attempts β βββ 03_Hyperparameter_Tuning_Optuna.ipynb βββ models/ β βββ best_xgb_model.json # Trained model βββ src/ β βββ preprocessing.py # Feature engineering and encoding β βββ train_model.py # Model training script βββ requirements.txt βββ README.md βββ .gitignore
git clone https://github.com/your-username/Healthcare_Analytics_Simulation.git
cd Healthcare_Analytics_Simulation
2. Install dependencies
pip install -r requirements.txt
3. Run notebooks
Open notebooks in Jupyter or VSCode to explore and reproduce results.
π¦ Requirements
Python 3.12+
XGBoost
Scikit-learn
Optuna
Pandas, NumPy, Matplotlib
Install all requirements:
pip install -r requirements.txt
π Key Learnings
How to handle class imbalance in multiclass problems
How to tune hyperparameters using Optuna
How to balance precision/recall tradeoffs in clinical data
How to structure and document ML projects for recruiters
π License
This project is for educational and portfolio purposes only. Not intended for clinical use.
πββοΈ Author
Kyle Spengler
π§ [email protected]
π LinkedIn
π» GitHub