This repository showcases the implementations for two interconnected coursework ๐. We dive deep into stroke data analytics using a simulated dataset of 172,000 anonymous patient records ๐. The focus? Analyzing cardiovascular risk factors like age ๐ด, hypertension ๐ฉธ, smoking ๐ฌ, glucose levels ๐ฏ, and lifestyle habits ๐โโ๏ธ to empower clinicians in preventing fatalities โ๏ธ.
The dataset (data.csv in /shared/) packs 20 features such as age, hypertension, heart_disease, avg_glucose_level, bmi, smoking_status, stroke, and more ๐. Both tasks champion ethical AI use ๐ค, inclusivity in health tech โฟ, and sustainability in data-driven healthcare ๐.
- Objectives: Build three modules sans high-level libraries (no Pandas/NumPy โ pure file I/O! ๐ซ) using core Python basics.
dataset_module.py: Loads CSV into a nested dictionary ๐.query_module.py: Crunches stats (mean, median, mode) for stroke queries, e.g., average age for smokers with hypertension ๐งฎ; dietary habits by stroke outcome ๐; persists outputs to CSV ๐พ.ui_module.py: Interactive text-based menu for queries, weaving in prior modules ๐.
- Main Entry: Fire up
task1/main.ipynbin Jupyter for the demo ๐ฌ. - Key Learning: Iteration loops ๐, string wizardry โ๏ธ, and custom data structures for domain-specific software๐.
- Extensions: Streamlit.
- Objectives: Refactor with OOP flair ๐๏ธ; unleash EDA with libraries; craft predictive models.
load2.py: OOP-savvy loading + cleaning ๐งน.eda2.py: Tackles missing data ๐, descriptive stats (mean, SD, skewness) ๐, visualizations (bar/pie/box/scatter plots via Matplotlib/Seaborn ๐จ), class balancing (e.g., SMOTE for stroke skew โ๏ธ), and train-test split ๐ฏ.ui2.py: Upgraded UI for EDA/ML insights ๐ฅ๏ธ.- ML Magic: Feature engineering (e.g., BMI buckets ๐); trains 3 classifiers (Naive Bayes, Random Forest & XGBoost) per target (
chronic_stress๐ฐ,physical_activity๐๏ธ,income_level๐ฐ,stroke๐ง ) using Scikit-learn. Evaluates via confusion matrices ๐บ๏ธ, precision/recall/accuracy ๐ฏ; visualizes model showdowns ๐.
- Main Entry: Launch
task2/main2.ipynbfor the full pipeline ๐. - Key Learning: OOP encapsulation/inheritance ๐ก๏ธ, ML ethics โ๏ธ, and performance deep-dives ๐.
- Extensions: Simple Tkinter GUI for predictions ๐จ.
- Python 3.8+ ๐; Jupyter Notebooks ๐.
- Task 1: Core Python (file I/O ๐, dicts/lists ๐๏ธ).
- Task 2: Pandas ๐ผ, NumPy ๐ข, Matplotlib/Seaborn ๐, Scikit-learn ๐ค, Imbalanced-learn โ๏ธ.
- Clone the Repo:
git clone https://github.com/yourusername/stroke-data-analytics-projects.git๐ฅ. - Task 1 Demo:
cd Task1 && jupyter notebook main.ipynb๐. - Task 2 Pipeline:
cd Task2 && jupyter notebook main2.ipynb๐ฏ. - Reports & Insights: Flip through
/task1/Report.pdf๐ and/task2/PCP2_Akshen_Report.pdffor designs, pseudocode, and reflections ๐ญ.
These projects sharpened my modular Python chops ๐ ๏ธ, from gritty low-level data wrangling to slick ML deployment ๐, sparking critical vibes on healthcare biases (e.g., urban-rural stroke gaps ๐๏ธ๐พ). Hurdles? Task 1's manual stats grind โ conquered with smart loops ๐; Task 2's class imbalances boosted model grit ๐ช. On the pro front, it mirrors real data scientist gigs, stressing clean code ๐งน and ethical AI ๐ค. Next time? Wire in real-time APIs for live alerts ๐ก.
MIT License โ Fork away, just shout-out! ๐
Author: Akshen Dhami ([email protected]) & https://www.linkedin.com/in/akshen-dhami22.