Skip to content

๐Ÿš€ Stroke Data Analytics Projects: Python-powered insights into cardiovascular risks using 172K patient records โ€“ from procedural querying to ML predictions! ๐Ÿ’‰๐Ÿ“Š

License

Notifications You must be signed in to change notification settings

Akshen22/Stroke-Data-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Stroke Data Analytics ๐Ÿง ๐Ÿ’ป

๐Ÿ“‹ Overview

This repository showcases the implementations for two interconnected coursework ๐ŸŽ“. We dive deep into stroke data analytics using a simulated dataset of 172,000 anonymous patient records ๐Ÿ“ˆ. The focus? Analyzing cardiovascular risk factors like age ๐Ÿ‘ด, hypertension ๐Ÿฉธ, smoking ๐Ÿšฌ, glucose levels ๐Ÿฏ, and lifestyle habits ๐Ÿƒโ€โ™‚๏ธ to empower clinicians in preventing fatalities โš•๏ธ.

The dataset (data.csv in /shared/) packs 20 features such as age, hypertension, heart_disease, avg_glucose_level, bmi, smoking_status, stroke, and more ๐ŸŒ. Both tasks champion ethical AI use ๐Ÿค, inclusivity in health tech โ™ฟ, and sustainability in data-driven healthcare ๐ŸŒ.

๐Ÿ“‚ Task 1: Procedural Data Loading & Querying๐Ÿ”

  • Objectives: Build three modules sans high-level libraries (no Pandas/NumPy โ€“ pure file I/O! ๐Ÿšซ) using core Python basics.
    • dataset_module.py: Loads CSV into a nested dictionary ๐Ÿ“š.
    • query_module.py: Crunches stats (mean, median, mode) for stroke queries, e.g., average age for smokers with hypertension ๐Ÿงฎ; dietary habits by stroke outcome ๐ŸŽ; persists outputs to CSV ๐Ÿ’พ.
    • ui_module.py: Interactive text-based menu for queries, weaving in prior modules ๐Ÿ”—.
  • Main Entry: Fire up task1/main.ipynb in Jupyter for the demo ๐ŸŽฌ.
  • Key Learning: Iteration loops ๐Ÿ”„, string wizardry โœ‚๏ธ, and custom data structures for domain-specific software๐Ÿ†.
  • Extensions: Streamlit.

๐Ÿ“Š Task 2: OOP, EDA, & ML Predictions ๐Ÿค–

  • Objectives: Refactor with OOP flair ๐Ÿ—๏ธ; unleash EDA with libraries; craft predictive models.
    • load2.py: OOP-savvy loading + cleaning ๐Ÿงน.
    • eda2.py: Tackles missing data ๐Ÿ”, descriptive stats (mean, SD, skewness) ๐Ÿ“‰, visualizations (bar/pie/box/scatter plots via Matplotlib/Seaborn ๐ŸŽจ), class balancing (e.g., SMOTE for stroke skew โš–๏ธ), and train-test split ๐ŸŽฏ.
    • ui2.py: Upgraded UI for EDA/ML insights ๐Ÿ–ฅ๏ธ.
    • ML Magic: Feature engineering (e.g., BMI buckets ๐Ÿ“); trains 3 classifiers (Naive Bayes, Random Forest & XGBoost) per target (chronic_stress ๐Ÿ˜ฐ, physical_activity ๐Ÿ‹๏ธ, income_level ๐Ÿ’ฐ, stroke ๐Ÿง ) using Scikit-learn. Evaluates via confusion matrices ๐Ÿ—บ๏ธ, precision/recall/accuracy ๐ŸŽฏ; visualizes model showdowns ๐Ÿ“ˆ.
  • Main Entry: Launch task2/main2.ipynb for the full pipeline ๐Ÿš€.
  • Key Learning: OOP encapsulation/inheritance ๐Ÿ›ก๏ธ, ML ethics โš–๏ธ, and performance deep-dives ๐Ÿ“š.
  • Extensions: Simple Tkinter GUI for predictions ๐ŸŽจ.

๐Ÿ› ๏ธ Technologies Stack

  • Python 3.8+ ๐Ÿ; Jupyter Notebooks ๐Ÿ““.
  • Task 1: Core Python (file I/O ๐Ÿ“, dicts/lists ๐Ÿ—‚๏ธ).
  • Task 2: Pandas ๐Ÿผ, NumPy ๐Ÿ”ข, Matplotlib/Seaborn ๐Ÿ“Š, Scikit-learn ๐Ÿค–, Imbalanced-learn โš–๏ธ.

๐Ÿš€ How to Run (Step-by-Step) ๐Ÿ•น๏ธ

  1. Clone the Repo: git clone https://github.com/yourusername/stroke-data-analytics-projects.git ๐Ÿ“ฅ.
  2. Task 1 Demo: cd Task1 && jupyter notebook main.ipynb ๐Ÿ”„.
  3. Task 2 Pipeline: cd Task2 && jupyter notebook main2.ipynb ๐ŸŽฏ.
  4. Reports & Insights: Flip through /task1/Report.pdf ๐Ÿ“„ and /task2/PCP2_Akshen_Report.pdf for designs, pseudocode, and reflections ๐Ÿ’ญ.

๐Ÿ’ญ Reflections & Takeaways ๐ŸŒŸ

These projects sharpened my modular Python chops ๐Ÿ› ๏ธ, from gritty low-level data wrangling to slick ML deployment ๐Ÿš€, sparking critical vibes on healthcare biases (e.g., urban-rural stroke gaps ๐Ÿ™๏ธ๐ŸŒพ). Hurdles? Task 1's manual stats grind โ€“ conquered with smart loops ๐Ÿ”„; Task 2's class imbalances boosted model grit ๐Ÿ’ช. On the pro front, it mirrors real data scientist gigs, stressing clean code ๐Ÿงน and ethical AI ๐Ÿค. Next time? Wire in real-time APIs for live alerts ๐Ÿ“ก.

๐Ÿ“œ License

MIT License โ€“ Fork away, just shout-out! ๐ŸŽ‰

Author: Akshen Dhami ([email protected]) & https://www.linkedin.com/in/akshen-dhami22.

About

๐Ÿš€ Stroke Data Analytics Projects: Python-powered insights into cardiovascular risks using 172K patient records โ€“ from procedural querying to ML predictions! ๐Ÿ’‰๐Ÿ“Š

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published