Global COVID-19 Mortality & Outcome Prediction (Python)

End-to-end data analytics and machine learning pipeline analyzing global COVID-19 trends, mortality risk, and patient outcomes

📌 Project Overview

This project performs a comprehensive data analytics and machine learning analysis on global COVID-19 data to understand the spread, severity, and outcomes of the pandemic across countries and regions.

Using Python, the project integrates:

Exploratory Data Analysis (EDA)
ETL (Extract–Transform–Load) pipelines
Statistical analysis
Supervised machine learning models
Geographic and time-series visualizations

The analysis is designed to reflect real-world public-health and data-science workflows, emphasizing interpretability, scalability, and analytical rigor.

🌍 Business & Public Health Context

During global health crises, policymakers and healthcare systems rely on data to:

Monitor disease spread
Predict mortality risk
Allocate healthcare resources
Assess recovery trends
Compare regional impacts

This project demonstrates how data analytics and machine learning can support evidence-based decision-making during large-scale public health emergencies.

🎯 Project Objectives

Analytical Goals

Analyze global COVID-19 trends over time
Identify relationships between confirmed cases, deaths, recoveries, and active cases
Engineer meaningful health metrics (fatality & recovery rates)
Predict COVID-19 mortality using machine learning
Classify outbreak outcomes (recovery-dominant vs death-dominant)

Technical Goals

Build a clean, reusable ETL pipeline
Apply regression and classification models
Evaluate model performance using multiple metrics
Visualize global and temporal patterns effectively

📊 Dataset Summary

~49,000+ records across multiple countries
Daily global COVID-19 reporting (2020)
Publicly available open data sources

Key Features

Province / State
Country / Region
WHO Region
Latitude & Longitude
Date
Confirmed Cases
Deaths
Recovered
Active Cases

Derived metrics:

Case Fatality Rate
Recovery Rate

🔄 Data Engineering (ETL Pipeline)

Extract

Loaded raw CSV data into Pandas DataFrames

Transform

Converted date fields for time-series analysis
Handled missing geographic values
Removed duplicate records
Filtered invalid zero-case records
Engineered health indicators:
- Case Fatality Rate
- Recovery Rate

Load

Saved transformed datasets for reproducible downstream analysis

This ETL workflow ensures data quality, consistency, and reusability.

🔍 Exploratory Data Analysis (EDA)

Key analytical steps:

Statistical summaries to detect outliers and skewness
Global geospatial visualizations using latitude/longitude
WHO-region-based comparisons
Time-series trend analysis for confirmed, active, recovered, and death cases
Correlation analysis between confirmed cases and mortality

Visualizations include:

Global scatter maps
Time-series line plots
Regional bar charts
Regression plots for severity analysis

🤖 Machine Learning Models

1️⃣ Mortality Prediction (Regression)

Target: Predict number of deaths

Models implemented:

Random Forest Regressor
Support Vector Machine (SVM)

Evaluation metrics:

Mean Squared Error (MSE)
R² Score
Feature importance analysis

✔ Random Forest demonstrated strong predictive performance and interpretability
✔ Feature importance revealed active and confirmed cases as key mortality drivers

2️⃣ Patient Outcome Prediction (Classification)

Binary classification:

Recovery-Dominant (1)
Death-Dominant (0)

Model:

Random Forest Classifier

Metrics:

Precision
Recall
F1-Score
Confusion Matrix

✔ Achieved high overall accuracy
✔ Strong performance in identifying recovery-dominant scenarios

📈 Key Insights

Active case volume is a strong predictor of mortality
Confirmed cases strongly correlate with deaths
Recovery trends vary significantly by WHO region
Pandemic severity peaks mid-timeline rather than uniformly
Tree-based models outperform kernel-based methods for this dataset

🛠️ Tech Stack

Python
Pandas, NumPy
Matplotlib, Seaborn
Scikit-learn
Jupyter Notebook
Basemap (geospatial visualization)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Final Project - Sai Pratyusha Gorapalli (1).docx		Final Project - Sai Pratyusha Gorapalli (1).docx
Project_Milestone_2.doc		Project_Milestone_2.doc
Project_Milestone__1 (1).pdf		Project_Milestone__1 (1).pdf
README.md		README.md
SQL.ipynb		SQL.ipynb
covid-19_data (1).csv		covid-19_data (1).csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Global COVID-19 Mortality & Outcome Prediction (Python)

📌 Project Overview

🌍 Business & Public Health Context

🎯 Project Objectives

Analytical Goals

Technical Goals

📊 Dataset Summary

Key Features

🔄 Data Engineering (ETL Pipeline)

Extract

Transform

Load

🔍 Exploratory Data Analysis (EDA)

🤖 Machine Learning Models

1️⃣ Mortality Prediction (Regression)

2️⃣ Patient Outcome Prediction (Classification)

📈 Key Insights

🛠️ Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Pratyusha108/COVID-19-MORTALITY-RATE-PREDICTION-ANALYSIS-USING-PYTHON

Folders and files

Latest commit

History

Repository files navigation

Global COVID-19 Mortality & Outcome Prediction (Python)

📌 Project Overview

🌍 Business & Public Health Context

🎯 Project Objectives

Analytical Goals

Technical Goals

📊 Dataset Summary

Key Features

🔄 Data Engineering (ETL Pipeline)

Extract

Transform

Load

🔍 Exploratory Data Analysis (EDA)

🤖 Machine Learning Models

1️⃣ Mortality Prediction (Regression)

2️⃣ Patient Outcome Prediction (Classification)

📈 Key Insights

🛠️ Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages