This project guides students through using Bayes' Rule, conditional probability, and Bayesian networks to reason about disease diagnosis from real-world-style medical data.
You'll explore how variables like symptoms, test results, and vaccination status affect medical diagnoses. The project includes:
- A synthetic medical dataset (
.csv) - A Jupyter notebook for guided analysis
- Exercises on joint/conditional probability and Bayes nets
- Visualizations of probabilistic dependencies
To install the required dependencies (including pgmpy, pygraphviz, and networkx), run:
conda env create -f environment.yml
conda activate cs3600-medical-analysis-env- Estimating probabilities from real-world-style data
- Computing joint and conditional probability tables
- Applying Bayes’ Rule for probabilistic inference
- Building and interpreting Bayesian networks (Bayes nets)
- Constructing and using Conditional Probability Tables (CPDs)
- Identifying (in)dependencies between variables
- Making inferences with partial or uncertain information
-
Explore the dataset
Understand the variables: diagnosis, symptoms, test results, vaccination status. -
Compute joint probabilities
Use pandas to create tables and uncover relationships between variables. -
Use Bayes’ Rule to update beliefs
Calculate the probability of a diagnosis given symptoms or test outcomes. -
Build a Bayesian Network from the data
Identify parent/child relationships and draw the graph structure. -
Estimate CPDs
Use grouped statistics to fill in tables like ( P(\text{Symptom} \mid \text{Diagnosis}) ). -
Perform inference using the Bayes net
Reason about unseen variables using observed evidence. -
Reflect and extend
Propose improvements, add variables, or try real-world scenarios.
To run the notebook and analysis scripts, you need:
- Python ≥ 3.10
- Conda (Anaconda or Miniconda recommended)
- The following Python libraries (installed via
environment.yml):pandasmatplotlibnetworkxpgmpygraphvizpygraphviznotebook