Assessment repository for CMP7005 — Programming for Data Analysis, part of the MSc Data Science programme (Semester 1, 2024–25).
This module builds core programming skills for data science — covering Python for data manipulation, exploratory data analysis, statistical computing, and building interactive data applications.
- Python — pandas, NumPy, SciPy, Matplotlib, Seaborn
- Streamlit — interactive web applications for data presentation
- Jupyter Notebooks — exploratory analysis and documented workflows
- Git & GitHub — version control and collaborative development
End-to-end data analysis project applying Python programming skills to a real-world dataset, culminating in an interactive Streamlit dashboard.
The full PRAC1 project is hosted in a dedicated repository: air-quality-analysis — Beijing Air Quality Analysis using multi-station PRSA data with a Streamlit dashboard.
Presentation of the data analysis project, communicating findings and methodology to a non-technical audience.
| File | Description |
|---|---|
st20310831_CMP7005_PRES1.pptx |
Presentation slides for PRES1 |
Programming for Data Analysis provides the foundational toolkit for every data scientist. Key topics include:
- Python fundamentals and data structures for analytics
- Data cleaning, transformation, and feature engineering with pandas
- Exploratory data analysis (EDA) workflows
- Statistical analysis and hypothesis testing
- Building and deploying interactive dashboards with Streamlit
- Best practices in reproducible research and version control
Student ID: st20310831 | MSc Data Science | 2024–25