Beginner-friendly project for exploring the CORD-19 metadata.csv dataset and creating an interactive Streamlit app.
Framework_Assignment/
ββ metadata.csv # dataset (downloaded separately)
ββ notebook.ipynb # Jupyter notebook with step-by-step analysis
ββ app.py # Streamlit app
ββ requirements.txt # dependencies
ββ README.md # this file
- Clone the repository or download it.
git clone <your-repo-url>
cd Framework_Assignment- (Optional) Create a virtual environment:
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows- Install required packages:
pip install -r requirements.txt- Open Jupyter Notebook:
jupyter notebook- Run through notebook.ipynb step by step to:
- Load and clean the metadata
- Explore basic statistics
- Create visualizations
-
Make sure you have metadata.csv in the same folder as
app.py. -
Run the app:
streamlit run app.py- The app will open in your browser. Use the sidebar to filter year range and number of rows loaded.
- A Jupyter Notebook showing:
- Basic exploration of the dataset
- Data cleaning steps
- Visualizations (publications by year, top journals, word frequencies)
- A working Streamlit app to interactively explore results
- If the dataset is too large, load a sample with
nrows=10000. - Word cloud is optional and requires the
wordcloudpackage. - Push this repo to GitHub as
Framework_Assignmentand submit the repo URL for your assignment.