This repository contains my personal projects exploring data analysis using Python
- Data Loading & Cleaning
- Exploratory Data Analysis (EDA)
- Understanding dataset structure and distributions
- Identifying trends, patterns, and anomalies
- Comparing metrics across categories and time
You can easily download and install it from the official website. This is basically the main programming language used to run all analysis in this repository.
What it does:
- Executes Python scripts
- Runs data analysis code
- Allows usage of data libraries such as pandas and NumPy
This is where you can write and run the scripts, there are all sorts of code editor out there, such as PyCharm, Jupyter Notebook , etc. but I personally prefer VSCode cause it requires less RAM than full-fledge IDEs like JetBrains products.
This project mainly uses NumPy and pandas. Numpy is a numerical computing library that powers pandas internall. You can do it by copying this to the Terminal
python -m pip install pandas
Description
This project analyzes the NYC Citywide Payroll dataset to explore payroll distribution across agencies, job titles, and fiscal years. The dataset contains over 2 million records, making it suitable for practicing large-scale data cleaning and analysis using Python.
Key Analysis Performed
- Cleaning salary fields stored as text with currency symbols
- Creating a Total Pay metric from multiple payroll components
- Identifying high-paying agencies and job titles
- Analyzing overtime dependency across agencies
- Exploring payroll trends over time
Project Link: πView Detailed Project Here
Dataset Source: πNYC Citywide Payroll Dataset(Kaggle)