Skip to content

This demonstrates how Python can be used alongside SQL and Excel to perform exploratory data analysis, handle large datasets, and automate analytical workflows using pandas.

Notifications You must be signed in to change notification settings

giomusyaffa/Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Python Portfolio 🐍

Welcome to my Python data analysis portfolio πŸ‘‹

This repository contains my personal projects exploring data analysis using Python

Objectives 🎯

  • Data Loading & Cleaning
  • Exploratory Data Analysis (EDA)
  • Understanding dataset structure and distributions
  • Identifying trends, patterns, and anomalies
  • Comparing metrics across categories and time

Setup πŸ› οΈ

Step 1: Install Python

image You can easily download and install it from the official website. This is basically the main programming language used to run all analysis in this repository.

What it does:

  • Executes Python scripts
  • Runs data analysis code
  • Allows usage of data libraries such as pandas and NumPy

Step 2: Install Visual Studio Code (VS Code)

image This is where you can write and run the scripts, there are all sorts of code editor out there, such as PyCharm, Jupyter Notebook , etc. but I personally prefer VSCode cause it requires less RAM than full-fledge IDEs like JetBrains products.

Step 3: Install Required Libraries

This project mainly uses NumPy and pandas. Numpy is a numerical computing library that powers pandas internall. You can do it by copying this to the Terminal

python -m pip install pandas

Projects πŸ“‚

1. NYC Citywide Payroll Data πŸ’Έ

Description

This project analyzes the NYC Citywide Payroll dataset to explore payroll distribution across agencies, job titles, and fiscal years. The dataset contains over 2 million records, making it suitable for practicing large-scale data cleaning and analysis using Python.

Key Analysis Performed

  • Cleaning salary fields stored as text with currency symbols
  • Creating a Total Pay metric from multiple payroll components
  • Identifying high-paying agencies and job titles
  • Analyzing overtime dependency across agencies
  • Exploring payroll trends over time

Project Link: πŸ”—View Detailed Project Here

Dataset Source: πŸ”—NYC Citywide Payroll Dataset(Kaggle)

About

This demonstrates how Python can be used alongside SQL and Excel to perform exploratory data analysis, handle large datasets, and automate analytical workflows using pandas.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published