Skip to content

ynakoo/DataMining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Electricity Consumption Data Analysis

Project Overview

This project analyzes electricity consumption data to provide insights for resource allocation and supply chain enhancement. The analysis addresses key questions about demand patterns, regional variations, forecasting, and supply risk areas.

Files in this Project

  1. electricity_analysis.ipynb - Main Jupyter notebook with complete analysis
  2. Yearly_Demand_Profile_converted (1).csv - Hourly electricity demand data for 2025
  3. cleaned_electricity_data.csv - State-wise electricity data (2015-2025)
  4. electricity_clean.csv - Monthly aggregated data (2024-2025)

Exploratory Questions Addressed

  1. How does electricity usage vary by region, time of day, and season?

    • Hourly demand patterns
    • Day of week analysis
    • Seasonal variations
    • State-wise regional comparison
  2. Are there recurring demand spikes that could stress the grid?

    • Spike detection using percentile analysis
    • Temporal distribution of spikes
    • Peak hour identification
  3. How do different regions differ in consumption?

    • State categorization by consumption levels
    • Supply efficiency analysis
    • Regional consumption patterns
  4. Can you forecast electricity demand for the next 5–10 years?

    • CAGR-based forecasting
    • State-level projections
    • National demand forecast
  5. Which areas are at risk of under-supply?

    • Shortage trend analysis
    • High-risk state identification
    • Future supply gap projections

Running the Notebook

Local Environment

  1. Ensure you have Python 3.x installed
  2. Install required packages:
    pip install numpy pandas matplotlib seaborn jupyter
  3. Open the notebook:
    jupyter notebook electricity_analysis.ipynb
  4. Run all cells sequentially

Kaggle Notebook

To run this analysis on Kaggle:

  1. Create a new Kaggle notebook

    • Go to Kaggle
    • Click "Code" → "New Notebook"
  2. Upload the datasets

    • Click "Add Data" → "Upload"
    • Upload all three CSV files:
      • Yearly_Demand_Profile_converted (1).csv
      • cleaned_electricity_data.csv
      • electricity_clean.csv
  3. Update file paths in the notebook

    • In the "Data Loading" section, change the file paths from:
      hourly_data = pd.read_csv('Yearly_Demand_Profile_converted (1).csv')
      state_data = pd.read_csv('cleaned_electricity_data.csv')
      monthly_data = pd.read_csv('electricity_clean.csv')
    • To:
      hourly_data = pd.read_csv('/kaggle/input/your-dataset-name/Yearly_Demand_Profile_converted (1).csv')
      state_data = pd.read_csv('/kaggle/input/your-dataset-name/cleaned_electricity_data.csv')
      monthly_data = pd.read_csv('/kaggle/input/your-dataset-name/electricity_clean.csv')
    • Replace your-dataset-name with the actual dataset name created by Kaggle
  4. Copy the notebook code

    • Copy all cells from electricity_analysis.ipynb
    • Paste into your Kaggle notebook
  5. Run the analysis

    • Click "Run All" or execute cells sequentially
    • All required libraries (numpy, pandas, matplotlib, seaborn) are pre-installed on Kaggle

Key Features

Analysis Components

  • Temporal Analysis: Hourly, daily, and seasonal demand patterns
  • Regional Analysis: State-wise consumption and efficiency metrics
  • Spike Detection: Identification of grid stress periods
  • Forecasting: 5-year and 10-year demand projections using CAGR
  • Risk Assessment: Under-supply risk identification and quantification

Visualizations

  • Time series plots
  • Bar charts and horizontal bar charts
  • Pie charts for distribution analysis
  • Multi-panel comparison plots
  • Trend lines with confidence intervals

Statistical Methods

  • Descriptive statistics
  • Percentile-based spike detection
  • Compound Annual Growth Rate (CAGR) calculation
  • Supply efficiency metrics
  • Shortage volatility analysis

Key Insights

The analysis provides actionable insights on:

  1. Peak Demand Periods: Identifies critical hours requiring grid reinforcement
  2. Regional Priorities: Highlights states needing infrastructure investment
  3. Growth Projections: Quantifies future capacity requirements
  4. Supply Risks: Pinpoints areas vulnerable to under-supply
  5. Investment Recommendations: Data-driven guidance for resource allocation

Dependencies

  • numpy: Numerical computations
  • pandas: Data manipulation and analysis
  • matplotlib: Basic plotting
  • seaborn: Advanced visualizations
  • datetime: Date/time handling

All dependencies are standard Python data science libraries and are pre-installed in Kaggle environments.

Output

The notebook generates:

  • 15+ visualizations
  • Statistical summaries
  • Forecast tables
  • Risk assessment reports
  • Actionable recommendations

Notes

  • The notebook is designed to work seamlessly on Kaggle with minimal modifications
  • Only file paths need to be updated for Kaggle compatibility
  • All visualizations are inline and will display in the notebook
  • The analysis is fully reproducible with the provided datasets

Author

Data Mining Project - Electricity Consumption Analysis for Resource Allocation


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published