A Machine Learning Approach to Energy Forecasting, Electricity Demand: Delhi, India

Can we accurately predict electricity demand 5 minutes ahead to help grid operators make better decisions? This project explores that question using a comprehensive dataset of Delhi's electricity consumption patterns from 2021 to 2024.

DelhiAnalysisImage

Project Goals

  1. Develop an accurate short-term forecasting model for electricity demand.
  2. Engineer meaningful features from raw time series data.
  3. Analyze demand patterns across different time scales (hourly, daily, monthly, seasonal).
  4. Understand weather impacts on electricity consumption.
  5. Create a production-ready model with robust evaluation.
Data Preprocessing
  • Converted datetime to index with proper formatting
  • Handled missing values (540 in wdir, 2 in moving_avg_3)
  • Removed irrelevant columns

  • Project Walkthrough

    This project leverages a comprehensive 4-year dataset (2021-2024) of 5-minute interval electricity demand from Delhi, India, combined with key weather parameters (temperature, humidity, pressure, wind) to build an accurate short-term forecasting model. Through extensive feature engineering, I transformed raw time series data into predictive signals by creating temporal features (hour of day, day of week, month, quarter, weekend indicators), lag variables (24-hour and 168-hour demand lags), rolling statistics (24-period moving average and standard deviation), and integrating Indian holiday calendar data. The final XGBoost model, trained on 314,928 samples and tested on 78,326 out-of-time samples, achieves strong performance metrics (RMSE: 191.6 kW, MAE: 88.9 kW) and successfully captures daily demand cycles, weekly patterns, seasonal variations, holiday effects, and weather-driven fluctuations—demonstrating its potential to support grid stability through better load balancing and operational decision-making.

    Data Description

    The dataset, sourced from Kaggle, contains 393,440 records of electricity demand in Delhi at 5-minute intervals from January 2021 to December 2024. It includes a timestamp column in YYYY-MM-DD HH:mm:ss format along with the target variable, Power demand (measured in kW). Weather features such as temperature (°C), dew point (°C), relative humidity (%), wind direction (degrees), wind speed (m/s), and atmospheric pressure (hPa) are also provided. For ease of time-series analysis, the dataset breaks down the timestamp into individual year, month, day, hour, and minute components. Additionally, a 3-time-step moving average of power demand is included as a pre-engineered feature. This rich combination of high-frequency demand data and corresponding weather variables makes the dataset ideal for forecasting models and analyzing energy consumption patterns.

    Feature Engineering
  • dayofweek (Monday=1, Sunday=7)
  • dayofyear (day number 1-366)
  • quarter (1-4)
  • weekofyear (intuitive Jan 1 = week 1)
  • is_weekend (binary indicator)
  • DelhiAnalysisImage2

    This project explores that question using a comprehensive dataset of Delhi's electricity consumption patterns from 2021 to 2024. Below is the Jupyter Notebook of the project.

    Jupyter Notebook: Delhi Electricity Demand Forecasting



    Technical Implementation & Results

    The model development utilized Python's data science ecosystem (pandas, numpy, matplotlib, seaborn for EDA; scikit-learn for evaluation; XGBoost for prediction; joblib for model persistence) with key insights revealing that lag features dominate imp ortance, time-of-day shows clear cyclical patterns, seasonality impact demand, and temperature positively correlates with consumption. Visual validation confirms the model captures both short-term fluctuations and long-term trends, making it suitable for real-world grid applications.

    Skills Demonstrated

    This project showcases expertise in time series analysis and feature engineering, including creating lag variables, rolling statistics, and temporal indicators; building and optimizing XGBoost models with hyperparameter tuning; performing comprehensive exploratory data analysis and correlation studies; applying proper time-based train/test validation and model evaluation using RMSE and MAE metrics; leveraging the full Python data science stack (pandas, numpy, matplotlib, seaborn, scikit-learn, XGBoost); and translating technical findings into actionable insights for grid stability and energy infrastructure applications.

    Future Improvements
  • Deploy as API for real-time predictions
  • Incorporate weather forecasts (not just historical)
  • Experiment with deep learning (LSTM, Transformers)
  • Add economic factors (holiday spending, industrial activity)
  • Ensemble methods combining multiple algorithms

  • Let's Connect!

    Thank you for checking out my portfolio! I hope you enjoyed exploring my projects. You can also explore more of my data visualizations on Tableau Public. Feel free to view some of my brand identity development projects on Behance.

    Tikhala :-)