This is a Load Forecast tutorial. We need to predict the load on the electric grid for the next 72 hours to generate or purchase enough electricity to cover the load. We need to pick the device level on the grid we will estimate to do this. The lowest level will be the meter (residential electric meter). So, forecast the load needed on each meter and roll that data up to the substation. Or we could use the substation load only in our estimation, which will work but is not as accurate or fine-grained as the meter level.
This tutorial is to predict the next 72-hour load for a single meter, which can run on my laptop. I will later show you how to execute this process on thousands or millions of meters using a distributed execution engine like Apache Spark. The code will not change much; rather, it is wrapped into a PySpark UDF. For now, let's stick to one meter.
The key to accurate load forecasting is not your coding ability or which language or framework you use. The key is your domain knowledge of the subject you are forecasting.This is an evolving tutorial, so more data and info will be added as we move forward.
import pandas as pd
from prophet import Prophet
from datetime import datetime
import numpy as np
import util
def nfl_sunday(ds):
date = pd.to_datetime(ds)
if date.weekday() == 6 and (date.month > 8 or date.month < 2):
return 1
else:
return 0
def school(ds):
date = pd.to_datetime(ds)
if date.weekday() and (date.month > 7 or date.month < 5):
return 1
else:
return 0
def spring(ds):
date = pd.to_datetime(ds)
if date.month in (3, 4, 5):
return 1
else:
return 0
def summer(ds):
date = pd.to_datetime(ds)
if date.month in (6, 7, 8):
return 1
else:
return 0
def fall(ds):
date = pd.to_datetime(ds)
if date.month in (9, 10, 11):
return 1
else:
return 0
def winter(ds):
date = pd.to_datetime(ds)
if date.month in (12, 1, 2):
return 1
else:
return 0
# flat, linear, or logistic for growth
model_type = 'flat'
history: pd.DataFrame = util.get_voltage_data()
row_cnt: int = len(history.index)
# we want to leave the last 864 rows for testing the model
history = history.head(row_cnt - 864)
# adding functions to pandas dataframe
history['nfl_sunday'] = history['ds'].apply(nfl_sunday)
history['school'] = history['ds'].apply(school)
history['spring'] = history['ds'].apply(spring)
history['summer'] = history['ds'].apply(summer)
history['fall'] = history['ds'].apply(fall)
history['winter'] = history['ds'].apply(winter)
m = Prophet()
m.add_seasonality('nfl_sunday')
m.add_seasonality('school')
m.add_seasonality('spring')
m.add_seasonality('summer')
m.add_seasonality('fall')
m.add_seasonality('winter')
m.add_country_holidays(country_name='US')
m.fit(history)
# we are working with 5 minute interval data which is 12 readings an hour
# and we need 72 hour forecast for a total of 864 readings
future = m.make_future_dataframe(periods=12*72, freq='5m', include_history=False)
future['nfl_sunday'] = future['ds'].apply(nfl_sunday)
future['school_in'] = future['ds'].apply(school_in)
future['spring'] = future['ds'].apply(spring)
future['summer'] = future['ds'].apply(summer)
future['fall'] = future['ds'].apply(fall)
future['winter'] = future['ds'].apply(winter)
forecast = m.predict(future)[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
pd.set_option('display.max_rows', None)
print(forecast.to_string())