Forecasting with Facebook's Prophet Framework

This is a Load Forecast tutorial. We need to predict the load on the electric grid for the next 72 hours to generate or purchase enough electricity to cover the load. We need to pick the device level on the grid we will estimate to do this. The lowest level will be the meter (residential electric meter). So, forecast the load needed on each meter and roll that data up to the substation. Or we could use the substation load only in our estimation, which will work but is not as accurate or fine-grained as the meter level.

This tutorial is to predict the next 72-hour load for a single meter, which can run on my laptop. I will later show you how to execute this process on thousands or millions of meters using a distributed execution engine like Apache Spark. The code will not change much; rather, it is wrapped into a PySpark UDF. For now, let's stick to one meter.

The key to accurate load forecasting is not your coding ability or which language or framework you use. The key is your domain knowledge of the subject you are forecasting.

The best thing you can do as a data scientist is to increase your domain knowledge by learning all you can about electricity usage, precisely the scenarios that affect load (human electricity usage patterns.)

What causes humans to consume more or less electricity?
Which appliances use the most electricity, and when are they used?
What is the effect of seasons on electricity usage?
Can we correlate usage with hours of the day?
Look at the min/max load for each hour, day, month, and year, and think about why it is the peak for the valley.

Answers to these questions should drive your approach to constructing the features for your model.

In the code below, I have a function for Super Bowl Sunday, a yearly American Football game in February. Many people change their habits that day and are at home or gathered together somewhere to watch the game. If you look at the data, you will see a difference in the usage data on that day. You might think that is irrelevant or that the playoffs should be added. That is where your creativity comes into play. Test if I'm wrong, and let me know. It may not affect and is a waste of time/CPU cycles. It's up to you.

Please visit our GitHub repo for complete code or contribution:
Load Forecast

This is an evolving tutorial, so more data and info will be added as we move forward.

Imports

import pandas as pd
from prophet import Prophet
from datetime import datetime
import numpy as np
import util

Functions

These seasonality functions will describe days as related or unrelated to other days. This is where we use our creativity, domain knowledge, etc., to create scenarios that would affect the forecast.

Tell the model about SuperBowl Sunday.

def nfl_sunday(ds):
    date = pd.to_datetime(ds)
    if date.weekday() == 6 and (date.month > 8 or date.month < 2):
        return 1
    else:
        return 0

Tell the model about the school year. Students being at home vs. school changes electricity usage.

def school(ds):
    date = pd.to_datetime(ds)
    if date.weekday() and (date.month > 7 or date.month < 5):
        return 1
    else:
        return 0

Tell the model about seasons. Summer and Winter spikes usage while Spring and Fall usage is much lower.

def spring(ds):
    date = pd.to_datetime(ds)
    if date.month in (3, 4, 5):
        return 1
    else:
        return 0

def summer(ds):
    date = pd.to_datetime(ds)
    if date.month in (6, 7, 8):
        return 1
    else:
        return 0

def fall(ds):
    date = pd.to_datetime(ds)
    if date.month in (9, 10, 11):
        return 1
    else:
        return 0

def winter(ds):
    date = pd.to_datetime(ds)
    if date.month in (12, 1, 2):
        return 1
    else:
        return 0

Data Setup

# flat, linear, or logistic for growth
model_type = 'flat'

history: pd.DataFrame = util.get_voltage_data()
row_cnt: int = len(history.index)

# we want to leave the last 864 rows for testing the model
history = history.head(row_cnt - 864)

# adding functions to pandas dataframe
history['nfl_sunday'] = history['ds'].apply(nfl_sunday)
history['school'] = history['ds'].apply(school)
history['spring'] = history['ds'].apply(spring)
history['summer'] = history['ds'].apply(summer)
history['fall'] = history['ds'].apply(fall)
history['winter'] = history['ds'].apply(winter)

Model Setup

m = Prophet()
m.add_seasonality('nfl_sunday')
m.add_seasonality('school')
m.add_seasonality('spring')
m.add_seasonality('summer')
m.add_seasonality('fall')
m.add_seasonality('winter')
m.add_country_holidays(country_name='US')

Train the model


        m.fit(history)

Create Future Dataframe

# we are working with 5 minute interval data which is 12 readings an hour
# and we need 72 hour forecast for a total of 864 readings
future = m.make_future_dataframe(periods=12*72, freq='5m', include_history=False)
future['nfl_sunday'] = future['ds'].apply(nfl_sunday)
future['school_in'] = future['ds'].apply(school_in)
future['spring'] = future['ds'].apply(spring)
future['summer'] = future['ds'].apply(summer)
future['fall'] = future['ds'].apply(fall)
future['winter'] = future['ds'].apply(winter)

Execute Model


        forecast = m.predict(future)[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]

View Results

pd.set_option('display.max_rows', None)
print(forecast.to_string())