Vovk’s Aggregating Algorithm is an algorithm for prediction with expert advice. Each expert makes predictions in each period and a loss function judges the strength of these predictions. The loss function for each expert then decays their weight through an exponentiated weighted scheme. So better experts gain more weight over time.
Suppose we have experts, then we initialize the algorithm with equal weights , and update the weights at each period according to:
Where is a learning rate, and is a loss function for the expert’s prediction in , e.g. a squared loss function. The weights can be normalized to sum to one to produce the ensemble prediction which combines each expert’s predictions and weights.
First we import dependencies:
import numpy as np import pyflux as pf from pandas.io.data import DataReader from datetime import datetime import pandas as pd import matplotlib.pyplot as plt %matplotlib inline
We will estimate some models on US GDP growth data:
growthdata = pd.read_csv('http://www.pyflux.com/notebooks/GDPC1.csv') USgrowth = pd.DataFrame(np.diff(np.log(growthdata['VALUE']))[149:len(growthdata['VALUE'])]) USgrowth.index = pd.to_datetime(growthdata['DATE'].values[1+149:len(growthdata)]) USgrowth.columns = ['US Real GDP Growth'] plt.figure(figsize=(15,5)) plt.plot(USgrowth) plt.ylabel('Real GDP Growth') plt.title('US Real GDP Growth');
We will use the following models:
model1 = pf.ARIMA(data=USgrowth, ar=4, ma=0) model2 = pf.ARIMA(data=USgrowth, ar=8, ma=0) model3 = pf.LLEV(data=USgrowth) model4 = pf.GASLLEV(data=USgrowth, family=pf.GASt()) model5 = pf.GPNARX(data=USgrowth, ar=1, kernel=pf.SquaredExponential()) model6 = pf.GPNARX(data=USgrowth, ar=2, kernel=pf.SquaredExponential())
We set up the Aggregate model with a squared loss penalty and add the models:
mix = pf.Aggregate(learning_rate=1.0, loss_type='squared') mix.add_model(model1) mix.add_model(model2) mix.add_model(model3) mix.add_model(model4) mix.add_model(model5) mix.add_model(model6)
To find a good learning rate we can use tune_learning_rate. For this example, we are going to look at our model performance over 40 periods (10 years):
Let’s run the mixture on the last 40 datapoints (quarters) of the data and view the evolution through plot_weights:
The best ensemble loss was achieved with a high learning rate, likely because a subset of models were harming the ensemble performance (a high learning rate kills these off). Pre financial crisis, the Gaussian local level model was a good performer. Following the crisis, the Student t local level model dominates.
We can get the summary results for the ensemble and the model constituents with summary:
|t GAS LLM||0.000043|
We can see that the GP models are not performing well, which is why a high learning rate hyperparameter was preferred. Let’s see what happens if we exclude these models.
mix2 = pf.Aggregate(learning_rate=1.0, loss_type='squared') mix2.add_model(model1) mix2.add_model(model2) mix2.add_model(model3) mix2.add_model(model4) mix2.tune_learning_rate(h=40) mix2.learning_rate
Now an extremely low learning rate is preferred that suggests an equal weights solution. Let’s plot the weights:
Let’s summarize again:
|t GAS LLM||0.000043|
As we can see, the ensemble performs better than any individual model. We can view what the in-sample ensemble predictions were with predict_is:
We can predict out of sample with the ensemble using predict:
If we want the latest weights in an array, we can use the run method directly:
latest_weights, _, _ = mix2.run(h=40) latest_weights = latest_weights[-1,] print(latest_weights)
[ 0.24999999 0.24999996 0.25000001 0.25000003]