## Theory

Vovk’s Aggregating Algorithm is an algorithm for prediction with expert advice. Each expert makes predictions in each period and a loss function judges the strength of these predictions. The loss function for each expert then decays their weight through an exponentiated weighted scheme. So better experts gain more weight over time.

Suppose we have $k=1,..,K$ experts, then we initialize the algorithm with equal weights $w_{k}$, and update the weights at each period $t$ according to:

$w_{k,t}=w_{k, t-1}\exp\left(-\eta{L_{k, t-1}}\right)$

Where $\eta$ is a learning rate, and $L_{k, t-1}$ is a loss function for the expert’s prediction in $t-1$, e.g. a squared loss function. The weights can be normalized to sum to one to produce the ensemble prediction which combines each expert’s predictions and weights.

## PyFlux

First we import dependencies:

import numpy as np
import pyflux as pf
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


We will estimate some models on US GDP growth data:

growthdata = pd.read_csv('http://www.pyflux.com/notebooks/GDPC1.csv')
USgrowth = pd.DataFrame(np.diff(np.log(growthdata['VALUE']))[149:len(growthdata['VALUE'])])
USgrowth.index = pd.to_datetime(growthdata['DATE'].values[1+149:len(growthdata)])
USgrowth.columns = ['US Real GDP Growth']
plt.figure(figsize=(15,5))
plt.plot(USgrowth)
plt.ylabel('Real GDP Growth')
plt.title('US Real GDP Growth');


We will use the following models:

model1 = pf.ARIMA(data=USgrowth, ar=4, ma=0)
model2 = pf.ARIMA(data=USgrowth, ar=8, ma=0)
model3 = pf.LLEV(data=USgrowth)
model4 = pf.GASLLEV(data=USgrowth, family=pf.GASt())
model5 = pf.GPNARX(data=USgrowth, ar=1, kernel=pf.SquaredExponential())
model6 = pf.GPNARX(data=USgrowth, ar=2, kernel=pf.SquaredExponential())


We set up the Aggregate model with a squared loss penalty and add the models:

mix = pf.Aggregate(learning_rate=1.0, loss_type='squared')


To find a good learning rate we can use tune_learning_rate. For this example, we are going to look at our model performance over 40 periods (10 years):

mix.tune_learning_rate(40)
mix.learning_rate

100000.0


Let’s run the mixture on the last 40 datapoints (quarters) of the data and view the evolution through plot_weights:

mix.plot_weights(h=40, figsize=(15,5))


The best ensemble loss was achieved with a high learning rate, likely because a subset of models were harming the ensemble performance (a high learning rate kills these off). Pre financial crisis, the Gaussian local level model was a good performer. Following the crisis, the Student t local level model dominates.

We can get the summary results for the ensemble and the model constituents with summary:

mix.summary(h=40)

Squared Loss
Ensemble 0.000044
ARIMA(4,0,0) 0.000046
ARIMA(8,0,0) 0.000049
LLEV 0.000044
t GAS LLM 0.000043
GPNARX(1) 0.000059
GPNARX(2) 0.000055

We can see that the GP models are not performing well, which is why a high learning rate hyperparameter was preferred. Let’s see what happens if we exclude these models.

mix2 = pf.Aggregate(learning_rate=1.0, loss_type='squared')
mix2.tune_learning_rate(h=40)
mix2.learning_rate

0.001


Now an extremely low learning rate is preferred that suggests an equal weights solution. Let’s plot the weights:

mix2.plot_weights(h=40, figsize=(15,5))


Let’s summarize again:

mix2.summary(h=40)

Squared Loss
Ensemble 0.000042
ARIMA(4,0,0) 0.000046
ARIMA(8,0,0) 0.000049
LLEV 0.000044
t GAS LLM 0.000043

As we can see, the ensemble performs better than any individual model. We can view what the in-sample ensemble predictions were with predict_is:

mix2.predict_is(h=5)

0
2014-10-01 0.008900
2015-01-01 0.006043
2015-04-01 0.004974
2015-07-01 0.006713
2015-10-01 0.006267

We can predict out of sample with the ensemble using predict:

mix2.predict(h=5)

0
2016-01-04 0.004939
2016-04-05 0.005529
2016-07-04 0.005337
2016-10-03 0.005396
2017-01-03 0.005649

If we want the latest weights in an array, we can use the run method directly:

latest_weights, _, _ = mix2.run(h=40)
latest_weights = latest_weights[-1,]
print(latest_weights)

[ 0.24999999  0.24999996  0.25000001  0.25000003]