Theory


A Gaussian Process Nonlinear Autoregressive model (GP-NARX) generalizes the linear autoregressive model. We place a Gaussian process prior over the lagged regressors, choose a kernel and proceed from there. More formally:

y_{t} \mid f_{t}  \sim p\left(y_{t} \mid f_{t}\right)

f_{t} = f\left(y_{t-1},y_{t-2},...\right) = f\left(Y_{t-1}\right)

f\left(Y_{t-1}\right) \sim GP\left(m_{f}\left(Y_{t-1}\right),{\kappa\left(Y_{t-1},Y^{'}_{t-1}\right)}\right)


Predictions (and their variance) for new data Y^{*}_{t-1} can be computed from the properties of the multivariate normal distribution:

\mu_{*} = K^{T}_{*}K^{-1}y

\Sigma_{*} = K_{**} - K^{T}_{*}K^{-1}K_{*}


For the GP-NARX model we assume there is an unknown function plus additive Gaussian noise \epsilon, so the covariance matrix K is actually equal to K_{old} + \sigma^{2}{I} – this prevents perfect interpolation. Computationally, it is common practice to add a small jitter to the covariance matrix K to prevent eigenvalues from dying off quickly.

PyFlux

The kernel options for GP-NARX are as follows:

import numpy as np
import pyflux as pf
from pandas.io.data import DataReader
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 

supported = pd.DataFrame(['SquaredExponential()','OrnsteinUhlenbeck()','RationalQuadratic()','Periodic()'])
supported.columns = ['PyFlux Kernel']
supported.index  =['Squared Exponential','Ornstein Uhlenbeck','Rational Quadratic','Periodic']
supported
PyFlux Kernel
Squared Exponential SquaredExponential()
Ornstein Uhlenbeck OrnsteinUhlenbeck()
Rational Quadratic RationalQuadratic()
Periodic Periodic()

We’ll look at building a GP-NARX model for US real GDP growth, with data from the St. Louis Fed:

growthdata = pd.read_csv('http://www.pyflux.com/notebooks/GDPC1.csv')
USgrowth = pd.DataFrame(np.diff(np.log(growthdata['VALUE']))[149:len(growthdata['VALUE'])])
USgrowth.index = pd.to_datetime(growthdata['DATE'].values[1+149:len(growthdata)])
USgrowth.columns = ['US Real GDP Growth']
plt.figure(figsize=(15,5))
plt.plot(USgrowth)
plt.ylabel('Real GDP Growth')
plt.title('US Real GDP Growth');
png

We’ll use a Squared Exponential kernel and four lags to account for the fact it is quarterly data:

model = pf.GPNARX(USgrowth,ar=4,kernel=pf.SquaredExponential())
x = model.fit()
x.summary()
GPNARX(4)                                                                                                 
======================================================= =================================================
Dependent Variable: US Real GDP Growth                  Method: MLE                                       
Start Date: 1985-07-01 00:00:00                         Log Likelihood: -159.1182                         
End Date: 2015-10-01 00:00:00                           AIC: 324.2365                                     
Number of observations: 122                             BIC: 332.6485                                     
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.                 
======================================== ========== ========== ======== ======== ========================
Noise Sigma^2                            1.1352                                                           
l                                        3.4602                                                           
tau                                      1.3509                                                           
=========================================================================================================

We can plot the fit of the model and see how it smooths the time series using plot_fit:

model.plot_fit(figsize=((15,10)))
png

We can plot the latent variables for the GP using plot_z:

model.plot_z(figsize=((15,5)))
png

We can predict the next 2 years of growth using plot_predict:

model.plot_predict(h=8,past_values=20,figsize=(15,5))
png

We can get the predictions in DataFrame format using predict:

model.predict(h=8)
US Real GDP Growth
2016-01-07 0.004924
2016-04-06 0.005348
2016-07-06 0.005210
2016-10-06 0.005211
2017-01-06 0.005248
2017-04-06 0.005273
2017-07-06 0.005273
2017-10-06 0.005277