Theory

A Gaussian Process Nonlinear Autoregressive model (GP-NARX) generalizes the linear autoregressive model. We place a Gaussian process prior over the lagged regressors, choose a kernel and proceed from there. More formally:

$y_{t} \mid f_{t} \sim p\left(y_{t} \mid f_{t}\right)$

$f_{t} = f\left(y_{t-1},y_{t-2},...\right) = f\left(Y_{t-1}\right)$

$f\left(Y_{t-1}\right) \sim GP\left(m_{f}\left(Y_{t-1}\right),{\kappa\left(Y_{t-1},Y^{'}_{t-1}\right)}\right)$

Predictions (and their variance) for new data $Y^{*}_{t-1}$ can be computed from the properties of the multivariate normal distribution:

$\mu_{*} = K^{T}_{*}K^{-1}y$

$\Sigma_{*} = K_{**} - K^{T}_{*}K^{-1}K_{*}$

For the GP-NARX model we assume there is an unknown function plus additive Gaussian noise $\epsilon$, so the covariance matrix K is actually equal to $K_{old} + \sigma^{2}{I}$ – this prevents perfect interpolation. Computationally, it is common practice to add a small jitter to the covariance matrix K to prevent eigenvalues from dying off quickly.

PyFlux

The kernel options for GP-NARX are as follows:

import numpy as np
import pyflux as pf
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

supported.columns = ['PyFlux Kernel']
supported

PyFlux Kernel
Squared Exponential SquaredExponential()
Ornstein Uhlenbeck OrnsteinUhlenbeck()
Periodic Periodic()

We’ll look at building a GP-NARX model for US real GDP growth, with data from the St. Louis Fed:

growthdata = pd.read_csv('http://www.pyflux.com/notebooks/GDPC1.csv')
USgrowth = pd.DataFrame(np.diff(np.log(growthdata['VALUE']))[149:len(growthdata['VALUE'])])
USgrowth.index = pd.to_datetime(growthdata['DATE'].values[1+149:len(growthdata)])
USgrowth.columns = ['US Real GDP Growth']
plt.figure(figsize=(15,5))
plt.plot(USgrowth)
plt.ylabel('Real GDP Growth')
plt.title('US Real GDP Growth');


We’ll use a Squared Exponential kernel and four lags to account for the fact it is quarterly data:

model = pf.GPNARX(USgrowth,ar=4,kernel=pf.SquaredExponential())
x = model.fit()
x.summary()

GPNARX(4)
======================================================= =================================================
Dependent Variable: US Real GDP Growth                  Method: MLE
Start Date: 1985-07-01 00:00:00                         Log Likelihood: -159.1182
End Date: 2015-10-01 00:00:00                           AIC: 324.2365
Number of observations: 122                             BIC: 332.6485
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.
======================================== ========== ========== ======== ======== ========================
Noise Sigma^2                            1.1352
l                                        3.4602
tau                                      1.3509
=========================================================================================================


We can plot the fit of the model and see how it smooths the time series using plot_fit:

model.plot_fit(figsize=((15,10)))


We can plot the latent variables for the GP using plot_z:

model.plot_z(figsize=((15,5)))


We can predict the next 2 years of growth using plot_predict:

model.plot_predict(h=8,past_values=20,figsize=(15,5))


We can get the predictions in DataFrame format using predict:

model.predict(h=8)

US Real GDP Growth
2016-01-07 0.004924
2016-04-06 0.005348
2016-07-06 0.005210
2016-10-06 0.005211
2017-01-06 0.005248
2017-04-06 0.005273
2017-07-06 0.005273
2017-10-06 0.005277