## Theory

Gaussian state space models – often called structural time series or unobserved component models – provide a way to decompose a time series into several distinct components. These components can be extracted in closed form using the Kalman filter and smoother if the errors are jointly Gaussian, and parameters can be estimated via the prediction error decomposition and Maximum Likelihood. One classic univariate structural time series model is the local level model. We can write this as a combination of a time-varying level and an irregular term:

$y_{t} = \mu_{t} + \epsilon_{t}$

$\mu_{t} = \mu_{t-1} + \eta_{t}$

$\epsilon_{t} \sim N\left(0,\sigma_{\epsilon}^{2}\right)$

$\eta_{t} \sim N\left(0,\sigma_{\eta}^{2}\right)$

Another univariate model is a local linear trend model which allows the levels to have a time-varying trend:

$y_{t} = \mu_{t} + \epsilon_{t}$

$\mu_{t} = \beta_{t-1} + \mu_{t-1} + \eta_{t}$

$\beta_{t} = \beta_{t-1} + \zeta_{t}$

$\epsilon_{t} \sim N\left(0,\sigma_{\epsilon}^{2}\right)$

$\eta_{t} \sim N\left(0,\sigma_{\eta}^{2}\right)$

$\zeta_{t} \sim N\left(0,\sigma_{\zeta}^{2}\right)$

We can also support dynamic linear regression in the state space framework:

$y_{t} = \boldsymbol{x}_{t}^{'}\boldsymbol{\beta}_{t} + \epsilon_{t}$

$\boldsymbol{\beta}_{t} = \boldsymbol{\beta}_{t-1} + \boldsymbol{\eta}_{t}$

$\epsilon_{t} \sim N\left(0,\sigma_{\epsilon}^{2}\right)$

$\boldsymbol{\eta}_{t} \sim N\left(\boldsymbol{0},\Sigma_{\eta}\right)$

## PyFlux

### Local Level Model

We’ll first estimate a local level model on river discharge data from the river Nile:

import numpy as np
import pyflux as pf
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

nile = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/datasets/Nile.csv')
nile.index = pd.to_datetime(nile['time'].values,format='%Y')
plt.figure(figsize=(15,5))
plt.plot(nile.index,nile['Nile'])
plt.ylabel('Discharge Volume')
plt.title('Nile River Discharge');
plt.show()


We can fit a local level model using the LLEV command:

model = pf.LLEV(data=nile,target='Nile')
x = model.fit()
x.summary()

LLEV
======================================================= =================================================
Dependent Variable: Nile                                Method: MLE
Start Date: 1871-01-01 00:00:00                         Log Likelihood: -641.5238
End Date: 1970-01-01 00:00:00                           AIC: 1287.0476
Number of observations: 100                             BIC: 1292.258
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.
======================================== ========== ========== ======== ======== ========================
Sigma^2 irregular                        15098.5722
Sigma^2 level                            1469.11317
=========================================================================================================


Let’s have a look at the smoothed fit – the model adapts to the lower level at the beginning of the 20th century:

model.plot_fit(figsize=(15,10))


We can use the Durbin and Koopman simulation smoother to simulate draws from the local level state, using simulation_smoother:

plt.figure(figsize=(15,5))
for i in range(10):
plt.plot(model.index, model.simulation_smoother(
model.latent_variables.get_z_values())[0][0:model.index.shape[0]])
plt.show()


We can perform rolling in-sample prediction (uses past information to estimate parameters at each step) using plot_predict_is:

model.plot_predict_is(h=20,figsize=(15,5))


We can view out-of-sample predictions using plot_predict:

model.plot_predict(h=5,figsize=(15,5))

And output the predictions in DataFrame format using predict:

model.predict(h=5)

Nile
1970-12-31 798.368959
1971-12-31 798.368959
1972-12-30 798.368959
1973-12-31 798.368959
1974-12-31 798.368959

### Local Linear Trend Model

For this example, we’ll use data on logged US Real GDP.

growthdata = pd.read_csv('http://www.pyflux.com/notebooks/GDPC1.csv')
USgrowth = pd.DataFrame(np.log(growthdata['VALUE']))
USgrowth.index = pd.to_datetime(growthdata['DATE'])
USgrowth.columns = ['Logged US Real GDP']
plt.figure(figsize=(15,5))
plt.plot(USgrowth.index, USgrowth)
plt.ylabel('Real GDP')
plt.title('US Real GDP');


We can fit a local linear trend model using the LLT command:

model = pf.LLT(data=USgrowth)
x = model.fit()
x.summary()

LLT
======================================================= =================================================
Dependent Variable: Logged US Real GDP                  Method: MLE
Start Date: 1947-01-01 00:00:00                         Log Likelihood: 877.3334
End Date: 2015-10-01 00:00:00                           AIC: -1748.6667
Number of observations: 276                             BIC: -1737.8055
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.
======================================== ========== ========== ======== ======== ========================
Sigma^2 irregular                        3.306e-07
Sigma^2 level                            5.41832e-0
Sigma^2 trend                            1.55224e-0
=========================================================================================================


We can plot the fit of the model and view the various components using plot_fit:

model.plot_fit(figsize=(15,10))


The trend picks out the underlying local growth rate in the series. Together with the local level this constitutes the smoothed series. We can use the simulation smoother to simulate draws from the states, using simulation_smoother. We draw from the local trend state:

plt.figure(figsize=(15,5))
for i in range(10):
plt.plot(model.index,model.simulation_smoother(
model.latent_variables.get_z_values())[1][0:model.index.shape[0]])
plt.show()


We can perform rolling in-sample prediction using plot_predict_is (or just predict_is if we are after a DataFrame):

model.plot_predict_is(15,figsize=((15,5)))


Let’s predict the next 5 years of growth with this model using plot_predict:

model.plot_predict(h=20,past_values=17*4,figsize=((15,6)))


Let’s get these predictions in DataFrame form using predict so we can convert into growth rates:

model.predict(h=20)

Logged US Real GDP
DATE
2016-01-15 9.719219
2016-04-14 9.724158
2016-07-14 9.729097
2016-10-14 9.734036
2017-01-14 9.738975
2017-04-15 9.743914
2017-07-15 9.748853
2017-10-15 9.753792
2018-01-15 9.758731
2018-04-15 9.763670
2018-07-15 9.768609
2018-10-15 9.773548
2019-01-15 9.778487
2019-04-15 9.783426
2019-07-15 9.788365
2019-10-15 9.793304
2020-01-15 9.798243
2020-04-14 9.803182
2020-07-14 9.808121
2020-10-14 9.813060

print("Average growth rate for this period is")
print(str(round(100*np.mean(np.exp(np.diff(model.predict(h=20)['Logged US Real GDP'].values)) - 1),3)) + "%")

Average growth rate for this period is
0.495%


### Dynamic Linear Regression

In constructing portfolios, we are often after the $\beta$ of a stock which can be used to construct the systematic component of returns. But this may not be a static quantity. If we assume that returns are normally-distributed then we can use a dynamic linear regression model using the Kalman filter and smoothing algorithm to track its evolution. First let’s get some data on excess returns. We’ll look at Amazon stock (AMZN) and use the S&P500 as ‘the market’.

from pandas.io.data import DataReader
from datetime import datetime

a = DataReader('AMZN',  'yahoo', datetime(2012,1,1), datetime(2016,6,1))
a_returns = pd.DataFrame(np.diff(np.log(a['Adj Close'].values)))
a_returns.index = a.index.values[1:a.index.values.shape[0]]
a_returns.columns = ["Amazon Returns"]

spy = DataReader('SPY',  'yahoo', datetime(2012,1,1), datetime(2016,6,1))
spy_returns = pd.DataFrame(np.diff(np.log(spy['Adj Close'].values)))
spy_returns.index = spy.index.values[1:spy.index.values.shape[0]]
spy_returns.columns = ['S&P500 Returns']

one_mon = DataReader('DGS1MO', 'fred',datetime(2012,1,1), datetime(2016,6,1))
one_day = np.log(1+one_mon)/365

returns = pd.concat([one_day,a_returns,spy_returns],axis=1).dropna()
excess_m = returns["Amazon Returns"].values - returns['DGS1MO'].values
excess_spy = returns["S&P500 Returns"].values - returns['DGS1MO'].values
final_returns = pd.DataFrame(np.transpose([excess_m,excess_spy, returns['DGS1MO'].values]))
final_returns.columns=["Amazon","SP500","Risk-free rate"]
final_returns.index = returns.index

plt.figure(figsize=(15,5))
plt.title("Excess Returns")
x = plt.plot(final_returns);
plt.legend(iter(x), final_returns.columns);


Now let’s perform a dynamic linear regression using the DynReg command:

model3 = pf.DynReg('Amazon ~ SP500',data=final_returns)
model3.X_names = ['alpha','beta_{AMZN}']
x = model3.fit()
x.summary()

Dynamic Linear Regression
======================================================= =================================================
Dependent Variable: Amazon                              Method: MLE
Start Date: 2012-01-04 00:00:00                         Log Likelihood: 2871.5419
End Date: 2016-06-01 00:00:00                           AIC: -5737.0838
Number of observations: 1101                            BIC: -5722.0719
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.
======================================== ========== ========== ======== ======== ========================
Sigma^2 irregular                        0.0003
Sigma^2 1                                0.0
Sigma^2 SP500                            0.0024
=========================================================================================================


Let’s view the plot of the fit and the various components:

model3.plot_fit(figsize=(15,15))


The third plot shows $\beta_{AMZN}$. Following the burn-in period, the $\beta$ hovered just above $1$ in 2013, although it became very correlated with market performance in 2014. More recently it has settled down again to hover just above $1$. The fourth plot shows the remaining residual component of return (not including $\alpha$).