## Theory

We are interested in problems of the form:

$p\left(y_{t}\mid{\theta}_{t}\right)$

$\theta_{t} = f\left(\alpha_{t}\right)$

$\alpha_{t} = \alpha_{t-1} + \eta_{t}$

$\eta_{t} \sim N\left(0,\Sigma\right)$

The state equation stays in the same Markovian form as the Gaussian state space model set-up. What differs is that we now have a potentially non-Gaussian measurement density. How do we estimate states in this setup? There are a number of approaches, including:

• MCMC

• Monte Carlo Maximum Likelihood (with importance sampling)

• Particle Filtering (Sequential Monte Carlo)

• Variational Inference

It is worth noting that most of these approaches only return smoothed estimates not filtered estimates, but this is often not a problem for the kind of problems we have to tackle. Currently PyFlux uses mean-field BBVI; the advantage of this approach is that it is quick; the disadvantage is that it does not fully account for the dependence between states. A future update will have a structured field alternative. To speed up BBVI, we use starting values from an approximate Gaussian distribution. See Durbin and Koopman (1997) for details.

## PyFlux

### Measurement Density Options

import pandas as pd
supported = pd.DataFrame(['.Poisson','.t','.Skewt','.Laplace','.Exponential'])
supported.columns = ['Density suffix']
supported.index  = ['Poisson','t','Skew t','Laplace','Exponential']
supported

Density suffix
Poisson .Poisson
t .t
Skew t .Skewt
Laplace .Laplace
Exponential .Exponential

### Poisson Measurement Density

#### Local Level Model

For fun, and since it’s topical, we’ll apply a Poisson local level model to count data on the number of goals the football team Leicester have scored since they rejoined the Premier League. Each index represents a match they have played.

import numpy as np
import pyflux as pf
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

leicester.columns= ["Time","Goals","Season2"]
plt.figure(figsize=(15,5))
plt.plot(leicester["Goals"])
plt.ylabel('Goals Scored')
plt.title('Leicester Goals Since Joining EPL');
plt.show()


We can fit a Poisson local level model using the NLLEV command:

model = pf.NLLEV.Poisson(data=leicester, target='Goals')
x = model.fit(iterations=5000)
x.summary()

10% done : ELBO is -107.599165657
20% done : ELBO is -127.571498111
30% done : ELBO is -136.25857363
40% done : ELBO is -137.626516299
50% done : ELBO is -137.539662707
60% done : ELBO is -137.321490055
70% done : ELBO is -137.518451697
80% done : ELBO is -137.311382466
90% done : ELBO is -136.3580387
100% done : ELBO is -137.346927749

Final model ELBO is -135.76799195
Poisson Local Level Model
======================================================= =================================================
Dependent Variable: Goals                               Method: BBVI
Start Date: 0                                           Unnormalized Log Posterior: -56.8409
End Date: 74                                            AIC: 115.681720125
Number of observations: 75                              BIC: 117.999208239
=========================================================================================================
Latent Variable                          Median             Mean               95% Credibility Interval
======================================== ================== ================== ==========================
Sigma^2 level                            0.0406             0.0406             (0.0353 | 0.0467)
=========================================================================================================


We can plot the evolution parameter with plot_z:

model.latent_variables.plot_z()


Let’s see what the extracted smoothed states look like with plot_fit:

model.plot_fit(figsize=(15,10))


We can predict forward using plot_predict:

model.plot_predict(h=5,figsize=(15,5))


Or get the results in DataFrame form through predict:

model.predict(h=5)

Goals
75 2.118221
76 2.118221
77 2.118221
78 2.118221
79 2.118221

#### Local Linear Trend Model

A local linear trend model can be estimated through NLLT:

model = pf.NLLT.Poisson(data=leicester,target='Goals')
x = model.fit(iterations=5000)
x.summary()

10% done : ELBO is -27837.965202
20% done : ELBO is -10667.1947315
30% done : ELBO is -5150.42573307
40% done : ELBO is -2567.54029949
50% done : ELBO is -1291.29282788
60% done : ELBO is -578.99494029
70% done : ELBO is -251.124996408
80% done : ELBO is -100.355592594
90% done : ELBO is -49.3752685727
100% done : ELBO is -13.9899801048

Final model ELBO is 46.2333499244
Poisson Local Linear Trend Model
======================================================= ================================================
Dependent Variable: Goals                               Method: BBVI
Start Date: 0                                           Unnormalized Log Posterior: 235.942
End Date: 74                                            AIC: -467.884097447
Number of observations: 75                              BIC: -463.24912122
========================================================================================================
Latent Variable                          Median             Mean               95% Credibility Interval
======================================== ================== ================== =========================
Sigma^2 level                            0.1738             0.1739             (0.1539 | 0.197)
Sigma^2 trend                            0.0                0.0                (0.0 | 0.0)
========================================================================================================

model.plot_parameters([0])
model.plot_parameters([1])

model.plot_fit(figsize=(15,10))


#### Dynamic Regression

Requires a pandas DataFrame. For example:

leicester.head()

Time Goals Season2
0 1 2 0
1 2 0 0
2 3 1 0
3 4 1 0
4 5 5 0
model3 = pf.NDynReg.Poisson('Goals ~ Season2',data=leicester)
x = model3.fit(iterations=5000)
x.summary()

10% done : ELBO is 7.68437342113
20% done : ELBO is -37.2596159045
30% done : ELBO is -80.9049171942
40% done : ELBO is -115.349685828
50% done : ELBO is -136.781660657
60% done : ELBO is -150.194331799
70% done : ELBO is -155.130949598
80% done : ELBO is -156.705336514
90% done : ELBO is -157.660105928
100% done : ELBO is -157.938139032

Final model ELBO is -155.428497791
Poisson Dynamic Regression Model
======================================================= ================================================
Dependent Variable: Goals                               Method: BBVI
Start Date: 0                                           Unnormalized Log Posterior: 81.3366
End Date: 74                                            AIC: -158.673199832
Number of observations: 75                              BIC: -154.038223605
========================================================================================================
Latent Variable                          Median             Mean               95% Credibility Interval
======================================== ================== ================== =========================
Sigma^2 1                                0.0313             0.0313             (0.0277 | 0.0354)
Sigma^2 Season2                          0.0047             0.0047             (0.0044 | 0.0051)
========================================================================================================

model3.plot_fit(figsize=(15,10),intervals=False)


### t-distribution measurement density

#### Local level model

We will look at returns for Amazon over the past year and filter the level. This will give us an idea of the underlying return signal in the stock. It should also be more robust than a normal-distributed solution (which would overreact to tail events and give a misleading impression of the underlying trend).

from pandas.io.data import DataReader
from datetime import datetime

a = DataReader('AMZN',  'yahoo', datetime(2015,6,1), datetime(2016,6,1))
a_returns.index = a.index.values[1:a.index.values.shape[0]]
a_returns.columns = ["Amazon Returns"]

spy = DataReader('SPY',  'yahoo', datetime(2015,6,1), datetime(2016,6,1))
spy_returns.index = spy.index.values[1:spy.index.values.shape[0]]
spy_returns.columns = ['S&P500 Returns']

one_day = np.log(1+one_mon)/365

returns = pd.concat([one_day,a_returns,spy_returns],axis=1).dropna()
excess_m = returns["Amazon Returns"].values - returns['DGS1MO'].values
excess_spy = returns["S&P500 Returns"].values - returns['DGS1MO'].values
final_returns = pd.DataFrame(np.transpose([excess_m,excess_spy, returns['DGS1MO'].values]))
final_returns.columns=["Amazon","SP500","Risk-free rate"]
final_returns.index = returns.index

plt.figure(figsize=(15,5))
plt.title("Excess Returns")
x = plt.plot(final_returns);
plt.legend(iter(x), final_returns.columns);

model2 = pf.NLLEV.t(data=final_returns,target='Amazon')
x = model2.fit(iterations=5000,start_diffuse=True)
x.summary()

10% done : ELBO is -148973.562165
20% done : ELBO is -159178.382816
30% done : ELBO is -100447.935917
40% done : ELBO is -61444.6477437
50% done : ELBO is -71745.2739519
60% done : ELBO is -92677.0988902
70% done : ELBO is -74587.6400881
80% done : ELBO is -31413.4770567
90% done : ELBO is -57717.0437659
100% done : ELBO is -55206.9771588

Final model ELBO is 2166.4457664
t-distributed Local Level Model
======================================================= =================================================
Dependent Variable: Amazon                              Method: BBVI
Start Date: 2015-06-02 00:00:00                         Unnormalized Log Posterior: 2537.1172
End Date: 2016-06-01 00:00:00                           AIC: -5068.23435883
Number of observations: 251                             BIC: -5057.65800001
=========================================================================================================
Latent Variable                          Median             Mean               95% Credibility Interval
======================================== ================== ================== =========================
Sigma^2 level                            0.0                0.0                (0.0 | 0.0)
Scale                                    0.0135             0.0135             (0.0108 | 0.0167)
v                                        3.1679             3.1674             (2.9005 | 3.4553)
=========================================================================================================


We can check the model fit through plot_fit:

model2.plot_fit(figsize=((15,10)))


The signal is positive over the period. Let’s forecast returns with predict – since this is a local level model, this isn’t particularly interesting:

model2.predict(h=5)

Amazon
2016-05-30 0.008965
2016-05-31 0.008965
2016-06-01 0.008965
2016-06-05 0.008965
2016-06-06 0.008965