Theory

We are interested in problems of the form:

p\left(y_{t}\mid{\theta}_{t}\right)

\theta_{t} = f\left(\alpha_{t}\right)

\alpha_{t} =  \alpha_{t-1} + \eta_{t}

\eta_{t} \sim N\left(0,\Sigma\right)


The state equation stays in the same Markovian form as the Gaussian state space model set-up. What differs is that we now have a potentially non-Gaussian measurement density. How do we estimate states in this setup? There are a number of approaches, including:

  • MCMC

  • Monte Carlo Maximum Likelihood (with importance sampling)

  • Particle Filtering (Sequential Monte Carlo)

  • Variational Inference

It is worth noting that most of these approaches only return smoothed estimates not filtered estimates, but this is often not a problem for the kind of problems we have to tackle. Currently PyFlux uses mean-field BBVI; the advantage of this approach is that it is quick; the disadvantage is that it does not fully account for the dependence between states. A future update will have a structured field alternative. To speed up BBVI, we use starting values from an approximate Gaussian distribution. See Durbin and Koopman (1997) for details.


PyFlux

Measurement Density Options

import pandas as pd
supported = pd.DataFrame(['.Poisson','.t','.Skewt','.Laplace','.Exponential'])
supported.columns = ['Density suffix']
supported.index  = ['Poisson','t','Skew t','Laplace','Exponential']
supported
Density suffix
Poisson .Poisson
t .t
Skew t .Skewt
Laplace .Laplace
Exponential .Exponential

Poisson Measurement Density

Local Level Model

For fun, and since it’s topical, we’ll apply a Poisson local level model to count data on the number of goals the football team Leicester have scored since they rejoined the Premier League. Each index represents a match they have played.

import numpy as np
import pyflux as pf
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 

leicester = pd.read_csv('http://www.pyflux.com/notebooks/leicester_goals_scored.csv')
leicester.columns= ["Time","Goals","Season2"]
plt.figure(figsize=(15,5))
plt.plot(leicester["Goals"])
plt.ylabel('Goals Scored')
plt.title('Leicester Goals Since Joining EPL');
plt.show()
png

We can fit a Poisson local level model using the NLLEV command:

model = pf.NLLEV.Poisson(data=leicester, target='Goals')
x = model.fit(iterations=5000)
x.summary()
10% done : ELBO is -107.599165657
20% done : ELBO is -127.571498111
30% done : ELBO is -136.25857363
40% done : ELBO is -137.626516299
50% done : ELBO is -137.539662707
60% done : ELBO is -137.321490055
70% done : ELBO is -137.518451697
80% done : ELBO is -137.311382466
90% done : ELBO is -136.3580387
100% done : ELBO is -137.346927749

Final model ELBO is -135.76799195
Poisson Local Level Model                                                                                 
======================================================= =================================================
Dependent Variable: Goals                               Method: BBVI                                      
Start Date: 0                                           Unnormalized Log Posterior: -56.8409              
End Date: 74                                            AIC: 115.681720125                                
Number of observations: 75                              BIC: 117.999208239                                
=========================================================================================================
Latent Variable                          Median             Mean               95% Credibility Interval 
======================================== ================== ================== ==========================
Sigma^2 level                            0.0406             0.0406             (0.0353 | 0.0467)        
=========================================================================================================

We can plot the evolution parameter with plot_z:

model.latent_variables.plot_z()
png

Let’s see what the extracted smoothed states look like with plot_fit:

model.plot_fit(figsize=(15,10))
png
We can predict forward using plot_predict:

model.plot_predict(h=5,figsize=(15,5))
png
Or get the results in DataFrame form through predict:

model.predict(h=5)
Goals
75 2.118221
76 2.118221
77 2.118221
78 2.118221
79 2.118221


Local Linear Trend Model

A local linear trend model can be estimated through NLLT:

model = pf.NLLT.Poisson(data=leicester,target='Goals')
x = model.fit(iterations=5000)
x.summary()
10% done : ELBO is -27837.965202
20% done : ELBO is -10667.1947315
30% done : ELBO is -5150.42573307
40% done : ELBO is -2567.54029949
50% done : ELBO is -1291.29282788
60% done : ELBO is -578.99494029
70% done : ELBO is -251.124996408
80% done : ELBO is -100.355592594
90% done : ELBO is -49.3752685727
100% done : ELBO is -13.9899801048

Final model ELBO is 46.2333499244
Poisson Local Linear Trend Model                                                                          
======================================================= ================================================
Dependent Variable: Goals                               Method: BBVI                                      
Start Date: 0                                           Unnormalized Log Posterior: 235.942               
End Date: 74                                            AIC: -467.884097447                               
Number of observations: 75                              BIC: -463.24912122                                
========================================================================================================
Latent Variable                          Median             Mean               95% Credibility Interval 
======================================== ================== ================== =========================
Sigma^2 level                            0.1738             0.1739             (0.1539 | 0.197)         
Sigma^2 trend                            0.0                0.0                (0.0 | 0.0)              
========================================================================================================
model.plot_parameters([0])
model.plot_parameters([1])
png png
model.plot_fit(figsize=(15,10))
png

Dynamic Regression

Requires a pandas DataFrame. For example:

leicester.head()
Time Goals Season2
0 1 2 0
1 2 0 0
2 3 1 0
3 4 1 0
4 5 5 0
model3 = pf.NDynReg.Poisson('Goals ~ Season2',data=leicester)
x = model3.fit(iterations=5000)
x.summary()
10% done : ELBO is 7.68437342113
20% done : ELBO is -37.2596159045
30% done : ELBO is -80.9049171942
40% done : ELBO is -115.349685828
50% done : ELBO is -136.781660657
60% done : ELBO is -150.194331799
70% done : ELBO is -155.130949598
80% done : ELBO is -156.705336514
90% done : ELBO is -157.660105928
100% done : ELBO is -157.938139032

Final model ELBO is -155.428497791
Poisson Dynamic Regression Model                                                                          
======================================================= ================================================
Dependent Variable: Goals                               Method: BBVI                                      
Start Date: 0                                           Unnormalized Log Posterior: 81.3366               
End Date: 74                                            AIC: -158.673199832                               
Number of observations: 75                              BIC: -154.038223605                               
========================================================================================================
Latent Variable                          Median             Mean               95% Credibility Interval 
======================================== ================== ================== =========================
Sigma^2 1                                0.0313             0.0313             (0.0277 | 0.0354)        
Sigma^2 Season2                          0.0047             0.0047             (0.0044 | 0.0051)        
========================================================================================================
model3.plot_fit(figsize=(15,10),intervals=False)
png

t-distribution measurement density

Local level model

We will look at returns for Amazon over the past year and filter the level. This will give us an idea of the underlying return signal in the stock. It should also be more robust than a normal-distributed solution (which would overreact to tail events and give a misleading impression of the underlying trend).

from pandas.io.data import DataReader
from datetime import datetime

a = DataReader('AMZN',  'yahoo', datetime(2015,6,1), datetime(2016,6,1))
a_returns = pd.DataFrame(np.diff(np.log(a['Adj Close'].values)))
a_returns.index = a.index.values[1:a.index.values.shape[0]]
a_returns.columns = ["Amazon Returns"]

spy = DataReader('SPY',  'yahoo', datetime(2015,6,1), datetime(2016,6,1))
spy_returns = pd.DataFrame(np.diff(np.log(spy['Adj Close'].values)))
spy_returns.index = spy.index.values[1:spy.index.values.shape[0]]
spy_returns.columns = ['S&P500 Returns']

one_mon = DataReader('DGS1MO', 'fred',datetime(2015,6,1), datetime(2016,6,1))
one_day = np.log(1+one_mon)/365

returns = pd.concat([one_day,a_returns,spy_returns],axis=1).dropna()
excess_m = returns["Amazon Returns"].values - returns['DGS1MO'].values
excess_spy = returns["S&P500 Returns"].values - returns['DGS1MO'].values
final_returns = pd.DataFrame(np.transpose([excess_m,excess_spy, returns['DGS1MO'].values]))
final_returns.columns=["Amazon","SP500","Risk-free rate"]
final_returns.index = returns.index

plt.figure(figsize=(15,5))
plt.title("Excess Returns")
x = plt.plot(final_returns);
plt.legend(iter(x), final_returns.columns);
png
model2 = pf.NLLEV.t(data=final_returns,target='Amazon')
x = model2.fit(iterations=5000,start_diffuse=True)
x.summary()
10% done : ELBO is -148973.562165
20% done : ELBO is -159178.382816
30% done : ELBO is -100447.935917
40% done : ELBO is -61444.6477437
50% done : ELBO is -71745.2739519
60% done : ELBO is -92677.0988902
70% done : ELBO is -74587.6400881
80% done : ELBO is -31413.4770567
90% done : ELBO is -57717.0437659
100% done : ELBO is -55206.9771588

Final model ELBO is 2166.4457664
t-distributed Local Level Model                                                                           
======================================================= =================================================
Dependent Variable: Amazon                              Method: BBVI                                      
Start Date: 2015-06-02 00:00:00                         Unnormalized Log Posterior: 2537.1172             
End Date: 2016-06-01 00:00:00                           AIC: -5068.23435883                               
Number of observations: 251                             BIC: -5057.65800001                               
=========================================================================================================
Latent Variable                          Median             Mean               95% Credibility Interval 
======================================== ================== ================== =========================
Sigma^2 level                            0.0                0.0                (0.0 | 0.0)              
Scale                                    0.0135             0.0135             (0.0108 | 0.0167)        
v                                        3.1679             3.1674             (2.9005 | 3.4553)        
=========================================================================================================

We can check the model fit through plot_fit:

model2.plot_fit(figsize=((15,10)))
png

The signal is positive over the period. Let’s forecast returns with predict – since this is a local level model, this isn’t particularly interesting:

model2.predict(h=5)
Amazon
2016-05-30 0.008965
2016-05-31 0.008965
2016-06-01 0.008965
2016-06-05 0.008965
2016-06-06 0.008965