Theory


ARIMAX models extend the ARIMA model type by allowing for the inclusion of exogenous variables X:


\Delta^{D}y_{t} = \sum^{p}_{i=1}\phi_{i}\Delta^{D}y_{t-i} + \sum^{q}_{j=1}\theta_{j}\epsilon_{t-j} + \sum^{M}_{m=1}\beta_{m}{X_{m,t}} + \epsilon_{t}

\epsilon_{t} \sim N\left(0,\sigma^{2}\right)

PyFlux


We will combine ARIMA dynamics with intervention analysis for monthly UK driver death data. There are two interventions we are interested in: the 1974 oil crisis and the introduction of the seatbelt law in 1983. We will model the effects of these events as structural breaks.

import numpy as np
import pandas as pd
import pyflux as pf
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline 

data = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/MASS/drivers.csv")
data.index = data['time'];
data.loc[(data['time']>=1983.05), 'seat_belt'] = 1;
data.loc[(data['time']<1983.05), 'seat_belt'] = 0;
data.loc[(data['time']>=1974.00), 'oil_crisis'] = 1;
data.loc[(data['time']<1974.00), 'oil_crisis'] = 0;
plt.figure(figsize=(15,5));
plt.plot(data.index,data['drivers']);
plt.ylabel('Driver Deaths');
plt.title('Deaths of Car Drivers in Great Britain 1969-84');
plt.plot();
png

The breaks can be included via patsy notation. Below we estimate a point mass estimate z^{MLE} of the latent variables:

model = pf.ARIMAX(data=data,formula='drivers~1+seat_belt+oil_crisis',ar=1,ma=1)
x = model.fit()
x.summary()
ARIMAX(1,0,1)                                                                                             
======================================================= =================================================
Dependent Variable: drivers                             Method: MLE                                       
Start Date: 1969.08333333                               Log Likelihood: -1278.7616                        
End Date: 1984.91666667                                 AIC: 2569.5232                                    
Number of observations: 191                             BIC: 2589.0368                                    
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.                 
======================================== ========== ========== ======== ======== =========================
AR(1)                                    0.5002     0.0933     5.3607   0.0      (0.3173 | 0.6831)        
MA(1)                                    0.1713     0.0991     1.7275   0.0841   (-0.023 | 0.3656)        
Beta 1                                   946.585    176.9439   5.3496   0.0      (599.7749 | 1293.3952)   
Beta seat_belt                           -57.8924   57.8211    -1.0012  0.3167   (-171.2217 | 55.437)     
Beta oil_crisis                          -151.673   44.119     -3.4378  0.0006   (-238.1462 | -65.1998)   
Sigma                                    195.6042                                                         
=========================================================================================================

We can plot the fit of the model with plot_fit:

model.plot_fit(figsize=(15,5))
png

To forecast forward we need exogenous variables for future dates. Since the interventions carry forward, we can just use a slice of the existing dataframe and use plot_predict:

model.plot_predict(h=10,oos_data=data.iloc[-12:],past_values=100,figsize=(15,5))
png