Theory


Maximum Likelihood is the primary classical approach to statistical inference. Suppose we have some time series data y and we describe each datapoint by an identical observation density p\left(y_{t}\mid{\theta}\right), where \theta represents the parameterisation. If we also assume each datapoint y_{t} is independently distributed, then the combined i.i.d assumption allows us to write the likelihood of the data as:

P\left(y\mid{\theta}\right) = L\left(\theta\mid{y}\right) = \prod^{T}_{t=1}p\left(y_{t}\mid{\theta}\right)


This gives us a quantity to maximize with respect to \theta. But in practice, we take the log of this quantity and maximize this instead, since we are using computers and are limited by floating point precision. The logarithm is a monotonically increasing function so it does not change the order (and hence the maximum) of the likelihood for its argument \theta.

\arg \max_{\theta} \sum^{T}_{t=1}\log p\left(y_{t}\mid{\theta}\right)


The value of theta \theta^{MLE} that maximizes this quantity is the maximum likelihood estimate. Alternatively, from a Bayesian perspective, we can view MLE as a modal approximation to a posterior with uniform priors for the latent variables.


PyFlux


In PyFlux, the default optimization routine for maximum likelihood is BFGS from the scipy.stats library. The ARIMA model class is a good example of where we can use MLE effectively. First we need some data:

import numpy as np
import pyflux as pf
from pandas.io.data import DataReader
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline 

msft = DataReader('MSFT',  'yahoo', datetime(2000,1,1), datetime(2016,3,10))
msft['Adj Close'] = np.log(msft['Adj Close'].values)
plt.figure(figsize=(15,5))
plt.plot(msft.index[1:len(msft.index)],np.diff(msft['Adj Close'].values))
plt.ylabel('Returns')
plt.title('Microsoft Returns')
plt.show()
png

We can define a model and then use the MLE fit option for that model class:

model = pf.ARIMA(data=msft,ar=1,ma=1,integ=1,target='Adj Close')
x = model.fit("MLE")
x.summary()
ARIMA(1,1,1)                                                                                              
======================================================= =================================================
Dependent Variable: Differenced Adj Close               Method: MLE                                       
Start Date: 2000-01-04 00:00:00                         Log Likelihood: 10139.2764                        
End Date: 2016-03-10 00:00:00                           AIC: -20270.5529                                  
Number of observations: 4070                            BIC: -20245.3073                                  
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.                 
======================================== ========== ========== ======== ======== ========================
Constant                                 0.0001     0.0003     0.2356   0.8137   (-0.0005 | 0.0007)       
AR(1)                                    -0.0208    0.3736     -0.0557  0.9556   (-0.753 | 0.7114)        
MA(1)                                    -0.0208    0.3849     -0.0541  0.9568   (-0.7753 | 0.7337)       
Sigma                                    0.02                                                             
=========================================================================================================

The output gives us standard model and parameter summary results. If we run another model, this time an ARIMA(2,1,2), we can compare on the basis of information criteria such as AIC and BIC:

model2 = pf.ARIMA(data=msft,ar=2,ma=2,integ=1,target='Adj Close')
x2 = model2.fit("MLE")
x2.summary()
ARIMA(2,1,2)                                                                                              
======================================================= =================================================
Dependent Variable: Differenced Adj Close               Method: MLE                                       
Start Date: 2000-01-05 00:00:00                         Log Likelihood: 10141.3223                        
End Date: 2016-03-10 00:00:00                           AIC: -20270.6446                                  
Number of observations: 4069                            BIC: -20232.7777                                  
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.                 
======================================== ========== ========== ======== ======== ========================
Constant                                 0.0        0.0002     0.2718   0.7858   (-0.0003 | 0.0004)       
AR(1)                                    0.0016     0.6415     0.0025   0.998    (-1.2556 | 1.2589)       
AR(2)                                    0.4093     0.4265     0.9596   0.3372   (-0.4267 | 1.2453)       
MA(1)                                    -0.0421    0.6449     -0.0653  0.9479   (-1.3061 | 1.2219)       
MA(2)                                    -0.4455    0.4579     -0.9729  0.3306   (-1.3431 | 0.452)        
Sigma                                    0.02                                                             
=========================================================================================================
For these two models, we prefer the ARIMA(2,1,2) model if we evaluate according to the AIC criterion.