## Theory

Maximum Likelihood is the primary classical approach to statistical inference. Suppose we have some time series data $y$ and we describe each datapoint by an identical observation density $p\left(y_{t}\mid{\theta}\right)$, where $\theta$ represents the parameterisation. If we also assume each datapoint $y_{t}$ is independently distributed, then the combined i.i.d assumption allows us to write the likelihood of the data as:

$P\left(y\mid{\theta}\right) = L\left(\theta\mid{y}\right) = \prod^{T}_{t=1}p\left(y_{t}\mid{\theta}\right)$

This gives us a quantity to maximize with respect to $\theta$. But in practice, we take the log of this quantity and maximize this instead, since we are using computers and are limited by floating point precision. The logarithm is a monotonically increasing function so it does not change the order (and hence the maximum) of the likelihood for its argument $\theta$.

$\arg \max_{\theta} \sum^{T}_{t=1}\log p\left(y_{t}\mid{\theta}\right)$

The value of theta $\theta^{MLE}$ that maximizes this quantity is the maximum likelihood estimate. Alternatively, from a Bayesian perspective, we can view MLE as a modal approximation to a posterior with uniform priors for the latent variables.

## PyFlux

In PyFlux, the default optimization routine for maximum likelihood is BFGS from the scipy.stats library. The ARIMA model class is a good example of where we can use MLE effectively. First we need some data:

import numpy as np
import pyflux as pf
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline

msft = DataReader('MSFT',  'yahoo', datetime(2000,1,1), datetime(2016,3,10))
plt.figure(figsize=(15,5))
plt.ylabel('Returns')
plt.title('Microsoft Returns')
plt.show()


We can define a model and then use the MLE fit option for that model class:

model = pf.ARIMA(data=msft,ar=1,ma=1,integ=1,target='Adj Close')
x = model.fit("MLE")
x.summary()

ARIMA(1,1,1)
======================================================= =================================================
Dependent Variable: Differenced Adj Close               Method: MLE
Start Date: 2000-01-04 00:00:00                         Log Likelihood: 10139.2764
End Date: 2016-03-10 00:00:00                           AIC: -20270.5529
Number of observations: 4070                            BIC: -20245.3073
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.
======================================== ========== ========== ======== ======== ========================
Constant                                 0.0001     0.0003     0.2356   0.8137   (-0.0005 | 0.0007)
AR(1)                                    -0.0208    0.3736     -0.0557  0.9556   (-0.753 | 0.7114)
MA(1)                                    -0.0208    0.3849     -0.0541  0.9568   (-0.7753 | 0.7337)
Sigma                                    0.02
=========================================================================================================


The output gives us standard model and parameter summary results. If we run another model, this time an ARIMA(2,1,2), we can compare on the basis of information criteria such as AIC and BIC:

model2 = pf.ARIMA(data=msft,ar=2,ma=2,integ=1,target='Adj Close')
x2 = model2.fit("MLE")
x2.summary()

ARIMA(2,1,2)
======================================================= =================================================
Dependent Variable: Differenced Adj Close               Method: MLE
Start Date: 2000-01-05 00:00:00                         Log Likelihood: 10141.3223
End Date: 2016-03-10 00:00:00                           AIC: -20270.6446
Number of observations: 4069                            BIC: -20232.7777
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.
======================================== ========== ========== ======== ======== ========================
Constant                                 0.0        0.0002     0.2718   0.7858   (-0.0003 | 0.0004)
AR(1)                                    0.0016     0.6415     0.0025   0.998    (-1.2556 | 1.2589)
AR(2)                                    0.4093     0.4265     0.9596   0.3372   (-0.4267 | 1.2453)
MA(1)                                    -0.0421    0.6449     -0.0653  0.9479   (-1.3061 | 1.2219)
MA(2)                                    -0.4455    0.4579     -0.9729  0.3306   (-1.3431 | 0.452)
Sigma                                    0.02
=========================================================================================================

For these two models, we prefer the ARIMA(2,1,2) model if we evaluate according to the AIC criterion.