## Theory

A penalized maximum likelihood (PML) – also known as a MAP estimate – $z^{PML}$ is a modal approximation to the posterior. Specifically we maximize

$\arg \max_{z} \log p\left(z\mid{y}\right) = \log p\left(y\mid{z}\right) + \log p\left({z}\right) - constant$

When the priors $p\left(z\right)$ are uniform then the maximum likelihood estimate $z^{MLE} = z^{PML}$. Otherwise, the prior will be somewhat informative and will pull the PML estimate away from the MLE estimate. PML is not a fully Bayesian approach and has a number of limitations:

• It only produces point estimates and does not provide a full picture of parameter uncertainty.

• It assumes a limited 0/1 loss function, whereas full Bayesian methods that report posterior mean/median are optimal under more general loss functions (squared loss/linear loss for mean/median).

• It may be a misleading estimate if the posterior is multimodal (and may even fail to find the highest mode due to poor initialization).

Nevertheless, PML estimates can still be useful; for example, we can use $z^{PML}$ as a starting point for an MCMC algorithm to shorten the burn-in time.

## PyFlux

PML estimation is straightforward in PyFlux. We start with some data:

import numpy as np
import pyflux as pf
from pandas.io.data import DataReader
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline

fb = DataReader('FB',  'yahoo', datetime(2000,1,1), datetime(2016,3,10))
plt.figure(figsize=(15,5))
plt.ylabel('Returns')
plt.show()


Let’s define an ARIMA(1,1,1) model:

model = pf.ARIMA(data=fb,ar=1,ma=1,integ=1)


We can view our prior structure:

print(model.latent_variables)

Index    Latent Variable           Prior           Prior Latent Vars         V.I. Dist  Transform
======== ========================= =============== ========================= ========== ==========
0        Constant                  Normal          mu0: 0, sigma0: 3         Normal     None
1        AR(1)                     Normal          mu0: 0, sigma0: 0.5       Normal     None
2        MA(1)                     Normal          mu0: 0, sigma0: 0.5       Normal     None
3        Sigma                     Uniform         n/a (non-informative)     Normal     exp


Let’s place a non-informative prior over the constant:

model.latent_variables.adjust_prior(0,pf.Uniform())
print(model.latent_variables)

Index    Latent Variable           Prior           Prior Latent Vars         V.I. Dist  Transform
======== ========================= =============== ========================= ========== ==========
0        Constant                  Uniform         n/a (non-informative)     Normal     None
1        AR(1)                     Normal          mu0: 0, sigma0: 0.5       Normal     None
2        MA(1)                     Normal          mu0: 0, sigma0: 0.5       Normal     None
3        Sigma                     Uniform         n/a (non-informative)     Normal     exp


We can obtain $z^{PML}$ by using the PML fit option:

x = model.fit("PML")
x.summary()

ARIMA(1,1,1)
======================================================= =================================================
Dependent Variable: Differenced Open                    Method: PML
Start Date: 2012-05-21 00:00:00                         Unnormalized Log Posterior: -1743.2084
End Date: 2016-03-10 00:00:00                           AIC: 3494.4167
Number of observations: 956                             BIC: 3513.8677
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.
======================================== ========== ========== ======== ======== ========================
Constant                                 0.035      0.0202     1.7281   0.084    (-0.0047 | 0.0746)
AR(1)                                    0.5755     0.1488     3.8667   0.0001   (0.2838 | 0.8673)
MA(1)                                    -0.6467    0.1542     -4.1926  0.0      (-0.949 | -0.3443)
Sigma                                    1.4984
=========================================================================================================


Let’s compare that to the Maximum Likelihood estimate $z^{MLE}$ :

x2 = model.fit("MLE")
x2.summary()

ARIMA(1,1,1)
======================================================= =================================================
Dependent Variable: Differenced Open                    Method: MLE
Start Date: 2012-05-21 00:00:00                         Log Likelihood: -1742.8856
End Date: 2016-03-10 00:00:00                           AIC: 3493.7712
Number of observations: 956                             BIC: 3513.2222
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.
======================================== ========== ========== ======== ======== ========================
Constant                                 0.0299     0.0152     1.9657   0.0493   (0.0001 | 0.0596)
AR(1)                                    0.6462     0.092      7.0246   0.0      (0.4659 | 0.8266)
MA(1)                                    -0.7193    0.093      -7.7307  0.0      (-0.9017 | -0.537)
Sigma                                    1.4981
=========================================================================================================


As we can see, our prior choice has shrunk the AR(1) and MA(1) estimates towards zero (the mean of the priors). What happens when we use non-informative priors for all our parameters?

model.latent_variables.adjust_prior(1,pf.Uniform())
x3 = model.fit("PML")
x3.summary()

ARIMA(1,1,1)
======================================================= =================================================
Dependent Variable: Differenced Open                    Method: PML
Start Date: 2012-05-21 00:00:00                         Unnormalized Log Posterior: -1742.8856
End Date: 2016-03-10 00:00:00                           AIC: 3493.7712
Number of observations: 956                             BIC: 3513.2222
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.
======================================== ========== ========== ======== ======== ========================
Constant                                 0.0299     0.0152     1.9657   0.0493   (0.0001 | 0.0596)
AR(1)                                    0.6462     0.092      7.0246   0.0      (0.4659 | 0.8266)
MA(1)                                    -0.7193    0.093      -7.7307  0.0      (-0.9017 | -0.537)
Sigma                                    1.4981
=========================================================================================================


As we can see, $z^{PML} = z^{MLE}$.