Theory

A penalized maximum likelihood (PML) – also known as a MAP estimate – z^{PML} is a modal approximation to the posterior. Specifically we maximize

\arg \max_{z} \log p\left(z\mid{y}\right) =  \log p\left(y\mid{z}\right) +  \log p\left({z}\right) - constant


When the priors p\left(z\right) are uniform then the maximum likelihood estimate z^{MLE} = z^{PML}. Otherwise, the prior will be somewhat informative and will pull the PML estimate away from the MLE estimate. PML is not a fully Bayesian approach and has a number of limitations:

  • It only produces point estimates and does not provide a full picture of parameter uncertainty.

  • It assumes a limited 0/1 loss function, whereas full Bayesian methods that report posterior mean/median are optimal under more general loss functions (squared loss/linear loss for mean/median).

  • It may be a misleading estimate if the posterior is multimodal (and may even fail to find the highest mode due to poor initialization).

Nevertheless, PML estimates can still be useful; for example, we can use z^{PML} as a starting point for an MCMC algorithm to shorten the burn-in time.


PyFlux


PML estimation is straightforward in PyFlux. We start with some data:

import numpy as np
import pyflux as pf
from pandas.io.data import DataReader
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline 

fb = DataReader('FB',  'yahoo', datetime(2000,1,1), datetime(2016,3,10))
fb['Logged Adj Close'] = np.log(fb['Adj Close'].values)
plt.figure(figsize=(15,5))
plt.plot(fb.index[1:len(fb.index)],np.diff(fb['Logged Adj Close'].values))
plt.ylabel('Returns')
plt.title('Facebook Returns')
plt.show()
png

Let’s define an ARIMA(1,1,1) model:

model = pf.ARIMA(data=fb,ar=1,ma=1,integ=1)

We can view our prior structure:

print(model.latent_variables)
Index    Latent Variable           Prior           Prior Latent Vars         V.I. Dist  Transform 
======== ========================= =============== ========================= ========== ==========
0        Constant                  Normal          mu0: 0, sigma0: 3         Normal     None      
1        AR(1)                     Normal          mu0: 0, sigma0: 0.5       Normal     None      
2        MA(1)                     Normal          mu0: 0, sigma0: 0.5       Normal     None      
3        Sigma                     Uniform         n/a (non-informative)     Normal     exp       

Let’s place a non-informative prior over the constant:

model.latent_variables.adjust_prior(0,pf.Uniform())
print(model.latent_variables)
Index    Latent Variable           Prior           Prior Latent Vars         V.I. Dist  Transform 
======== ========================= =============== ========================= ========== ==========
0        Constant                  Uniform         n/a (non-informative)     Normal     None      
1        AR(1)                     Normal          mu0: 0, sigma0: 0.5       Normal     None      
2        MA(1)                     Normal          mu0: 0, sigma0: 0.5       Normal     None      
3        Sigma                     Uniform         n/a (non-informative)     Normal     exp       

We can obtain z^{PML} by using the PML fit option:

x = model.fit("PML")
x.summary()
ARIMA(1,1,1)                                                                                              
======================================================= =================================================
Dependent Variable: Differenced Open                    Method: PML                                       
Start Date: 2012-05-21 00:00:00                         Unnormalized Log Posterior: -1743.2084            
End Date: 2016-03-10 00:00:00                           AIC: 3494.4167                                    
Number of observations: 956                             BIC: 3513.8677                                    
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.                 
======================================== ========== ========== ======== ======== ========================
Constant                                 0.035      0.0202     1.7281   0.084    (-0.0047 | 0.0746)       
AR(1)                                    0.5755     0.1488     3.8667   0.0001   (0.2838 | 0.8673)        
MA(1)                                    -0.6467    0.1542     -4.1926  0.0      (-0.949 | -0.3443)       
Sigma                                    1.4984                                                           
=========================================================================================================

Let’s compare that to the Maximum Likelihood estimate z^{MLE} :

x2 = model.fit("MLE")
x2.summary()
ARIMA(1,1,1)                                                                                              
======================================================= =================================================
Dependent Variable: Differenced Open                    Method: MLE                                       
Start Date: 2012-05-21 00:00:00                         Log Likelihood: -1742.8856                        
End Date: 2016-03-10 00:00:00                           AIC: 3493.7712                                    
Number of observations: 956                             BIC: 3513.2222                                    
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.                 
======================================== ========== ========== ======== ======== ========================
Constant                                 0.0299     0.0152     1.9657   0.0493   (0.0001 | 0.0596)        
AR(1)                                    0.6462     0.092      7.0246   0.0      (0.4659 | 0.8266)        
MA(1)                                    -0.7193    0.093      -7.7307  0.0      (-0.9017 | -0.537)       
Sigma                                    1.4981                                                           
=========================================================================================================

As we can see, our prior choice has shrunk the AR(1) and MA(1) estimates towards zero (the mean of the priors). What happens when we use non-informative priors for all our parameters?

model.latent_variables.adjust_prior(1,pf.Uniform())
model.latent_variables.adjust_prior(2,pf.Uniform())
x3 = model.fit("PML")
x3.summary()
ARIMA(1,1,1)                                                                                              
======================================================= =================================================
Dependent Variable: Differenced Open                    Method: PML                                       
Start Date: 2012-05-21 00:00:00                         Unnormalized Log Posterior: -1742.8856            
End Date: 2016-03-10 00:00:00                           AIC: 3493.7712                                    
Number of observations: 956                             BIC: 3513.2222                                    
=========================================================================================================
Latent Variable                          Estimate   Std Error  z        P>|z|    95% C.I.                 
======================================== ========== ========== ======== ======== ========================
Constant                                 0.0299     0.0152     1.9657   0.0493   (0.0001 | 0.0596)        
AR(1)                                    0.6462     0.092      7.0246   0.0      (0.4659 | 0.8266)        
MA(1)                                    -0.7193    0.093      -7.7307  0.0      (-0.9017 | -0.537)       
Sigma                                    1.4981                                                           
=========================================================================================================

As we can see, z^{PML} = z^{MLE}.