PyFlux 0.2.0 Preview

posted in: Uncategorized | 0

The next release will likely be the biggest release so far. For a couple of reasons.

Refactoring

Have sought to make the code follow PEP guidelines more clearly. Private methods and attributes are now clearly labelled.

Additionally I have changed the method structure to include a few core methods:

  • predict – outputs a pd.DataFrame of predictions
  • plot_predict – plots predictions
  • predict_is – does a dynamic one-step ahead in-sample evaluation with the data (this is ‘out of sample’ wrt to the in-sample data – but needs to use existing data as a means of cross-validation)
  • plot_predict_is – plots the one-step ahead predictions against realized data.

‘predict’ now allows users to more easily factor the results of the models into their own applications. ‘predict_is’ provides a means to compare different model types.

Speed ups

GARCH 1.52x
EGARCH 1.37x
ARIMA 1.45x
GAS 1.46x
GPNARX 1.58x

Have optimized the existing models which has led to speed ups in the range 1.4-1.6x. This was done purely with code refactoring and strict use of optimized NumPy functions.

Bug Fixes

Reviewing the code identified two bugs:

  • A bug with date_indices when outputting predictions – now fixed
  • A bug with GARCH predictions and the parameters chosen – now fixed

New Features

I have implemented and tested a student-t GAS model. This works really nicely for financial  data and will be available in the next release.

I have also implemented smoothed predictions for the state space model options, as well as the ability to use Koopman’s simulation smoother to simulate samples from estimated models.

The main new model type will be non-linear state space models. I need to think of an efficient way to use OOP to implement this, but this is something I will work on in the evenings next week.

Things that might be worked on this release

VARs currently only work well for OLS and are unreliable for other methods. The problem is threefold:

  • Specifying the covariance matrix so it is non-singular – a Cholesky decomposition is the foolproof way. I just need to think of a good way to integrate this into the current setup where a user can enter priors for *each* element of the covariance matrix.
  • Slow matrix creation – there is likely a better way to compute some of the covariance matrices using inbuilt NumPy functions. Need to dedicate a bit of time to think about this.
  • The loglikelihood function – simply need to implement a custom multivariate loglikelihood function to deliver significant speedups – this is actually where the bulk of speed issues lie, and should be straightforward to solve.

Summary

In my mind, with this release, the core alpha version is complete. There are of course more new model types I would like to implement, but this represents the core of what I’d want from a time series library. Future releases will build on this core and provide new variants upon existing models, and new options for manipulating and evaluating existing models.

It is also worth noting that now the core will be complete, I am going to dedicate more of my free time towards other projects. So to manage expectations : expect the pace of updates to be slower from after this release. As I said in my previous post I am also going to look to advertise the library more broadly now which will hopefully open up the project to more developers who can help drive the project forward.

Thanks and stay tuned for the next release.

 

Leave a Reply