There are many great implementations of **unit-root** tests in packages such as R, SAS, etc. For various reasons I need to implement this in Java. Probably the best known test of this sort is the Augmented Dickey Fuller test. This is most often used in determining whether a process is non-stationary or rejecting in favor of stationarity.

Loosely speaking, to say a process is stationary implies that all of its moments are largely time invariant. So for example the mean and variance of a stationary process would be (largely) constant across time.

We know that AR(1) processes with AR coefficient α = 1 have unbounded variance that grows with time. This can be seen by evaluating its variance:

The ADF test therefore tests the hypothesis that a given series expressed as a AR(p) process has a coefficient of 1 (ie unit root). The ADF expands upon the original Dickey Fuller test to allow for additional AR lags:

The null hypothesis is that β = 0 (non-stationary process) and alternative hypothesis (may be stationary) otherwise. Note that β = 0 in this differenced equation is equivalent to α = 1 in the non-differenced AR(1) equation.

Samples (i.e. the difference between the observed and null hypothesis value) in hypothesis testing often follow a student-t distribution. In the case of the Dickey Fuller test the distribution is not student-t and therefore presents a problem. To calculate the p-value for the ADF hypothesis, one must refer to special Dickey-Fuller tables, as there is no known closed form method to calculate this (that I am aware of).

The p-value, we know to be the probability of obtaining a result at least as extreme (far away from the hypothesis value) as the one obtained in our sample. Basically this involves calculating:

Then to determine the p-value for the ADF, we need to determine the distribution of values around hypothesis β = 0. The tables provide precalculated mappings between test values and the cumulative probability.

**Problem
The Dickey-Fuller tables are sparse, particularly at the tails. For my problem I was interested in the relative “stationarity” of many different series. For this, I the needed to calculate the Dickey-Fuller pdf on my own with a Monte-Carlo simulation. **

Here is one configuration of the pdf for the Dickey-Fuller hypothesis (produced from a monte-carlo simulation):

Of course monte-carlo simulations are not practical on each evaluation of the ADF test. The results from the MC simulation generate a high resolution density function as a series of (x,y) points. I could generate distributions for a number of configurations and store a very large 3D table of function values, but is impractical and unnecessary.

I started thinking about an approximating function for the distribution. Aside from finding functional form for the distribution through trial and error, a general automated approach is to use splines. A natural spline will fit the function and be piecewise continuous.

Splines, however, do not reduce the dimensionality of the problem. A natural spline requires the complete set of knot points to reconstitute the function. I then started thinking about knot reduction techniques. With knot reduction the idea is: what is the minimum set of knots required to reproduce the target function within some (small) error range. So if my function currently has 2000 (x,y) points, perhaps I could reduce it to ~10 points?

There have been many papers written on this topic, often involving very complex algorithms. As I wanted to keep this simple, decided to come up with a “poor man’s algorithm” for knot reduction.

**Algorithm
The algorithm starts with a spline consisting of the 2 end-points and gradually adding knots at specific positions, iteratively reducing the error between the “reduced spline” and the target function. The new knots are chosen based on the maximum distance from the target function.**

**The algorithm is not guaranteed to be optimal in that it may have more knot points than the smallest possible reproducing set. However, it is very fast and is often close to optimal, depending on the function. The algorithm is as follows:**

- start with a spline consisting of 2 points (the start and end point)
- loop while R^2 < (1 – eps)
- add knot at point with maximum distance from target function
- compute new spline and R^2 fit metric

- at end of loop, set of knots now matches function within epsilon

The effectiveness of the approach can be enhanced by using a better basis function than the hermite cubic and/or by adjusting tensions in addition to knots.

Here is an example of the function to epsilon = 1e-8. I’ve slowed this down to 1 iteration per second so you can see how it works. In practice this involves a few micro-seconds:

It’s relatively straight forward to integrate R with Java through JRI, in which case you can use all the unit root and cointegration tests that exist there.

Yes, indeed I do use rJava extensively. I tend to use it in the R to java direction though as opposed to java calling R. I could be wrong, but I believe rJava does not allow a call sequence like: R -> java -> R (I mean running a java method with .jcall and this method in turn calling a R function). I would like to be able to run all model exploration from within R. I believe to use Java -> R, the java needs to be standalone. I’d love to be mistaken on this issue.

For this reason I tend to write stuff that needs horsepower (such as monte-carlo evaluation, particle filters, etc.) in Java and often call from R.

If you know of a better way would be cool.

Another issue BTW, is that the resolution of the p-value in the ADF implementation in tseries is poor. So if I want to be able to distinguish between 0.025 and 0.01 or 0.001 does poorly. This could be addressed by increasing the resolution of the table in the function.

You’re right. There is no R -> Java -> R functionality implemented there, although it’s certainly possible with a combination of rJava and JRI.

There are other adf implementations in R beyond tseries (see urca and fUnitroots), although I’m not sure how they compare.

yes. I’ve sometimes thought of hacking rJava to allow it. I’m not sure if there is an issue with recurrent entry into the R evaluation or why it was left out.

Yes, aware of the packages, thanks. I’ve used them a few times in the past.

For better or worse, thought the ADF is something I would like in my JVM toolset.

Short of a R compiler I’ve been thinking to write something to allow on the fly specification / evaluation of Java, Jython, or other JVM language code within R. Basically would initially send the code over for compilation and then load into the JVM for evaluation … I can then write something complex that needs performance, inlined in R code.

It is not all that hard to do, just requires some spare time.

R/Java/JNI/rJava …

I do all my backtesting/coding in java as well. Would you share your Dickey Fuller Java test?

Great read — would you mind sharing your Dickey Fuller implementation?

great blog. some really good insights articles. I am currently looking around at implementing some cointegration tests for which R is too slow. were you able to separate out the above knot reduction code from your main libraries and able to share?

if so i’d be very interested to take a look at what you did.

btw, a bit off-topic, but I know you moved to F#/mono so I am just wondering how you are getting on. if you find time it would be great to see an update post on your experiences. also – did you see that BlueMountain released an R Type Provider for F#? I’ve not tried it yet but looks very interesting!

I’ll send you some old source (in java) as I can’t send you the recent stuff as I sold rights when I joined a fund. Will send to your email address above.

With regard to F# / mono, I am using a combination of F# and C#. F# is nice and works well with mono. At this point, with LLVM, mono is very fast. So quite happy with it.

With regard to R and F#/C#, i have a R package that allows me to call into C# / F# from R (but not vice-versa). I’ll have to peek at the package you mention.

Do you use F#?