Cointegration Models

A colleague had asked if I could help develop a multi-factor cointegration model for the Canadian bond market on daily or more frequent sampling, based on a variety of market data and fundamental factors. I had not developed a model like this before and was skeptical that could produce a useful result short of some man years of research.

To my surprise, found a very high probability model with 95% R-squared values and very high significance in a variety of tests. Now have a variety of models based on it depending on all or some of the below:

• US 3m rates
• US 2y swap rates
• S&P 500
• S&P / TSE Composite
• Shanghai Composite index (SSE 300)
• Momentum
• CAD/USD fx rate
• CAD 5Y liquid bond
• Surprise Index

With the 2 variable cointegration, one is simply trading mean reversion on the spread between one security and another. With a multivariate cointegration, one trades a long or short basket against the cointegrating security.

Filed under Uncategorized

37 responses to “Cointegration Models”

1. sp

This is a few years old but I recently came across your blog. First off – great blog. I find the topics you cover valuable, although some of the math is over my head.

In any case, I have focused the past few months on multivariate cointegration to develop basket trading models using daily data. I eventually want to test on higher frequencies. What I have learned is that I can find highly cointegrated baskets at 1% using the Johansen VECM approach, which are very profitable in-sample but not so out of sample. The conventional wisdom in this space seems to be that baskets are more stable than pairs. That was my intuition at the beginning but I have had a difficult time finding baskets that remain robust without significant changes in weights. I would welcome your thoughts on this and how you go about managing the dynamic nature of the mean and variance of the spreads.

I also wanted to ask you whether the variables listed above are all endogenous in your VECM or whether you are using some as exogenous variables. I have tinkered with this in R (using ca.jo()) but haven’t spent too much time ironing out details of the model specification. I guess one nice aspect of ca.jo() is that you can throw in a bunch of variables that you ‘think’ are cointegrated and let R do the work. I am surprised that the Shanghai Composite index played a role in your VECM though. Interesting.

• tr8dr

I since left the bank where I was doing this so have not really followed up on the analysis to be honest — it was preliminary and just before I left. I’m largely focused on HF FX at this point.

It is very easy to find spurious cointegration or correlation where there is no true causal relationship. VECM is also a simplistic model in that it relies on AR(p) type behavior across asset returns, whereas the structure of interrelationship is often more complex.

As for the shanghai index, China is the #2 trading partner with Canada, but a very distant with only 1/5th in terms of imports relative to the US. Perhaps with China being the largest trading partner (by imports) with the US, the impact China has on the US also affects the fiscal env in Canada. I have not put much time into the analysis. I’d have to bring in the latest data and take another look.

2. sheng

A few years have slipped away before I found your blog. Since you said VECM is a simplistic model and you were working on high frequency data, a few question have I had at hand. The cointegration test gets to be more difficult as the frequency of data rises since variance gets largely influenced by microstructure. What are some of the techniques you prefer to modeling in that space. How much did you dive into the research there? Take a step back, if the goal is to do cointegration in the minute space (say 3 min closing price), what do you suggest? Thanks a lot for the response.

• tr8dr

To be honest I have not been looking at cointegration in the context of high frequency recently. Short term causality and lead-lag models are definitely interesting and something I plan to explore later.

What has been more valuable for me in HF has been analysis of order flow / order book, trade imbalance and things like that. The cross-asset dimension is something will pursue more seriously later.

• sheng

What do you consider a good success ratio for a HF indicator like that? Order flow=? depth; trade imbalance = lee ready? Any other things you are looking at currently.

3. tr8dr

Chooses the right features from the orderbook, I detect momentum movements with high degree of accuracy.

If we consider 10 second samples of the orderbook as it relates to price series and assign labels of { -1, 0, +1} for +/- momentum or neutral, the start, sustainment, and end of the momentum periods labelled correctly with fairly low noise, out of sample.

The confusion matrix shows a 75% accuracy (but in fast during momentum, must be closer to 90+%). The rest of the error is the occassional mislabeling of a sample, here and there as being in momentum.

The key is not really classification, but understanding the dynamics that are important and generating the appropriate features for them.

With regard to Buy/Sell imbalance, Lee-Ready is does not classify very well in my experience. You will want to rank each trade on a continuum of buyer aggressivenes to seller aggressiveness. Hidden trades are also very important and need to be treated with some bias. The hidden trades from hidden orders become visible when traded and have no ordered.

Again, BSI, have found to be very accurate for momentum detection. But one needs to get the classification of trades right.

• sheng

My problem might be slightly different since only snapshots and BBO’s are available for the market that I am working with(unfortunately). My previous solution for this was to use a proxy to detect the shift in volume profile. However, this approach, as expected, is lagged in detecting momentum formation. Of course, I can attack the problem from a different angle and construct the liquidity changes on the bid and ask and classify all trades into adding and cancellation. Do you see any merits in this approach? I would imagine the “features” you said was something like cancellation followed by aggressor, etc?

I agree with you on the hidden trades. They needs to be treated carefully especially when they occur with size.

I am thinking that if I compare each volume traded with its bid and ask at t (and also changes from t-1), I should be able to classify if it’s buyer initiated or seller initiated. Is ranking the %buy aggressiveness at t in the continuum of that past T(eg. 20 sec) enough to capture the change? Any other techniques here?

• tr8dr

Yes, some markets do not provide order level transactions (such as the CME for example). The CME in particular does provide book level changes, so one can try to back out probable activity. I have yet to test and see how well this level of information works. If you only have snapshots, probably can’t do much in terms of momentum detection from the orderbook alone.

With regard to B/S imbalance, if you get the trade classification right and look at the sum of buyers vs sellers over a window, such as you suggested, will get a raw (though noisy) signal. Plotting the buyer imbalance as blue dots and seller as red dots and grey for neutral, for example overlaying prices, you should see that downward momentum moves have a lot of reds with treys (and maybe some spurious blues) and vice versa.

The next stage would be to assign the optimal labels to samples and try to classify. Usually one can clean up very nicely here.

• sheng

Would it make sense to pick a arbitrary time window to collect the buyer/seller imbalance? The selection here seems to have profound impacts on the labeling in the next step. For instance, let’s say you have some blue dots with scattered red and grey dots in between with time window of 1 sec. However, the same situation might yield all blue dots with time window=10 sec. Of course, one can pick a middle ground to adapt to a rolling time window but that does not seem to be to provide much usefulness besides smoothing its polarity.

Drastic change in polarity will have large impact on the next-stage labeling. Simple arithmetic can be applied here to count all the blue and red dots and compute the ratio. But, this tweaking very much depends on the dots you feed into the algorithm, and prone to the change in the time window.

Last but not the least, another layer of issue is that one has to also consider the movement of price in reference to the momentum. Price grinding up with the presence of up momentum vs. price goes down with the same up momentum yields different information. So I wonder what kind of success rate should we target?

• tr8dr

Yes, the window over which you observe this does make a difference in the following ways:

– lag (larger windows lead to larger lag as one would expect)
– smaller windows present more noise

The interesting thing in equities is that I found that buy/sell imbalance leads price movement significantly. Hence a rolling window used in determining imbalance is often in sync with price movements. This may only work out with a HF data source and in equities. Depending on the dynamics and granularity of trading, may be much less effective. For example, in the FX market, trading size is very large. Hence the # of trades is much smaller than US equities and time gap between trades is much larger. B/S imbalance is much harder to detect in FX for this reason.

Finally, as you point out, you can use price movement as a confirmation / boost to your B/S imbalance signal (indeed I do this). The price confirmation cleans up the noise very nicely. There is still noise, but with a further classification step can be very accurate.

• tr8dr

Yes, the window over which you observe this does make a difference in the following ways:

– lag (larger windows lead to larger lag as one would expect)
– smaller windows present more noise

The interesting thing in equities is that I found that buy/sell imbalance leads price movement significantly. Hence a rolling window used in determining imbalance is often in sync with price movements. This may only work out with a HF data source and in equities. Depending on the dynamics and granularity of trading, may be much less effective. For example, in the FX market, trading size is very large. Hence the # of trades is much smaller than US equities and time gap between trades is much larger. B/S imbalance is much harder to detect in FX for this reason.

Finally, as you point out, you can use price movement as a confirmation / boost to your B/S imbalance signal (indeed I do this). The price confirmation cleans up the noise very nicely. There is still noise, but with a further classification step can be very accurate.

• sheng

I am looking at emerging market data. The format is different but I find an algorithm to sort things out.

Fair point about the rolling window. It looks like there might be at least multiple ways to contrast buy aggressive vs. sell aggressive trades other than summing up each and take a upper and lower percentile. The question here is whether simple summing up would do better than some sort of weighting or more sophisticated arithmetic. This will change the firing of the indicator directly and is very significant.

There comes another arbitrary point where one needs to eliminate signals generated due to the lack of volume. i.e. when you got almost no trades in quiet period, the aggressiveness indicator might be bias by a small sized volume in either way that tilts the aggressiveness dramatically. One way, of course, is to force a volume threshold that ignore such signals all together. I see at least 2 problems from this approach: 1. arbitrary threshold (optimization here will probably results in over-fitting); 2. the insignificant volume is mostly due to the selection of small rolling window size so this problem confluent with the previous one.

Next issue is the detection for price movement as confirmation to the B/S imbalance. Is there any fancy math one can adapt here. One attempt is to look at the price movement in a fixed time window after each signal fires to track the confirmation. But this is not elegantly solved may bias the results.

• tr8dr

Well, aside from simple windows / summations of this data, the more appropriate approach is to use a Hawkes or another self-exciting process to model and smooth out the behavior.

• sheng

While Hawkes might be useful to model the behavior, it would not be as useful in predicting the next market move.:(agree?) It seems more of a way to explain what had happened instead of predicting what’s going to happen next. Maybe we can use multi-classfication SVM here. One confusion I got from you is that: should I classify price movement into (+,0,-) or you are really talking about classifying derivative of B/S imbalance? If the former is the case, can we still include price change as a feature in SVM?

• tr8dr

A self-exciting process is indeed partially retrospective. However, can serve to smooth the short gaps in activity going forward. I did find Hawkes to be a good fit in this regard.

However, if one has a lot of activity (as in equities), did not really need the “smoothing” provided by Hawkes. With more sporadic events, such as one would get in a less liquid market, something like the Hawkes process will give you a better view on the intermediate periods between events.

As for classification, u will want at least 3 classes, right? Because there are stages where the market does not have a strong direction {0} and other periods where it does {+1, -1 }.

In terms of what classifier you use, SVM, FDA+MARS, or other classifiers can be used. The most important thing is determining what your features should be and having an accurate way of generating the “perfect” labels across your price series (for training).

• sheng

Thanks for your reply so far. I now suspect the selection of particular kernel will largely impact the practicality of the SV machinery. Would you agree on that? Gaussian seems to be a starting point but I see no clear advantage of it other than the lack of confidence in other kernels. A side question: which part of the technique explicitly allows taking care of the temporal cluster of the observation? Or, one has to include “time” as a “attributes” to express that?

4. sheng

Are you familiar with matthews correlation coefficient?

• tr8dr

Never used it. I guess the problem for use in this context is that B/S is really a 3 class system and not 2: {+, 0, -}

5. If you are referring to SVM, then a gaussian kernel is fine. One can even consider a blend of gaussian kernels. However, more important than the details of the SVM kernel and parameters is what you choose for features.

As for the clustering, etc, you should include a number of lagged samples in your feature set aside from the most recent sample.

• sheng

lagged sample =? attributes that belongs to prior observation? How many would you include for HF data? (5 sec of data when you try to predict next 10 sec?)

• yes, by lagged meaning attributes of prior observations. This helps provide continuity over gaps in activity. You can use some feature information technique or brute-force trial and error to determine the appropriate # of lags and which features are relevant. As for how many lags you need and the period of the samples, will depend on the market you are looking at.

6. sheng

I am sensing that I might have done something terribly wrong. so PLEASE correct me if it seems stupid. Currently, I have label my observation as a continuous string of “1’s”, “0’s”,or “-1’s” at different time. The auto-correlation dictates that the same value is likely to be followed by the same value until it changes to other value after which the other value is likely to follow. Therefore, I would think only 1 prior attribute or at most 2’s are sufficient to describe such pattern. Am I wrong here?

• The correct labels of course have this aspect of continuity, but your raw features may drop out for more than one period. You’ll have to observe the data and model accordingly.

• sheng

Aha. I think in this case, I no longer need to sum up the previous 20 secs of buy/sell aggression explicitly any more. Rather, I would want to include ~20 prior attributes for the machine to learn itself. Correct?

• sheng

what is feature information technique?

7. S

OK. I have attempted to include lagged attributes in the currect classification, but this seems to worsen the predictive power. I think it’s probably due to the high variance in the attributes themselves. I was curious what you think of modeling the effect of extraordinary volume compared to the bid and ask liquidity. Would you simply create an attributes that indicates such event happened, or would you try to model such behavior with some arithmetics? Right now, I am trying to capture liquidity spill-over coupled with size. BTW, do you have a private interest in new market? Let me know. Thanks in advance.

8. Ludo

tr8dr,

You use colocation for do trade HFT?

For you what is the best venue (fxall / hotspot/currenex ) for trade orderbook ?

Have you aggregate the oderbook on all venues or just 1 ?

Regards
Ludo.

• tr8dr

Yes, for HFT, or even medium frequency intra-day, you are better off co-locating. Depending on exchange it may not be super expensive to colocate with them in their data center. If you are targeting the US equity markets and not doing ultra-HFT, then would suggest colocating with secondary providers such as Lime Brokerage (their latency to exchange is maybe 50 usec, i.e. sub-ms).

For FX, Hotspot and FXAll do not charge for cross-connects to their pop (point of prensence). Currenex does however (\$3K / mo). If you do a lot of volume with Currenex, they can drop the fees.

As for FX and orderbooks, each venue has somewhat different information. In terms of volume, the order is Currenex, Hotspot, FXAll. I like Hotspot the best of the lot for various reasons. That said, Currenex customers tend to be more transparent in terms of their order stream (using cancel/replace instead of place and cancel). This extra information can be quite useful.

As for aggregating orderbooks, I have not applied our orderbook momentum model across the aggregate, as the signal is already very well behaved. It is likely that the composite orderbook would allow for a cleaner read, since is effectively averaging the noise.

• Ludo

tr8dr, thanks for this information.

Once the model finish with right classification and right features … how many time the model take for predict the online environnement ?

For the learning you use your own soft or matlab/weka/encog software ?

9. Ludo

tr8dr,

Have you inspired your works on this paper ? http://www.academia.edu/283065/Multiple_Kernel_Learning_on_the_Limit_Order_Book

What do think on that ?

Can you explain me the symbol || Vt || 1 ?

thanks.
Regards
Ludo.

• tr8dr

No, had not seen this paper before. From what I can tell they have taken a very different approach. I think they are missing some important sources of information in the orderbook. Just looking at size @ price is not likely to work well.

The most important thing is not the ML approach taken (such as kernels), but the construction of the features.

10. LinuXPoWa

tr8dr,
You not want start a new thread about a good or not feature of oderbook for predict direction mouvement ?
Thx

• tr8dr

It would be an interesting topic, but alas I cannot really talk about it due to proprietary issues with the fund I am with.

• LinuXPoWa

ok i understand it.

Just can you tell me if you use ML for entry or both (exit/entry) ? or you use fix exit pip ?