In the past had created clusters on assets to help identify relationships across assets. The resulting graphs were useful in identifying assets for use in mean-reverting portfolios. A difficulty in the approach was always around accurately measuring the strength of relationships between asset pairs. I had looked at Granger-causality, which is fairly limited in that expects asset relationships to follow a VECM-like model, and eventually settled on weighting across a number of techniques as an approximate.

Determining causality for more general relationships requires a very different approach, where Y ← f(X) + ε. i.e. f(x) may be an (unknown) non-linear function on X. I came across an interesting paper: “Distinguishing cause from effect using observational data: methods and benchmarks” (link) which builds on work around looking at the asymmetry of the “complexity” of p(Y|X) versus p(X|Y), where if Y ← X (X causes Y), p(Y|X) will tend to have lower complexity than p(X|Y).

The paper provides results on two methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI), where ANM generally did better across a variety of scenarios than IGCI.

The algorithm for ANM in python-like pseudo-code (taken from the paper):

def causality(xy: DataFrame, complexity: FunctionType, complexity_threshold: float): n = nrows(xy) xy = normalize (xy) training = sample(xy, n/2) testing = sample(xy, n/2) ## regress Y <- X and X <- Y on training set thetaY = GaussianProcess.regress (training.y, training.x) thetaX = GaussianProcess.regress (training.x, training.y) ## predict Y' <- X' and X' <- Y' & compute residuals on testing set eY = testing.y - GaussianProcess.predict (testing.x, thetaY) eX = testing.x - GaussianProcess.predict (testing.y, thetaX) ## determine complexity of each proposition on variable and residuals Cxy = complexity (testing.x, eY) Cyx = complexity (testing.y, eX) ## determine whether Y <- X, X -> Y, or indeterminate if (Cyx - Cxy) > complexity_threshold: return X_causes_Y elif: (Cxy - Cyx) > complexity_threshold: return Y_causes_X else: return None

The paper recommended using a scoring of the Hilbert-Schmidt Independence Criterion (HSIC) as the complexity test. I will have to code the above up and see how it does on a broad set of assets. The paper did test the CEP benchmark which includes some known financial relationships.

### Addendum

Am quite busy now with another investigation, but will revisit this with a proper implementation.

Was going to do something similar but I fear the noise levels may be too high

Could well be. However, 3 of the tests in the CEP suite are on financial returns. I suspect these are daily returns, so a bit more smoothed out relative to intra-day. Would not expect to get a good read on intra-day as leaders and laggers can switch fairly easily.

I should implement this and see how well it does. Quite busy at the moment so will have to get to it later. The post was more of a reminder for me to investigate more thoroughly later …

Pingback: The Whole Street’s Daily Wrap for 12/26/2014 | The Whole Street