Monthly Archives: February 2010

Model Parsimony

Given the complexity of the markets, it is easy to fall into the trap of overfitting when creating a parametric model of some sort.    There are a variety of approaches to avoiding this, some heuristic and others theoretical, such as:

  1. cross-validation  (in-sample, out-of-sample)
  2. likelihood weighted information criteria

I want to briefly look at information criteria today.   The Kullback-Leiber divergence metric is probably the seminal work in this area.   Since then there have been a number of measures such as Akaike Information Criterion, Hannon-Quinn, etc.    More recently (well 2000) Hamaparsum Bozdogan developed another measure (ICOMP) which has more appeal for me in terms of what it captures.   Conceptually the measure weighs the following:

  1. likelihood of the model given the parameters (lack or degree of fit)
  2. the complexity of model parameters (lack of parsimony)
  3. the complexity of model errors (profusion of complexity)

A simplified form of the measure is as follows:

where Σθ and Σε are the parameter and residuals covariance matrices respectively.  λ’s are the eigenvalues of each respective matrix.

We look to minimize the above function (effectively maximizing likelihood against the countervailing complexity measures):

The measure of complexity is explained as follows from his paper:

Complexity of a system (of any type) is a measure of the degree of interdependency between the whole system and a simple enumerative composition of its subsystems or parts.

The contribution of the complexity of the model covariance structure is that it provides a numerical measure to assess parameter redundancy and stability uniquely all in one measure. When the parameters are stable, this implies that the covariance matrix should be approximately a diagonal matrix.

In general, large values of complexity indicate a high interaction between the variables, and a low degree of complexity represents less interaction between the variables. The minimum of Complexity(Σ) corresponds to the least complex structure. In other words:

Complexity(Σ) → 0 as Σ→ I

This establishes a plausible relation between information-theoretic complexity and computational effort. Furthermore, what this means is that the identity matrix is the least complex matrix. To put it in statistical terms, orthogonal designs, or linear models with no colinearity, are the least complex, or most informative, and the identity matrix is the only matrix for which the complexity vanishes. Otherwise, Complexity(Σ) > 0, necessarily.

Why bother?   Well the most commonly used criterion (the AIC) has does not adequately capture complexity and is know to be biased for some model systems.   The approach also has greater intuitive appeal.

5 Comments

Filed under strategies

Impulse Response

This is just a quick note on deriving an impulse response function for a VECM system.   Basically we want to get the system into a form where we can take the partial derivatives at various lags.   Starting with a simplified VECM:

Convert this into a form expressing in terms of X instead of ΔX:

We change variable to simplify the form:

Via Pesaran and Shin (1996) we transform this into the following recursive expression:

We determine the partial derivative of ∂vj / ∂vk  (i.e. the impact of a change in the kth variable on the ith) after n time periods (t+n) to be:

where Si is a selection vector with 1 at the ith position and 0 elsewhere.

Normally the cholesky decomposition is used to orthogonalize the covariance (U U’ = Σ), however other decompositions can be used, providing different measures of  response such as the Bernanke-Sims approach.

Leave a Comment

Filed under strategies

Chinese Bubble

Just a quick note.  Wanted to point to this presentation on the Chinese bubble: “China the Mother of All Black Swans“.   His conclusions on commodity prices and interest rates are right on the mark, stating:

  1. interest rates will go up
    Ok this is a no-brainer for a number of reasons, one being that US interest rates are about as low as they can go.  Also, given the large amount of new debt issue and increasing stockpiles in China, Japan, and elsewhere, would expect reduced appetite at current interest levels.   That said, in uncertain markets, where else should China or Japan park their $ inflows?
  2. commodity prices will revert
    Well, most commodity prices already reverted to historical levels in 2008 during the market crisis.    This can be seen in a number of commodity indices.    That said, there are specific commodities that are at premiums to historical levels (ones that China has been buying).

Here we can see that commodities, as a broad index, have fallen back to historical levels and stayed there since 2009, however industrial commodities have increased in price by 30%:

I’m going to ignore Gold at the moment, as I don’t believe China is the major driver behind its rise (that said, gold would appear to be ripe for a huge reset, similar to the buildup and reset in the 80s).   Copper on the other hand, has had a 50% rise since 2009 that can be attributed to demand from China:

Apparently China is buying copper not only for internal demand, but as a way to invest its huge inflow of $s as opposed to investing entirely in US treasuries.   So really we have to be looking at significantly less US consumption of Chinese products or a change in policy around stockpiling industrial commodities, before we see these commodities reverting to historic levels.

4 Comments

Filed under strategies

Network Model

I’ve been thinking about the relationships amongst a network of assets.   Supposing I have a network of hundreds of assets, what sort of measurements can be made that allow for statements about the future state with a measurable degree of confidence.

Here are some “standard” approaches to looking at the relationship between assets:

  1. Covariance
    Covariance is a linear measure, the normalization of which is literally the slope of a least-squares regression line through paired data.  Has issues with lagged series and assumes linearity, also only uniquely specifies elliptical distributions.
  2. Cointegration
    Cointegration is a measure of relationship between series using an autoregressive error correction model.   It avoids many of the issues of Covariance, however, like covariance is not sufficient to uniquely specify the joint distribution.   The VECM model and Johansen method give robust estimates of this.
  3. Distribution Estimating Models
    Models that estimate the distribution (i.e. provide the most probable price movement in the next sample period based on an estimated high-dimensional distribution); works well on certain portfolios.
  4. SDEs
    SDEs that impose a structure / mechanism of price movement, implying future price movements.   These models often combine points 1 and 3.   I like most other people in this space have a collection of these, some better than others.

Given that I already have models in categories 3 and 4,  am interested in a new model based on cointegration — not cointegration for pairs trading, but using the strong error-correcting relationships in a network of assets to determine likely next period moves.

Amongst a number of approaches for determining error-correcting relationships, have found the eigenvectors implied by the Johansen maximum likelihood estimate of the VECM to be the most stable as compared to other alternatives:

  1. heuristic zero crossings maximization
  2. beta estimates from rolling OLS regressor
  3. Various Ornstein-Uhlenbeck models (though with a particle filter the degree of noise can be reduced significantly)

I’m not going to state what I am doing right now, but may write up parts of it along the way.

3 Comments

Filed under strategies

Adaptive Regressor

Regression is an important tool in trading (witness the number of traders that rely on moving averages of various sorts).    I don’t directly use regressors to generate trading signals, but I do find them useful in denoising signal output.

Aside from the obvious about past predicting the future, there are other issues with regressors:

  1. lag: denoising necessarily involves averaging of some sort, resulting in lag relative to the underlier
  2. parameterization:  what parameter settings bring out the features of interest

The simplest regressors are ARMA based FIR or IIR filters.   Lag is easy to quantify as phase delay in those systems and harder in others.   Rather than focusing on lag, I want to consider the parameterization.

Parameterization
To illustrate the problem of parameterization, consider a simple exponential MA in two market scenarios:

  1. market with strong trends
    Long windows mask tradeable market movements.   A shorter window (or “tau”) is needed to capture market movements of interest.
  2. market trading sideways
    Short windowed MA oscillates on small movements.   Long window needed to reduce or eliminate noise that is not tradeable.

While I don’t use MAs for trade entry, the general problem of adapting a regressor to features of interest is important.

Penalized Least Squares
The penalized least-squares spline is known to be  the “best linear unbiased predictor”  for series that can be modeled by:

Where, f(x) is typically a polynomial based function (typically a high dimensional basis function).   Characteristic of the penalized family of splines is the balance between least-squares fit and curvature penalty:

This minimization can be constructed into a matrix based system using the basis design matrix.  I’m not going to go into this here, but you can find many papers on this.  The formulation is straightforward, but it is very easy to run into numerical instabilities with straightforward solutions (trust me I’ve tried), so best bet is to use one of the tried and tested implementations (such as DeBoor’s).

Ok, the problem with the above is that the parameter λ is a free variable (i.e. an input into the minimization).   λ allows us to control the degree of curvature or oscillating behavior.   Here is the same series with 4 different levels of λ (underlier in black):

Flexibility is great.  Now how do I choose λ appropriately?   And how do I define appropriate?

Criteria
As mentioned above, with the incorrect choice of regression parameters result in  regressor that is either too noisy or misses features.

Now before I explain the criteria (heuristics really) that I came up with, let me point to some literature tackling the general concept.   Tatyana Krivobokova, Ciprian M. Crainiceanu, and Goran Kauermann, “Fast Adaptive Penalized Splines” (2007).   Their approach produces an evolving λ, one for each of the truncated basis functions through time, chosen such as to reduce the local error, but keeping enough error to be optimally cross-validated.

Though the above is interesting, and indeed produces some amazing results for certain data sets, the “smoothness criteria” are fundamentally different from what I am looking for.

I decided that my criteria is as follows:

  1. the amplitudes between min/maxima in the spline must meet some minimum amplitude-time
  2. the energy of the spline must be “close” to the energy of the original series

The rationale for the 1st point is that we do not want small oscillations in the spline (signifying that we need to tune for less noise).   The second point tunes in the other direction, that is, if the spline is too stiff, missing many features, the energy of the spline will be too low relative to the original series.

Algorithm
The two above criteria break down into:

  1. the integral between a maximum and minimum ≥ threshold
  2. the integral of f(x)^2, where f(x) is the spline

As I did not see an easy way of building into a system of equations took the “poor mans algorithm” approach, namely:

  1. binary-style search between low and high values for λ
  2. if amplitude/area < threshold choose higher lambda else lower
  3. repeat until some granularity

Works well!

3 Comments

Filed under strategies

Commissions

It is quite frustrating that commissions arrangements are so opaque.   Basically professional trading firms have to negotiate with the venues (if you are big enough to go direct) or with your prime broker.

I was previously on the “sell side” so have a pretty good idea of what commissions were in the FX & Rates markets.   I don’t have much idea about this on the equity side though.

I have a new strategy in the equities markets and trying to determine what sort of commissions would be involved.   It would be good to know what the sell side and hedge funds “see” in terms of fees as a point of reference.    What sort of upside woud one have in terms of commissions should one structure as a well capitalized fund?

So I did a bit of digging on the web.   The only information out there that I can find is for retail.   On the retail side (I use IB at the moment):

I’ve run across a number of articles like this, indicating commission costs of ~ 2 cents / share for institutional trading (this must be a typo or I am misunderstanding the article).

I’m sure fees on the equity venues are much less than the bid/ask spread (which for liquid issues is 1 cent or less).   In fact some venues provide rebates (give you money) if you are providing liquidity rather than aggressing.   Now for a buy-side firm with a prime-brokerage arrangement, whose goal is not equities market making, the costs must be quite a bit higher than going direct.   I would guess would still be less than 1/100th of retail.

In any case, I am trying to get a better handle on what the true costs are for different situations.   If anyone has some indicative #s for commissions on the buy side would be appreciated.

3 Comments

Filed under strategies