Given the complexity of the markets, it is easy to fall into the trap of overfitting when creating a parametric model of some sort. There are a variety of approaches to avoiding this, some heuristic and others theoretical, such as:

- cross-validation (in-sample, out-of-sample)
- likelihood weighted information criteria

I want to briefly look at information criteria today. The Kullback-Leiber divergence metric is probably the seminal work in this area. Since then there have been a number of measures such as Akaike Information Criterion, Hannon-Quinn, etc. More recently (well 2000) Hamaparsum Bozdogan developed another measure (ICOMP) which has more appeal for me in terms of what it captures. Conceptually the measure weighs the following:

- likelihood of the model given the parameters (lack or degree of fit)
- the complexity of model parameters (lack of parsimony)
- the complexity of model errors (profusion of complexity)

A simplified form of the measure is as follows:

where Σθ and Σε are the parameter and residuals covariance matrices respectively. λ’s are the eigenvalues of each respective matrix.

We look to minimize the above function (effectively maximizing likelihood against the countervailing complexity measures):

The measure of complexity is explained as follows from his paper:

Complexity of a system (of any type) is a measure of the degree of interdependency between the whole system and a simple enumerative composition of its subsystems or parts.

The contribution of the complexity of the model covariance structure is that it provides a numerical measure to assess

parameter redundancyandstabilityuniquely all in one measure. When the parameters are stable, this implies that the covariance matrix should be approximately a diagonal matrix.In general, large values of complexity indicate a high interaction between the variables, and a low degree of complexity represents less interaction between the variables. The minimum of

Complexity(Σ) corresponds to the least complex structure. In other words:

Complexity(Σ) → 0 as Σ→ IThis establishes a plausible relation between information-theoretic complexity and computational effort. Furthermore, what this means is that the identity matrix is the least complex matrix. To put it in statistical terms, orthogonal designs, or linear models with no colinearity, are the least complex, or most informative, and the identity matrix is the only matrix for which the complexity vanishes. Otherwise,

Complexity(Σ) > 0, necessarily.

Why bother? Well the most commonly used criterion (the AIC) has does not adequately capture complexity and is know to be biased for some model systems. The approach also has greater intuitive appeal.

Thanks! Any RSS available for your posts?

I think all wordpress blogs have rss feeds. I tried https://tr8dr.wordpress.com/rss and it directed me to https://tr8dr.wordpress.com/feed/rss/

Hi, I’ve been pouring over some of your thoughts, and have to say I’m fascinated by many of your observations and the breadth of your interests.

I’d be very interested if you could show an example:

for instance, fitting an ARIMA model to an equity time series, whereby this complexity measure would provide a more robust fit, than say an AIC measure. Have you done any practical comparisons to see which gave the better generalization (in sample and some test data hopefully as well)?

Thanks and look forward to reading more of your ideas.

Hi, Just saw your blog, looks nice, adding to my list. With regards to this penalty approach versus AIC, I have actually applied this to VECM estimation. I found that for VECM estimation AIC consistently under-penalized, leading to a model with more parameters. I had reason to believe that the parameterization (in this case lag) should be less. The ICOMP approach tightened the estimate.

Not sure if I’ll have time to put together, but may put on an addendum detailing a specific application. In short, for a linear system: Y = X B + E, where Y (nxp), X (nxq), B (qxp), E (nxp), reduces to:

np log(2π) + n log |Σ| + np + 2 [ (n+q) C(Σ) + p C(inv (X’X)) ]

where:

C(Σ) = 1/p Sum[λ] / Product[λ]^(1/p)

and Σ is the covariance matrix of your model residuals.

Why stick with a single model when the model building purpose is prediction?

Consider Bayesian Model Averaging to get optimal forecasts or any other version of the idea. Finance people are starting to use it as I hear…