Model Parsimony

Given the complexity of the markets, it is easy to fall into the trap of overfitting when creating a parametric model of some sort.    There are a variety of approaches to avoiding this, some heuristic and others theoretical, such as:

  1. cross-validation  (in-sample, out-of-sample)
  2. likelihood weighted information criteria

I want to briefly look at information criteria today.   The Kullback-Leiber divergence metric is probably the seminal work in this area.   Since then there have been a number of measures such as Akaike Information Criterion, Hannon-Quinn, etc.    More recently (well 2000) Hamaparsum Bozdogan developed another measure (ICOMP) which has more appeal for me in terms of what it captures.   Conceptually the measure weighs the following:

  1. likelihood of the model given the parameters (lack or degree of fit)
  2. the complexity of model parameters (lack of parsimony)
  3. the complexity of model errors (profusion of complexity)

A simplified form of the measure is as follows:

where Σθ and Σε are the parameter and residuals covariance matrices respectively.  λ’s are the eigenvalues of each respective matrix.

We look to minimize the above function (effectively maximizing likelihood against the countervailing complexity measures):

The measure of complexity is explained as follows from his paper:

Complexity of a system (of any type) is a measure of the degree of interdependency between the whole system and a simple enumerative composition of its subsystems or parts.

The contribution of the complexity of the model covariance structure is that it provides a numerical measure to assess parameter redundancy and stability uniquely all in one measure. When the parameters are stable, this implies that the covariance matrix should be approximately a diagonal matrix.

In general, large values of complexity indicate a high interaction between the variables, and a low degree of complexity represents less interaction between the variables. The minimum of Complexity(Σ) corresponds to the least complex structure. In other words:

Complexity(Σ) → 0 as Σ→ I

This establishes a plausible relation between information-theoretic complexity and computational effort. Furthermore, what this means is that the identity matrix is the least complex matrix. To put it in statistical terms, orthogonal designs, or linear models with no colinearity, are the least complex, or most informative, and the identity matrix is the only matrix for which the complexity vanishes. Otherwise, Complexity(Σ) > 0, necessarily.

Why bother?   Well the most commonly used criterion (the AIC) has does not adequately capture complexity and is know to be biased for some model systems.   The approach also has greater intuitive appeal.



Filed under strategies

5 responses to “Model Parsimony

  1. Spyrous

    Thanks! Any RSS available for your posts?

  2. Hi, I’ve been pouring over some of your thoughts, and have to say I’m fascinated by many of your observations and the breadth of your interests.

    I’d be very interested if you could show an example:
    for instance, fitting an ARIMA model to an equity time series, whereby this complexity measure would provide a more robust fit, than say an AIC measure. Have you done any practical comparisons to see which gave the better generalization (in sample and some test data hopefully as well)?

    Thanks and look forward to reading more of your ideas.

    • tr8dr

      Hi, Just saw your blog, looks nice, adding to my list. With regards to this penalty approach versus AIC, I have actually applied this to VECM estimation. I found that for VECM estimation AIC consistently under-penalized, leading to a model with more parameters. I had reason to believe that the parameterization (in this case lag) should be less. The ICOMP approach tightened the estimate.

      Not sure if I’ll have time to put together, but may put on an addendum detailing a specific application. In short, for a linear system: Y = X B + E, where Y (nxp), X (nxq), B (qxp), E (nxp), reduces to:

      np log(2π) + n log |Σ| + np + 2 [ (n+q) C(Σ) + p C(inv (X’X)) ]


      C(Σ) = 1/p Sum[λ] / Product[λ]^(1/p)

      and Σ is the covariance matrix of your model residuals.

  3. Why stick with a single model when the model building purpose is prediction?

    Consider Bayesian Model Averaging to get optimal forecasts or any other version of the idea. Finance people are starting to use it as I hear…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s