Monthly Archives: December 2009

Equity Clusters 2

The correlations between daily returns show, sometimes, surprising relationships.  Looking at relationships on 5-day cumulative returns produced clusters more in line with the traditional view.

This is not surprising as weekly returns will have less “noise” than daily.    A more robust measure than correlation needs to be used to determine whether relationships seen on daily or higher-frequency are bona fide.

3 Comments

Filed under strategies

Equity Clusters

I am putting together some portfolios to be auto-traded using a dynamic portfolio asset allocation algo.   I had put together a maximum spanning tree (shown in a previous posting) to observe relationships between securities.

In this iteration have gone further:

  1. Heatmap colors to indicate average volatility levels for a given security relative to others
  2. Maximum spanning tree clusters to reveal which diversification group a given asset belongs to
  3. Edge thickness indicates strength of relationship

I am interested in both the diversity and the volatility profile of the asset pool.   I will pick the majority of assets from the set with mid-range volatility (oranges), as opposed to low vol (reds), and high vol (yellows).    Classifying the asset set into clusters based on correlations provides an automated way of observing the diversification group the asset belongs to.

Algorithm
The algorithm is loosely as follows:

  1. calculate lower-triangular correlation matrix of returns for, say, s&p 500 stocks
  2. sort in descending order by correlation
  3. set up graph structure
  4. loop through correlations selecting pairs of assets
    1. if neither in graph add as new cluster pair
    2. if one in graph and other not, attach new to existing
    3. if both in graph but size of clusters < min cluster size, merge clusters
  5. repeat until all assets accounted for and all clusters have size >= min cluster size
  6. annotate & plot

Clusters (daily returns)
Here are the clusters the algorithm produced:

Of course this can be applied to any asset set.  Thought is a useful visualization, though there are many other dimensions of interest.

4 Comments

Filed under strategies

End of a Decade

It is tempting to say that the last decade has been an interesting time to live in on Wall Street.   But that would belie the observation that the 80s, and the 90s also came with dramatic changes to the markets and for practitioners.

I was not on the street during the 80s, but the 80s were really the start of the acceleration point towards model and technology driven trading.   The 70s brought us the famous Black-Scholes model, the 80s a variety of synthetic instruments such as swaps, swaptions, CMOs, etc.    The 80s really had not fully ushered in the leveraging of technology as it would do in the 90s and the ultimate step-up in the 2000′s.

The 90s
Arguably without technology we would not have gotten much farther than basic option modeling.   The 90s saw the rise of quantitative modeling and financial application of technology unlike any time previous.   To be a quant or quantitative developer in those days was exciting and rewarding.

I remember joining Lehman Brothers in the early 90s.  Swaps and swaptions though about 10 years old at that point still were traded on the back of HP calculators or maybe lotus 1-2-3 spreadsheets.   The spreads were wide, in the 10s of basis points, today in fractional basis points.

With the full embrace of technology in investment banks it was a matter of time before we saw an explosion in complexity in exotic derivatives.   Over time many of these exotics would become mainstream “vanilla” products.   The interest rate markets moved in the direction of more volume in vanillas and more complexity in exotics.

Of course the equity markets also developed out derivative products, but more interestingly were “quietly” building out increasing sophistication on the “program trading” front (as it was called in those days).   This was mostly unique to the equities markets, whereas the interest rate and FX markets continued to be largely OTC.

Program Trading
Program trading is the grandfather of all we know as “Algo Trading” today.    The NYSE introduced DOT in the 80s as a means to provide automated clearing and semi-automated execution (order routing) to the manned floor.

Program trading facilitated basic execution algo, index arbitrage, and proprietary strategies.    Index arbitrage involved buying or selling index futures against executing a basket of the same or similar components.    The game then as it is now was speed.

In some of the foreign equity markets electronic execution was prohibited, throttled, or limited in various ways.   For instance I remember that the Japanese MOF would not allow electronic execution to the exchange or at least required keying of orders by humans placed in Osaka, perhaps partially to protect such jobs, but maybe also to protect companies with less sophistication.

Morgan Stanley, Goldman, and Lehman had each, individually, hacked the serial lines from such terminals so as to simulate typing orders.   Technically they employed people at the exchange to lend some credibility to the idea that they were following the rules, but was pretty much common knowledge that this was going on.   They just happened to have very fast typists ;)

The “equity guys” pioneered many of the execution strategies that we use today in the 90s on the back of the growing program trading business.

The 90s (and perhaps late 80s) saw a number of (now) well known hedge funds spawn from this environment such as D.E. Shaw, Citadel, Renaissance Technologies, etc.

The 2000s
Thinking about it the 2000s encompass a period with a number of large failures in the market:

  • the end of the internet bubble
  • the explosion and implosion of credit markets
  • the failure of major wall street firms and life-support for the designated survivors

Much has been written on the above, so I would like to focus on innovation and future direction.     Here are some thoughts on what has characterized the last 10 years:

  1. commoditization of derivatives
  2. program trading (now “algo trading”) crossover into other asset classes
  3. automated market making
  4. automated trading via statistical or rule driven strategies
  5. faster, faster, faster;)

What I really read from the above is that the days of the instinct driven prop-trader or market maker are numbered.   A raft of traders are being replaced by teams of quant / traders, usually more on the quant / CS side than time spent in trading.

The new setup is often a group of quant / dev / traders that develop trading strategies and a much smaller number of “execution traders” that manage the strategies day to day.    Now *that* is really exciting, but perhaps shows my bias being in the former group.

I think the big days of derivatives are over (mostly).   Regulations and standardization will push for more commoditization, automated clearing, and eventual exchanges.   The new areas of innovation in the medium term for derivatives need to be in risk management IMO.

The Next Decade
It would be hard to predict the next decade in the markets given the rapid pace and surprises of the last 3 decades.    The markets are a function not only of the practitioners but of global events, regulation, governments, and seen or unforseen technological advancements.

Here are some predictions:

  1. quantum computing goes mainstream in automated trading
  2. fewer traders more quants
  3. algo dominates all asset classes
  4. hard to find job as derivatives quant

My strategies are market ambivalent, however there are some changes in progress:

  1. US$ devaluation, possible move to another cross currency (maybe in 20 yrs)
  2. Dominance of china in market and world politics

Wishful thinking:

  1. Dramatic changes on Wall Street in terms of Risk Practices
  2. Wall Street takes long term investment focus
  3. Transportation economy largely moves to electric by end of decade (more like 20yrs).  End of oil domination.
  4. Reset of compensation and fees on Wall Street; Incentivize more CS / Physics / Mathematicians to do one and the same ;)

Beyond
We note that the street is moving increasingly into, say, automated market making.   Now firms engage in this for their own benefit, but it serves an important function in providing liquidity to the market, a useful function beyond speculation.

A real AI will be achieved, but doubtful in the next couple of decades.  It may take another 50 years or more to achieve.  I suspect the first AI will be achieved as a neural mapping of a human brain to a machine.   Whether it is achieved this way or through an evolution of our machine learning and knowledge algorithms, accelerated with quantum computing, is immaterial.

Such a development will have dramatic consequences not only for mankind but also for the markets.

The financial services as they stand eventually will be coordinated by an AI.   An AI will be developed by a research organization and be deployed at some point in the market.  More than one may be deployed, at which point the game will become so tight that in the end the AIs will effectively be running the market.

Beyond market making, broader responsibilities could be given to manage money supplies, handle capital allocation / investment (now via loans, public stock offerings, etc).    Anything a human can do could eventually be done more efficiently and at magnitudes lower cost by an AI.

This would effectively put a whole industry out of business.   Would it not be better for great minds to be deployed in the pursuit of knowledge, sciences, etc?    Today it is hard to make a living that way.   I can only hope that there would be more balance in the direction of the sciences and progressive commerce.

Leave a Comment

Filed under strategies

Asset Selection

I have a portfolio-based strategy that uses a probabalistic model to determine the “optimal” portfolio allocation vector for each trading period.    This can be applied either intra-day, daily, or across longer ranges of time. The dynamics and composition of the portfolio may change depending on holding period.

The algorithm uses an online learning and optimization approach.   Because this is computationally expensive, want to do some up-front analysis to determine the composition of the portfolio, i.e. what set of assets will provide a good degree of balance and information for weighting decisions.

So for instance, if we chose the S&P 500 stocks as the pool from which to select, which of these stocks should be selected?   There are many approaches to this and it really depends on the characteristic of your strategy.   A “buy & hold” style mutual fund manager is going to pick based on fundamentals with a view on growth over a medium – long term period and in keeping portfolio volatility within some limit.

In the case of this strategy, I am market neutral and look to have both appreciating and declining assets (provided there is sufficient liquidity).   However, as with buy & hold style portfolio managers I am also looking to provide a certain amount of diversification and balance in the portfolio.

Maximum Spanning Trees
A nice way to look at the basic relationships amongst assets is via a maximum spanning tree on the correlations between assets.   The idea is that such a graph spans all assets but each asset connects to only its strongest relationship.

One starts with a “seed” asset and determines the strongest relationship with another from the set and traverses recursively.   Additionally only include assets with relationships above a given correlation threshold weeding out assets with weak relationships.

10 Years in FX
With a small number of assets, a full graph is reasonably intelligable, though carrying less information than a spanning tree in terms of relationship strength.

Here is a spanning tree showing the relationships for a selection of currencies on daily returns (with a correlation threshold of 0.4).  In the case of daily returns one does not see strong connections between european currencies and south america currencies:

The same analysis on weekly cumulative returns shows stronger relationships:

10 Years of the S&P 500
Here are the spanning relationships for the daily returns on S&P 500 stocks with absolute correlations > 0.7:


3 Comments

Filed under strategies

Time Dilation

Many measures work best in a homoscedastic volatility regime.   This is not a big secret.    Most regressors, the simplest of which are the ever popular moving averages, are especially biased in the context of a heteroscedastic series.

Probably the best way of normalizing a heteroscedastic series into one with near constant variance is to observe the following.   If we assume our process is roughly a SDE with normally distributed innovations (or alternatively a Hurst constant close to 1/2), we know that:

As a rough measure, we can remove much of the vol of vol by scaling our time axis in proportion to the variance.   I use a duration based local volatility measure with smoothing or alternatively for daily data an EWMA based evaluation of:

We can then change measure:

where ψ(t) is a smoothing / scaling function.   An example of such a scaling (with the red curve in the upper pane indicating the degree of scale from the baseline):

8 Comments

Filed under technical-analysis, volatility

CRP = CR*P?

Procrastinating on a hard problem, I decided to take a brief diversion to look at Constant Rebalanced Portfolios and Universal Portfolios, lured by Max Dama’s post on UP (Universal Portfolios).   I had read papers on these in the past but never explored them empirically.

CRP is the underpinning of Universal Portfolios, so will focus on CRPs.   Simply stated the CRP approach allocates a fixed % of capital to each asset in the portfolio.   The portfolio is rebalanced each period as some assets will have disproportionally increased or decreased in value.   As Max points out, this is basically a mean-reversion scheme.

Unfortunately, this is a “blind” mean-reversion scheme in the sense that there is no measure of the likelihood of mean-reversion or the period over which it will take place.   The implicit assumption is that mean-reversion will occur between rebalancing periods.    The more worrying aspect is that money in “winning” assets will be diverted to “losing” assets (where by losing, refer to assets that trend downward with little MR in the upward direction).

Examples
The classic (and absurd) example where CRP does phenomenally well, is of a pair of assets where one asset is constant and the other asset appreciates and depreciates on alternating periods (in this case up 25% and down 25% repeatedly).   Needless to say, this provides exponential growth (I could only plot the first 100 days without obscuring the other detail):

Empirical Tests
I did not find that CRP did much better than the equivalent Buy & Hold portfolio with the same weightings.  Indeed, depending on transaction costs, could do significantly worse.     There are some asset sets, undoubtedly, that would do much better than Buy & Hold over specific periods, but would be few and far between given the fragility of CRP assumptions.

Making the Concept Work
The CRP concept is one of moving money away from assets we expect will mean-revert (in the negative direction) and increasing money in assets that will mean-revert (in the positive direction).   This is done in a rigid fashion and makes no observation as to whether a devaluing asset is likely to mean-revert in the next period.

It had occurred to me that modifying the rebalancing to take into account the likelihood of mean-reversion or more generally, the likelihood of appreciation or depreciation in the next period would be a better guide in rebalancing the portfolio.    Of course, the performance of such a scheme depends on the degree of accuracy of such measures.

Not surprisingly, there has been work in this area, for instance the ANTICOR algorithm described by (Borodin, El-Taniv, and Gogan) in “Can We Learn to Beat the Best Stock”.    Their approach is to use the autocorrelation and cross-correlation of prior periods to adjust the portfolio weighting in a CRP portfolio.

The fragility in this approach is two-fold:

  1. relies on fixed windows as a means to determine correlation or anti-correlation
    I would expect that mean-reversion cycle periods would differ for different assets.   That said, they need a way to compare across assets, so may be a reasonable compromise.
  2. correlation is a coarse measure
    There are other measures that may be more effective in determining future mean-reversion or direction.

That said, the approach is parsimonious and seems to have performed quite well in the empirical tests.

Pattern / Sequence Learning Approaches
There have been a number of papers describing approaches where the choice of weightings for the portfolio in the next period is determined based on finding prior patterns that match the current local pattern in past data.   I.e. the prior K returns are converted to a series of symbols and compared to historical sequences.   One attempts to locate the approximate sequence one or more more times in past history and observe the optimal portfolio weights that maximized cumulative return in the past.    This is repeated with varying length and discretization “fuzziness”.

The observed weights are blended based on the performance of the past weightings and degree of match with the current sequence.    The authors point to excellent results on empirical data.   It is surprising that there is a reasonable amount of information in the daily returns, I had thought would be more dominated by noise.

One Final Note
Although I have been “disparaging”  CRP, it can be shown that some weighting of CRP will yield the most optimal portfolio provided returns are i.i.d.   There lies the rub.

Leave a Comment

Filed under machine-learning, portfolio

Advances in Computing

Have been watching advances in computing power for some time, since university really.   I did research in parallel algorithms and architecture for my first position at the university and later applied practically on Wall Street.   In those days super-expensive machines like the Intel Hypercube, Paragon, and many other architectures were the backbone of the HPC community.

HPC (High Performance Computing) roughly breaks down into 4 categories:

  1. Big iron supercomputers (MIMD generally)
  2. Distributed computing (these days advertised as Cloud Computing)
  3. The emerging SIMD GPU based solutions
  4. Quantum Computing (not really here yet for the mainstream)

In the machine learning and optimisation world there are massive problems, some of which are not computable on von-neumann architectures, as their runtime would be astronomical.    An (absurd) example of such a problem would be to simulate a large number monkeys typing on typewriters, stopping when one produces the works of Shakespeare.   The number of monkeys required to produce such a work on average in astronomical.     This seems like an absurd problem, but is comparable to the GP / GA approach.

Then of course there are numerous problems with high dimensionality and/or with polynomial order complexity.

Supercomputing on the Cheap
The FASTRA team at the University of Antwerp has put together an inexpensive multi-teraflop machine with 7 gaming cards.  Check out their video.

Unfortunately the “easy” part of these sort of solutions is the hardware.  The problem is the (often) great expense to develop one’s models in a SIMD framework, so can be applied to for the GPU architecture.    Although there is now standardization on the low-level C-variant used to program GPUs, there are significant differences between different models of GPUs, that even if you manage to write a correct SIMD program, may have to rearrange for a specific GPU implementation.   (I guess this is not all that different from my experiences with big-iron parallel architectures of the past).

One could have a team devoted to parallelization, tuning, and retuning / reworking for the new GPUs that are out periodically.   Very time consuming!

For my work, the problems that would map well are particle filters and monte-carlo based models, each of which have obvious fine-grained parallel operations.

Quantum Computing
The other notable announcement this week was Google’s use of quantum computing to solve pattern recognition problems.   I have not done the leg-work to fully understand the algorithms in quantum computing, but broadly it seems to be a matter of framing one’s problems statistically as path integration problems (i.e., expectations), where quantum computing allows the paths to be explored simultaneously.


5 Comments

Filed under HPC, machine-learning

Learning a Sequence

I had been looking at predicting durations (or the intensity) to model price behavior and variance estimation.    As mentioned previously, the prevalent ACD models in the literature do poorly.   Before moving on to another topic wanted to revisit this, with an idea for future approach.

Here is a sample of durations for a high-frequency price series:

9.30, 0.26, 0.28, 4.21, 0.04, 0.21, 3.23, 0.04, 2.28, ...

I decided that rather than trying to regress for specific durations, where there are an infinite number of possible values (theoretically), transform this into a set of symbols so that there are a finite number of states say:

S1, S2, S3, ...

where S1 might represent durations in [0, 0.25], S8 durations in [3, 3.5], etc.   The sequence of states for the above durations might look like:

12 → 1 → 1 → 9 → 1 → 1 → 8 → 1 → 7 → 6 → 6 → 6 ...

This turned out to be useful.

SVM
SVM on a radial basis kernel did a much better job of predicting the next symbol (duration) in a sequence than the ACD models.   It was still not  a suitable level of prediction however.

The problem with SVM and related approaches in general is that you either need to have a problem that can easily be categorized in high dimensional linear vector space.  A big part of this is finding the kernel that will map your (usually) non-linear vectors into a linearly separable space.    Also, SVM is arguably better suited to binary classification as opposed to multinary classification.

ANNs
In theory, an ANN with enough neurons can asymptotically approximate any function.  There are many problems in arriving at a general solution though:

  1. Calibration
    Standard techniques of backpropagation (essentially gradient descent) solve for a local optimum, which depends on the starting configuration.   A global optimum can be found with meta-heuristic approaches such as GAs, however, at significant computational cost.
  2. Overfitting
    It is very difficult to come up with networks that generalize.   Part of the success in doing this involves choosing training sets and configurations carefully.

Nevertheless, this may be an approach worth exploring.

Probabalistic Graph Models
As our duration pattern is essentially a transition from one state to the next, modeling as a probabalistic finite state machine appeals as model.  The idea with such an approach would be:

  1. empirically observe all chains of length ≤ some maximum
  2. determine the frequency of chains
  3. factorize into the smallest graph that reproduces those chains within some error

The chains, for instance:

A first approach to this problem is to consider whether can be modeled as a markovian state system.  It is, however, doubtful that the states {S1, S2, S3, …} can be modeled in a strictly markovian setting without the use of additional states.

For instance, is  P(S1|S2) the same as P(S1|S2, {prior states})?   The duration data shows dependence beyond the immediate prior state.    Therefore we have to expect that P(S1|S2, {S5,S1}) will differ from P(S1|S2, {S2,S3}), whereas in a markovian model, the probability of S1 can be conditioned purely on the prior state.

Such a markovian system might look like:

The HMM (Hidden Markov Model) combats this assumption by assuming that there is a hidden markovian process (usually with more states than the observed state system).   One can easily prove that a HMM of infinite size can exactly model all possible state chains (sequences) amongst a finite set of states.   Of course we are interested in a much smaller model that can reproduce most of the observed chains with limited error.

Here is a sample structure, where the black lines are edges between hidden states and the red edges indicate correspondence between hidden state and observed state.   The red edges are not traversed:

Aliasing Issues
Remember that we have arbitrarily subdivided durations (which are continuous) into N discrete states.   The idea was that the difference between say 0.25 seconds and 0.22 seconds is not important for our purposes.   One would think that less granular states will allow for  easier modeling of the state sequence.

The problem is that we are dividing these discretely.   We run into an aliasing problem where a specific duration partially belongs to the set represented by S(i) and S(i+1).   For instance for a sequence of length 3 we have 4 possible true state paths, each with associated probability.   Without compensating for aliasing we see the states (naively):

With aliasing we have the following possibilities:

As our path length approaches N, we will have 2^(N-1) possible paths.  One possible implementation of this is train with the M highest probability paths.

Fuzzy HMM
Aliasing is a kind of fuzzy set membership.   Aside from aliasing there are a number of reasons why we should consider fuzzy state membership:

  1. The data may be noisy, obscuring the pattern
  2. Discretisation error (aliasing)

Not surprisingly, other people have thought of fuzzy state membership in the context of HMM.   There are multiple fuzzy HMM models.   To be investigated …

Leave a Comment

Filed under machine-learning, volatility

Mean in the context of Mean-Reversion

I want a running mean estimator that acts as a mode through mean reversion cycles of target amplitude or frequency.   The key characteristics should be:

  • adaptation to local volatility
    • determination of diffusion related squared return
    • determination of jump related squared return
    • determination as to how much of the jump should be absorbed into the mean
  • model of mean reversion
    • calibrated to a desired long-run rate of reversion
    • allowance for changes in reversion constant and reversion to long run
  • model of mean
    • autoregressive
    • innovations scaled by sigma term (with MR component and jumps removed)
  • recursive backward estimation of ML
    • implicitly decide how innovation is distributed amongst mean, mean-reversion, and noise

A SDE-based Approach
The model is an expanded variant of the familiar Ornstein-Uhlenbeck process, with specialized mean-reversion, mean, and volatility processes.   It also attempts to correct for jumps.    Let’s start with the following SDEs (in continuous time):

Variance
There are many approaches to modeling volatility (all with issues).   Initially I had though to use a predictive model based on:

  • intensity process (based on “first exit” duration)
    This is a very complex process.  First approximations have been to use ACD, a family of AR models for duration.   ACD models perform very poorly on HF data however.    It seems that a markov chain model recognizing the patterns will be most appropriate.
  • amplitude process
    The amplitudes of squared returns seem to follow a largely AR process.   This seems fairly well behaved.

Before fully committing to a complex volatility model thought its makes sense to first try with a non-predictive measure of realized variance.  I will use:

The choice of α determines the degree of smoothing with previous values based on how local (and noisy) we want this function to be.   For example, here is the estimate with a smoothing factor of 60 and a threshold of 3e-5:

Discretising
Using Ito’s lemma we discretise the processes as follows:

Simplifying the volatility term in S(t), we first determine the variance of the SDE:

We reorganize as follows:

Putting it together
We can now model this discretely as a state-space based filter, searching for parameters that fit a-posteriori idealized view on the mode and mean-reversion process.   Post-parameterization, the process can be used in real-time to provide an estimate of the mode.

Final Notes
As you may have seen I took a (useful) 2-3 week diversion before coming back to the SDE based approach.   This is not a final model by any means, but I think a a solid starting point.    The purpose of the above is as a one of a number of factors in a multi-factor  strategy that want to optimize further.

Leave a Comment

Filed under mean, state-space-models, statistics, stochatistic, volatility