Pricing & Regime (part 1)

High frequency trading tends to fall into the following categories:

  1. some form of prop market making (probably largest %)
  2. gaming other players or microstructure
  3. short-period arbitrage
  4. execution algos
One might also run medium frequency strategies that pursue:
  1. trend (if there is sufficient momentum)
  2. cycles (follow the pivots for high amplitude price cycles)
  3. longer-period arbitrage
Focusing on market making and trend/cycle following, understanding price regime (the gross characteristics of price movement) and pricing function (a view on expected forward price) is important.

We might like to have an indication of regime to determine whether we should be market making, following a trend, following cycles, etc.

In particular, with market making, there will be periods in the market where market making is “dangerous” (i.e. is almost guaranteed to lose money).   Aside from the obvious danger in gapping prices around news or other events, the main danger is in offering during periods of strong momentum.

There will be a strong order selection bias such that when the market is going up (down) you receive buying (selling) orderflow almost exclusively, and end up with short inventory against an appreciating (depreciating) price.

Hence, there will be periods during a trading day that are “good” for market making (i.e. when there is balanced order flow and very gradual price drift) and periods which are better suited for momentum or high-amplitude cycle following strategies.

Pricing provides us with an expected mean and its derivatives over some forward period, as well as an estimate of noise and MR activity. This information is invaluable in determining appropriate market making offering prices and in following high amplitude price activity.

We would like to have the following:
  1. A pricing function with (often) good accuracy over some forward period, tailored to the requirements of the regime
  2. A notion of price regime
  3. Other metrics such as upper and lower noise around the price function (mean), period and amplitudes of monotonic segments, etc.
In the context of market making we are looking to have a monotonic and almost linear price function that represents a mean or a mode through price noise.   Calibrated in this fashion one has a convenient view on the forward price function and the upper and lower bounds on prices we are willing to offer on.   This combined with inventory can be used to dynamically bias the offerings.

In the context of momentum or cycling we are looking for a pricing function that follows the momentum or cycles with appropriate curvature and is reasonably predictive (or retrospective with low-lag) with respect to pivot points.

Here is an example of a pricing function (the green line) during a momentum period (over the course of some hours), encountering a disturbance and change in regime to a cycling mode thereafter:


Here is the continuation within the series with the pricing function in a momentum / cycling mode:

Note that the above is smoothed a-posteri, so the trajectory of the filter has more noise as it progresses tick to tick.

History Of Approaches
I’ve used many approaches in the past in modeling the price function:
  1. variety of filters from the signal processing space
  2. signal decompositions and partial reconstitutions
  3. a number of different stochastic state systems with an EKF (extended kalman filter)
  4. direct calibration of least squares splines with various heuristic rules
My current approach is the #4, but would like to find a good solution using a SDE in a bayesian framework, as is more elegant and flexible.    My past experience with EKFs was that they are hard to parameterize (via the state and measurement covariances) and can easily become unstable in the presence of unexpected noise.

A few months ago Max Dama posted a link to a paper describing a TV-VAR model, but more of interest, an alternate approach to describing the kalman filter evolution rate with discount factors.   The discounting approach solves one of the problems in parameterizing Kalman filters.

That and in discussion with some other quants (hi Sasha) recently motivated me to reexamine the SDE approach.   I’ll be discussing an approach in some detail in the next posts.


Filed under strategies

Detecting Price Direction in Order Flow

As indicated in a previous post, high-freq market data is typically in the form of order updates, for example:

This can be used to reconstruct the orderbook at any given instant, but can also be used to analyze movements within the orderbook.   The most primitive approaches look at the incidence of size being removed or hitting a given price-level.    More sophisticated approaches try to determine order streams to see where an order is being moved to in the book.

Given that we can produce a variety of measures that can add information to our trading decisions.   Here is an example of a USD/JPY on a downward move of 10 pips with strong momentum.   The mid price is in the upper pane and the cumulative bid and ask movements (derived from order data) are in the lower pane:

One will note that the ask flows are more aggressive than the bid flows.   The above is not yet terribly useful, as the cumulative flows look to follow the price path more or less during a period of momentum.

More interesting is to look at what is happening in the book:

The reds represent areas of high cancellation and blues high insertion (I amplify this by scaling by the size of movement from one region to another).    You’ll notice the following:

  1. Seller Book
    There is high movement into the inside (or more aggressive) areas of the book, in-line with the fact that the price has downward momentum.   There are vertically “trailing” deletions from the outside price levels as traders move their prices en-mass to the inside.
  2. Buyer Book
    Buyers / market makers are risk adverse and are moving their orders deeper into the book away from the inside.  Again we see a trail of deletions in the inside-most levels and strong movements in a couple of deeper bands.

Sideways Movement
What does “sideways” or non-directional movement look like then?

It looks a lot more random, with some upward and downward directional patterns in sequence.

Cleaning It Up
The activity can be denoised to produce a signal that is much easier to read:


The approach is particularly useful for understanding movements over a short to medium timeframe.   However long-running price movements often have periods of “indecision” where the flow and price flounders for a period.   Ultimately this sort of approach needs to be evaluated on multiple windows to accomodate various timeframes.

(sorry this is mostly pretty pictures without much explanation.  Should provide some ideas though).



Filed under strategies

Ponzi-Like Bidding Scheme

Just for fun:  There is a retail bidding site QuiBids which has received a bit of press in the US recently.   The attraction is that they offer products (such as ipads) for, say, an average price of $30-50 (whatever the highest bid is).    One may wonder how this works.

Most auctions are “penny auctions”, meaning that each bid placed on an item increments the price by $0.01.   Prospective buyers pay for each bid they put forward on an item.   In fact, the company charges $0.60 / bid.   So with an end-auction price of, say, $50, the ipad has brought in $3000, because there were 5000 bids to achieve that price.  Wow, these guys are making a lot of money in terms of % markup!

The bidding session gets prolonged by ~10 seconds each time a bid is made in the last seconds.   Basically, it seems to me that the winner is the person in the pool of bidders that has not exhausted his prepaid bids or otherwise given up.

Since one can see the bidder with the top bid in real-time as the end-game bidding occurs, perhaps one could game it, observing the timeseries of bids each participant has placed and come up with a view on likelihood of having reached their bidding limit.   At that point one would start testing by putting in a bid and determine who is left.

If the gaming commission understood what they are doing here, suspect they would have to get a gambling license.    Interesting “not-quite-a-scam” business concept.   I have to think that eventually most participants, not having won anything will discontinue with the site.


Filed under strategies

Bipartite Matching

For very low-level order analysis we try to determine continuity from one order to the next as a stream of orders across a period.   This is applicable for orders generated by a given trader (algo) that are more-or-less continually in the market.

In most cases an order near the top of book has a short lifetime, but an algo will maintain continuity through resubmission into the order book at a different price level.   This stream of orders is often masked in various ways, but often has discernable patterns, depending on the sophistication or consideration towards masking this in the algo.

Going from order deletions to new order insertions we want to attempt to match up the old and new orders (where applicable), to determine the continuity of an order stream.   We therefore want to map from an old-set to a new-set:

However, in order to determine, we have to explore all possible mappings with weights between these edges:

The optimal mapping will be one that finds the set of one-to-one edges that has maximal weight.   This is a problem in graph theory and optimisation theory.   This can be expressed rather neatly as a linear constraints problem where we have:

Where Xij is a matrix of 0 or 1, 1 indicating an edge and 0 no edge.  Wij is a corresponding matrix of weights for each edge.   Integer (or integral) linear programming solutions are classified as NP-hard and do not have an efficient algorithm.    We will be evaluating these on a relatively small # of orders each time, so the cost of a brute force solution is not bad.

The real trick is in determining the weights, representing the degree of similarity.  Going further, one may generate a tree of these (that spans out exponentially, but is trimmed regularly).   Some decisions on prior matches will be wrong and should be reevaluated holistically.

Is it worth the effort?   Depends on what information you are interested in and what your game is.   We do a mixture range ultra-high frequency to medium frequency.   This is one sub-problem of a bigger picture.


I should mention that the worst case brute-force approach involves examining combined weights for each permutation of edges.  This is a O(n!) algorithm that quickly becomes uncomputable.   For the specific bipartite matching problem, rather than the general Integral Linear Programming problem there are much more efficient solutions.   One of the simplest (though not the most efficient) is the “Hungarian algorithm“.   A reader pointed out a paper for an “online” approach to matching as well.


Filed under strategies

What you don’t know …

Sometimes there are precipitous drops (or ascents) in price over a very short period in the market.   Over a short period,  momentum analysis from prices or order flow, is not going to allow one to avoid becoming a casualty — Unless, you have  up-front information or lower-latency access to the event source than most participants.

However, when it comes to information impacting intra-day FX, it is disproportionately known to a small # of players:

  1. large market makers (such as banks in the FX market) have access to significant flow and therefore can have a read on supply and demand for a given asset.
  2. banks also know who their trading-counterparts (customers) are and whether the customer is “smart”,  tending to trade ahead of movements.  They also have a good view on how much size is traded over what period.  This is valuable information that they can piggyback on.

The small HF hedge fund participates on the anonymous venues without access to either of these.    Now a fund may have better models, but still lacks a direct view of the market flows as indicated above.    Whether it be large-flow houses or smart hedge funds, some smaller % of orders represent “smart” or in-the-know money and the larger % will tend to be more reactive.

A Drop

Here is an example of a ~10 pip drop that occurred over a period of 200ms:

The bottom pane contains order-streams for a few bid-side orders of interest in this discussion.   dp is expressed in distance from the inside price (for the bid side this means a larger DP means a lower order price).   By order-streams, referring to implicit cancel/replace activity by a trader (algo).

Notice that the orders have been moved deeper into the book (as seen by a spike in dp) as a defensive move against a falling price.  Looking more closely, this set of orders actually made the move before the drop (aha, “smart” algo working here).

Notice that four of the order streams (green – blue) moved orders deeper into the book ~60ms before the drop and the other colors a ms or 2 before drop but in-time to avoid a fill.


If you don’t have early access to information, look for a participant who does and use appropriately.   This is easier said than done, however. Detecting participants (i.e. order-streams) and determining whether they are “smart” or “dumb” is not straightforward.   I will touch on some of the related problems in a later post, but unfortunately cannot discuss everything in detail.


Filed under strategies

HF Simulation

Simulation for high-frequency strategies is, at best, a decent way to take one’s strategy through its paces, perhaps giving a view on profitability.   In no way can it present an accurate view on profitability, rather at best can look at the strategy’s performance under various synthetic assumptions.

Naturally our simulator must reflect:

  1. typical order event latency (perhaps with some jitter)
  2. typical market event latency
  3. accurate orderbook
  4. view on trade fills
  5. view on impact (icing on the cake, perhaps not necessary)

The Data

There is just not enough information presented in market data to model with complete accuracy nor can we predict with certainty, the effects of our own trades on subsequent market events.   With access to a HF feed, we receive events such as:

We are able to see most individual order activity.  Primarily, the operations are New, Update, and Delete order.    The exceptions are:

  1. hidden orders
  2. orders that immediately cross
  3. IOC orders (immediate or cancel)

What is shown or not shown may vary from venue to venue.   With the above we can reconstruct the orderbook as a set of price levels on the bid and another set on the ask.  Each price-level represents a queue of orders against which executions will occur on a FIFO basis.

The orderbook (abbreviated) might look something like this:

A simulator using historical data needs to reconstruct the orderbook, maintaining proper FIFO behavior.   The two biggest problems in simulation are:

  1. determining when / if a trade happened
  2. determining market impact of strategy trades on historical data

With the various FX venues trades are shown selectively, and at that, with significant lag to the actual event.   Order (New or Update) events that would have crossed are also not shown.   This leaves us to guess when a trade is likely to have happened.   We can execute our strategy under various scenarios of trade fill.

Possible Trade Events

Let’s look at some possible signals that a trade may have occurred:

  1. all orders on our level are removed
    1. due to trade or cancellation?
  2. inside order level(s) removed
    1. due to trade(s) or cancellation?
  3. one of the above + our level has moved within epsilon of crossing
  4. our level is now crossing

None of these except (4) can indicate a trade with certainty, however without some assumptions about the first three scenarios would only leave us with aggressive crossing or waiting for price levels to collide.

All orders on our level removed

Consider the above orderbook and the transformed below (where the removed orders are in red):

All orders on our orderlevel of 82.709 on the ask have been removed in the historical events (except our synthetic order from the strategy we are testing).   The likelihood that a trade happened is higher if the # of orders on the level and/or the maximum size historically for the life of the level is high.   With one order on the level removed it may well have been a cancellation.

In this particular case we see the order on our level and the first order on the next level removed.   This strengthens a view that a trade may have occurred.

We can assign a probability of trade based on evidence like this (though if using a random / probability approach, ones results will differ with every run).

Deeper Inside Level removed

Consider the orderbook below:

Our order is on the inside level and the level(s) above it is either cancelled or taken out.  This is an even stronger indication of trade if more than one level taken out.   Levels can disappear quickly from a number of possible stimuli:

  1. traded: large buying algorithm buys in size (this happens quite often during the day)
  2. moved: level primarily lead by market makers and activity elsewhere demands changing the offering
  3. risk aversion: pending event or volatility cause pullback

#2 is harder to gauge perhaps, but #1 and #3 are relatively easy to see in retrospect.

Order Moves Close to Crossing

We may have a passive order that drifts towards crossing (or is placed very close to crossing):

We see that our order is 2/10ths pips away from the cross.  On some venues there may be inside-of-inside hidden orders (for venues that don’t require partial exposure of size).  In those cases, a move that tight is likely to get executed right away.   Market makers with inventory may find a cross of 2/10ths appealing as well.

Other Trade Signals?

We did not look at what was happening on the other side of the book, but could possibly correlate activity on the other side and activity in the above scenarios to determine a stronger likelihood of a trade.   I’d appreciate ideas.


Filed under strategies

Orderbook Profile

I haven’t posted for quite a while as have been busy getting a new high-frequency venture off the ground.   We started trading 2 weeks ago so will start posting periodically again.

There is some modeling crossover between the medium-daily frequency to high-frequency space, but of course each of these areas has their own characteristics.

The highest profit-per-trade strategies focus on short-term cross-asset relationships, MR, and long-run mean prediction.   These strategies have holding periods in minutes, anywhere from 5-20 minutes generally.  That said, there is a whole world of opportunity at the higher frequencies.    In some cases the profit after transaction cost may only be a couple 10ths of a basis point, but with a very high win/loss ratio.

For some of the simpler HF strategies, interpreting the order book and the flows in and out are key.   Here is a sample order book profile showing the longevity of orders in the book versus initial positioning relative to the inside price (as-of just before placement).  +pips are outside of the inside price and -pips are better than inside (i.e. better than best bid or ask as appropriate).

Clearly for this sample most orders live in the 15sec – 15min range from inside to inside + 3 pips.   More interesting is to look at the higher frequency activity in a subset of this:

We can use this to see where most of the activity is occurring.   With a bit more analysis get an idea of what other algorithms are doing and their frequency in the market.    Seeing how this changes with market time and regime is also telling..

In the above density graph it is easy to see that +5/10ths pip is a popular target for HF algorithms in this sample, for order durations 1ms and larger.  There is also something interesting going on with orders < 1ms (basically a place followed closely by a cancel).   One can then trace through order streams against targets like these and get a view on other games in town.


Filed under strategies

Languages Again

I do a lot of my production/  performance-oriented stuff in Java (for better or worse) because it is so portable and is very close to the performance of C++ if you know what you are doing.   I would much prefer to be using F# / C#, as they are far superior as languages in my view (though C# owes its heritage to Java more than any other language).

The problem is that I need to deploy on Linux (and for that matter, my dev platform of choice is osx).  Here are some performance #s for 500 million iterations of the nbody problem on a core i7 920 linux box:

The nbody problem uses sqrt when calculating the euclidean distance between bodies.  Began to wonder whether much of the difference is in the implementation of the library math functions.   So I did an implementation that uses a (slow) numerical algorithm to calculate the sqrt, common across the languages.   In that way we have an apples-to-apples comparison.   The C++ vs Java numbers are now:

The difference is now 1.9%.   In fact the additional 10 seconds difference may be Java startup + JIT compilation.   This resonates with other comparisons I have done in the past.

I have code in Java that outperforms equivalent C++ code.  For instance, the JIT compilation is clever enough to avoid virtual function invocation when it can be avoided / determined at runtime.   The same cannot always be done as effectively with static compilation.    (The cost of virtual function calls is 3x more expensive than non-virtual on the intel architecture).

The JVM is just the best VM around (even faster than the MS .NET vm).   I would love to see the JIT compilation technology in the JVM applied to a cross-platform .NET VM, such as Mono.   Alternatively, if the JVM was expanded to include value types, a properly implemented generic type system, and some functional features, could imagine targeting IL for the JVM.

My ideal would be to be able to cross-compile C# and F# to JVM bytecode (much in the way one can map java bytecode to the CLR with ikvm).   The .NET libraries would be a much smaller problem given that the Mono implementation already exists.    The mono guys have their hands full, and seem to have focused more on breadth rather than ultra-high performance.

I suppose the .NET and Java communities are mostly distinct, so probably few people interested in seeing F# targetted to the JVM.   Can only hope that either the mono VM continues to improve performance and GC or a new subsuming VM fills the gap where the JVM once ruled.


Filed under strategies

Minimum Covariance Determinant

Outliers are very common in noisy data.  Unfortunately outliers can bias estimates of the moments.   In this case am interested in the covariance of a data set.   In addition to being prominent in stochastic equations, portfolio analysis (the usual stuff in finance), it is also used in linear projection and prediction.

Geometrically, the covariance matrix specifies an ellipsoid that circumscribes the primary dimensions of the data in N space (N being the dimension of the data).  Outlier data stretches the ellipsoid along the axis of the outlier relative to the mean.   Here is a diagram from a paper by Hubert and Verboven that illustrates this nicely:

For many applications we are interested in the “robust” covariance as illustrated in the smaller ellipse.

Illustrative Brute Force Algorithm
We want to find the covariance matrix with minimum volume encompassing some % of the data set.  Let us assume we have a set of observations X = {x1, x2, … xN} of size N.  Further let us assume 10% of samples are outliers.   One could construct a brute-force algorithm as follows:

  1. determine all unique sample subsets of size h = 0.9N
  2. for each subset S
    1. compute the covariance matrix of Cs = S
    2. compute Vs = det(Cs)
  3. choose the subset where Vs is a minimal

Geometrically, the determinant is the volume of the N dimensional vector space implied by the covariance matrix.  Minimizing the ellipsoid is equivalent to minimizing the volume.

A Better Algorithm
The brute force algorithm is very inefficient (or not computable) as one has to evaluate an increasingly astronomical number of possible subsets.
Rousseeuw developed an algorithm called FAST-MCD.

Supposing we take a random sample of size h.   We can evaluate the similarity between data points in the full set and  our randomly sampled subset.    In particular the Mahalanobis distance is used.  Let M be the mean of the random subset and S be the standard covariance of the random subset:

The algorithm is then as follows:

  1. choose a random subset of H0 of X, with size h
  2. repeat
    1. Determine covariance S and mean M of the subset H0
    2. Determine distances d(Xi) for all Xi relative to H with the Mahalanobis distance
    3. Choose the h smallest distances and create a new subset H1
    4. repeat with Ho <- H1, until Ho and H1 are equal or 0
  3. Evaluate from 1 for K times (maybe 500) and determine the selection that had the smallest volume.

The random sampling works because it can be proven that det(H1) ≤ det(H0) in the iterative portion (i.e. we are iteratively reducing the volume) and that the set self-selects the tightest representative volume of points.

The reevaluation of the random samples can be cut short at the 2nd or 3rd iteration based on volume and then a smaller set of random sets can be fully evaluated.

There are other tricks that can be used to subdivide the problem for N very large.

I implemented the algorithm this morning.   Works nicely.   Here is an example of normally distributed 2-dimensional data with outliers (the red indicates points that were selected as optimal for the robust covariance):


Filed under strategies

Embedding & Whitening

A reader asked about why we need to normalize differential entropy to remove autocorrelation (and cross-correlation).   I should have said, the whitening is trying to establish the difference (or ratio in this case) between the entropy of the series with autocorrelations / cross-correlations removed and the series with the correlations.

In fact, we are trying to isolate the correlations because this is where the mutual information is, the rest is the ambient noise and variance of the data set.

With the embedding vector we are trying to find a vector series that maximizes correlations.   Ok, so why do we need the normalization?

In searching for the appropriate embedding dimension and lag, we vary these quantities.  Increasing embedding dimension, for instance, is adding new variables to the system.

So for instance, let us say we have a system of 2 variables A and B, with a timeseries of {{A0,B0}, {A1,B1}, …} pairs.   Our embedding vector starts at dimension 2 { A[t], B[t] }.   Increasing the dimension, we have { A[t], B[t], A[t-1], B[t-1] } and so on.

Having added A[t-1], B[t-1], we may have increased the correlation between neighbors, but cannot be certain if we just observe the change in entropy without a relative measure for the same system variables.   We may have increased or decreased entropy just by adding a new dimensions, in addition to the correlation effect.

The goal of normalization is then to determine the contribution of entropy introduced by the new variables, without any correlations, and compare that to the possible entropy reduction produced by increased correlation between neighbors.

Leave a comment

Filed under strategies