# Monthly Archives: July 2011

## Unscented Transform & Filters

The unscented transform (and filter) was developed as a practical alternative to MC sampling in estimating the conditional distribution functions of hidden state systems.   We want to determine the hidden state (Xt)  given observations (Yt):

The  Sequential MC approach to determining this distribution p(Xt | Y1:t) is calculated as folows:

This works for us, since we know the observations y[1:t] and we should have a view on the likelihood of state X given Y by projecting samples of X into Y via function g(.).   There are various approaches to determining the likelihood and reducing the number of particles (samples) required to arrive at a robust integration (such as SIR).

The basic MC particle sampling algorithm is as follows:

1. generate N samples of X[t],i from f (X[t-1] + noise), transformed from X[t-1] to X[t] via that f(.) function
2. determine “particles” as p(Yt | X’[t],i) / normalizing constant, where p(Yt|Xt) can be determined by applying function g(x) and determining likelihood of computed Yt given Xt.
3. some resampling and kernel weighted selection may occur, depending on algorithm
4. Mean is then the probability weighted average of the Xt’s (as indicated in the above integral)

I’ve implemented and used a number of variations of these in the past.   MC approaches are unavoidable for complex distributions (for instance ones that are multi-modal).   That said particle filters can be orders of magnitude slower than the Kalman family of filters.

Unscented Transform
Enter the unscented transform.   Much as with the SMC approach, the unscented transform uses sampling to determine the mean and higher-order moments of the distribution: p(x[t] | y[1:t]).   Instead of generating numerous random samples, the unscented approach is to take samples at specific points around the current mean.

The unscented approach also avoids approximating the non-linearity of  system functions, instead looks to determine the simpler problem of approximating the distribution.  2^d + 1 sample (or sigma) points are determined around the mean and transformed through the non-linear functions to arrive at a view of the distribution of the state vector.

As illustrated in Julier and Uhlmann’s 1997 seminal paper:

The sigma points determined around the mean (Si and with associated weights Wi), are transformed through the non-linear system function allowing us to determine the moments of the distribution p(μ, P), with mean μ and covariance P (higher order moments can be computed as well):

It then becomes a question of how to select these sigma points Si around the mean Xt.   We know that we want to choose Si = Xt + Ei, where each vector Ei provides an appropriate spread in a given dimension, along the orientation of the distribution.  If these points are well chosen, will provide enough information to reconstruct moments of the distribution.

Under the assumption that Xt (pre-transform) is elliptically distributed, it turns out that the column vectors of the Cholesky decomposition of the covariance matrix specify vectors co-linear with the axes of the distribution.   These are determined and scaled as follows:

For more detail read the following paper.

Why the UKF?
The typical implementation of the EKF uses a linear (and sometimes quadratic) approximation in the update of the distributions.  This can fail spectacularly (or just present significant error) unless your system is well behaved through 1st order dynamics.   The UKF approach also does not require calculation of Jacobian or Hessian matrices, which for some problems may be extremely difficult or impossible to provide.

Next Step
I am particularly interested in the forward-backward UTF with smoothing.   I have found that smoothing on the priors (not just the immediate t-1) provides a better forecast for the current and next periods.   Will write more about this next.

Filed under strategies

## Thinking About State Space Filters

I have not used stochastic state based systems for a couple of years, but have decided to revisit.   I had previously implemented a number of systems with both the EKF and 3 variants of particle filter, but encountered various issues.

In particular, found the parameterization of the evolution via the covariance matrices to be opaque and problematic for some systems.    Also both the EKF and particle filters were subject to numerical instability if the mean of the state distribution or observed samples shifted significantly with likelihoods approaching 0.

I’ve decided to revisit state space filters but with a different filtering approach, for the purposes of coming up with a different approach to multi-regime pricing.

I’ll start with an introduction.  The traditional state space model is a discrete (or discretized continuous) system represented by a measurement equation and state evolution equation:

Where:

1. X is a n dimensional vector representing the state of the system on timestep t
2. Y is a m dimensional vector  representing the measurement on timestep t (for example a price in our price series)
3. f(.) is our state evolution function mapping the t-1 th state to the t th state
4. g(.) is our state to measurement mapping
5. ε_x our state noise / error ~ N (0, Σx)
6. ε_y our observation noise / error ~ N(0,Σy)

Kalman Family of Filters
The kalman family of filters uses a bayesian logic to implement a linear error correction model, such that errors in the estimation of Yt propagate back to the evolving state Xt (or more precisely, an evolving distribution of Xt).

The traditional Kalman filter assumes a linear function f(.) and g(.), usually expressed as matrices:

We are interested in estimating the state Xt given the noisy observed data Yt, hence are interested in the pdf p(x[t] | y[1:t]).  Given that we will have a lagged (t-1) view on this pdf via the recursion inherent in filtering, we first need to be able to compute p(x[t] | y[1:t-1]) and then using that p(x[t] | y[1:t]):

Hence as pointed out in (Hartikainen 2008) the above linear state equations map to distributions:

I won’t go into the math for the Kalman filter, as would like to discuss its issues and move on to more sophisticated approaches.  The 3 main issues I have with the Kalman Filter:

1. can only express the evolution of linear state functions
2. hard to calibrate degree of fit via state and measurement noise covariance matrices
3. can become numerically unstable with unexpected noise (probabilities go to 0 or may oscillate wildly)

The Extended Kalman Filter partially solves one of these issues (that of non-linear state functions) by adjusting the kalman distribution estimation to use 1 or 2 terms of the taylor series expansion of the state and observation functions.   This works well for functions completely described by 1st and 2nd derivatives, although has similar drawbacks in terms of calibration and stability.

Short of using a numerically expensive particle filter, it seems that a variant the Unscented Kalman Filter (UKF) presents the best choice for the potential state systems I will be using.    The UKF using a deterministic sampling approach mapped through the non-linear functions, provides multiple moments for the underlying distribution.   The distribution can then be described with greater accuracy than afforded through the EKF’s taylor expansion.

Topics to be Discussed:

1. Unscented Transform
2. Smoothing
3. Need for a Multiple Model approach with Markovian switching
4. Filter approach for multiple models
5. Possible Models

Filed under strategies

## Pricing & Regime (part 1)

High frequency trading tends to fall into the following categories:

1. some form of prop market making (probably largest %)
2. gaming other players or microstructure
3. short-period arbitrage
4. execution algos
One might also run medium frequency strategies that pursue:
1. trend (if there is sufficient momentum)
2. cycles (follow the pivots for high amplitude price cycles)
3. longer-period arbitrage
Focusing on market making and trend/cycle following, understanding price regime (the gross characteristics of price movement) and pricing function (a view on expected forward price) is important.

Regime
We might like to have an indication of regime to determine whether we should be market making, following a trend, following cycles, etc.

In particular, with market making, there will be periods in the market where market making is “dangerous” (i.e. is almost guaranteed to lose money).   Aside from the obvious danger in gapping prices around news or other events, the main danger is in offering during periods of strong momentum.

There will be a strong order selection bias such that when the market is going up (down) you receive buying (selling) orderflow almost exclusively, and end up with short inventory against an appreciating (depreciating) price.

Hence, there will be periods during a trading day that are “good” for market making (i.e. when there is balanced order flow and very gradual price drift) and periods which are better suited for momentum or high-amplitude cycle following strategies.

Pricing
Pricing provides us with an expected mean and its derivatives over some forward period, as well as an estimate of noise and MR activity. This information is invaluable in determining appropriate market making offering prices and in following high amplitude price activity.

Goals
We would like to have the following:
1. A pricing function with (often) good accuracy over some forward period, tailored to the requirements of the regime
2. A notion of price regime
3. Other metrics such as upper and lower noise around the price function (mean), period and amplitudes of monotonic segments, etc.
In the context of market making we are looking to have a monotonic and almost linear price function that represents a mean or a mode through price noise.   Calibrated in this fashion one has a convenient view on the forward price function and the upper and lower bounds on prices we are willing to offer on.   This combined with inventory can be used to dynamically bias the offerings.

In the context of momentum or cycling we are looking for a pricing function that follows the momentum or cycles with appropriate curvature and is reasonably predictive (or retrospective with low-lag) with respect to pivot points.

Example
Here is an example of a pricing function (the green line) during a momentum period (over the course of some hours), encountering a disturbance and change in regime to a cycling mode thereafter:

Here is the continuation within the series with the pricing function in a momentum / cycling mode:

Note that the above is smoothed a-posteri, so the trajectory of the filter has more noise as it progresses tick to tick.

History Of Approaches
I’ve used many approaches in the past in modeling the price function:
1. variety of filters from the signal processing space
2. signal decompositions and partial reconstitutions
3. a number of different stochastic state systems with an EKF (extended kalman filter)
4. direct calibration of least squares splines with various heuristic rules
My current approach is the #4, but would like to find a good solution using a SDE in a bayesian framework, as is more elegant and flexible.    My past experience with EKFs was that they are hard to parameterize (via the state and measurement covariances) and can easily become unstable in the presence of unexpected noise.

A few months ago Max Dama posted a link to a paper describing a TV-VAR model, but more of interest, an alternate approach to describing the kalman filter evolution rate with discount factors.   The discounting approach solves one of the problems in parameterizing Kalman filters.

That and in discussion with some other quants (hi Sasha) recently motivated me to reexamine the SDE approach.   I’ll be discussing an approach in some detail in the next posts.