Category Archives: genetic algorithms

Calibrating Non-Convex Functions

Many of the calibrations I perform are on non-convex functions, functions with multiple local minima/maxima.    Solving a min(max)imization for this class of function precludes a wide range of algorithms based on gradient descent (such as BFGS).    There are a number of “metaheuristic” approaches to solving non-convex problems, such as: simulated annealing, genetic algorithms, differential evolution, particle swarm, and brute force grid search, amongst others.

For the last couple of years I have been using JGAP, a library for optimization via genetic algorithm.    For some time was under the impression that most GA libraries were similar in terms of core algorithm and just different at the API level.   I ran into some problems with the new variance model I am working on, where JGAP was not converging in the expected manner.

I recently began using a package in R for ad-hoc calibrations (genoud) and found that it converges quickly, whereas JGAP did very poorly in comparison.   DEoptim is a very similar package (a straight implementation of differential evolution). Differential Evolution  a cousin of GA, but with a different approach to population evolution.

Differential Evolution
Differential Evolution is very similar to standard GA approaches in that both involve working towards the optimum via a randomly generated population of “trial vectors” (θ) and a directed evolution of those vectors towards the optimum.   Differential evolution is implemented as follows:

generate Population = { θ1, θ2, θ3, ... } where θ in domain of function
determine θbest ← maxarg [θ : fitness(θ)] forall θ
repeat
   foreach θi in Population
      draw { θa, θb, θc } randomly from P (without duplication)
      θd ← θa + F(θb - θc)
      θnew ← crossover (θd, θi) 

      # adjust ith vector if fitness of generated superior
      if fitness(θnew) > fitness(θi) then
         θi ← θnew

      # adjust best vector if new optimum seen
      if fitness(θnew) > fitness(θbest) then
         θbest ← θnew
   end
until fitness(θbest) approaches target

One of the key innovations is that the variance of the innovations (mutations) is dependent on the dispersion of the vectors.  At the outset the dispersion will be large, and as the optimization progresses, the dispersion will reduce and hence mutations will be more focused.   The choice of the function F(), operating on the difference of vectors, has various effects on the speed of convergence and exploration of the domain.  I will not go into it here.

Improvements with Hybrid Approach
Provided one has a continuous function with derivatives within the neighborhood of a trial vector, the algorithm can be improved with the use of gradient descent (such as the BFGS algorithm) to locate the true optimum within a neighborhood.    Care must be taken not to optimize too early such that the diversity of trials is lost.

Some Tests
Here are some results on the two R packages mentioned above.  Ultimately I need to get an implementation into Java, but wanted to do a comparison with existing implementations first.   I ran optimizations on a particularly pathological gaussian mixture (taken from the examples in the genoud package):

Which looks like:

It is clear that gradient descent (ascent) solvers will gravitate to one of the local maxima with the above function.   Testing DEoptim with a population of 100 and similar parameters, both arrived within 1e-4 of the answer within 10 generations (without any BFGS optimization) and 5e-6 within 60 generations.

With BFGS optimization on, genoud arrived within 1e-10 within 2 generations!   Unfortunately genoud arrived at the wrong maximum with some other (more complex) mixture function, whereas the pure DE approach arrived at the proper one.    With appropriate parameterisation can be avoided, however.

Leave a Comment

Filed under genetic algorithms

Strategy Discovery

Today I want to discuss the process of building or discovering a strategy. Generally medium to high-frequency models fall into one of the following catagories:

  • set of rules / heuristics on top of statistical observations
  • analysis of price signal
  • evolving state-based model of prices
  • spread or portfolio based relationships
  • technical indicators
  • some combination of these within an bayesian or amplification framework

These models share a common problem in that they are just crude approximations. They attempt to accurately determine behavior on a macro level.

The market is the emergent behavior of the trades and order activity of all of its participants. The perfect model is one that would have to be able to predict the behavior of each individual participant and be aware of all external stimuli affecting their behavior. This is at worst unknowable and at best would require something akin to an omniscient AI.

The best we can do is have a view or views around how to model market behavior. We can chose one of three approaches towards modelling:

  • create models that rationalize some statistical or behavioral aspect of the market
  • create models using a evolved program or regression, without a preconceived rationalization
  • create models that embody a combination of the above two approaches

Preconceived models have the advantage of being explainable, whereas, generated models often are not. That said, it is intriguing to pursue evolution and/or program generation as a means of discovering strategies in an automated fashion.

Rationale
Manual model development and testing is very time consuming. One will start with a conjecture or skeleton idea for a new strategy. The parameter space or variants of the idea may be large. Each has to be tested, optimized, retested.

Many of my strategies start out as models that digest raw prices and produce some form of “hidden state”. This hidden state is designed to tell us something useful with less noise than the original signal. This state may be multidimensional and may require further regression to map to buy and sell signals.

Obtaining optimal strategies point towards a multivariate numerical or codified regression approach. The testing and discovery of parameters and model variations would best be automated.

Tools
There are a number of approaches used in optimization, regression, or discovery problems:

  • Regression
    ANN (Artificial Neural Nets), SVM (Scalable Vector Machine), RL (Reinforcement Learning)
  • Optimization
    GA (Genetic Algorithms), Gradient Descent, Quadratic Optimization, etc
  • Discovery
    GP (Genetic Programming), perhaps ANN as well

Strategy Discovery
Thus far I have mostly used tools in the Regression and Optimization categories to calibrate models. Genetic Programming represents an interesting alternative, where we generate “programs” or strategies, testing for viability and performance.

The “program” represents a random combination from an algebra of possible operations that operates on a set of inputs to produce an output. In our case, our inputs will be the digested information that our models produce. The program will map this into something that can be used to generate buy/sell/out signals.

Thousands of such programs are generated and evaluated against a fitness function. The fitest programs replicate, perform crossover, and mutate. This can be repeated for thousands of generations until programs with strong trading performance are determined.

An alternative and perhaps simpler approach is to use an ANN coupled with a GA. The GA generates weights / connections between neurons to produce a model between inputs and outputs.

Questions Under Consideration
ANNs and GPs differ in a number of important ways. Need to think further on the following:

  • ANNs and GPs can represent an infinite number of functions
    ANNs accomplish this, though, at the cost of numerous neurons
  • ANNs and GPs may have a very different search space in terms of volume
    We want to choose an approach that will converge more quickly (ie have a smaller search space)
  • How should we constrain the algebra or permutations to affect convergence
    There are many “programs” which are equivalent, there may also be certain permutations we may not want to allow.
  • What sort of inputs are useful and how do we detect those that are not
    Inputs that are not useful should ultimately have very little trace through the model. Will have to determine how to detect and prune these.

More thought needed …

Leave a Comment

Filed under genetic algorithms, genetic programming, neural networks, regression, strategies

Trader Bots

I came across this site today. I’m not a huge believer in technical analysis as a basis for trading, however these guys are doing something interesting. They are generating / seeding strategies as a genetic program based on a combination of technical, momentum, and sentiment inputs into a neural net. These are then bred / cross-pollinated to refine further.

The next part is an extrapolation from the very little they have indicated. I suspect they are doing the following:

  1. Generate initial strategies using a random genetic program that selects inputs from a subset of available technical, sentiment, and momentum indicators.
  2. Calibrate to best possible trading signal (given inputs) using a ANN (neural net)
  3. Evaluate utility function across some years of historical data
  4. Based on results, refine by breeding the strategies with a GA
  5. Rinse and Repeat

It is an automated approach to strategy descovery, avoiding costly manual research. Though it does not appear to make use of more sophisticated inputs & models, the general approach is nice. It would not be a surprise to find that some of these strategies are successful.

The approach can be expanded to incorporate more sophisticated models as inputs (such as basis function based signal decomposition, stochastic state systems, etc).

1 Comment

Filed under genetic algorithms, neural networks, strategies, technical-analysis

Feed Forward NN in "real life"

Turns out that the Nematode Caenorhabditis Elegans has a nervous system that is similar to a feed forward network. A feed forward network is one where neurons have no backward feedback from neurons “downsignal” (i.e. the neurons and synapses can be arranged as a directed acyclic graph). This is very analogous to the feed forward network first envisaged for Artificial Neural Networks.

The worm has exactly 302 neurons and ~5000 synapses, with little variation in connection between one worm and another. This implies on average less than 20 synapse connections per neuron. This is in contrast to the mammalian brain, where most neurons have a feedback loop back from other neurons downstream of the signal.

I am very enthusiastic about this area of research as it progresses us step-by-step closer to realizing mapping an organism brain onto a machine substrate. The nematode is quite tractable because of the fixed and very finite number of neurons.

ANNs are no longer in vogue, but I use feed forward ANNs for some regression problems. Of course my activation function is likely to be quite different from the biological equivalent. ANNs are not a very active area of research given their limitations, but one does find them convenient for massive multivariate regression problems where one does not understand the dynamics.

The regressions that I solve only have sparse {X,Y} pairs if at all and can only be evaluated as a utility function across the whole data set. This precludes the various standard incremental “learning” approaches. Instead I use a genetic algorithm to find the synapse matrix that maximizes the utility function.

SVM is more likely to be used in this era than ANNs for regression. Its drawback is that it requires one to do much trial and error to determine an appropriate basis function, transforming a nonlinear data set into a reasonably hyperlinear dataset in another space.

Leave a Comment

Filed under biological systems, genetic algorithms, neural networks, regression

Evolution of Distribution

For a number of problems, understanding the evolution of the distribution over time is important. The distribution tends to be stable over longer periods and unstable over shorter periods.

The distribution is going to be measured from a sample over some time period. One may want to take a blend of distributions measured over different time periods, combined as basis functions with weights summing to 1.

The interesting bit is predicting the distribution forward with some statistical accuracy. The order book and momentum indicators should tell us something about how the distribution is going to transform over the next period or based on when a certain price level is achieved.

We are going to use a GA to calibrate the transformation function against historical data. There are many different functions we could use, so we use a GP approach to play with the permutations.

Leave a Comment

Filed under genetic algorithms, pricing, statistics

GP for option pricing

As you probably know GP (Genetic Programming) is an extension of GA which rearranges algebraic or functional instruction trees to fit to a solution.


I had not thought of it previously, but could use such an approach with the right set of functional constructors to converge on an option pricing GP. Now if all we were trying to do was to replicate the Black / Scholes, CEV, or other gaussian distribution based model, would not be very interesting.

We know that the actual distribution are often non-gaussian. Could we produce a more accurate approximation of the hedging cost against a non-gaussian distribution (implying the true risk free price of the option) with GP?

Interestingly, Neural Networks are just special cases of a GP tree, so in the end GP is the most general approach to non-linear regression.

Leave a Comment

Filed under genetic algorithms, pricing, statistics

Trading Signals

I’m putting together a framework to evolve, test, and optimize signals using a genetic algorithm approach. Signals with statistically significant results will be further combined into bayseian networks and fed back into this testing framework.

Determining the conditionality of one signal against another requires insight and guesswork, or evaluating permutations of networks. With a GA optimizer and enough computing power should be able to determine networks that successfully amplify the combination of signals to one that is correct more often than not (or at least is more successful in the profitable situations than the losing ones).

The universe of events and indicators that have potential to be significant is large, as are their parameterizations. Choosing the search space will be a challenge, as is sometimes having access to the required data.

I plan to overlay our tick UI with trading signal indicators indicating a probability weighting when a signal reaches a non-neutral threshold. Should be interesting to visualize the results.

2 Comments

Filed under bayesian networks, genetic algorithms, indicators

Applying Genetic Algorithms

I have done some experimentation with genetic algorithms in the past, but am now looking to incorporate as an optimization tool into my rules-based strategy framework.

The behavior of our strategies are often parameterized. There can be so many combinations of parameters, that to arrive at a sucessful or best performing set can be a matter of guesswork, testing and retesting.

Enter Genetic Algorithms (GA). GA allows us to efficiently arrive at a (nearly) best fit set of parameters based on a fitness function. For those not familiar with genetic algorithms (GA), GA is a biology-inspired gene/generational means of evolving a solution based on fitness criteria. A population of genes is produced on each generation, the fittest of which are chosen and recombined with other genes. Successive generations will tend to be more and more fit, arriving at a fitness maximum.

In truth GA, as exotic as it sounds, is just one of many optimization techniques where one is trying to minimize, maximize, or find a zero for a complex function. GA is easy to set up and can work with black-box functions, so is a good candidate.

I need to set up an environment to run the optimization in parallel given the computation required. For each day we evaluate a strategy will be looking at 200K – 500K ticks or more. So to evaluate over, say, 3 months of data is 30 million ticks. Multiply this by 1000 or more fitness evaluations, and one is looking to evaluate billions of ticks and rules.

Given that we have datasynapse available on hundreds of blades, may adapt JGAP and the strategy engine for this infrastructure.

Leave a Comment

Filed under genetic algorithms, optimization, parallel processing