Consolidated Source of Data for Bitcoin

It seems like every other month there is a new bitcoin exchange.  For the purposes of trading research & backtesting it is important to have historical data across the most liquid exchanges.  My minimal list is:

  1. BTC/USD
    1. bitfinex (15%)
    2. bitstamp (5%)
    3. coinbase (new, but likely to garner market share)
  2. BTC/CNY
    1. okcoin (28%)
    2. btcn (44%)

(percentage volume sourced from   Each of these exchanges not only has a unique protocol but also unique semantics that need to be normalized.

For example, bitstamp produces the following sequence of transactions for a partial sweep of the orderbook.  For example, here is a partial sweep, where a BUY 14 @ 250.20 was placed, crossing 4 orders on the sell side of the book:

Orderbook sweepIn Bitstamp, would see the following transactions:

  1. NEW BUY 14 @ 250.20, id: 43
  2. DEL  SELL 1.2 @ 250.05, id: 23
  3. UPDATE BUY 12.8 @ 250.20: id, 43  (updating the size of the aggressing order)
  4. TRADED 1.2 @ 250.05
  5. DEL SELL 0.3 @ 250.10, id: 24
  6. UPDATE BUY 12.5 @ 250.20: id, 43  (updating the size of the aggressing order)
  7. TRADED 0.3 @ 250.10
  8. TRADED 8 @ 250.20
  9. DEL BUY 0 @ 250.20, id: 43

The oddity here is that many market data streams & orderbook implementations will just transact the crossing in 1 go, so one will usually only see:  DEL, TRADE, DEL, TRADE, DEL TRADE (and deletes may not be sequenced between the trades either).  Where it gets odd is in replaying this data in that a typical OB implementation will sweep the book on seeing the order right away without intermediate UPDATE states.   In such an implementation, seeing UPDATE to non-0 size after crossing and deleting the order completely might be seen as an error or a missed NEW, since the order is no longer on record in the OB.

Another note is that Bitstamp does not indicate the side of the trade (i.e. which side aggressed), though this is uncommon in markets such as equities or FX, Bitcoin exchanges do provide this.   Fortunately because the initial crossing order is provided can use a bipartite graph (in the presence of multiple crossing orders) to determine the most likely aggressing order and therefore the trade sign.

Clearing House for Data

I would like to build and/or participate in the following:

  1. build robust normalized L3 or L2 -> L3 (implied) orderbook live feeds
    1. used to collect data into a simple binary tickdb format
    2. also can be reused as connectivity handlers for live trading
  2. normalize transaction stream (such as issues in the example above)
  3. identify buy/sell designation on trades based on exchange specific semantics
  4. in addition to exchange specific tick streams / dbs, create a consolidated OB stream:
    1. synchronized market state to nearest ms
    2. normalized orderid space so that order ids do not collide and can identify order source
  5. simple means to generate bars or filter for trades from the L3 data

It takes some amount of time to develop & fairly small amount of money to run in terms of hosting.   Assuming there are not EULA issues in doing so, could perhaps provide data as a non-profit sort of arrangement.   Not looking to build a for-profit company around this rather a collective where can give something back to the community and perhaps be able to make use of donated resources and/or data.


1 Comment

Filed under strategies

Reinventing the Wheel

I guess I am old enough to have seen the wheel reinvented a number of times in both software infrastructure and financial technology spaces.  Unfortunately the next generation’s version of the wheel is not always better, and with youthful exuberance often ignores the lessons of prior “wheels”.   That said, we all know that sometimes innovation is a process of 3 steps forward and 2 steps backward.

Two examples that have impacted me recently.


An example of a poorly reinvented wheel is Hadoop / HDFS in the technology space, where Amdahl’s law has been ignored completely (i.e. keep the data near the computation to achieve parallelism).   When one writes to HDFS, HDFS distributes blocks of data across nodes, but with no explicit way to control of data / node affinity.   HDFS is only workable in scenarios where the data to be applied in a computation is <= the block size and packaged as a unit.

“Everybody and his brother” are touting big data on HDFS, S3, or something equivalent as the data solution for distributed computation, in many cases, without thinking deeply about it.   I was involved a company that was dealing with a big data problem in the Ad space recently.  I needed to track & model 400M+ users browsing behavior (url visits) based on ~4B Ad auctions daily, relate each URL to a content categories (via classification & taxonomy of concepts), and for each user, then determine a feature vector for each user.

I argued that using HDFS for this data would be a failure with respect to scaling.  The problem was that auction data, 4B records of <timestamp,user,url, …>, was distributed across N nodes uniformly, rather than on userid MOD N.  HDFS does not allow one to control data / node affinity.  My computation was some function over all events seen in a historical period for each user.  This function:

  • F( [<timestamp,user1,urlA, …>, <timestamp,user1,urlG, …>, …])

needed to be evaluated for each user, across all events for that user and could not be decomposed into sub-functions on individual events reasonably.    The technology I had to use was Spark over HDFS distributed timeseries.  I argued that to evaluate this function for each user, on average, (N-1)/N of the requested data with need to be transported from other nodes (as only 1/N of the data was likely to be local in a uniformly distributed data-block scheme).   The (lack of) performance did not “disappoint”, instead a linear speedup (N x faster), produced a linear slowdown (approaching N x slower due to the dominance of communication).

Bitcoin Exchanges

Bitcoin popularized a number of excellent ideas around decentralized clearing, accounting, and trust via the blockchain ledger.  The core technology behind bitcoin is very innovative and is being closely watched by traditional financial institutions.   The exchanges, on the other hand (with some exceptions), seem to have been built with little clue as to what preceded them in the financial space.

  1. JSON / REST is popular as a web transport, but is not an efficient or precise transport for market data.
    1. clue: provide a Nasdaq ITCH like binary stream-based feed for efficiency and precision OR FIX if you insist.
  2. Some exchanges (like Bitstamp) have a super-slow matching engine
    1. sweeping the book takes multiple seconds to transact
  3. Provide orderbook updates as transactions and not top-K levels
    1. transactions (new order, delete order, update order, traded) are more compact, provide more information, & are timely.
  4. Provide keyframes
    1. i.e. provide an enumeration of orders in the orderbook on connection and perhaps periodically in the stream to bootstrap the subscribers view of the orderbook and validate later in the stream.

Leave a comment

Filed under strategies

Bitcoin, why I am still interested

The Not-so-Great News

Bitcoin has been in a (mostly) negative trend from its high (~1175) to, as of today, a low of 165, since early 2014.  It has also been associated with negative news across the year, concerning: bitcoin theft, shoddy exchanges, illicit uses, etc.

Bitcoin has been a victim of its own “success” in terms of asset valuation.  The ascent to the > 1000 price was largely built in the Nov – Dec ’13 period, and naturally the coin is reverting to sustainable levels.

Unfortunately, the ascent to 1000+ was associated with a period where 2 algos were aggressively buying ~600,000 BTC from Mt Gox.   It is believed (though not confirmed) that these algos may have been used to perpetrate the fraud that stole the similar amount of bitcoin from Mt Gox (and its user base).   In other words, the algos may, through aggressive buying, have set up the momentum that took the price from sub $200 to over $1000.  The whole movement may have been ignited and sustained by a fraud.  See this analysis.

Bitcoin now is in the perfect storm of:

  1. likely slow liquidation of stolen bitcoin (how much is left, hard to know)
    1. Evidence showed that above algo started to liquidate in early 2014, however could not have liquidated the complete holdings (unless through another agent).
  2. liquidation of mined bitcoin as miners are desperate to recover what they can in a fallen valuation
    1. A significant amount of mining infrastructure was invested in on the basis of inflated BTC valuation.   With adjustments to hash difficulty this may normalize, but for now should pressure miners to liquidate.
  3. financial markets in disarray, triggered by the falling oil -> commodities -> EMG, Equities, etc.
    1. Expect this to leak into the BTC market as well.
  4. More news of stolen bitcoin

I believe Bitcoin and cryptocurrencies in general will recover however.  The negative impact of all of the above makes it very likely that BTC will sink further before it recovers (so far today it hit a low of $165).

The shake-out from this has and will increasingly give rise to more secure financial technology as opposed to the quick-and-dirty implementations (such as Mt Gox’s php travesty).  That said, developing carefully and with appropriate safeguards is not easy, especially in the context of a startup racing to get a product out the door.

So why am I (still) interested

Bitcoin is one of the most transparent marketplaces (if not the single-most).  For example:

  1. trades are labeled in terms of which side aggressed (buy or sell)
    1. buy/sell imbalance
    2. orderbook resiliency
  2. orderbook information is very useful (not as obfuscated with the games in other markets)
    1. OB slope & resiliency
    2. OB prediction modeling

Where it lacks:

  1. low-liquidity compared to more traditional assets (but enough for smaller operations)
  2. futures / forwards market is very low liquidity
  3. shorting can be difficult

In short, like many new markets & asset classes, there is often wider opportunity for earlier participants.   Markets tend to tighten as they mature.


Filed under strategies

Causality in observational data

In the past had created clusters on assets to help identify relationships across assets.   The resulting graphs were useful in identifying assets for use in mean-reverting portfolios.   A difficulty in the approach was always around accurately measuring the strength of relationships between asset pairs.   I had looked at Granger-causality, which is fairly limited in that expects asset relationships to follow a VECM-like model, and eventually settled on weighting across a number of techniques as an approximate.

Determining causality for more general relationships requires a very different approach, where Y ← f(X) + ε.   i.e. f(x) may be an (unknown) non-linear function on X.   I came across an interesting paper: “Distinguishing cause from effect using observational data: methods and benchmarks” (link) which builds on work around looking at the asymmetry of the “complexity” of p(Y|X) versus p(X|Y), where if  Y ← X (X causes Y), p(Y|X) will tend to have lower complexity than p(X|Y).

The paper provides results on two methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI), where ANM generally did better across a variety of scenarios than IGCI.

The algorithm for ANM in python-like pseudo-code (taken from the paper):

def causality(xy: DataFrame, complexity: FunctionType, complexity_threshold: float):
    n = nrows(xy)
    xy = normalize (xy)
    training = sample(xy, n/2)
    testing = sample(xy, n/2)

    ## regress Y <- X and X <- Y on training set
    thetaY = GaussianProcess.regress (training.y, training.x)
    thetaX = GaussianProcess.regress (training.x, training.y)

    ## predict Y' <- X' and X' <- Y' & compute residuals on testing set
    eY = testing.y - GaussianProcess.predict (testing.x, thetaY)
    eX = testing.x - GaussianProcess.predict (testing.y, thetaX)

    ## determine complexity of each proposition on variable and residuals
    Cxy = complexity (testing.x, eY)
    Cyx = complexity (testing.y, eX)
    ## determine whether Y <- X, X -> Y, or indeterminate
    if (Cyx - Cxy) > complexity_threshold:
        return X_causes_Y
    elif: (Cxy - Cyx) > complexity_threshold:
        return Y_causes_X
        return None

The paper recommended using a scoring of the Hilbert-Schmidt Independence Criterion (HSIC) as the complexity test. I will have to code the above up and see how it does on a broad set of assets. The paper did test the CEP benchmark which includes some known financial relationships.


Am quite busy now with another investigation, but will revisit this with a proper implementation.


Filed under strategies

Thompson Sampling

I recently attended a talk by David Simchi-Levi of MIT, where he discussed an approach to online price discovery for inventory, to maximize some objective, such as profit.   The scenario was where one could observe whether inventory was sold or not-sold to for each potential buyer in the marketplace, giving an empirical view of demand-behavior as a function of price.   The optimal setting in selling the inventory is one that  maximizes price x liquidation probability.

When we have no knowledge about the true demand for inventory as a function of price, we must start with some prior estimate in the form of a sold/unsold distribution (the demand function in terms of price) and modify this online during the selling period.   The setup of the problem is then:

  • have a fixed period in which can sell the inventory
  • can observe whether a prospective buyer bought at offer price or rejected the price as too high
  • determine a number of different selling price levels & some prior on their respective probability of sale at each price, starting with a uniform distribution.

This can be modeled as the 1-armed bandit problem, where we decide amongst n bandits  (slot machines), determining which slot machine gives the highest payout through iteratively adjusting our model based on observations.

In determining the optimal price, can formulate  as a sequence of bandits: a sequence of prices ranging from low to high.  Associated with each price is a distribution which represents our view on the demand associated with this price (i.e. the probability of sale).  With no prior, can start with an equi-probable view on demand across prices, starting with an initial Beta distribution on at each price level of B(α=1,β=1).

Rather than provide the formal setup for Thompson sampling, will illustrate its use for the above problem.  The sold/unsold distributions & selection of price can then be evolved as follows:

  • take a sample from each distribution associated with a price (representing the probability of sale at a each given price)
  • choose the ith price (1-armed bandit) that maximizes the objective: p(sale[i]) * price[i], where p(sale) is the sampled probability from each demand/price distribution
  • offer the ith price to the marketplace
  • the unit of inventory is observed to be sold or unsold
  • adjust the beta distribution associated with price[i] (the price used in the offer) as follows:
    • if sold:  B’ = B (α+1, β)
    • if unsold: B = B(α, β+1)

The net effect over time is that the distributions will converge such that the optimal bandit (associated with optimal price) will offer the highest probability in sampling or at least the highest expectation (price x p(sold)).  The approach (Thompson Sampling) has many applications in finance for online optimisation.  Definitely going to be part of my toolset going forward.

Leave a comment

Filed under strategies


The T-SNE approach is a very intuitive approach to dimensional reduction and visualization of high dimensional features.  I Implemented this in F# recently and had used some implementations in R and python previously.  I have found this very useful in determining whether I have designed a feature set correctly, such that has a natural degree of separation with respect to labels.

The algorithm is elegant: determine the pairwise probabilities of points being neighbors in high dimensional space, find a distribution in  2-dimensional space that, to the extent possible, reflects the same distances (probabilities). with a bias towards preserving locality.   This is solved as an iterative gradient descent on the pairwise relationship across n points.

The problem is that it is an O(n^2) problem, equivalently solving the n-body problem.   My data sets are in the 100s of thousands or millions, so you can imagine, this is no longer feasible on a single processor.

Barnes-Hut is an approximation approach for n-body problems bringing the complexity to O(n log n), but can only be effective if ones partitioning of space is coarse.  For some of my problems, have found that cannot get appropriate separation with this sort of approximation.

In terms of parallelizing, n-body problems require fine-grained parallelism to solve efficiently.   Given that fine-grained parallelism is limited in scale (how many processors can you have on 1 machine or GPU), have been considering whether there is an iterative hierarchical approach that could be applied with distributed parallelism.  I would not expect that such an approach could get anywhere close to linear speedup, but perhaps can do better than a limited # of local nodes.

Another approach to this problem, which is easily distributed would be a meta-heuristic approach such as differential evolution.

Leave a comment

Filed under strategies

Money Management

It has been almost a year since my last post.  I have been far too busy getting a new trading desk up and running.   I  thought to discuss money management, since am revisiting right now.


It is easy to think that trading signal is the most important aspect of a trading strategy, but money management (and execution) can be even more important.   Loosely defined, money management is a mechanism for position-level risk management.  The mechanism attempts to regulate, accomplishing a number of things:

  1. ride out a profitable signal as long as there is continued profit potential
    1. close out a profitable position when the p&l is drifting sideways
    2. close out a profitable position, alternatively when there is a drawdown
    3. otherwise, allow the position to continue to profit, even through transient negative noise
  2. close out our losing positions as quickly as possible
    1. close position once we have a view that it is unlikely to be profitable
  3. close out strategy if seems unstable
    1. for example keeps hitting stop-loss
    2. risk measures indicative of unstable market situation for strategy

A desirable feature of a money manager is that when pairing the money manager and signal together, we have a return distribution with positive skew and very limited negative tails.   We can even have a signal with < 50% wins, but because of the generated bias in + returns / – returns, have an overall positive equity curve.   Of course I would advocate for much a much higher win ratio than 50% 😉

Signal → Position

I take the approach of having a trading signal that scales between [-1,1] or [0,1] on a continuous basis.   In my trading systems the money manager not only works as a risk manager, but also decides how to scale the signal into a desired position.

For example, if our maximum position is $5 million, we might scale our desired position from 0 to $5 million (if the signal reaches full power at 1).  The 0 or close to 0 level would indicate being out of market, 0.5 being at 1/2 strength or 2.5 million in.   Here is an example signal from 0 to 1:

Trading signals can be noisy, though we do our best to provide smooth signals.   Without regulation of how we map the signal to position size, the up and down dips in the signal would imply thrashing in and out of position, which would be costly.

Hence, we should try to enforce direction monotonicity, so as to avoid thrashing.

Types of stop-loss

There are a number of stop-loss types we should consider:

  1. stop-loss:
    1. stop when (smoothed) equity curve has reached a negative return threshold
  2. stop-profit:
    1. exit an up-to-current profitable trade, but one that has lost some % from the high
  3. stop-drift
    1. a time and slope based stop that closes out a position whose equity curve is drifting more-or-less sideways for a significant period

Risk Reentry Avoidance

On a stop-loss not only want to close the position, but also have to “back away” from the signal, such that we do not immediately get back into an undesirable situation.   Depending on why we exited the position, we may want to:

  1. disable market entry until the signal has gone to zero
  2. impose a time penalty
  3. impose a market reentry restriction (wait for the market regime to reach a certain stage)


Here is a finite state machine that illustrates a possible state system guiding position scaling and money management:

The state system expresses some of the above money management features.   To make it effective, one needs to be clever about deciding on whether a negative movement is a sign to get out or a transient movement.   One can use a combination of signal, smoothed series, and aspects of order book and trade data to have a better read on this.


Filed under strategies