High Performance Computing on the Cheap

I have a couple of trading strategies in research that require extremely compute intensive calibrations that can run for many days or weeks on a multi-cpu box. Fortunately the problem lends itself to massive parallelism.

I am starting my own trading operation, so it is especially important to determine how to maximize my gflops / $. Some preliminaries:

  • my calibration is not amenable to SIMD (therefore GPUs are not going to help much)
  • I need to have a minimum of 8 GB memory available
  • my problem performance is best characterized by the SPECfprate benchmark

I started by investigating grid solutions. Imagine if I could use a couple of thousand boxes on one of the grids for a few hours. How much would that cost?

Commercial Grids
So I investigated Amazon EC2 and the Google app engine. Of the two only Amazon looked to have higher performance servers available. Going through the cost math for both Amazon and Google revealed that neither of these platforms is costed in a reasonable way for HPC.

Amazon charges 0.80 cents per compute hour, $580 / month or $7000 / compute year on one of their “extra-large high cpu” boxes. This configuration of box is a 2007 spec Opteron or Xeon. This would imply a dual Xeon X5300 family 8 core with a SPECfprate of 66, at best. $7000 per compute year is much too dear, certainly there are cheaper options.

Hosting Services
It turns out that there are some inexpensive hosting services that can provide SPECfprate ~70 machines for around $150 / month. That works out to $1800 / year. Not bad, but can we do better?

Just How Expensive Is One of these “High Spec” boxes?
The high-end MacPro 8 core X5570 based box is the least expensive high-end Xeon based server . It does not, however, offer the most !/$ if your computation can be distributed. The X5500 family performs at 140-180 SPECfprates, at a cost of > $2000 just for the 2 CPUs.

There is a new kid on the block, the Core i7 family. The Core i7 920, priced at $230 generates ~80 SPECfprates and can be overclocked to around 100. A barebones compute box can be built for around $550. I could build 2 of these and surpass the performance of a dual cpu X5500 system, saving $2000 (given that the least expensive such X5500 system is ~$3000).

Cost Comparison Summary
Here is a comparison of cost / 100 SPECfprate compute year, for the various alternatives. We will assume 150 watt power comsumption per cpu at 0.10 / Kwh, in addition to system costs.

  1. Amazon EC2
    $10,600 / year. 100/66 perf x 0.80 / hr x 365 x 24
  2. Hosting Service
    $2,700 / year. 100/70 perf x $150 x 12
  3. MacPro 2009 8 core dual X5570
    $1070 / year. 100 / 180 perf x $3299 / 2 + $160 power
  4. Core i7 920 Custom Build
    $430 / year. 100 / 80 perf x $550 / 2 + $88 power
  5. Core i7 920 Custom Build Overclocked
    $385 / year. 100 / 100 perf x $550 / 2 + $100 power

The Core i7 920 build is the clear winner. One can build 5-6 of these for the cost of every X5570 based system. Will build a cluster of these.



Filed under HPC, performance

6 responses to “High Performance Computing on the Cheap

  1. bbc

    you forgot to include utility bills for option 3,4,5 LoL

  2. tr8dr

    actually no, the + XX$ power is utilities.

  3. John Carse

    What about bandwidth costs? And latency? How do these costs stand up against Amazon’s EC2 offering?

    I have been thinking about hosting a trading platform on EC2 when the time comes. The primary motivation for me was network latency and system /network availability.

    • tr8dr

      Hi, I was not focused on that. At least for compute intensive tasks I’m doing, aside from shipping original data, not much bandwidth usage. As this post is dated, I expect that the cost structure has changed. Amazon, for instance, has a hadoop based service that is much cheaper than their regular service. Unfortunately hadoop does not map well to my problems.

      • Dan

        This interesting .. why doesn’t Hadoop map well? I understand that map/reduce is more useful for text than streaming data, but can’t most optimizations be decomposed into e.g

        – Train on a subset e.g. one day (so map over days)
        – Keep statistics of how many instances there were
        – Combine parameters weighted by those statistics (so reduce)?

        HDFS does a nice job of managing a lot of data files, and Hadoop does a nice job of managing task execution, moving tasks to where the data is, keeping track of failures, etc..

        I’m currently thinking of abusing Hadoop as a file storage and job scheduling framework, and just implementing very coarse grained jobs using the above approach .. does this seem like a bad idea?

      • tr8dr

        Yes, you can do coarse-grained parallelism with hadoop and your example usage makes sense.

        Hadoop may have evolved since I last looked at it, but seemed to have a fairly limited tuple space concept and allowed interactions across nodes. I needed to have intermediate tuples applied into running nodes. Its been a while since I looked at it so don’t remember the limitations at the time.

        For a more general tuple-based coordination approach, look at Linda (the coordination language). This was the conceptual ancestor for javaspaces and many other coordination paradigms (including hadoop). Unfortunately hadoop dropped a lot of the flexibility allowing for inter-node interactions (if I remember correctly).

        My desire in possibly mapping my calibrations to hadoop was to taking advantage of the cheaper pricing available for hadoop runs. In reality, there are much richer parallel environments.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s