2009-05-26

Benchmarking Amazon EC2 with GHC

My personal computers are pretty old and/or slow. I have an old PowerBook G4 and a newish EEE PC. The PowerBook was top of its class when I got it with most options maxed out. Alas, that was five years ago. The EEE PC is by definition not a top performer, nor does it try to be. I find that when the two machines perform similarly in day-to-day tasks, at least when the EEE PC is in “Super performance” mode.

Truth is that for most of the mundane stuff I do these two machines perform acceptably. I’m obviously not going to be watching any 1080p movies on them, or enjoying the latest games (from a productivity standpoint I’m not sure these are such bad things), but most everything else works fine.

Where I feel the performance does hurt me is when compiling with GHC (or any compiler for that matter, it’s just that GHC is the one I use most). Often I spend way to much of my precious little private developer time waiting for the compiler to finish. This is in stark contrast to the situation at work where I run GHC on a pretty zippy Dell PowerEdge Blade Server.

In the near future I expect to be spending more time developing at home and want to be able to do so more efficiently, preferably on par with the situation at work. I could obviously buy shiny new hardware but being a miser (in case you couldn’t already tell based on my hardware) I’m looking for alternatives that would allow me to avoid or defer a hefty up-front investment. One such alternative I’m considering is to rent compute capacity in the Amazon Elastic Compute Cloud (EC2).

EC2 compute capacity is sold in the form of instances at an hourly rate ranging from $0.10 to $0.80 depending on capacity/performance plus some small-change for bandwidth and persistent storage. While there are a couple of hurdles to overcome in order to leverage EC2 as a development workstation the first thing I want to do is to make sure it is a goal worthy of pursuing in the first place, i.e. will I get the desired performance gains at a reasonable price?

To at least begin to answer this question I’ve done some informal benchmarking of the aforementioned systems, excluding the pricier EC2 instances. All the regular benchmarking caveats apply and to reinforce the unscientificity of it all I’m not going to bother providing complete specs for the systems. Here are the fundamentals:

  • Apple PowerBook G4: 1.5 GHz PowerPC G4 processor, 2 GB RAM, 5400 rpm HD.
  • Asus EEE PC 900HA: 1.6 GHz Intel Atom processor, 1 GB RAM, 4200 rpm HD.
  • Dell PowerEdge 1855 Blade Server: Two single-core 3.2 GHz Xeon processors, 2 GB RAM.
  • Amazon EC2 Small Instance: 1 EC2 Compute Unit (1 virtual core), 1.7 GB RAM, $0.10 per hour.
  • Amazon EC2 High-CPU Medium Instance: 5 EC2 Compute Units (2 virtual cores), 1.7 GB RAM, $0.20 per hour.

According to Amazon “one EC2 Compute Unit provides the equivalent CPU capacity of a 1.0–1.2 GHz 2007 Opteron or 2007 Xeon processor.”

I figure the results are probably more interesting that the details of the tests so here they are, the systems are ordered by increasing performance which happened to be consistent across the tests:

Benchmark results, times in seconds, shorter times are better.
astro-tables buildhighlighting-kate buildfad test suite
PowerBook G4292(848)28
EEE PC29164318
EC2 Small17151915
Dell PowerEdge752606
EC2 Medium551724

The EEE PC was in “Super Performance” mode during the tests and the PowerBook was at its highest CPU speed. All times are the “real” time as measured by the Unix time command and lower numbers are naturally better.

As can be seen an Amazon EC2 Small instance is only marginally faster than the EEE PC. An EC2 High-CPU Medium instance on the other hand is significantly faster than the zippy Dell PowerEdge Blade server. Is either one a good deal? Good question, I think a case could be made either way depending on your priorities but I’m not going to tackle that today.

If you care about the details of the tests read on, if not please move on to your next blog of choice!

astro-tables build

This package currently consists of a single automatically generated 4000-line monster of a module1. The code is pretty straight-forward but the module takes ages to compile, almost certainly due to me giving the type checker an unnecessarily hard time. I have a trivial rewrite on my todo-list which I expect will shorten the compilation time dramatically but the current form comes in kind of handy for the purposes of this benchmark. The Git repo is git://github.com/bjornbm/astro-tables.git and the commit used in the benchmarking was e63b8978833878526870b2101697197ff64af593. I made sure the dependencies were already installed and ran time cabal install.

highlighting-kate build

From recent memory I knew that John MacFarlane’s highlighting-kate package has a hefty number of modules (the majority of which are also automatically generated) that take a fair amount of time to compile. I downloaded version 0.2.4 from hackage, made sure all dependencies were already installed, and ran time cabal install --flags=executable.

I ran into one snag with this test: the build wouldn’t complete with GHC 6.10.3 on the PowerBook G4 due to some problem with pcre-light (which tends to give me headaches on pretty much every platform). This particular headache2 I was unable to resolve and had to run the test using GHC 6.10.1 on the PowerBook G4.

fad test suite

Finally I did a runtime performance (as opposed to compilation) benchmark: running the test suite of the fad library. The Git repo is git://github.com/bjornbm/fad.git and the commit used was cd2965a6741291570930e4bf6e9f8f9ab64ccadd. I ran ghc --make Test and then time ./Test.


  1. An implementation of the 678 lunisolar terms and 687 planetary terms of the IAU 2000A Precession-Nutation Model.

  2. gcc: Internal error: Virtual timer expired (program cc1)

6 comments:

  1. We have quoted your conclusion and also created a graph image from your numbers on www.cloudclimate.com's blog

    ReplyDelete
  2. If you want to do it right, you have to take the time to push and pull the data from EC2 in consideration.

    ReplyDelete
  3. Anonymous: My thought was to use an EC2 instance as a development workstation (via e.g. terminal or NoMachine NX) rather than as a build server. Thus there would be no push/pull involved in compiling.

    ReplyDelete
  4. I also have an eee pc and considered using EC2 for compilation.

    I chose not to use EC2 because it seemed you pay the hourly rate whether your compiling or not. I found no way to pause the instance, and when you stop the instance the state is lost.

    I wondered if Amazon allows resellers to buy an instance and use it to sell X instances with the pause functionality.

    ReplyDelete
  5. Anonymous #2: By using Elastic Block Storage (EBS) you can get persistent state pretty transparently (except for the root volume). But you would still want to just leave your instance up during the entire programming session.

    ReplyDelete
  6. Thank you for sharing the information here. Its much informative and really i got some valid information. You had posted the amazing article.

    ReplyDelete