A Few Thoughts on Rapid Genome Sequencing and The Archon Prize

The December, 2006 issue of The Scientist has an interesting article on new sequencing technologies.  "The Human Genome Project +5", by Victor McElheny, contains a few choice quotes.  Phil Sharp, from MIT, says he, "would bet on it without a questionthat we will be at a $1,000 genome in a five-year window."  Presently we are at about US$10 million per genome, so we have a ways to go. It's interesting to see just how much technology has to change before we get there. 

The Archon X-Prize for Genomics specifies sequencing 100 duplex genomes in 10 days, at a cost of no more than US$10,000 per genome.  In other words, that is roughly 600 billion bases at a cost of microdollars per base.  Looking at it yet another way, winning requires 6000 person-days at present productivity numbers for commercially available instruments, whereas 10 days only provides 30 person-days of round-the-clock productivity.

I tried to find a breakdown of genome sequencing costs on the web, and all I could come up with is an estimate for the maize genome published in 2001.  I'll use that as a cost model for state of the art sequencing of eukaryotes (using Sanger sequencing on capillary based instruments).  Bennetzen, et al., recount the "National Science Foundation-Sponsored Workshop Report: Maize Genome Sequencing Project" in the journal Plant Physiology, and report:

The participants concurred that the goal of sequencing all of the genes in the maize genome and placing these on the integrated physical and genetic map could be pursued by a combination of technologies that would cost about $52 million. The breakdown of estimated costs would be:

  • Library construction and evaluation, $3 million
  • BAC-end sequencing, $4 million
  • 10-Fold redundant sequencing of the gene-rich and low-copy-number regions, $34 million
  • Locating all of the genes on an integrated physical-genetic map, $8 million
  • Establishing a comprehensive database system, $3 million.

From the text, it seems that decreases in costs are built into the estimate.  If we chuck out the database system, since this is already built for humans and other species, we are down to direct costs of something like $49 million for approximately 2.5 megabases(MB).  The Archon prize doesn't specify whether competitors can use existing chromosomal maps to assemble sequence data, so presumably all the information is fair game.  That lets us toss out another $8 million in cost.  The 10-fold redundant sequencing is probably overkill at this point, but I will keep all those costs because the Archon prize requires an error rate of no more than 1 in 100,000 bases; you have to beat down the error regardless of the sequencing method.  Rounding down to $40 million for charity's sake, it looks like the labor and processing associated with producing the short overlapping sequences necessary for Sanger sequencing account for about 17.5 percent of the total.  These costs are probably fixed for approaches that employ shotgun sequencing.

Again using the Archon prize as a simple comparison, that's US$1.75 million just to spend on labor for getting ready to do the actual sequencing.  In 1998, the FTE (full time equivalent) for sequencing labor was US$135,000.  If you assume the dominant cost for preparing the library and verifying the BACs is labor, you can hire about 15 people.  This looks like a lot of work for 15 people, and, given the amount of time required to do all the cloning and wait for bacteria to grow, not something they can accomplish even within the 10 days alloted for the whole project.

The other 82.5 percent of the $10 million you can spend on the actual sequencing.  The prize guidelines say you don't have to include the price of the instruments in the cost, but just for the sake of argument I'll do that here.  And I'll mix and match the cost estimates from the maize project for Sanger sequencing with other technologies.  The most promising commercial instrument appears to be the 454 pyrosequencer, at $500,000 a pop, looking at its combination of read length and throughput, even if they don't yet have the read length quite high enough yet.  If you buy 16 of those beasties, it appears you can sequence about 1.6 GB a day, about a factor of 40 below what's required to win the Archon prize.  Let's say 454 gets the read length up to 500 bases, then they are still an order of magnitude shy just on the sequencing rate, forgetting the sample prep.

Alternatively, you could simply buy 600 of the 454 instruments, and then you'd be set, at least for throughput.  Might blow your budget, though, with the $300 million retail cost.  But you could take solace in how happy you'd make all the investors in 454.