454's Microfabricated Pyrosequencer

Today, Nature published online an article entitled, "Genome sequencing in microfabricated high-density picolitre reactors," by Margulies et al.  The paper describes embedding beads coated with DNA in 1.6 million wells etched into the end of a fiber-optic slide, where the slide is produced by repeatedly folding and drawing a fiber-optic cable.  Each well serves as a reaction chamber for the sequencing-by-synthesis method known as Pyrosequencing.  The research utilized an instrument built by 454 Life Sciences.

My email is ringing off the hook today with questions about how this fits into my estimates of sequencing and synthesis productivity ("Carlson Curves").  Thanks for your interest, everyone.

A few comments.  The first thing to note about the article is that the authors state they sequenced "25 million bases, at 99% or better accuracy, in one four hour run."  So at 6.25 million bases per hour, they appear to be doing quite well compared to a Sanger-based 96-capillary instrument, which the authors assert reads out 67,000 bases per hour.

Digging into the text a bit, we find that the average length of the DNA the authors were able to read was about 100 bases, which they note is far shorter than the ~750 bases standard in Sanger sequencing.  The article also notes that prepping the DNA samples required 10 person-hours; 4 hours for fragmenting genomic DNA into bite-sized pieces and generation library from those pieces, and 6 hours to put that DNA on beads and then put the beads on the sequencing chip.

So, that's roughly 14 hours from start to getting sequence data, which puts the productivity number at about 10 million bases per person per day.  This is better than running a couple of capillary-based instruments, it's true, but there is still an enormous amount of skilled labor in that 10 hours of sample preparation.  If you have look at the supplementary information, documents s1 and s3 in particular, the processing is by no means trivial.  Actually, the enzymatic rigmarole is quite impressive.  But I wouldn't want to do it myself.  Looking ahead, I don't see any reason it can't be automated.  Given time, patience, and some effort at the microfluidics, the whole process should require only minimal human attention.  That will definitely make an impact on productivity.  No doubt 454 is planning for this eventuality.  The upshot is that this paper puts a point, more-or-less, right on my previously published curves.  It is consistent with progress made with previous technologies, but is actually a bit slower than the estimate Mostafa Ronaghi gave me in 2003.  That's life.

Here's a bit more info.  The New York Times is reporting that;

Jonathan Rothberg, board chairman of 454 Life Sciences, said the company was already able to decode DNA 400 units at a time in test machines. It was working toward sequencing a human genome for $100,000, and if costs could be further reduced to $20,000 the sequencing of individual genomes would be medically worthwhile, Dr. Rothberg said.

We'll see.  We are still a long way from the Thousand Dollar Genome, and this paper appears to be keeping the pace.  All in all, it looks promising, though I wince at the current $500,000 instrument cost.  I don't have enough information at hand to make my own estimates of per base sequencing cost, and I haven't had a chance to contact anyone at the company to suss out the productivity issues better.  I'll update this if and when such conversations take place.

UPDATE (5 Aug 05):  The $500,000 per instrument cost comes from the NYT article:

The Joint Genome Institute, a federal genome sequencing center in Walnut Creek, Calif., has ordered one of 454's $500,000 sequencing machines but has not yet installed it. Paul Richardson, the institute's head of technology development, said the new approach "looks very, very promising" and could reduce sequencing costs fourfold.

The machine's limitation is that at present it can only read DNA fragments 100 units or so in length, compared with the 800-unit read length now attained by the Sanger-based machines. The shorter read length makes it harder to reassemble all the fragments into a complete genome, Dr. Richardson said, so although microbial genomes can be assembled with the new method, mammalian genomes may be beyond its reach at present.

Dr. Fraser, director of the Institute for Genomic Research in Rockville, Md., also said that the new machine's short read lengths "limit its overall utility at this point."