DNA Cost and Productivity Data, aka "Carlson Curves"

I have received a number of requests in recent days for my early DNA synthesis and productivity data, so I have decided to post it here for all who are interested. Please remember where you found it.

A bit of history: my efforts to quantify the pace of change in biotech started in the summer of 2000 while I was trying to forecast where the industry was headed. At the time, I was a Research Fellow at the Molecular Sciences Institute (MSI) in Berkeley, and I was working on what became the essay “Open Source Biology and Its Impact on Industry”, originally written in the summer of 2000 for the inaugural Shell/Economist World in 2050 Competition and originally titled “Biological Technology in 2050”. I was trying to conceive of where things were going many decades out, and gathering these numbers seemed like a good way to anchor my thinking. I had the first, very rough, data set by about September of 2000. I presented the curves that summer for the first time to an outside audience in the form of a Global Business Network (GBN) Learning Journey that stopped at MSI to see what we were up to. Among the attendees was Steward Brand, whom I understand soon started referring to the data as “Carlson Curves” in his own presentations. I published the data for the first time in 2003 in a paper with the title “The Pace and Proliferation of Biological Technologies”. Somewhere in there Ray Kurzweil started making reference to the curves, and then a 2006 article in The Economist, “Life 2.0”, brought them to a wider audience and cemented the name. It took me years to get comfortable with “Carlson Curves”, because, even if I did sort it out first, it is just data rather than a law of the universe. But eventually I got it through my thick skull that it is quite good advertising.

The data was very hard to come by when I started. Sequencing was still a labor intensive enterprise, and therefore highly variable in cost, and synthesis was slow, expensive, and relatively rare. I had to call people up to get their rough estimates of how much time and effort they were putting in, and also had to root around in journal articles and technical notes looking for any quantitative data on instrument performance. This was so early in the development of the field that, when I submitted what became the 2003 paper, one of the reviews came back with the criticism that the reviewer – certainly the infamous Reviewer Number 2 – was “unaware of any data suggesting that sequencing is improving exponentially”.

Well, yes, that was the first paper that collected such data.

The review process led to somewhat labored language in the paper asserting the “appearance” of exponential progress when comparing the data to Moore's Law. I also recall showing Freeman Dyson the early data, and he cast a very skeptical eye on the conclusion that there were any exponentials to be written about. The data was, in all fairness, a bit thin at the time. But the trend seemed clear to me, and the paper laid out why I thought the exponential trends would, or would not, continue. Steward Brand, and Drew Endy at the next lab bench over, grokked it all immediately, which lent some comfort that I wasn’t sticking my neck out so very far.

I've written previously about when the comparison with Moore's Law does, and does not, make sense. (Here, here, and here.) Many people choose to ignore the subtleties. I won't belabor the details here, other than to try to succinctly observe that the role of DNA in constructing new objects is, at least for the time being, fundamentally different than that of transistors. For the last forty years, the improved performance of each new generation of chip and electronic device has depended on those objects containing more transistors, and the demand for greater performance has driven an increase in the number of transistors per object. In contrast, the economic value of synthetic DNA is decoupled from the economic value of the object it codes for; in principle you only need one copy of DNA to produce many billions of objects and many billions of dollars in value.

To be sure, prototyping and screening of new molecular circuits requires quite a bit more than one copy of the DNA in question, but once you have your final sequence in hand, your need for additional synthesis for that object goes to zero. And even while the total demand for synthetic DNA has grown over the years, the price per base has on average fallen about as fast; consequently, as best as I can tell, the total dollar value of the industry hasn't grown much over the last ten years. This makes it very difficult to make money in the DNA synthesis business, and may help explain why so many DNA synthesis companies have gone bankrupt or been folded into other operations. Indeed, most of the companies that provided DNA or gene synthesis as a service no longer exist. Due to similar business model challenges it is difficult to sell stand alone synthesis instruments. Thus the productivity data series for synthesis instruments ends several years ago, because it is too difficult to evaluate the performance of proprietary instruments run solely by the remaining service providers. DNA synthesis is likely to remain a difficult business until there is a business model in which the final value of the product, whatever that product is, depends on the actual number of bases synthesized and sold. As I have written before, I think that business model is likely to be DNA data storage. But we shall see.

The business of sequencing, of course, is another matter. It's booming. But as far as the “Carlson Curves” go, I long ago gave up trying to track this on my own, because a few years after the 2003 paper came out the NHGRI started tracking and publishing sequencing costs. Everyone should just use that data. I do.

Finally, a word on cost versus price. For normal, healthy businesses, you expect the price of something to exceed its cost, and for the business to make at least a little bit of money. But when it comes to DNA, especially synthesis, it has always been difficult to determine the true cost because it has turned out that the price per base has frequently been below the cost, thereby leading those businesses to go bankrupt. There are some service operations that are intentionally run at negative margins in order to attract business; that is, they are loss leaders for other services, or in order to maintain sufficient scale so that the company can have access to that scale for its own internal projects. There are a few operations that appear to be priced so that they are at least revenue neutral and don't lose money. Thus there can be a wide range of prices at this point in time, which further complicates sorting out how the technology may be improving and what impact this has on the economics of biotech. Moreover, we might expect the price of synthetic DNA to *increase* occasionally, either because providers can no longer afford to lose money or because competition is reduced. There is no technological determinism here. Just as Moore's Law is ultimately a function of industrial planning and expectations, there is nothing about Carlson Curves that says prices must continuously fall monotonically.

A note on methods and sources: as described in the 2003 paper, this data was generally gathered by calling people up or by extracting what information I could from what little was written down and published at the time. The same is true for later data. The quality of the data is limited primarily by that availability and by how much time I could spend to develop it. I would be perfectly delighted to have someone with more resources build a better data set.

The primary academic references for this work are:

Robert Carlson, “The Pace and Proliferation of Biological Technologies”. Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science. Sep, 2003, 203-214. http://doi.org/10.1089/153871303769201851.

Robert Carlson, “The changing economics of DNA synthesis”. Nat Biotechnol 27, 1091–1094 (2009). https://doi.org/10.1038/nbt1209-1091.

Robert Carlson, Biology Is Technology The Promise, Peril, and New Business of Engineering Life, Harvard University Press, 2011. Amazon.

Here are my latest versions of the figures, followed by the data. Updates and commentary are on the Bioeconomy Dashboard.

Creative Commons image licence (Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0)) terms: 

  • Share — copy and redistribute the material in any medium or format for any purpose, even commercially.

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

  • NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.

Here is the cost data (units in [USD per base]):

Year DNA Sequencing Short Oligo (Column) Gene Synthesis
1990 25

1991


1992
1
1993


1994


1995 1 0.75
1996


1997


1998


1999

25
2000 0.25 0.3
2001

12
2002

8
2003 0.05 0.15 4
2004 0.025

2005


2006 0.00075 0.1 1
2007

0.5
2008


2009 8E-06 0.08 0.39
2010 3.17E-06 0.07 0.35
2011 2.3E-06 0.07 0.29
2012 1.6E-06 0.06 0.2
2013 1.6E-06 0.06 0.18
2014 1.6E-06 0.06 0.15
2015 1.6E-09

2016 1.6E-09 0.05 0.03
2017 1.6E-09 0.05 0.02

Here is the productivity data (units in [bases per person per day] and [number of transistors per chip]) — note that commercially available synthesis instruments were not sold new for the decade following 2011, and I have not sat down to figure out the productivity of any of the new boxes that may be for sale as of today:

year Reading DNA Writing DNA Transistors
1971

2250
1972

2500
1974

5000
1978

29000
1982

1.20E+05
1985

2.75E+05
1986 25600

1988

1.18E+06
1990
200
1993

3.10E+06
1994 62400

1996


1997 4.22E+05 15320
1998

7.50E+06
1999 576000
2.40E+07
2000
1.38E+05 4.20E+07
2001


2002


2003

2.20E+08
2004

5.92E+08
2005


2006 10000000

2007 200000000 2500000
2008

2000000000
2009 6000000000

2010 17000000000

2011

2600000000
2012 54000000000