So, Nature Reviews Genetics has an article, “Computational solutions to large-scale data management and analysis“, which claims the following in the abstract (boldface mine):
Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle our big data problems.
I’ve dealt with cloud computing before, but what I want to address is the $5,000 for hundreds of gigabases (that’s hundreds of billions of bases; the human genome is about 3 gigbases, times two cuz we’re stupid diploids).
So is that $5,000 figure correct?
Cuz there are a lot of things that go into the cost of sequencing:
1) Buying the machine. The current high-throughput machine goes for about $500,000 (that’s actually pretty generous). If you amortize the cost of the machine over two years–let’s call that 100 weeks, and we can generate a couple of hundred gigabases (200 Gb) per week, then… (Mad Biologist takes off shoes to do the countings) …each ~200 Gb costs around $5,000. Unless a wealthy donor paid for the machine, some state or federal funding agency paid for this. Your grant might not have to pay for this, but a taxpayer somewhere did.
2) Paying people to run the machine. By run, I’m also including the basic bioinformatics support (not downstream analysis stuff, just getting files to people,etc.). Let’s say this is two people at $150,000 per year (what a bargain!). Per above, tack on another $3,000 per year. Again, if your institution has other funding to pay for this, or eats the cost itself, this isn’t reflect in the price, but it is a cost that someone, somewhere is bearing.
3) Embiggening your processes.
Just like real men go to Baghdad, real men also sequence lots. If you’re a small center, you might only have a few machines, and you don’t need a lot of (or any) infrastructure to manage this. You also don’t need to go heavy into robotics or other automated processes. If you move to a large scale, this will require some infrastructure building costs (on the other hand, if you’re simply moving your infrastructure from one technology to another, these costs are rather small–the personnel are already in place).
I’m willing to go along with the costs of reagents plus minimal ancillary costs, such as electricity, running around $5,000. Maybe. And if that’s all you have to write your purchase or billing order for, then the price of sequencing is around $5,000. But the cost, including the externalized costs, is much more.
Cited article: Schadt EE, Linderman MD, Sorenson J, Lee L, & Nolan GP (2010). Computational solutions to large-scale data management and analysis. Nature reviews. Genetics, 11 (9), 647-57 PMID: 20717155