So, Nature Reviews Genetics has an article, “Computational solutions to large-scale data management and analysis“, which claims the following in the abstract (boldface mine):
Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle our big data problems.
I’ve dealt with cloud computing before, but what I want to address is the $5,000 for hundreds of gigabases (that’s hundreds of billions of bases; the human genome is about 3 gigbases, times two cuz we’re stupid diploids).
So is that $5,000 figure correct?
Cuz there are a lot of things that go into the cost of sequencing:
1) Buying the machine. The current high-throughput machine goes for about $500,000 (that’s actually pretty generous). If you amortize the cost of the machine over two years–let’s call that 100 weeks, and we can generate a couple of hundred gigabases (200 Gb) per week, then… (Mad Biologist takes off shoes to do the countings) …each ~200 Gb costs around $5,000. Unless a wealthy donor paid for the machine, some state or federal funding agency paid for this. Your grant might not have to pay for this, but a taxpayer somewhere did.
2) Paying people to run the machine. By run, I’m also including the basic bioinformatics support (not downstream analysis stuff, just getting files to people,etc.). Let’s say this is two people at $150,000 per year (what a bargain!). Per above, tack on another $3,000 per year. Again, if your institution has other funding to pay for this, or eats the cost itself, this isn’t reflect in the price, but it is a cost that someone, somewhere is bearing.
3) Embiggening your processes.
Just like real men go to Baghdad, real men also sequence lots. If you’re a small center, you might only have a few machines, and you don’t need a lot of (or any) infrastructure to manage this. You also don’t need to go heavy into robotics or other automated processes. If you move to a large scale, this will require some infrastructure building costs (on the other hand, if you’re simply moving your infrastructure from one technology to another, these costs are rather small–the personnel are already in place).
I’m willing to go along with the costs of reagents plus minimal ancillary costs, such as electricity, running around $5,000. Maybe. And if that’s all you have to write your purchase or billing order for, then the price of sequencing is around $5,000. But the cost, including the externalized costs, is much more.
Cited article: Schadt EE, Linderman MD, Sorenson J, Lee L, & Nolan GP (2010). Computational solutions to large-scale data management and analysis. Nature reviews. Genetics, 11 (9), 647-57 PMID: 20717155
Just to supply a bit of info, if you work at a big research U (as I did this year on sabbatical) you can probably get HT sequencing done (Solexa or 454) for about $5k, including reagent costs, if you amortize of a bunch of runs (the reagents are sold in kits of 10 rxns, at least for the Solexa). This doesn’t include the time for a lab student/post-doc to prep the samples, or the analysis time (minimum of 20h at $40 per h, if I remember right). Then there’s figuring out what to do with the data, which is the problem that most of the early adopters outside the sequencing community seem to be having right now 🙂
Scientists are notoriously bad at costing things. My boss will say X will cost $30K and I say “are you crazy? It will be at least $800k and 2 years to develop”. I’d be lucky if the $30k even covered the cost of components, let alone the gazillion other things. You’d think people might listen to me after 20 years of doing that sort of work – but nooo … I’m being obstructive when I say “it will cost a hell of a lot more” or “it can’t be done in that time with the resources we have”.
I’d love to get my hands on an Ion Torrent Sequencer. As it is, I can get roughly half a gig for ~15K*. I don’t have a machine, and don’t want one even if it were handed to me, so that price is acceptable and all part of doing business!
*Includes sample prep which for me (dirt and poop) can be a bit of a headache when trying to get it of sufficient quality for high-throughput sequencing. I’d rather let them do it for the nominal fee they charge me for it.
Probably the most honest price for sequencing is what a third party is willing to charge you to sequence something. I’ve recently been quoted a bit less than $25K for a 30X human genome (I supply DNA; they deliver FASTQ), — or about $3.6K/Gb.
Now, that doesn’t include analysis, but for communicating to the green eyeshade folks that isn’t important to me (since we’ve sunk the cost into the server & the rest is my time).
Costs also get funny in an academic environment because of how they are shared. Core facilities are paid to buy the equipment & are expected to run it, so you really can get things much cheaper than what I cost before — just as you point out many costs are coming out of someone else’s wallet. If both are being funded by the public sector, that’s really a shell game.
While the material costs of sequencing a human genome might conceivably reach $1000 in a few years, what is really relevant is how much it is going to cost a person who want’s their own genome sequenced? Furthermore, what is it going to cost to get the DNA sequence translated to yield personal information of practical value? Governments and health insurance companies are not going to pay these costs for the average citizen.
As you have pointed out, the purchase and maintenance of equipment for $1000 genome analyses will not be cheap and these and other overhead costs will have to be factored into the provision of any service to provide a full genomic analysis. The labour time to prepare a full report of a human genome in a meaningful way, even if major aspects are automated, will be staggering. The average lawyer or accountant charges about $300 to $400 per hour. What will a genetic consultant charge?
When one further factors in inflation, which is very likely to markedly increase soon, perhaps the costs of requisite sequencing chemicals and other materials might never get to as low as $1000. In any event, I seriously doubt that the true full loaded costs will ever reach this arbitrary number.
Moreover, as the limits of scalability are pushed with gene sequencing, the error rate is likely to markedly increase. This is especially problematic if it is the detection in mutations in DNA sequences that is the prime objective for sequencing whole genomes of individuals in the first place. The potential rate of false positives and negatives is staggering considering that there are about 2.9 billion base pairs in a human genome even with 30X coverage.
My guess is that it may eventually cost $10,000 for a quality genomic analysis, but even at this price, only the very wealthy or fool hardy would even contemplate such a personal expenditure.
@ “S. Pelech – Kinexus”
This can be done in a standardized way by a computer program that doesn’t charge $300/hour. It already is, with services like 23andMe, for SNP data.
My guess is that this will happen next year, if it hasn’t already. And $10,000 is less than many medical procedures, at least in the US.