Can Science Be Open When Resources (and Funds) Are Private?

Michael Nielsen has a Wall Street Journal op-ed about open science that is making the rounds in the science bloggysphere. Before I get to why I think Nielsen is wrong–at least when it comes to biology–I support open science. More importantly, I’m funded to do it. When, at Major Sequencing Center, we sequence a bunch of bacterial genomes, we are required to release those genomes rapidly and make them publicly available–we can’t hold them for months back so our collaborators can get first dibs. So I’ve got some open science bona fides. OPEN SCIENCETISMZ!! I HAZ THEM!!

But, in my case, we can do this because our funding incentives are different: the funding agency wants us to produce data for the larger scientific community, not do a bunch of specific R01-like grants. I’m sure Nielsen’s anecdote is accurate, but I think this is an unfortunate example, since genomics is actually more open than most of the biological sciences:

As one biologist told me, he had been “sitting on [the] genome” for an entire species of life for more than a year. A whole species of life! Just imagine the vital discoveries that other scientists could have made if that genome had been uploaded to an online database.

So anyway, let’s talk about incentives, and then move to funding. Incentives are important:

For the large sequencing centers, most of the projects are geared towards genome production. That is, the funding agency assesses whether or not benchmarks for sequence (and assembly and annotation) quality have been met in the time frame expected. To put it more crassly, renewal of funding is not primarily determined by manuscript output. Renewal of funding is determined by genome output. Yes, publications by the center are included in renewals, although ‘prestigious’ publications by other groups not associated with the center can also matter. And Dooling is right: often the sequencing centers are the only groups with the bioinformatics and analytical resources and know-how to make sense of the data, so, in reality, the centers end up publishing papers using the data.

But for the large centers, this is essentially contract work: the funding agency has determined that a certain amount of genomic data is required to aid other scientists in one or more disciplines, and the center is obligated to deliver these data. That’s what pays the bills.

The smaller centers often do not have these arrangements. The funding agency treats this as a typical research grant. There are specific aims and hypotheses designed to address a particular research goal. But more importantly, these grants are not structured with the expectation that the funded group will rapidly deliver a set of data to a wider community. The incentive structure is that, by the end of the grant, the researchers will have addressed some specific questions. To be crass again, the ability to renew the grant (or leverage it into another grant) is determined by publication output at the time of grant renewal, which can be several years. This creates an incentive to not share data, often to the detriment of the field as a whole.

That is what is operative here, and Nielsen stumbles into it:

If you’re a scientist applying for a job or a grant, the biggest factor determining your success will be your record of scientific publications. If that record is stellar, you’ll do well. If not, you’ll have a problem. So you devote your working hours to tasks that will lead to papers in scientific journals.

Even if you personally think it would be far better for science as a whole if you carefully curated and shared your data online, that is time away from your “real” work of writing papers. Except in a few fields, sharing data is not something your peers will give you credit for doing.

The problem is much, much worse than not finding time to write papers. Peer-reviewed papers, to funding agencies, are the mark of good research (whether that should the case is an entirely separate post). They are currency: more papers increases the likelihood of funding. In biology, generating the data undergirding those peer-reviewed papers is time-consuming and expensive. You don’t want, and more importantly, can’t afford to have someone swooping in and publishing ‘your’ data first. This isn’t idle speculation on my part: I know of several cases where smaller groups did some genomic sequencing, released the data, and then had a paper scooped by another group (often a high-powered analysis group focused solely on genetic analysis of sequence data). When these scooped investigators then try to convince funders that they’ve been productive, they have one less paper to show for their efforts (and these efforts are expensive and time consuming).

Then there’s the notion of competition, or competing labs. If you make ongoing work open to all, not only do you lose out in publication, but you’re giving away your next set of experiments, often in a crowded field. I’ve seen scooping here too, where a smaller lab presents work, only to have a much larger lab head think, “We could do that”, mobilize his resources and then scoop the smaller lab.

Here’s the problem: biology is expensive, and so we need funding agencies (private or public). It’s not just some dudes (and dudettes) hacking away on a computer. Thanks to the need for large amounts of funding (a small NIH-funded lab spends $250,000 per year, and the university typically collects an additional $100,000, give or take), funders need some sort of way to assess if a lab has been productive. In most cases, that assessment revolves around peer-reviewed publication. Publications translate into funding dollars, without which you can not be a research biologist*.

If you want to fix this problem, exhorting scientists with these recommendations isn’t enough:

A good start would be for government grant agencies (like the National Institutes of Health and the National Science Foundation) to work with scientists to develop requirements for the open sharing of knowledge that is discovered with public support. Such policies have already helped to create open data sets like the one for the human genome. But they should be extended to require earlier and broader sharing. Grant agencies also should do more to encourage scientists to submit new kinds of evidence of their impact in their fields—not just papers!—as part of their applications for funding.

Until the incentive system is fundamentally changed, with the removal of peer-reviewed publications as evidence of productivity, we won’t see a whole lot of open science.

*Of course, there is good biology that can be done with limited funding, but the high-powered stuff, as well as supporting graduate students and post-docs, requires serious money. Some at teaching institutions can be very productive if they are able to tap into the limited funding for undergraduate training (although this access increases if you can piggyback off a research grant…).

This entry was posted in Funding, Genomics. Bookmark the permalink.