Zika, Data Release, and the Tragedy of the Commons

This does not bode well for public health-related sharing (boldface mine):

When researchers in Brazil posted four Zika virus genome sequences in the online repository GenBank on 26 January, they were complying with a call for scientists to openly release their data during public-health emergencies. By 10 February, the information had been used by Slovenian researchers for their own Zika paper in the New England Journal of Medicine (NEJM) — apparently, a textbook example of the power of rapid, open data-sharing.

But the process didn’t go entirely smoothly. Oliver Pybus, an evolutionary and infectious-disease biologist at the University of Oxford, UK, who works with the Brazilian group, has complained that the NEJM paper did not adequately credit the original data-providers when it only included the GenBank accession number for the data. And Pybus says that he is concerned that this lack of formal recognition could dissuade others from rapidly sharing data during an outbreak.

“The very first big Zika virus paper in the New England Journal of Medicine has just created exactly the opposite incentive for groups in Brazil that we want to create. We want them to feel confident they can put their data immediately online without any possible disadvantage to them,” Pybus says. The authors of the NEJM paper waited to release their own data until their paper was published, he notes.

The obvious solution would have been to contact the authors and make them middle authors on the paper–if their data were critical to the analysis (and selecting the proper isolates for analysis is experimental design), then they should be co-authors.

(Aside for the non-scientists: Publications are not only used for determining tenure, but also–and arguably more importantly–as proof of productivity for grants. Without funding, your career can crater–usually, science isn’t cheap. ‘Scooping’ someone can have real professional consequences. That’s why this post is hopelessly naive).

Admittedly, I’m old enough to remember the Ft. Lauderdale Agreement, but Pybus is absolutely correct: if people are afraid of being scooped, then they won’t make their data public. At some point, microbiological public health cease to be current, and become ‘stale’. That doesn’t mean those data aren’t useful for other purposes, but for public health intervention purposes, they aren’t very helpful (other than for ‘reconstructing the apocalypse’).

This isn’t just Zika virus either. There was an mcr-1/colistin resistance* paper in The Lancet where the authors didn’t appear to contact the sequence depositors (who had deposited the genome a couple of months earlier). Would it have killed them to offer a middle-authorship to the genome producers?**

I’m also not surprised this happened in NEJM or by NEJM’s statement that the scientists should resolve this issue–NEJM has always been a step behind on how to handle genomic data, nor, in the past anyway, has it really understood ‘genomics culture.’

One partial solution is journals like Genome Announcements, which, well, announce that you’ve produced a genome (or genomes), which can then be cited. While that solves part of the problem, these aren’t high-profile ‘glamour pubz’***. Also, it still takes a short while to get even these articles written and published, and, when weeks or days matter, this isn’t the same as immediate release.

I wish I could say something like “Researchers will have to behave better”, but enough of them won’t which could turn this into a tragedy of the commons problem. I wish I could have confidence in journals and reviewers to police this–and some do–but, as I noted, not all journals ‘are aware of all genomics traditions’ (to use a phrase). I’m also not sure if funders will police this (though some would be willing to try).

I don’t want to call this a potential crisis, but it’s definitely a problem.

*Colistin is the last line antibiotic against certain bacterial infections. Tracking its spread, its sources, and so on is obviously important.

**It’s all the more ridiculous since I described this on my fucking blog a couple of weeks before the paper.

***Depending on the importance of data produced, arguably journals like this should be seen as higher-profile than they are.

  1. Hello Mike (whoever you may be), Oliver Pybus here. Thanks for the post. Unfortunately, the Nature News article highlighted the issue I was less concerned about, specifically credit/citation of publicly available genomes on GenBank. I’ve emailed with the NEJM authors we’ve resolved that issue between us.

    Much more important are the incentives on individual researchers to share or hoard data that may be of immediate relevance during a public health emergency. I believe the journals have a responsibility to encourage best practise. The more powerful the journal is, the more responsibility they should shoulder. All that NEJM (and more recently, Lancet ID) need to do is ask authors to release sequences on GenBank at submission, not after acceptance or publication. That would immediately incentivise all groups to do likewise. Instead, the most powerful journals have in effect said, go ahead, use others’ data, don’t reciprocate and share your own, we’ll still publish your work. How can this possibly support “data sharing”?

