What Should Be The Long-Term Fate of Scientific Databases?

One of the problems that all scientific databases face is funding. From the funders’ perspective, they are being asked to make a commitment in perpetuity: with rare exceptions, there is always more work to be done. At the same time, if the funding disappears, then the database vanishes, or becomes increasingly irrelevant (meaning the money previously spent loses its value). I bring this up because NHGRI (an institute within NIH) is taking a very hard look at five databases that it supports (boldface mine):

Researchers are protesting plans to reorganize and cut the budgets of five model-organism databases supported by the US National Institutes of Health (NIH).

The databases host a rich trove of information on species of budding yeast, flies, roundworms, zebrafish and mice that hastens progress in biology and human disease, according to an open letter signed by 46 researchers, addressed to NIH director Francis Collins and NIH institute heads. It was published on 21 June on the Genetics Society of America website.

The databases house information about the genetics and biology of model organisms, such as data on genes, proteins, gene expression and numerous other traits. They also host analysis tools.

“NIH’s support for the [model organism databases] over the last two decades has enabled a bevy of pivotal discoveries that lie at the true heart of the NIH mission. We urge you to continue the far-sighted policy of support for vital research infrastructure,” states the letter, which will be presented to Collins on 14 July at the Allied Genetics Conference in Orlando, Florida.

Officials with the National Human Genome Research Institute (NHGRI) told researchers in May that the agency plans to gradually cut support for the databases by 30–40% beginning next year. For the current fiscal year, the agency has dedicated US$17.6 million to support the 5 databases, each of which is run independently.

Here’s the problem:

The agency is hoping to get other NIH institutes to help to support the databases. But at a May meeting in Rockville, Maryland, NHGRI officials asked principal investigators in charge of the five databases to submit a proposal within several months for integrating some of their features. The agency would like to merge administration of the databases completely over the next few years.

“I know there is a lot of concern on the part of the community, but there is a need for the broad scientific community to support these resources better, and that is something that our institute currently cannot afford,” says Valentina Di Francesco, programme director for computational genomics and data science at the NHGRI. And, she adds, the independent operation of each database is confusing for users, who must navigate five separate user interfaces and sets of tools if they want to query information on all five species.

Those databases are important. But at this point, they are probably more useful to NIGMS/NIH (general medical sciences) and NSF-funded researchers (and the NSF is even more cash-strapped). Which presents a problem for NHGRI obviously. Because $17.6 million isn’t chicken-feed; it’s about over three percent of NHGRI’s budget. In an era of no new spending increases, if institutes want to fund new areas, something else has to get cut.

I don’t have any easy solutions (or hard ones, for that matter), but this is a real problem for database maintenance. No funder wants to–or is able to–make commitments in perpetuity.

