An Issue Equal to Scientific Fraud: Bad Editing and Reviewing

The Mad Biologist, of course, is never a bad reviewer…

Every so often, there’s a spate of articles about scientific fraud (here’s a recent entry). I don’t mean to downplay the issue–it is serious. In my daily work though, I encounter far more issues with poor reviewing and editing: that is, errors*. One of the things that keeps me off the streets and out of trouble is bioinformatic curation, which might (?) sound glamorous but is really nothing more than reading scientific literature closely and then extracting useful information to be added to databases and used in various software programs. For example, a paper might describe a mutation in a gene that confers resistance to an antibiotic, so I need to figure out what the mutation is and if it’s actually worth incorporating: I might not agree with the assessment of the authors (e.g., what they’re calling ‘resistance’ probably shouldn’t be considered resistance), or there might be some other data issue.

Let me give you a few, somewhat anonymized, examples:

  1. One paper described four different mutations that conferred resistance to an antibiotic–this was not some ‘throwaway’ point in the supplemental materials, but the primary point of the paper. Two had incorrectly identified locations. Because they had submitted the reads to NCBI (the U.S. government’s publicly available sequence repository), I was able to reassemble the genomes these results were derived from, and identify the correct positions. Both were clear typographical errors (e.g., position 82 instead of 92), and could not be attributed to other things (e.g., for the cognescenti, this isn’t an issue of quibbling about start sites).
  2. One paper described a series of ‘new’ genes (and they conferred the expected phenotypes) that were completely identical to existing genes except for the last twenty amino acids or so. Both visual inspection of the sequences and the reassembly of the genomes in which they were found showed these were obvious frameshift mutations, and not ‘novel’ at all. That is, this was a clear-cut case of assembly error.
  3. Another paper had mutations that did not align to the reference, but in a weird way: they were off by 1, 4, 5, and 6 as one progressed from the 5’end (the ‘front’ of the gene) to the 3′ end. What (I think) happened is, when they downloaded the sequence, they did not have it in a single line (e.g., ATGCATTCCC…). Instead, there were line breaks in the standard NCBI format of 70 characters per line, and they had ‘off by one errors’: the off by 1 mutation occurred on the second line, the off by 4 mutation occurred on the fifth line, and so on. Again, this is a very simple set of errors.
  4. Yet another case involved a position on a sequence, let’s say position 3,554, and the position was referred to nearly two dozen times in the paper (it was a key point of the paper), but it was clear from the only figure in the paper that it was position 8,554–and the deposited sequence backed that up.

I’ll stop with the examples because this is depressing, but I could go on for thousands of words, if I were so inclined (life is too short however). But there are several important things to note. First, it’s a pain in the ass to extract these data, and some people just might give up, which is a failure of communication (I stick with it because I am paid to do so–it’s not the fun part of my job). Second, which is far more important than the Mad Biologist’s suffering, how do we trust the other parts of the paper? These examples are not about some minute part of a supplemental table, but are the critical findings of the paper. So how does a reader handle this? Ignore everything else? Pick only the good parts? Third, all of these issues should have been caught. Reviewers should check the key findings of the manuscript in review (and editors should make sure they’ve done so).

I don’t think these are cases of fraud at all (I’ve seen very few cases that even make me suspect fraud), but these are errors, ones that could have and should have been caught in the review process. So I do think fraud is an obvious and documented problem, but poor reviewing and editing also is a serious problem.

I will now go outside and yell at clouds.

*I have seen the very occasional article that makes me wonder if the whole damn thing was made up however.

This entry was posted in Publishing. Bookmark the permalink.

3 Responses to An Issue Equal to Scientific Fraud: Bad Editing and Reviewing

  1. Morgan Price says:

    For some of your examples, carefully reading the manuscript itself would not necessarily reveal the error. This level of checking goes beyond what reviewers are usually expected to do. Also, I think having one person check these sorts of things is adequate. So my proposal is: once a paper is provisionally accepted, there should be a “data reviewer”. For a genomics paper, it might be a lot of work though!

  2. Kaleberg says:

    This is serious stuff. It looks like this kind of error can waste a lot of people’s time. I’m wondering if there’s a systemic way of cutting the likelihood of this kind of error. For example, require sequences extracted from known databases be referenced by some kind of “url”. That way, one class of errors could be caught by simply “following” the url and checking that the gene sequence referenced is the same as the one in the paper. If we start now, some of this stuff could be regular practice by 2050.

  3. Arjun says:

    I think reviewers _should_ be expected to look at the actual point mutation. Reading to make sure the text makes sense is ok, but reviewers should be expected to take at least a cursory glance at the data, try installing and running the software, etc. In a pure mol bio paper wouldn’t you expect the reviewers to look at the gel images and see if the expected bands were there, if there were extra bands, or if the controls were appropriate and looked clean? That’s not to say we won’t miss stuff, but a basic level of glancing at and/or spot checking the actual data should be expected.

    I’ve been known to make mistakes, and I would much rather a reviewer catch them than have to issue a correction. Then again I don’t review that many papers, and when I do spend perhaps more time than I can afford on this stuff.

Comments are closed.