A couple of weeks ago, I found a post, “Is HIV ‘fingerprinting’ junk courtroom science?“, which argues:
But calling the comparison of HIV strains’ genes “fingerprinting” — calling to mind the more-familiar matching of human suspects’ DNA to blood at a crime scene — is dangerously misleading, they warn.
“By calling such investigations HIV fingerprinting, scientists raise unrealistic expectations” about the method’s accuracy among juries and judges, the write. “Unlike for (human) DNA fingerprinting, where a likelihood can be calculated for a full match between the evidential DNA and the suspect’s DNA, there is never a full match between the RNA or the DNA of HIV in two samples, even within an individual.”
That is partly because HIV strains are constantly evolving, they note.
“Proper identification of the transmission source would require two major assumptions: that a phylogenetic tree can flawlessly reconstruct a true epidemic history and that strains from all patients ever infected with HIV are available as controls,” the authors write. “Both assumptions are unrealistic.”
Worse, since the full range of local genetic variation and the relatedness of local HIV strains are unknown, the probability that one individual’s HIV infection came from another specific individual simply cannot be quantified, the authors note.
“Because the full transmission tree is unknown, no likelihood can be attached to the a priori hypothesis of direct transmission,” they write.
I think there’s a far more serious problem, and it has to do with the rapid rate of evolution of the HIV virus (the virus that causes AIDS).
HIV evolves so quickly that most patients don’t have one virus, they have multiple, genetically distinct viruses. In other words, each patient has a population of HIV viruses that changes rapidly through time. So what is actually being reported is what we call the consensus sequence. To use a toy example, imagine that, using the omniscient powers of the Mad Biologist, we know that a patient has viruses with the following sequences–the number of sequences indicates their relative proportions in the population (these aren’t real sequences):
Due to how most HIV sequencing is done, we would report the consensus sequence as ACTGG. Now, suppose that patient infected another patient, but due to random factors, different population dynamics, or that TCTGT transmits to other patients better, that new patient has the following HIV sequences (again, known through my awesome superpowers):
The consensus is TCTGT, and looks completely different. But it gets worse. Sometimes, the consensus doesn’t even exist in the patient. How is that even possible? Imagine the following sequences:
Here, the consensus is TCTGC, and that sequence does not exist in the population. To eliminate that problem, we can use what’s known as 50% majority consensus, where we call those bases for which only 50% of the sequences yield the same base (i.e., here we would report NCTGN), but we lose a lot of information (since CTG is invariant, that doesn’t help us).
People are developing ways to sequence the population of HIV viruses, but that creates a whole new set of difficulties: populations of HIV can change rapidly, and, if there’s a response to the new patient’s immune system, that too can alter frequencies of individual HIV viruses. I don’t even know how you definitively in a legal context identify a source with that going on.
Having said that, HIV fingerprinting can work as an epidemiological tool, provided the samples are collected shortly after infection. But, outside of that situation, I think a good defense attorney could honestly raise issues that lead to reasonable doubt.