To paraphrase Mark Twain, biological understanding may not repeat itself, but it does rhyme. There’s a recent commentary, “The Great DNA Data Deficit: Are Genes for Disease a Mirage?“, from the Bioscience Resource Project which, if Twitter is any guide (and how could Twitter possibly not be?), is leading to some angry rebuttals, even as I write this. The piece discusses Genome Wide Association Studies (‘GWAS’), and argues that GWAS is overhyped in that many common diseases have a very limited genetic basis. Before I deal with the piece (which has good and bad points), I would note that I’m having flashbacks to the neutralist/selectionist debates (which always seem to flare up with the advent of new data types).
Like those debates, I think we’ll ultimately come to the conclusion that some diseases have a very limited genetic component and some don’t; some can be attributed to common alleles, and some rare alleles; some traits will be determined by variation among many genes, and others among few; some diseases or traits will be driven by large scale deletions, some won’t be. As a dissertation committee member once told me, “It ultimately comes down to those stupid fucking natural history facts.” And I see no reason why it should be any different this time.
Anyway, onto the commentary, which has good and bad.
A while ago, I wrote about the need for more precision when we specify the environmental component of heritability. Let me explain what that means. When we try to figure out if a trait is ‘genetic’–that is, has a high heritability–we need to specify three things. First, we need to accurately define the phenotype or trait under consideration. For instance, one wouldn’t want to lump together Crohn’s Disease, ulcerative colitis, and pouchitis, even though they’re all described as inflammatory bowel disease, since they’re separate diseases (I think most GWAS are fine in this regard, but I’m including this for the sake of completeness).
The second factor, if you’re doing GWAS and not just ol’ timey quantitative genetics, is to appropriately describe the available genetic variation. With the ‘first generation screens’–looking for diagnostic changes in single nucleotides scattered throughout the genome (single nucleotide polymorphisms or SNPs)–there are concerns over whether we are missing certain other types of changes such as copy number variation (i.e., multiple copies of a single gene) or deletions (missing chunks of sequence that don’t remove the SNP). Unlike the authors, I do think searching for this type of variation is important, since, in some cases, these changes will probably have some additional effect. It very well could be that many common diseases are collections of rare diseases that are lumped together (as one shouldn’t do with inflammatory bowel disease). Let’s do some science and find out.
The third factor that needs to be specified is the environment. When we state that a disease (or any other trait) is highly heritable (for those remembering some genetics, a high h2), this is actually a statement about heritability in a given environment. Since the origins of quantitative genetics, it has been well documented in a variety of plants and animals (including humans) that the environment can play a huge role in the heritability of a trait. That’s why agricultural genetics work go through great pains to control the environments of their subjects.
At this point, GWAS defenders will say, “That’s why we use twin studies!” And twin studies have demonstrated high heritabilities. Consequently, we simply have to keep searching for the ‘missing heritability.’ But there are two problems with these studies. Here’s one problem:
Studies of human twins estimate heritability (h2) by calculating disease incidence in monozygotic (genetically identical) twins versus dizygotic (fraternal) twins (who share 50% of their DNA). If monozygotic twin pairs share disorders more frequently than do dizygotic twins, it is presumed that a genetic factor must be involved. A problem arises, however, when the number resulting from this calculation is considered to be an estimate of the relative contribution of genes and environment over the whole population (and environment) from which the twins were selected. This is because the measurements are done in a series of pairwise comparisons, meaning that only the variation within each twin pair is actually being measured. Consequently, the method implicitly defines as environment only the difference within each twin pair. Since each twin pair normally shares location, parenting styles, food, schooling, etc., much of the environmental variability that exists between individuals in the wider population is de facto excluded from the analysis. In other words, heritability (h2), when calculated this way, fails to adequately incorporate environmental variation and inflates the relative importance of genes.
Heritability studies of humans are classic experiments that have been conducted many times and they have strong defenders among modern geneticists (e.g. Visscher et al. 2008). Nevertheless, criticisms such as those above are not novel. They are a specific example of the general problem, formulated by Richard Lewontin (of Harvard University), that the contributions of genes to a trait normally depend on the particular environment. And further, that susceptibility to environment depends on genes. In consequence, there can be no universal constant (such as h2) that defines their relationship to one another (Lewontin, Rose and Kamin 1984; Lewontin 1993). Lewontin is not alone among geneticists in his dismissal of heritability as it is used in human genetics. Martin Bobrow of Cambridge University, for example, has called human heritability “a poisonous concept” and “almost uninterpretable”.
If one accepts either that h2 is consistently inflated, or that it is essentially meaningless, even “poisonous”, then the only current evidence supporting genetic susceptibility as a major cause of disease disappears. “The Missing Heritability of Complex Diseases”, DNAs’ so-called ‘dark matter’, becomes simply an artefact arising from overinterpretation of twin studies.
A second, related criticism is based on some work I’m doing that involves twin studies. When you compare twin pairs (we have metadata related to the main study condition), whether they be identical or fraternal, they can differ greatly both in the environmental variability within pairs, as well as means among pairs. Worse, some of this environmental variability is correlated with twin type. In English, this means that disentangling the effect of genetic relatedness (i.e., identical or fraternal) from the other covariates (e.g., diet) is impossible–and with most twin studies, if you rigorously characterize the patients, this conflation will be impossible to avoid*.
Years ago, Graham Bell wrote a wonderful essay in Molds, Molecules, and Metazoa were he reviewed many studies of breeding and crop experiments that assessed heritability–and keep in mind, these were only ‘semi-wild’ plots, so the importance of genotype is overestimated. In most cases, the relative contribution of genetics to a trait was between five to fifteen percent, and these were traits that could be altered in artificial systems (i.e., nobody is going to waste their time trying to alter phenotypes that just don’t budge). I think we’ll ultimately see the same phenomenon here: there will be a range of heritabilities, but, for common diseases, the heritabilities will be rather low.
Until then, I’ll just sit back and watch the fireworks….
*Imagine that you ask patients twenty questions, which can only be answered with one of two responses (and there are always two answers for every question in the study population). You would need 220–1,048,576–patients to see every combination once.