There’s a recent and ridiculous Culture and Brain paper on the genetics of intelligence in Chinese college students. Thankfully, Jeremy Yoder knocks down the ridiculous statistics, and as he notes we’ll probably see more SHTstat (which is the analytical module for SHTseq).
That noted, what bothers me about the paper is that even if some of the SNPs wound up being ‘real’, it’s still garbage because it doesn’t account for the environment. We have no idea of the cognitive background of these students. To explain in more detail, the variation in IQ scores can be described by four factors:
1) Genotype: one or more genetic differences (SNPs in this study–’single nucleotide polymorphisms).
2) Environment: some students were raised in a ‘smart’ environment, others in a ‘stupid’ environment. In the U.S., the differences found in every socioeconomic group between Alabama and Massachusetts could be a case of environmental effects on IQ. Likewise, lead exposure also seems to have an effect.
3) Genotype by environment interactions (‘gxe’): certain genetic differences, when found in certain environments, will be associated with differences in intelligence. For example, one or more SNPs, when found in a child in Massachusetts (as compared to Alabama) might lead to higher IQ, but children who lack those SNPs, in either state, will not have any differences on average in IQ (note the “on average’).
4) Genotype by environment covariance: certain genotypes (combinations of SNPs) are more likely to be found in certain environments. In other words, children ‘smart genes’ will be more likely to be reared in ‘smart environments.’
Here’s why the environment matters. If we look at gxe interactions (#3), incorrectly specifying the environment will lead to underestimating the effects of genotype. Translated into English, if I lump together all of the students’ scores without accounting for their environments (e.g., Alabama and Massachusetts), I will weaken (or perhaps fail to meet the threshold of significance) the effects of certain SNPs. In the hypothetical example in #3, if I were to look at Massachusetts and Alabama separately, I would realize that genetic variation does matter (at least in Massachusetts), while lumping them together, I might conclude there is a much weaker genetic effect.
On the other hand, genotype by environment covariance will do the opposite. For example, if I don’t realize that children who have a slightly higher IQ score are more likely to be raised in environments that dramatically increase IQ (e.g., better nutrition, less lead exposure, better educational access, parents with better educations, etc.), then I would conclude, erroneously, that these SNPs have a much greater contribution to IQ than they otherwise do.
By the way, that last scenario sounds like a society where one’s economic success is heavily determined by academic performance, including test taking, which then will improve the academic success of your children.
In other words, China (and, increasingly, the U.S.).
Point being, a lot of these studies are very poorly designed because they don’t collect the appropriate metadata (this isn’t unique to human genomics, since I see the same thing in microbial genomics and microbiome studies). It’s easy to look at a couple of traits and do genotyping. Designing the right study with the right subjects and the appropriate metadata is very hard. But nobody said this science stuff was supposed to be easy.
So we’re not only going to see a lot of SHTstat, but I also think a lot of SHTdesign and SHTmetadata. And in the meantme, it will take a lot of effort to correct public misperception–if we can do that at all.