Yes, it’s a brutal charge, but, sadly, it’s true. Economist Moshe Adler (who has also written an excellent book Economics for the Rest of Us: Debunking the Science That Makes Life Dismal) has squared off against Raj Chetty et alia, an MIT economist who has used value-added measurements–which we find very problematic when applied to education–to claim that teacher quality can affect lifetime economic earnings. Chetty’s work received a lot of attention when President Obama mentioned it in his 2012 State of the Union address (and the initial reports from 2010 had problems as well). Here’s a summary of Chetty’s work by Adler:
The first part of the report (NBER Working Paper No. 19423) reviewed here linked information about students and teachers in grades three through eight in New York City (NYC), spanning the years 1989-2009. The research used this linked dataset for “value added” (VA) calculations for individual teachers. The model used for the VA calculation controls for factors such as students’ prior achievement, parents’ income, and the performance of other students in the classroom. But none of these factors can entirely predict a child’s performance. After accounting for all known and easily measurable social and economic factors, a residue still remains. The report assumes that this residue is attributable to the teacher, and it calculates a teacher’s value-added by using the residues of his or her students.
The second part of the report then linked the incomes of adults with the value-added of their teachers when those earner-adults were students. Using this linked data set they found that a one unit (one standard deviation) increase in teacher value-added increases income at age 28 by $286 per year or 1.34%. The study then assumes that this percentage increase in income will hold for a person’s entire working life, producing a cumulative lifetime increase of $39,000 per student.
Sounds impressive. But there are problems, as Adler lays out (boldface mine):
1. An earlier version of the report found that an increase in teacher value-added has no effect on income at age 30, but this result is not mentioned in this revised version. Instead, the authors state that they did not have a sufficiently large sample to investigate the relationship between teacher value-added and income at any age after 28, but this claim is untrue. They had 220,000 observations (p. 15), which is a more than sufficiently large sample for their analysis.
2. The method used to calculate the 1.34% increase is misleading, since observations with no reported income were included in the analysis, while high earners were excluded. If done properly, it is possible that the effect of teacher value-added is to decrease, not increase, income at age 28 (or 30).
3. The increase in annual income at age 28 due to having a higher quality teacher “improved” dramatically from the first version of the report ($182 per year, report of December, 2011) to the next ($286 per year, report of September, 2013). Because the data sets are not identical, a slight discrepancy between estimates is to be expected. But since the discrepancy is so large, it suggests that the correlation between teacher value-added and income later in life is random.
4. In order to achieve its estimate of a $39,000 income gain per student, the report makes the assumption that the 1.34% increase in income at age 28 will be repeated year after year. Because no increase in income was detected at age 30, and because 29.6% of the observations consisted of non-filers, this assumption is unjustified.
5. The effect of teacher value-added on test scores fades out rapidly. The report deals with this problem by citing two studies that it claims buttress the validity of its own results. This claim is both wrong and misleading.
While Adler’s entire critique is damning (and accessible to non-specialists), point #1 just floors me.
Nowhere in any of the correspondence between Adler and Chetty et alia is there any serious effort to assess the power of test; Adler, on the other hand, has done those power calculations, and finds that the smaller dataset is about eight times larger than the minimal size required (Adler’s response to their response, erm…. is devastating). In other words, there’s no reason to discard the data for the 30 year olds, except for its inconvenience.
You don’t do science like this. Period.
This seems like a case where an initial, trumpeted finding, not confirmed by peer review, suddenly is confronted with additional data and falls apart.
In the entire history of science, this has never happened. I kid: it happens all the time. The question is, as a scientist, do you double down or admit that the effect is either very weak or non-existent? Human nature being what it is, the temptation is to double down, especially when you made your bones on this stuff (and the President mentioned your work!). Hell, senior researchers often don’t back off, until they are completely dogpiled.
At what point does the dam break? Just how many shoddy methods, misuses and misrepresentations of data, and the like before education reformers lose legitimacy? Good policy can’t be built upon incorrect science and evidence.
Sadly, I think there’s a long way to the bottom of that bottle.