Understanding the Limitations of Your Data: The Student Debt Edition

As I’ve noted many times, when doing analysis, you must understand the limitations of your data, whether it be the microbiome, genomics, education statistics, polling data, or, well, anything else.

With presidential candidates Sen. Elizabeth Warren and Bernie Sanders both offering plans to wipe out student debt, some have argued this isn’t progressive*, as the debt forgiveness disproportionately will not help low-income households, as they don’t carry much educational debt. But, as Matt Bruenig notes, this statement is largely based on a misunderstanding of the data used to make this claim (boldface mine):

…I set out to produce my own distributional analysis over the past few days but learned something very unpleasant in the process: the data source everyone uses for these estimates, the Survey of Consumer Finances (SCF), is not well-suited for the purpose and almost certainly systematically understates how much student debt is carried by low-income individuals.

Everyone uses the SCF for these purposes because it is the only high-quality wealth survey in the US that appears to allow for distributional analysis on student debt. The think tanks that have put out student debt figures based on it include behemoths like Brookings and the Urban Institute. Those figures have then been cited in articles at the New York Times, the Washington Post, the Wall Street Journal, Vox, Slate, and virtually every other media outlet.

But a deep dive into the methods of the SCF, along with comparisons of the SCF to other student debt data sources, clearly show that these figures are off the mark, and probably dramatically so

Rather than grouping all related people who live in the same household, they [the SCF] instead construct a Primary Economic Unit (PEU) for each household, which “consists of an economically dominant single individual or couple (married or living as partners) in a household and all other individuals in the household who are financially interdependent with that individual or couple.

Importantly for our purposes here, the financially independent relatives of the economically dominant individual or couple, such as many young adults living with their parents, are not included in the PEU. In fact, they are excluded from the survey altogether.

This is a problem because it means that student debtors who live with their parents, which many do in part because of their student debt, are either absent from the survey sample or being counted as part of their parent’s PEU. Specifically, if the parents tell the survey taker that their co-resident student debtor kid is financially independent, then they are just dropped out of the survey universe. If they tell the survey taker that their kid is financially interdependent, then the kid (and their student debt, if their parents properly estimate it) are included in the parent’s PEU, meaning that it is the parent’s characteristics (age, income, education) that gets assigned to the kid’s student debt.

Once you understand how the PEU works, it does not take a genius to realize that the SCF must be missing a lot of student debt and especially student debt carried by low and moderate income young adults who are living with their parents. And even when it does pick up that kind of debt, it is assigning it to the older, generally more affluent parents of the student debtors, which should also skew the distribution of student debt up the age and income ladders.

The Federal Reserve realized five years ago there are problems with these data, so it’s not just one dude at a thinky tank.

What this means is that we are overestimating how much student debt is owned by well off people, and underestimating how much student debt is owned by low income people.

Shocking, I know…

*There’s also nothing ‘unprogressive’ about policies that help middle income people, as opposed to focusing on lower-income people

This entry was posted in Economics, Education, Statistics. Bookmark the permalink.