Public Health Big Data: The Wages of Privatization

I spent a significant part of my career in and around these kinds of data, and I think the article below, while very good in terms of description of the failures of COVID-19 data collection, misses the larger picture: this is a result of massive underfunding of public data collection, and also the failures of privatized data collection. We excel at using ‘big data’ to determine what needless crap you will be more likely to buy–and by “excel”, I mean dump tons of money and personnel into that (this is a substantial part of the tech economy, which is to say, advertising). At the same time, we have massively underfunded public data collection, to the point, where most of these systems are run on shoestring budgets with too few personnel. Those personnel are almost always tasked with simultaneously keeping existing systems from collapsing, while somehow also developing new systems, while underpaid and given too few resources for the amount of work they’re expected to do (retention is an issue).

Anyway, onto the article (boldface mine):

Yet when it comes to evaluating NPI use in this pandemic, we seem unable or unwilling to muster the testing data that could inform statistical models and guide our actions. Forecasters and planners desperately need timely testing data. Yet as the absence of comprehensive public data on race and ethnicity revealed, the United States has underfunded and undermined its disease surveillance programs and done a poor job of organizing its 50 state systems for collecting and reporting testing data. The pandemic affects all states, yet states’ data are incomplete and uneven at best… The shortcomings are even more puzzling in the light of two decades of bipartisan federal efforts to build measurement and public reporting systems for health care and implement electronic health records.

The best database on testing for Covid-19 in the United States, created through valiant efforts by news media organizations to fill the gap left by the CDC, contains testing data limited to aggregated counts of the tests done each day, the states where tests were performed, and the number of positive results. The validity and reliability of the data are not fully known. Inspection of the data suggests a patchwork of inconsistent reporting from state and commercial labs. The database lacks basic information about tests such as the characteristics of the people tested, where they were tested, how they were selected for testing, and what factors led to the decision to test them. Yet these data are the best we have.

That the United States is failing such a simple test of its capacity to protect public health is shocking. Collecting and reporting public health data are not rocket science. Other countries, notably Canada and Belgium, are already reporting nationwide data on testing at the individual level, including individual demographic data (using ranges for each person to protect privacy) and other key attributes for each test. The United States was once a leader in collecting systematic federal data on population health. Now our national disease-tracking effort seems stuck with well-meaning but scattershot efforts by tech companies using cellular phone signals, social media surveys, online searches, and smart thermometers as we try to guess where Covid-19 outbreaks may be lurking. Small one-off studies using convenience samples have popped up to try to fill the vacuum with basics such as percentages of cases that are asymptomatic and of symptomatic people who seek care. Because of sampling bias, these studies are producing wildly different and nearly uninterpretable results. Estimates are so wide ranging that modelers have little choice but to default back to imprecise assumptions.

In the information age, the United States seems to be swimming in big data. This country has generated many of the world’s largest, most innovative, most profitable data companies. Yet when it comes to forecasting the spread of a major pandemic that is killing Americans and wreaking havoc on our economy, we seem oddly lost. With more than 80,000 dead and no end in sight, our national efforts seem feebler and more halting than the 19th-century work of Florence Nightingale in the Crimean War and William Farr in England, where they used systematically collected epidemiologic data and rigorous analysis to save countless lives. Would that our statistical models had such standardized, systematically collected, and readily reported data to inform them. Reopening state economies without the precision provided by analysis of rigorously reported testing data seems a peculiarly American form of madness.

Let’s talk about a couple of the specifics.

First, electronic health records aren’t about public reporting, they’re about billing. Anyone who has ever tried to get antibiotic susceptibility testing data–these are the lab tests that determine which antibiotics will be effective against bacterial infections–from private institutions knows that all of the hospital laboratory information management systems (LIMS) are variable and different, even when supposedly using the same software vendors. These systems were never designed for public health information extraction, but for billing. To the extent these LIMS inform patient care, they are designed to provide the minimal amount of information needed for a physician to treat: anything more, such as making data extraction easy for public health agencies, costs more, so it’s not a priority (to say the least). One reason why public health agencies have built sentinel surveillance systems is because it is so difficult to get this information from the existing healthcare system*.

Second, as I noted at the beginning, our ‘big data’ is not harnessed for the public good. When it is, in my experience, the companies doing the work do not have the experience, though they do have the resources–often an excess of resources–to do what needs to be done. These companies are in an excellent position to poach the expertise that exists in public health and other science agencies. Why work for a lower salary doing the same thing in government that one could do in the private sector?** This, in turn, leads to an increased dependency on these companies, and can often leave public health agencies in a very dependent position–and often with inferior systems.

If you want good public health systems, then you have to pay for them. You have to keep these expertise in the government. While that might not be possible with salaries, a good working environment and benefits helps. Part of that good working environment involves adequate resources and personnel, so workers aren’t robbing Peter to pay Paul. You have to build the expertise in house, because, if you don’t, then you’ll pay more to an external company, and often fail to get what you need. When we contrast the small size of public health budgets, much of which have little to do with electronic data collection, to the massive amounts we spend in private ‘tech’ data collection, it becomes obvious how underfunded these systems are.

You get what you pay for, and we didn’t pay for it.

*This is one more reason why we need a universal healthcare system, regardless of how BERNIE WILL PAY FOR IT? and other concerns.

**The only advantage the public sector can have is job security, though agencies are using contractors more and more, especially in the information tech areas, so that even isn’t the advantage it once was. While this patterns has accelerated under Republican rule, this first began under Al Gore’s ‘reinventing government’ initiative.

This entry was posted in Bioinformatics, COVID-19, Funding, Public Health, We're Really Fucked. Bookmark the permalink.