Data Week: How to collate data

Welcome to Plugging the Gap (my email newsletter about Covid-19 and its economics). In case you don’t know me, I’m an economist and professor at the University of Toronto. I have written lots of books including, most recently, on Covid-19. You can follow me on twitter (@joshgans) or subscribe to this email newsletter here.

This week is data week at this newsletter. Today’s newsletter looks at the efforts of economist, Emily Oster, to collate Covid-19 data from schools and childcare centres around the US; an effort that is already delivering fruit.

Emily Oster is well known for being the rational voice of parenting. She has written books such as Expecting Better and Cribsheet and has a regular newsletter, Parent Data. She had launched a site to explain Covid to ordinary folks. A few months back, she realised that public data on Covid-19 outbreaks and transmission in schools and childcare centres was woefully bad. When these reopened in August, there would be little ability to track cases and inform good public policy. So she turned to crowdsource the data directly those places and collate them in a way that would be of use to anyone. The result is today, the Covid-19 School Response Dashboard.

She wrote:

Our goal was to start not with case counts of Covid-19 but with schools. We wanted to ask schools how they were opening (or not), what mitigation strategies they were using, and to describe their enrollment and staffing levels. And then we wanted to ask them about Covid-19 cases. But only once we had the context.

The highly decentralized, fragmented American school system makes this kind of data collection difficult, and it may explain why a coordinated Covid-19 response in schools has been so hard. Reporting requirements vary from district to district and from state to state. Even in states with detailed coronavirus school case dashboards, such as Tennessee, the group that creates the dashboard, usually a state Covid-19 response team, often does not have good access to underlying enrollment data.

Private schools have little or no reporting requirements for coronavirus, but in many locations they are the only ones to open. These private schools are an opportunity to learn about what might happen when public schools open in these areas, but only if we have data.

This fragmentation means there is no centralized location to look for the context we need. But it also provides opportunities: Since there is so much individual decision-making, there is enormous variation across the country in school reopening plans. This variation provides an opportunity to learn about the effectiveness of different reopening strategies. But, again, only if we have the data.

I can’t do it justice. It is a dream for anyone interested in what is happening in schools. It covers 1,006 schools across 48 states with 528,000 students and 54,000 staff. They even set it up so you can do your own analysis. It represents the pinnacle of what can be achieved with so little cost in the digital age.

BusinessWeek has the story of how this all came to be. A highly recommended read. Here is a taste:

The ability to do a true cost-benefit analysis—for Oster or any of us—is hamstrung by the lack of comprehensive infection data from child-care settings. So Oster and her team at another of her projects, the website Covid Explained, began collecting their own data. “Part of it is, honestly, I’m looking to shame the world, the CDC, states, whoever, who are telling us it’s impossible to learn from this,” she says. “I’m a lady with a newsletter. You should improve your data-collection efforts.”

Preliminary Findings

It is early days yet but Oster has begun to notice some trends. The dashboard reinforces the idea that context really matters:

And we have information on Covid-19 cases, at least in the first weeks of school. So far, the numbers are small. In our data, as of Sunday, confirmed case rates in students are 0.073 percent and, in staff, 0.14 percent. That means, in a school of 1,350 students you’d expect one case every two weeks and, in a staff of 100, one case about every 14 weeks. These numbers are about three times as high if we include suspected cases.

The top-line numbers are usually what people ask about first, but by starting with the context we can look at all sorts of additional information.

For example: In some school districts, staff are working in person and students are not in person. Staff suspected and confirmed case rates in these schools look similar to schools that have students in person (although all are low), which suggests that staff may be spreading the coronavirus to each other, or these cases may be the result of general community spread. Another simple finding: Private schools in our data have lower infection rates, which seems to reflect, at least in part, their demographics and the fact that they do more mitigation.

And there’s more. This is from The Atlantic:

Our data on almost 200,000 kids in 47 states from the last two weeks of September revealed an infection rate of 0.13 percent among students and 0.24 percent among staff. That’s about 1.3 infections over two weeks in a school of 1,000 kids, or 2.2 infections over two weeks in a group of 1,000 staff. Even in high-risk areas of the country, the student rates were well under half a percent. (You can see all the data here.)

School-based data from other sources show similarly low rates. Texas reported 1,490 cases among students for the week ending on September 27, with 1,080,317 students estimated at school—a rate of about 0.14 percent. The staff rate was lower, about 0.10 percent.

These numbers are not zero, which for some people means the numbers are not good enough. But zero was never a realistic expectation. We know that children can get COVID-19, even if they do tend to have less serious cases. Even if there were no spread in schools, we’d see some cases, because students and teachers can contract the disease off campus. But the numbers are small—smaller than what many had forecasted.

Predictions about school openings hurting the broader community seem to have been overblown as well. In places such as Florida, preliminary data haven’t shown big community spikes as a result of school openings. Rates in Georgia have continued to decline over the past month. And although absence of evidence is not evidence of absence, I’ve read many stories about outbreaks at universities, and vanishingly few about outbreaks at the K–12 level.

This is important. Despite earlier fears, schools appear both relatively safe and not to be drivers of outbreaks. While some say we should never gamble with children, and things not being 100% clear qualifies as ‘a gamble’ there are costs to keeping kids at home too — itself a gamble. There is only one way to grapple with fears and that is by taking what data we do have and examining it closely.

Nonetheless, all this gives me considerable comfort.

What about Ontario?

So Oster has covered the US. What about Canada? There is data. The Ontario government is tracking cases in schools and reporting them by school. The data is available but it is barebones. You would have to do considerable work even to keep track of the nature of any outbreak let alone correlate the data with other key variables such as local outbreaks. It offers nothing like the richness of Oster’s framework.

This is a problem. We are entering the second wave and with that discussion as to whether to close down schools. Thankfully, the government’s focus is to shut down other things before schools. But we do need to understand if schools are a problem going forward. Oster’s data will actually give us clues but only if we have more granular data for our own schools.

What did I miss?