Column: When tracking Tennessee COVID-19 data, mind the backfill

(Photo: Getty Images)
(Photo: Getty Images)

To say that I’m knowledgeable on Tennessee’s COVID-19 data as a private citizen would be accurate. After nearly four months of tracking new hospitalization data near-daily, I’ve come to learn of patterns in the data and how the state reports it. But most importantly, I’ve come to learn about the flaws of the current hospitalization data as well. 

On March 30th, 2020, I started a Google Docs spreadsheet tracking COVID-19 numbers reported by the State of Tennessee. I started this spreadsheet because the state wasn’t providing a historical record of cases at the time that I actually felt was usable.

I’ve been updating this same spreadsheet nearly every night with numbers reported by the state daily for Tennessee and for my county. Since I began, I’ve tracked every major data addition and change to the COVID-19 data released.

I began to demand more data even tweeting about it several times. Tennessee should have been reporting current and pending COVID-19 hospitalizations since most states were early in the pandemic, but Tennessee wasn’t. I knew they weren’t listening to me, some rando on Twitter barking orders at them like some kind of… citizen. 

“Backfill” here means once the initial report for a date has been made, the numbers in the past are adjusted, or “backfilled,” to be more accurate for that day when more data is available, often a day or two later or even up to a week or two later.

So I was elated when they finally started releasing this data, even back to May 29. Eventually, I got historical data dating back further. I made new columns in my spreadsheet and started recording it for as far back as they had. 

You see, with numbers such as total cases, total tests, and total recovered, those numbers are reported and in most instances, don’t change after they’ve been reported for that day. Those who track this data daily such as myself, local news organizations, journalists, and the likes of the wonderful folks at The COVID Tracking Project, see this data as what it is: a snapshot for the day.

We are taking a snapshot of the data as it stands for the day, relying on data not to change after it’s been reported so that we can use that data for things like trend lines and discovering emerging patterns to make educated decisions. This is great as long as the data we record doesn’t change when we move on to tomorrow, the weekend, or next week even.

But the State of Tennessee backfills the current and pending hospitalization numbers, current and pending patients in the ICU, and current and pending patients on ventilators from COVID-19. “Backfill” here means once the initial report for a date has been made, the numbers in the past are adjusted, or “backfilled,” to be more accurate for that day when more data is available, often a day or two later or even up to a week or two later.

Tennessee routinely reports one current hospitalization number in their graphics on social media. However, in the days that follow, they will continuously update the numbers of previous dates. If someone isn’t aware that the state is doing this, they could be using lower numbers to make decisions or judgments than what reality actually holds. 

For example, on Sept. 14, 2020, the Tennessee Department of Health tweeted out the following graphic, listing current hospitalizations as 703 (pink box and arrow added). 

Chart from Department of Health
Chart from Tennessee Department of Health.

You should keep two key things in mind when you see this number. First, this number is for Sept. 13. The state reports current hospitalizations on a day delay, evident here. Second, this is only for current “positive” cases and excludes current “pending” cases. 

By Sept. 21, the state showed 793 positive cases on Sept. 13, 90 more than the department of health originally reported. Tennessee “backfilled” the data to add those additional 90 cases.

Sept. 19 Tennessee Department of Health Report.
Sept. 19 Tennessee Department of Health Report.

I compared the Tennessee Department of Health’s reported current hospitalization numbers based on the graphic they tweet every day, versus what they currently show for those dates for most of the month of September .

Date TN Tweeted Current Hospitalizations Actual Number Now Difference in Tweeted vs Actual TN Tweeted Change from Previous Day Actual Change from Previous Day
9/2/2020 897
9/3/2020 865 865 0 -32 -32
9/4/2020 846 857 +11 -19 -8
9/5/2020 819 848 +29 -27 -9
9/6/2020 826 826 0 -18 -22
9/7/2020 844 846 +2 +18 +20
9/8/2020 862 866 +4 +16 +20
9/9/2020 848 858 +10 -18 -8
9/10/2020 808 808 0 -50 -50
9/11/2020 805 832 +27 -3 +24
9/12/2020 696 789 +93 -109 -43
9/13/2020 703 793 +90 -14 +4
9/14/2020 762 831 +69 +35 +38
9/15/2020 791 814 +23 -40 -17
9/16/2020 823 823 0 +9 +9
9/17/2020 735 806 +71 -88 -17
9/18/2020 706 773 +71 -100 -29
9/19/2020 662 762 +100 -115 -11
9/20/2020 693 704 +11 -51 -58
9/21/2020 766 766 0 +62 +62
9/22/2020 779 +13


Every time there was a backfill, the number went up, not down (evident in the third column of numbers) from what was reported. In most cases, the number the state reported as a change from the previous day was way overstated (compare the last two columns of numbers). This isn’t the fault of the state, in my opinion. They are just doing the math based on the previous number they reported.

People who report on pandemic numbers record case numbers as they are reported and use that data produce trendlines, without going back to see if the number has changed. This could mean that incorrect assumptions have ripple effects for those who rely on the data.

Another implication I see is for the average passerby who occasionally sees these graphics that the Department of Health and local news stations report out (using just that number reported for the day), and make possibly wildly inaccurate assumptions of the state of the pandemic. 

I believe the state is being truthful when they report the number in their graphics and on their website, as that seems to be the accurate number as of that date.

The issue lies here: I think the general understanding of how to read these numbers is the key concern I have. We’ve gotten used to the number of cases, hospitalizations and deaths not changing because they are totals for the entire pandemic. However, when you report a number as “current” that fluctuates after it’s been reported, not making it abundantly clear that you are making the numbers accurate by backfilling numbers could cause more harm than good.

If we want to get an accurate picture of the impact of COVID-19 on our healthcare system or for the public to make informed health decisions, we need to use the most accurate numbers possible. 

The state doesn’t mention anywhere on their current hospitalizations page that these numbers are continuously updated for prior days. They don’t mention that “data is subject to change” or disclose that once they report a figure for any given day, that figure could be updated in the future for previous dates. 

The public relies on people giving them the data they are seeking to make important health decisions. Those who provide that data must do so accurately. For those who track this metric over time, it’s important they follow the backfills every day and update numbers, so those who rely on these numbers make accurate decisions and conclusions from the data.