As the Covid-19 pandemic continues, the proliferation of models of the spread of the virus rivals the spread of the virus itself. Looking at these models is dizzying, and their accuracy so far, especially with respect to heterogeneity across regions, is unimpressive. Of course, modeling an ongoing pandemic is hard, but recently a more fundamental problem has started to bother me: perhaps the models we’re seeing are not the models we need. I do not know if the models we need exist and I’m unaware of them, if they don’t exist but could, or if they are beyond our grasp. What models do I mean? It pains me as a physicist to write this, but: detailed ones. I will explain, beginning with an analogy.
Suppose delivery routes in your city are, on average, 30 miles long. You have a delivery service and you want to know how much fuel in total your vans will need to make 5 trips; this depends on the distance traveled. You know the exact routes the vans take. Do you conclude that the total distance is 150 miles? Of course not! You don’t take the generic solution to the problem, you take the specific solution that’s the sum of the five actual distances that your vans will travel. This might be close to 150; it might not. If we wanted to make predictions that were optimal over a wide range of routes, or a wide range of potential universes of delivery services, we’d use a general model, but otherwise, specificity wins.
For disease spreads, we can imagine various modeling approaches. There are:
(1) Simple differential equation based models, for example SIR models. These are elegant and interesting, describing in a generic sense the rise and fall of diseased populations as a function of just a few parameters. Any graduate student in the sciences should be able to read, write, and simulate the equations of such a model, and even fit data to the model to estimate values of the parameters. (Keep in mind, of course, that one can always fit any data to any model; this in itself doesn’t tell you whether the model is any good.) This is an excellent approach if one wants to gain a general understanding of epidemic spread, just as minimal models are, throughout science, an excellent way to learn about the general features of all sorts of phenomena. A simple parameter to extract from such a model is R0, the basic reproduction number, which characterizes how many secondary cases are generated per infection in a completely susceptible population; R0 > 1 indicates increasing infection. (See, however, below…)
(2) Models that include network connectivity. Spatial structure is an important part of real-life transmission — I can only transmit a contagious disease to someone I contact. We can combine models like the SIR model with graphs of varying degrees and types of connectivity. Lots of people do this. Again, there are general insights to be gained.
(3) Models that include the realistic connectivity networks of the world we live in. People in different regions, or with different lifestyles, have different patterns of connectivity with others. An exact (and absurdly complex) map would show each person’s connections with others; an approximation to this would have patterns of transport, types of housing, densities and occupancies of buildings, and so on represented by samples drawn from the appropriate distributions. A New York City subway rider does not have the same connectivity graph as a bike commuter riding through a park. With information on actual structure and habits, we can model infection dynamics on realistic networks. This has been done in the past! See this 2018 study, for example: “Measurability of the epidemic reproduction number in data-driven contact networks” [1]. Notably, the authors write: “We find that the classical concept of the basic reproduction number is untenable in realistic populations, and it does not provide any conceptual understanding of the epidemic evolution.”
Why? Because the network matters! Again quoting from the above paper, “the heterogeneity and clustering of human interactions (e.g., contacts between household members, classmates, work colleagues) alter[s] the standard results of fundamental epidemiological indicators, such as the reproduction number and generation time over the course of an epidemic.”
This shouldn’t surprise us, for two reasons. The general approach of (2) above already tells us that geometry matters, not for the existence of infectious spreading but for its characteristics. It strikes me as analogous to percolation theory in physics — asking what fraction of sites on a lattice I’ll have to randomly occupy with a stepping stone before I can cross from one side to the other. That fraction varies a lot on different lattices, though it exists for all of them.
More importantly, our goal in analyzing an ongoing pandemic is to understant that pandemic, not the general case. We care about the spread of infection in the particular network of roads, buildings, jobs, hospitals, etc., that characterize, and differently characterize, New York, Oregon, Singapore, etc. It’s only minimally helpful to apply the same naive model to all, even as a curve-fitting exercise, because while we can always fit effective parameters, the regions’ curves themselves may be profoundly different.
Again to use an analogy, this time from my own lab’s work: We recently examined the surprisingly large impact of weak antibiotics on gut microbes, looking in zebrafish as a model organism (paper; blog post). In part of this work, we realized that we could understand bacterial dynamics with a simple biophysical model that makes testable predictions and that has only a few parameters that could be fit from the data. We are quite happy about this — the existence of a useful minimal model suggests that we understand the mechanisms that broadly govern our bacterial populations. However, if we cared about one specific fish, and that fish only, we wouldn’t use this general model except as a very rough guide — we’d instead carefully map that fish’s intestinal anatomy, the fluctuations of its particular bacteria, etc.
Why, then, do we not make realistic (and much more detailed) pandemic models?
One possibility is that we do, but these models are not widely publicized. I would like to believe this, but I am doubtful. The influential IHME model is simply a region-by-region curve fit to a Gaussian error function (!!) Most others, e.g. this that does county-by-county fitting in Virginia, this from UPenn, and all but one of the models listed here ignore spatial structure. The one exception is from the “GLEAM” project, which involves the lead author of the 2018 realistic network paper noted above. This looks fascinating, but there isn’t a lot of information at the web site, and its last updated projection was 17 days ago.
Another possibility is that it’s much easier to make poorer models. Certainly it seems like everyone these days is doing it. We’ve progressed a little bit beyond a few weeks ago when I (really) saw graphs fitting simple exponentials to mortality data, with y-axes extending to the billions, feeding hysterical conclusions. We haven’t progressed very far, though, and we’re inundated with mediocre, easy models. Ignoring them is probably best, but it’s hard to do.
I’d like to imagine that like the Manhattan Project, there is a sequestered band of scientists somewhere doing the difficult work of slogging through demographic, transport, and city planning data to construct realistic pandemic models, and that they’re so occupied with this real work that they aren’t wasting time Tweeting. I am skeptical they exist, though. They should, however — it would a good resource for the next pandemic.
Today’s Illustration
Extracting a cell from an embryo. A worse version of an illustration I made for the book chapter I’m presently working on — see the link!
— Raghuveer Parthasarathy, April 21, 2020
[1] Q.-H. Liu, M. Ajelli, A. Aleta, S. Merler, Y. Moreno, A. Vespignani, Measurability of the epidemic reproduction number in data-driven contact networks. PNAS.115, 12680-12685 (2018). Link
Hi Raghu,
As always, thank you for the post. You point out some very good points, and I think another reason why we haven’t seen very many detailed studies might have to do with the way resources are being/ have been partitioned. For example, while governors may have largest say in how their state responds to the pandemic (which is probably a good thing), they have to fight from a large pot of resources to do so. This points to a flaw in resource allotment from the beginning, but it also points to a weird dynamic for determination of quantity of resources allotted. Perhaps it isn’t as strategic for a state to give highly tailored information for their particular situations when the federal government sees 49 other states claiming a simpler argument for maybe a larger piece of the pie. Perhaps our leadership (or lack thereof) doesn’t allow for complex arguments in the face of complex problems.
Related question: Are you or any colleagues currently teaming up to develop such a model?
Hope you are well.
Thanks! About “Perhaps our leadership (or lack thereof) doesn’t allow for complex arguments in the face of complex problems.” I agree, though I’m sadly realizing that most people, not just our leadership, have an even bigger aversion to complex arguments than I thought!
About developing a better model: no, I’m not working on this. I could give a bunch of reasons that aren’t wholly satisfying. If you want, I can tack these onto the post!
What specific questions do you want to use the models to answer?
It’s not a good idea to add model complexity because “X matters” rather than because there’s a specific question you have in mind and a plausible argument for why modelling X in detail is relevant to answering that question.
So a third possibility is that the kind of detail you want isn’t considered necessary for answering the kinds of questions these models are used for. Maybe the model parameters are fit with such noisy/stale data that it would make no sense to ask detailed questions of the resulting models. In this case, the kinds of details you suggest including may not be relevant for the kinds of questions which *are* asked.
There are plenty of bad models out there — particularly the U Washington IHME one.
But how much looking have you done for good ones? Have you read the Imperial College London paper from March 16? Their model seems quite good, and well-suited to simulate the effects of various policies.
Me and some colleagues (at Leeds and Cambridge) are starting to do what you suggest, but it is data and time hungry. We did apply for a grant to do this a couple of years ago, but the need was not appreciated then. In a way this is quite mundane – where people go each day and who they meet, but it would be so useful for many tasks (e.g. crime and transport modellings as well as epidemics).
Great! Sorry for my slow “approval” of this comment to be posted — WordPress used to alert me if comments appear, but this doesn’t seem to be happening anymore, so I have to remember to check manually.
https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html