As the Covid-19 pandemic continues, the proliferation of models of the spread of the virus rivals the spread of the virus itself. Looking at these models is dizzying, and their accuracy so far, especially with respect to heterogeneity across regions, is unimpressive. Of course, modeling an ongoing pandemic is hard, but recently a more fundamental problem has started to bother me: perhaps the models we’re seeing are not the models we need. I do not know if the models we need exist and I’m unaware of them, if they don’t exist but could, or if they are beyond our grasp. What models do I mean? It pains me as a physicist to write this, but: detailed ones. I will explain, beginning with an analogy.
Suppose delivery routes in your city are, on average, 30 miles long. You have a delivery service and you want to know how much fuel in total your vans will need to make 5 trips; this depends on the distance traveled. You know the exact routes the vans take. Do you conclude that the total distance is 150 miles? Of course not! You don’t take the generic solution to the problem, you take the specific solution that’s the sum of the five actual distances that your vans will travel. This might be close to 150; it might not. If we wanted to make predictions that were optimal over a wide range of routes, or a wide range of potential universes of delivery services, we’d use a general model, but otherwise, specificity wins.
For disease spreads, we can imagine various modeling approaches. There are:
(1) Simple differential equation based models, for example SIR models. These are elegant and interesting, describing in a generic sense the rise and fall of diseased populations as a function of just a few parameters. Any graduate student in the sciences should be able to read, write, and simulate the equations of such a model, and even fit data to the model to estimate values of the parameters. (Keep in mind, of course, that one can always fit any data to any model; this in itself doesn’t tell you whether the model is any good.) This is an excellent approach if one wants to gain a general understanding of epidemic spread, just as minimal models are, throughout science, an excellent way to learn about the general features of all sorts of phenomena. A simple parameter to extract from such a model is R0, the basic reproduction number, which characterizes how many secondary cases are generated per infection in a completely susceptible population; R0 > 1 indicates increasing infection. (See, however, below…)
(2) Models that include network connectivity. Spatial structure is an important part of real-life transmission — I can only transmit a contagious disease to someone I contact. We can combine models like the SIR model with graphs of varying degrees and types of connectivity. Lots of people do this. Again, there are general insights to be gained.
(3) Models that include the realistic connectivity networks of the world we live in. People in different regions, or with different lifestyles, have different patterns of connectivity with others. An exact (and absurdly complex) map would show each person’s connections with others; an approximation to this would have patterns of transport, types of housing, densities and occupancies of buildings, and so on represented by samples drawn from the appropriate distributions. A New York City subway rider does not have the same connectivity graph as a bike commuter riding through a park. With information on actual structure and habits, we can model infection dynamics on realistic networks. This has been done in the past! See this 2018 study, for example: “Measurability of the epidemic reproduction number in data-driven contact networks” . Notably, the authors write: “We find that the classical concept of the basic reproduction number is untenable in realistic populations, and it does not provide any conceptual understanding of the epidemic evolution.”
Why? Because the network matters! Again quoting from the above paper, “the heterogeneity and clustering of human interactions (e.g., contacts between household members, classmates, work colleagues) alter[s] the standard results of fundamental epidemiological indicators, such as the reproduction number and generation time over the course of an epidemic.”
This shouldn’t surprise us, for two reasons. The general approach of (2) above already tells us that geometry matters, not for the existence of infectious spreading but for its characteristics. It strikes me as analogous to percolation theory in physics — asking what fraction of sites on a lattice I’ll have to randomly occupy with a stepping stone before I can cross from one side to the other. That fraction varies a lot on different lattices, though it exists for all of them.
More importantly, our goal in analyzing an ongoing pandemic is to understant that pandemic, not the general case. We care about the spread of infection in the particular network of roads, buildings, jobs, hospitals, etc., that characterize, and differently characterize, New York, Oregon, Singapore, etc. It’s only minimally helpful to apply the same naive model to all, even as a curve-fitting exercise, because while we can always fit effective parameters, the regions’ curves themselves may be profoundly different.
Again to use an analogy, this time from my own lab’s work: We recently examined the surprisingly large impact of weak antibiotics on gut microbes, looking in zebrafish as a model organism (paper; blog post). In part of this work, we realized that we could understand bacterial dynamics with a simple biophysical model that makes testable predictions and that has only a few parameters that could be fit from the data. We are quite happy about this — the existence of a useful minimal model suggests that we understand the mechanisms that broadly govern our bacterial populations. However, if we cared about one specific fish, and that fish only, we wouldn’t use this general model except as a very rough guide — we’d instead carefully map that fish’s intestinal anatomy, the fluctuations of its particular bacteria, etc.
Why, then, do we not make realistic (and much more detailed) pandemic models?
One possibility is that we do, but these models are not widely publicized. I would like to believe this, but I am doubtful. The influential IHME model is simply a region-by-region curve fit to a Gaussian error function (!!) Most others, e.g. this that does county-by-county fitting in Virginia, this from UPenn, and all but one of the models listed here ignore spatial structure. The one exception is from the “GLEAM” project, which involves the lead author of the 2018 realistic network paper noted above. This looks fascinating, but there isn’t a lot of information at the web site, and its last updated projection was 17 days ago.
Another possibility is that it’s much easier to make poorer models. Certainly it seems like everyone these days is doing it. We’ve progressed a little bit beyond a few weeks ago when I (really) saw graphs fitting simple exponentials to mortality data, with y-axes extending to the billions, feeding hysterical conclusions. We haven’t progressed very far, though, and we’re inundated with mediocre, easy models. Ignoring them is probably best, but it’s hard to do.
I’d like to imagine that like the Manhattan Project, there is a sequestered band of scientists somewhere doing the difficult work of slogging through demographic, transport, and city planning data to construct realistic pandemic models, and that they’re so occupied with this real work that they aren’t wasting time Tweeting. I am skeptical they exist, though. They should, however — it would a good resource for the next pandemic.
Extracting a cell from an embryo. A worse version of an illustration I made for the book chapter I’m presently working on — see the link!
— Raghuveer Parthasarathy, April 21, 2020
 Q.-H. Liu, M. Ajelli, A. Aleta, S. Merler, Y. Moreno, A. Vespignani, Measurability of the epidemic reproduction number in data-driven contact networks. PNAS.115, 12680-12685 (2018). Link