Wikipedia:Reference desk/Archives/Mathematics/2020 March 18

= March 18 =

How many lives would preventing a single additional COVID-19 infection today save in all ? (Galton-Watson / Poisson branching question)
I've tried asking this on Stackoverflow and Mathoverflow, but it seems to be getting voted down. I'd be so grateful if somebody here could have a look:

Consider a tree process with Poisson branching with mu = 0.9, to simulate an epidemic decaying under strong social isolation.

How many total infections would stopping a single infection at the start of the tree save in all ?

In we assume an average CFR of 3.4%, how many lives is that in all ?

Bonus points for an image of what typical trees look like with mu = 0.9

So to try to make the model clearer: Start with a node. That node will have n forward branches, where n = 0 or 1 or 2 or 3 .... according to a Poisson distribution with mean 0.9. This generates 0 or 1 or 2 or 3 ... new nodes. Repeat over all the new nodes, until an iteration produces no further new nodes. What do the trees that are the result of such a process look like? (ie a sample of trees). And counting up the total of new nodes, what is the average of that number?

I think the answer is 10 new infections, so that each single new infection would cost an average 0.34 lives.

But if somebody could either confirm the maths for me, or run a simulation to show me what the trees look like, I would be enormously grateful.

Motivation: Last night I managed to persuade my rabbi to cancel her service this Saturday morning that might have had 120 people in it. How many new transmission events would there be in that sanctuary and in that kiddush hall? I don't know, I'm not an epidemiologist. But I think that the effect if each transmission event would potentially be an average of 0.34 lives(*), of people who were never there, and never had any choice about it.

Have I got this right ?

I'm frantic to get a second pair of eyes on this, because if it's right, it's something everyone needs to know: the cost in lives of each single additional infection they fail to stop.

(*) In reality it would actually be more than 0.34 lives, because the UK is not yet doing strong social isolation, so there's still going to be n rounds of full-speed doubling on top of the 0.34

And yes, I know this is a spherical cow problem. But sometimes spherical cows are useful.

Thanks in advance, Jheald (talk) 08:29, 18 March 2020 (UTC)


 * Given the expected number of infections, we can compute the expected number of fatalities by simply multiplying it with the CFR – provided that we overlook the nicety that the latter is usually defined in relation to the number of diagnosed cases. This is not an issue if we replaced "infected" by "clinically ill". With μ representing the expected number of sprouts from any given tree node (where it is not relevant whether that number has a Poisson distribution or some other distribution as long as this is independent for different nodes), and E being the expected number of nodes in the tree, if E < ∞, it has to satisfy the equation E = 1 + μE. It follows that E = 1 / (1 − μ). Using the value μ = 0.9, we get E = 10. Multiplication with a CFR of 0.034 gives, indeed, 0.34. However, these nodes represent human individuals who in reality run the risk of getting infected from many sides, and what this tree model does not account for is that an individual surviving on one contact tree, so to speak, may be a fatality on another tree. The number obtained is at best an upper bound. This simple tree model only works at the onset of a potential epidemic, when virtually no one is infected yet. In the midst of an epidemic, it may not be adequate for this problem; you need to consider networks. --Lambiam 12:05, 18 March 2020 (UTC)
 * The question makes an assumption that if X caught the virus from W, W caught it from U and so on to B catching it from A, then if you could have isolated A at the start then X would not get the disease. And while it's true that the disease would have had to take a different path to get to X, that doesn't mean B couldn't have gotten the virus from A' instead and X would still get it anyway, or that C couldn't have gotten it from B', etc. I think the only person who would have truly made a difference was the very first person to get the virus; if we could somehow go back in time and somehow figure out who that person was and isolate him or her before they passed the virus onto anyone else then the whole epidemic could have been avoided in the first place. But once the genie was out of the bottle I doubt that what would happen to a single individual would make much difference. Slowing the rate of infection, when averaged over an entire population, can make a difference, but individuals not that much. Even then, the relationship between rate of infection and the number of people to be infected eventually is highly non-linear and difficult to model; assuming it's a differentiable or even continuous function may not be realistic. --RDBury (talk) 13:44, 18 March 2020 (UTC)
 * The usual mathematical models for the spread of infectious disease involve continuous functions and do not give rise to singularities. There is a connection to percolation theory, where models may exhibit singularities at the critical probability. That requires, though, that the underlying graph is infinite, as it usually is in percolation models. --Lambiam 17:24, 18 March 2020 (UTC)
 * What I was thinking about was the rate of transmission, which they call R0 in the article you just mentioned. If R0 < 1 then the disease dies out, there is no epidemic and the total number of infected people is small, presumably proportional to the number of people initially infected. If R0 > 1 then the number of people infected grows exponentially at first, but a logistic curve is more realistic in the long run. In this case the number of number of infected people is large, presumably a percentage of the entire population. There seems to be a discontinuity at R0 = 1. You're right in that the usual models are differentiable with respect to time, but I was thinking about the steady state (i.e. t=∞) as a function of the model's parameters, which can have singularities even if the model is differentiable as a function of time. --RDBury (talk) 17:15, 19 March 2020 (UTC)
 * R=1 is the classic 'interesting' parameter value of the Galton-Watson process, often motivated by the question: what typically happens to the survival and prevalence of a surname, inherited from father to child, in a population the size of which is approximately static over time. See both the en-wiki and the fr-wiki articles for different but complementary discussions of this.  It turns out that for R <= 1 eventual extinction is predicted with probability 1.  A population may still become extinct with R > 1, but the probability is less than one.    However, in the R=1 case, the big population must still be made of somebody, so how does this work?  If one starts a number of surnames (genetic markers, neutrons etc) running at time 0, most random walk their way to zero relatively quickly.  A few do not.  These are the ones that random-walked away from zero, and happened to becomes very large, unlike those that remained in the danger zone.  As a result, the extended population rapidly becomes dominated by rather few surnames, the rest having died out.  Ultimately one surname becomes the unique survivor, all others having disappeared; until ultimately it too disappears.  Alternatively, if there imposes a hard constraint fixing the population total to a particular value (cf the difference between a canonical and grand canonical ensemble), the predicted balance of proportions in the varying-total model works pretty well as prediction of the evolving balance of proportions in the fixed-total model, except that per that constraint the final population (and with it the final surname) never disappears/ Jheald (talk) 15:03, 20 March 2020 (UTC)
 * Thanks very much. Regarding This simple tree model only works at the onset of a potential epidemic, when virtually no one is infected yet - as of March 19, that is the reality of where we are at the moment in Europe (including even Italy) -- we are still a large number of doublings from the peak, the number of those so far infected or infectious is still only quite a small proportion, and almost all of the epidemic is still to come.  As to the numbers who would nevertheless get infected, by some other route, anyway, that will depend on which strategy your government is running, and how successful they are at it.  Up until Monday afternoon, the UK government were on track for a "mitigate" strategy, where the epidemic was expected to run its course, and a proportion ultimately infected of about 0.8 -- with a projected best case of 250,000 deaths, though in reality at least twice that because intensive care would have been up to 16x overloaded (as a country-wide average - some spots would be even more), plus more if non-ICU medical care also buckled, plus more if the isolation strategies weren't as successful or as widely taken up as modelled.  (Figures from Monday evening's paper out of Imperial ).  We are now told that the government has switched to a "suppress" strategy, which according to the team at Imperial might come in at only 20,000 deaths -- so presumably only one-twelfth as many infected, a probability of only about 0.07 if they escape being infected by this chain.  The reality could end up somewhere between the two, if the strategy failed to get R below 1, so failed to stop the exponential growth, but sufficiently spread the peak that a larger proportion of the population still remained uninfected by the time mass-vaccination became a possibility.


 * The other thing that occurred to me is that there is of course some multiple-counting in assigning the responsibility for a downstream death to the every node in the chain upstream of it. The situation is perhaps analogous to that in An Inspector Calls.  While none of the family may be solely responsible for the tragedy, none can be absolved of it.


 * It is a responsibility on all of us to try to limit those downstream infections. Jheald (talk) 17:12, 19 March 2020 (UTC)


 * (Added after the section was archived)
 * Thanks again to User:Lambiam and User:RDBury for the above.
 * I finally got round to doing some simulations. Worth noting that the number of infections in the model above is always at least one, given that there has been an initial infection.  In 40% of cases (for Poisson with μ=0.9) the chain ends right there.  But I was surprised, from the simulations, as to quite how heavy-tailed the distribution for the total number infected in the chain becomes.  The standard deviation σ appears to go as roughly $$E \sqrt{E}$$ (compared to eg $$\sqrt{E}$$ for a Poisson distribution), which works out at about 31.6 for the Poisson μ=0.9 case.  So if the chain or tree is not very short, it can be really quite long: if I run ten simulations, the number in the chain or tree will most often be 0 or 1 or 2.  But if not, it will not uncommonly be in two figures or even three figures.  This seems consonant with the survival curve plotted at Galton–Watson process for μ=0.9, which shows there is an appreciable probabilty that the chain or tree may persist even up to 30 or so generations.
 * I hope to add some plots of typical trees in due course, but am on the look out for a nice library for displaying trees -- I haven't been able to search very much so far, given verious challenges I have been having with internet access.
 * As to realism or not of the model, one thing it does ignore is variation in how much each person is likely to spread the disease, whether due to genetics or occupation or behaviour or whatever (cf eg super-spreaders). I presume this would tend to make the tree a bit "clumpier" than Poisson for a given average number of sprouts μ. The other side of this, I think, is that it might be a little more probable that anyone you give it to might have caught it anyway, per User:RDBury above.
 * The other thing it doesn't include is any relationships between generations of the tree in sprouting probabilities -- for example in the current household lockdown, if one member of a household gets it, they might be quite likely to spread it to other members of the household, but those other members of the household might not be so likely to spread it further. Similarly, on a larger scale, workplaces might act as 'compartments', and the more that different compartments can be isolated from each other, all of that may shorten typical chains for a particular population-wide value of μ.  As of course does advice to stay at home if infected.  Any of this would need a much more sophisticated model to capture.
 * But as a rough back-of-an-envelope estimate of one's personal responsibility not let the infection propagate onwards, I still think the broad-brush picture above does help usefully make one think about the potential consequences of actions. Jheald (talk) 12:04, 29 March 2020 (UTC)