The Zero Times Infinity Problem
August 24, 2013
There are two ways to keep yourself safe while rock climbing: The first option is to protect yourself carefully with ropes and gear, so that if you fall you won't fall too far or hard. The second option is to make sure not to fall. I think that the option climbers choose reflects their perception of what I like to call the "zero times infinity" problem, in which you multiply a near-zero probability by a near-infinite loss.
The first group of climbers think, "If I had a bad fall, that would be really, really bad, so I'm gonna try to make sure that none of my falls can be like that." These people are effectively saying zero times infinity is infinity. The second group of climbers think, "Let me just make extra sure not to fall, and then it doesn't really matter how bad it would be if I did because it's not gonna happen." This group takes zero times infinity to be zero. For the record, I'm in the first group for climbing, (but of course the second for measure-theoretic integration!)
The zero times infinity problem has more general applications to disaster planning: The probability of some disaster occurring is very small, but the loss incurred if the disaster happened would be very large. Unfortunately, by the very nature of this problem, it's hard to make reasonable quantitative inferences about it: Because the disaster very rarely happens, or perhaps has never happened yet, it's hard to estimate the probability of occurrence. And since we've only observed between zero and a handful of disasters, it's hard to estimate how much loss you will incur when or if it happens.
And it makes quite a difference! If the yearly probability of a major disaster in Chicago is anywhere between 0.1% and 1%, and the loss from this disaster is anywhere from \$1 billion and \$10 billion, then the yearly expected loss is anywhere between \$1 million and \$100 million, which makes the difference between ignoring the possibility of disaster and actively planning for it.
To get a sense of how tough this problem is statistically, suppose we're trying to estimate the probability of some disaster happening in a given year. It hasn't happened in any of the last 100 years, and we don't have any data from before that. Let's get a 95% confidence interval for the disaster probability: Well, if we suppose that the disasters occur in any year independently, and the true (fixed) disaster proportion is $p$, then the probability of having seen no disasters so far is $(1 - p)^{100}$. To get a 1-sided 95% confidence interval, we set this probability between $0.05$ and $1.0$, and get $p \in [0, 1 - 0.05^{1/100}] = [0, 0.0295]$. So with 95% confidence, the yearly disaster probability could be as large as 3% a year, which is large enough to plan for, or as small as 0, which means we have no problem.
In other words, despite a hundred years worth of data, we still don't know whether or not to plan for disaster! What can we do about this? Here are three general strategies:
Firstly, we can sometimes expand the amount of data we have by looking at "semi-parallel universes". Perhaps any given city has only a hundred years of disaster data, but collectively, the world's cities have many times more. It's probably possible to use this auxiliary data, for example, to quantify how summer temperatures and population densities contribute to the risk and cost of major fires. Or you could look at climbing statistics to determine the rate at which strong, careful climbers fall. Incorporating these horizontal parallels seems to work well for many insurance problems.
Not every kind of disaster has such nice "horizontal" parallels that can be drawn, though. For example, try figuring out the probability of a major global epidemic and the associated cost it would have. There are no parallel worlds we can observe to glean analogous data from, and (thankfully) there hasn't been a major epidemic since the 1918 flu pandemic killed 3-5% of the world's population. It might, however, be possible to draw "vertical" parallels, by analyzing the frequency, size, and cost of more small-scale epidemics, and trying to extrapolate these numbers to larger-scale events.
Finally, we can also make progress on the zero times infinity problem by incorporating disaster-specific modeling: For the back-of-the-envelope calculations above, for example, we supposed that disasters occur independently, which is actually false for disasters like forest fires: If there hasn't been a forest fire for a while, my understanding is that this makes it more likely for there to be a new forest fire, because the easily-flammable brush keeps accumulating. Building disaster-specific models allows you to incorporate this information, which can be valuable if you don't have much other data.
Zero times infinity problems are extremely important both to individuals and to society. Failing to solve them can lead to major financial loss, serious injury, or death. These high-stakes problems, however, yield interesting and challenging statistical problems, which we can sometimes solve with some creativity.