#### Initial results for the Pfizer vaccine suggest that the vaccine is 95% effective. Eric Jacquier, clinical professor of finance, explains how this relatively small test sample can be used to make forecasts for the entire population.

Pfizer just submitted their phase III trial to the FDA. It reveals a vaccine with an impressive 95% efficacy, a spectacular result. The public statement indicates that Pfizer monitored about 43660 volunteers for 3 months, split into two groups – aka “arms”, of 21830, one receiving a placebo the other the vaccine.

*Efficacy* is the ratio of COVID cases avoided over the total number of cases. Pfizer reports 162 COVID cases in the placebo arm versus 8 in the vaccine arm, an efficacy ratio of (162-8)/162, the reported 95%. Undoubtedly a rare uplifting conversation topic over Thanksgiving dinner, properly distanced, masked, and reduced in size, please. How do such small numbers, 162 and 8, allow us to make precise forecasts for an entire population? How confident are we that the efficacy will be close to 95%? What can we say about how many cases will be avoided in a vaccinated population?

**Allowing for uncertainty**

We do not know the efficacy because we do not know the two infection rates in the ratio. Rather, Pfizer *estimated* them from two samples of 21830 volunteers, 0.74% and 0.037% for the vaccinated and unvaccinated arms. Other sets of 21830 volunteers with the same characteristics or the same volunteers at another period would have resulted in numbers different from 162 and 8. This is the *estimation uncertainty* that must be incorporated in any forecast. As a ratio of uncertain estimates, the 95% efficacy number is itself uncertain.

Table 1 documents the uncertainty in infection rates from the Pfizer volunteers. The two rows demonstrate both the trial size, as well as scaled to a population of a million for which we may want to make a prediction. For the trial estimate, the vaccine reduces infection rates from 7400 to 370 per million individuals. With no vaccine, estimation uncertainty amounts to 50% chances of between 7000 to 7800, 90% chances of between 6500 and 8400 cases per million. The right half of Table 1 shows the (*“unfavorable”* aspect of ) the uncertainty for a vaccinated population. Per million, we have 20% chances of 390 to 470, 15% chance of 470 to 640, 3% chances of more than 640, negligible chances of more than 730 cases.

Informally, Table 1 shows that the vaccine is highly effective even accounting for uncertainty in a most *pessimistic* way: An unlikely high of 640 cases per million of vaccinated, down from an unlikely low of 6400 per million for an unvaccinated population, still yields a 90% vaccine efficacy. Unfavorable aspects of the uncertainty about the vaccine are the scenarios where the non vaccinated infection rate is low, while the vaccinated rate is high. Inspecting both samples separately does not tell us how likely these scenarios are. Yet, these two uncertainties interact to produce the uncertainty of the efficacy ratio itself.

**The US population has a much higher infection rate than the Pfizer placebo arm**

Another reason to study the efficacy ratio itself is this: We cannot predict that one million vaccinated US residents will have a0.037% infection rate in the future, as per Table 1. It would assume that unvaccinated they would have had a 0.74% infection rate, as in the Pfizer trial. But it happens that the unvaccinated US population has a much higher infection rate than the Pfizer placebo arm.

The Pfizer trial lasted about 3-months and began in late July. At the same time the USA, population 328 million, recorded 4,552,413 cases, August to the end of October. Further, the Pfizer sample minimum age is 12 years old. The CDC reports only 83 cases under the age of 14 for the whole of 2020. A proper US base population then is the 268 Million over age 14. This gives an infection rate of 1.7%. Even this rate does not include positive individuals who did not get tested. In contrast, Pfizer likely closely monitors all 46,330 volunteers. The Pfizer placebo arm does have a vastly lower infection rate than the US population.

And of course, the US rate is now higher and will be even higher in early January as vaccination starts in earnest.

This is not a problem. It has no reason to bias the results. A crucial trial design feature is for the two arms, vaccinated and placebo, to be as similar as possible. Ideally, the only difference should be that one group was vaccinated, the other not. This is needed for the 8 cases to be comparable to the 162 cases and to have a reliable 95% efficacy estimate. However, it is not so important that the unvaccinated sample be exactly similar to, say a population we want to make predictions for. All we need is that this does not affect the efficacy ratio. This is why we must understand the uncertainty of the efficacy ratio itself.

**Up to 95% chances that the vaccine efficacy is above 92% **

The efficacy ratio is one minus the vaccinated over the non vaccinated infection rate. Its uncertainty follows from that of these two rates. Its exact distribution is easily simulated (black line in Figure 1) or well approximated by a normal distribution (red line). For any x value from 0.9 to 1, Figure 1 shows the probability that the vaccine efficacy is** less** than that value. The plot shows that there are virtually no chances that the vaccine efficacy is below 90%, 5% chances it is between 90% and 92%, 95% chances it is above 92%.

To forecast future COVID cases avoided by vaccination in say, January, we would combine this efficacy uncertainty with a forecast of the relevant population infection rate at the time of vaccination. In the least, we would use the latest infection rate available, better, extrapolate it given the rise in cases.

**From sample to real-world: the long run**

As the period extends, additional information will accumulate on the durability of protection for which this trial provides little information. In the long run, as the vaccine efficacy declines, antigen tests of vaccinated individuals will inform on the durability of the vaccine and the need for re-vaccination. Another, positive, long-run effect unmeasurable in the trial is that as the vaccine gets more widely distributed, it will reduce the number of contagious people circulating in the population. This will in turn reduce transmission and infection rate in the non-vaccinated population. The vaccine will affect some degree of social distancing.

**From sample to real-world: What could go wrong**

The analysis of a trial is based on simplifying statistical assumptions unlikely to be exactly true. Typical assumptions are that the data, namely each volunteer’s chance of getting infected, 1) are independent of each other, and 2) have the same distribution, i.e. the same chance of getting the disease. Two things generally happen when these assumptions fail: 1) uncertainty goes up, and 2) we underestimate uncertainty. It’s easy to see in the math, but we can get some intuition without it.

Why is estimation more precise when “*volunteer’s chances of getting the disease are independent from one another*”? It owes to the amount of information we can get from a given number of volunteers. Take an extreme example: We want to estimate the probability p of getting a disease, say it is 0.5 but we don’t know it, so we collect a sample of volunteers. Take 2 volunteers isolated from each other, one in Rio, the other in Berlin. We have 50% chances of getting the right p of 0.5, one sick and the other not, and 50% chances of getting p wrong, 1 with both sick or 0 with both not sick. Now, if the two volunteers live in the same house or work in the same office, as one gets sick, the other will. We now get a p of either 0 or 1, never 0.5. This increases the uncertainty of p which is either 1 or 0, rather than 0, 0.5, or 1.

Of course, we don’t do trials with 2 volunteers, rather 21,830. The more volunteers the more precisely we estimate p, but the story is similar. Take 4 volunteers, p can be 0, 0.25, 0.5, 0.75, or 1. We get 0.5 on average but we are likely to get 0.25 or 0.75, and much less likely to get the worst estimates, 0 and 1. Increasing the sample size from 2 to 4 yields a more precise estimate of p. If the four volunteers live or work together, we are back to p = 0 or 1. Take 1,000 volunteers living in 100 family groups of 10. We violate independence again since if someone gets sick the family gets sick. We get no more information than with 100 *independent* volunteers. Further, Table 1 and Figure 1 would be wrong because all computations assume independence. They would understate the uncertainty of the estimates. Trial designers make sure that volunteers are “independent” from each other in this statistical sense.

The same consequence follows, increased and understated uncertainty, when volunteers have an unequal chance of being sick, i.e., the sample is heterogeneous. Take 10,000 volunteers, half young people, and half seniors with p_{Y}=0.1 p_{O}=0.9 chances of getting sick. Say we estimate a single probability of getting sick, p, in the sample, we find p = 0.51. It may be useful to discuss the average infection probability for the population as a whole, and 0.51 is not a bad estimate of (0.1+0.9)/2. Yet again, standard statistics that assume all volunteers have the same p understate the uncertainty of this 0.505. The problem gets a bit philosophical, understate uncertainty **relative to what**? relative to a sample of 10,000 **identical** volunteers with true p=0.5. For Pfizer, the 21830 unvaccinated volunteers with an estimated 0.74% infection rate, likely differ vastly, young and old, in more or less good cardiovascular and respiratory health, etc. The sample is not homogenous. We can do two things. First, we can adjust the calculus to allow for heterogeneity if we want to study the precision of the estimate of this single p, the 0.74% for Pfizer, or the average of p_{Y} and p_{O}. Second, we can break the sample into two homogeneous smaller samples, the young and the seniors. Each sample has only 5,000 volunteers, but it is homogenous. The uncertainty is higher than with 10,000 volunteers, but we compute it correctly. Sample heterogeneity is taken seriously, the brief Pfizer announcement shows that they clearly studied sub-samples.