Last edited on January 18, 2018 at 08:16:08 AM

Coverage Probability

The coverage probability of a confidence interval procedure for estimating \(\pi\) at a fixed value of \(\pi\) is

\[C_n(\pi) = \sum_{k=0}^nI(k, \pi)\binom{n}{k}\pi^k(1 - \pi)^{n-k}\]

where \(I(k, \pi)\) equals 1 if the interval contains \(\pi\) when \(X = k\) and equals 0 if it does not contain \(\pi\).

Coverage Probability (concept)

  • The coverage probability of a confidence interval is the proportion of all possible confidence intervals for a fixed \(\pi\) that contain \(\pi\).

  • Confidence intervals are constructed at a given confidence level \((1 - \alpha)\), which is referred to as the nominal coverage probability or the nominal confidence level.

  • In an ideal setting, the nominal confidence level will equal the coverage probability; however, when assumptions used to derive a confidence interval are not satisfied, the actual coverage probability can be either less than or greater than the nominal confidence level.

Example 8.24

Consider a random variable \(X \sim Bin(n = 25, \pi)\), and define \(P = \frac{X}{n}\).

  • Compute the coverage probability for a 95% Wald (asymptotic) confidence interval if \(\pi = 0.70\).

  • The Wald (asymptotic) confidence interval for \(\pi\) is given below.

\[CI_{1 - \alpha}(\pi) = \left[ p - z_{1 - \alpha/2} \sqrt{\frac{p(1 - p)}{n}}, p + z_{1 - \alpha/2} \sqrt{\frac{p(1 - p)}{n}}\right]\]

Solution

To compute \(C_{n = 25}(\pi = 0.70)\), one must consider all the possible outcomes for \(X\) when \(n = 25\). The random variable \(X\) can assume values \(0, 1, 2, \ldots,25\), and for each value of \(X\) a different value of \(p\) (the sample proportion of successes) results, which one uses with the Wald (asymptotic) confidence interval to compute a 95% confidence interval.

Code

n <- 25            # number of Bernoulli trials
alpha <- 0.05      # alpha level
x <- 0:n           # vector containing values RV can assume
p <- x/n           # vector of possible p values
z <- qnorm(1 - alpha/2)     # critical value
ME <- z*sqrt(p*(1 - p)/n)   # margin of error
lcl <- p - ME      # lower confidence limit  
ucl <- p + ME      # upper confidence limit  
PI <- 0.70         # PI = P(Success)
BP <- dbinom(x, n, PI)      # Binomial probability
cover <- (PI >= lcl) & (PI <= ucl)  # Logical vector 

Code (continued)

RES <- cbind(x, p, lcl, ucl, BP, cover) # cover is coerced to 0/1
DT::datatable(round(RES, 4), options = list(pageLength = 5, 
                                            autoWidth = TRUE))

Computing the Coverage Probability

  • Recall that \(C_n(\pi) = \sum_{k=0}^nI(k, \pi)\binom{n}{k}\pi^k(1 - \pi)^{n-k}\).
  • Need to programatically add all of the Binomial Probabilities (BP) values when the Wald interval contains \(\pi\).
x[cover]
[1] 13 14 15 16 17 18 19 20 21

In this problem, \[C_{n = 25}(\pi = 0.70) = P(X = 13) + \cdots + P(X = 21)\].

Final Code

dbinom(x[cover], n, PI)
[1] 0.02677676 0.05355351 0.09163601 0.13363585 0.16507958 0.17119364
[7] 0.14716646 0.10301652 0.05723140
sum(dbinom(x[cover], n, PI))
[1] 0.9492897
binom::binom.coverage(p = 0.70, n = 25, 
                      conf.level = 0.95, method = "asymptotic")
      method   p  n  coverage
1 asymptotic 0.7 25 0.9492897

\(C_{n = 25}(\pi = 0.70) = 0.9492897\).

Example 8.24 (continued)

  • Compute and graph the coverage probability for the Wald (asymptotic) confidence interval, using a confidence level of 95% with 2000 equally spaced values of \(\pi\).

  • Previously, we computed the coverage probability when \(\pi\) was 0.70. In this problem, we will need to compute 2000 coverage probability values and graph those against the 2000 values of \(\pi\).

R Code

n <- 25            # number of Bernoulli trials
alpha <- 0.05      # alpha level
CL <-  1 - alpha   # Confidence level
x <- 0:n           # vector containing values RV can assume
p <- x/n           # vector of possible p values
z <- qnorm(1 - alpha/2)     # critical value
ME <- z*sqrt(p*(1 - p)/n)   # margin of error
lcl <- p - ME      # lower confidence limit  
ucl <- p + ME      # upper confidence limit  
m <- 2000
PI <- seq(1/m, 1 - 1/m, length = m)   # PI = P(Success)
P_cov <- numeric(m) # allocating storage space
for(i in 1:m){
cover <- (PI[i] >= lcl) & (PI[i] <= ucl)  # Logical vector 
P_cov[i] <- sum(dbinom(x[cover], n, PI[i]))
}

Final Graph Code

plot(PI, P_cov, type = "l", xlab = expression(pi), 
     ylab = "Coverage Probability", ylim = c(0.0, 1.05))
lines(c(1/m, 1 - 1/m), c(CL, CL), col = "red", 
      lty = "dotted")
text(0.5, CL + 0.05, paste("Targeted Confidence Level =", CL))

Final Graph

ggplot2 code

DF <- data.frame(PI, P_cov)
library(ggplot2)
ggplot(data = DF, aes(x = PI, y = P_cov)) + 
  geom_line() + 
  theme_bw() + 
  labs(x = expression(pi), y = "Coverage Probability") + 
  geom_hline(yintercept = CL, color = "red", lty = "dashed") + 
  geom_text(aes(x = 0.5, y = CL + 0.05), 
            label = paste("Targeted Confidence Level = ", CL))

ggplot2 Graph

Using binom

library(binom)
binom.plot(n = 25, method = binom.asymp, np = 2000)

Better Confidence Intervals for \(\pi\)

  • Wilson confidence interval

  • Agresti-Coull confidence interval

  • Clopper-Pearson confidence interval

Wilson Confidence Interval

\[ \mathbb{P}\left(P-z_{1-\alpha/2}\sqrt{\frac{\pi(1-\pi)}{n}}\leq\pi\leq P + z_{1+\alpha/2}\sqrt{\frac{\pi(1-\pi)}{n}}\,\right)=\\1-\alpha \] Solution to above is

\[ CI_{1 - \alpha}(\pi) = [lcl, ucl], \] where \(lcl = \dfrac{p+\frac{z^2_{1-\alpha/2}}{2n}-z_{1-\alpha/2}\sqrt{\frac{p(1-p)}{n}+\frac{z^2_{1-\alpha/2}}{4n^2}}}{\left(1+\frac{z^2_{1-\alpha/2}}{n} \right)}\), and \(ucl = \dfrac{p+\frac{z^2_{1-\alpha/2}}{2n}+z_{1-\alpha/2}\sqrt{\frac{p(1-p)}{n}+\frac{z^2_{1-\alpha/2}}{4n^2}}}{\left(1+\frac{z^2_{1-\alpha/2}}{n} \right)}\).

Computing Options for Wilson (score) Confidence Interval

  • Use prop.test()

  • Use binom.confint() from binom

prop.test(x = 26, n = 40, correct = FALSE, conf.level = 0.90)$conf
[1] 0.5200677 0.7609263
attr(,"conf.level")
[1] 0.9
library(binom)
binom.confint(x = 26, n = 40, conf.level = 0.90, methods = "wilson")
  method  x  n mean     lower     upper
1 wilson 26 40 0.65 0.5200677 0.7609263

Agresti-Coull Confidence Interval for \(\pi\)

\[ CI_{1-\alpha}(\pi)=\left[\tilde{p}-z_{1-\alpha/2} \sqrt{\frac{\tilde{p}(1-\tilde{p})}{\tilde{n}}},\: \tilde{p}+z_{1-\alpha/2} \sqrt{\frac{\tilde{p}(1-\tilde{p})}{\tilde{n}}} \right] \]

where \(X\) denotes the number of successes in a sample of size \(n\),

  • \(\tilde{n} = n + z^2_{1 - \alpha/2}\), and

  • \(\tilde{p} = \frac{1}{\tilde{n}}\left(X + \frac{1}{2}z^2_{1 - \alpha/2} \right)\).

  • Compute with binom.confint() using methods = "ac"

binom.confint(x = 26, n = 40, conf.level = 0.90, methods = "ac")
         method  x  n mean    lower    upper
1 agresti-coull 26 40 0.65 0.519717 0.761277

Clopper-Pearson Confidence Interval for \(\pi\)

Often referred to as an "exact" confidence interval for \(\pi\). The Clopper-Pearson confidence interval is

\[ CI_{1-\alpha}(\pi)=\left[\beta_{\alpha/2, x, n - x + 1}, \beta_{1 - \alpha/2, x + 1, n - x} \right] \] where \(x\) is the number out of \(n\) observed successes and \(\beta_{\alpha/2, x, n - x + 1}\) and \(\beta_{1 - \alpha/2, x + 1, n - x}\) are the \(\alpha/2\) and \(1-\alpha/2\) percentiles of the standard \(\beta(\alpha,\beta)\) distribution. The function binom.confint() from the binom package will return a Clopper-Pearson confidence interval when the user provides the argument methods = "exact".

Computing Clopper-Pearson Confidence Interval

alpha <- 0.10
n <- 40
x <- 26
CI <- c(qbeta(alpha/2, x, n - x + 1), qbeta(1 - alpha/2, x + 1, n - x))
CI
[1] 0.5080545 0.7744675
binom.confint(x = x, n = n, conf.level = 1 - alpha, method = "exact")
  method  x  n mean     lower     upper
1  exact 26 40 0.65 0.5080545 0.7744675

Which One?

Expected Width of 95% Confidence Intervals when \(n = 20\)