Extrapolate coverage and width using sub-sampled bootstrap confidence intervals.
Source:R/calc_coverage.R
extrapolate_coverage.Rd
Given a set of bootstrap confidence intervals calculated across sub-samples with different numbers of replications, extrapolates confidence interval coverage and width of bootstrap confidence intervals to a specified (larger) number of bootstraps. The function also calculates the associated Monte Carlo standard errors. The confidence interval percentage is based on how you calculated the lower and upper bounds.
Usage
extrapolate_coverage(
data,
CI_subsamples,
true_param,
B_target = Inf,
criteria = c("coverage", "width"),
winz = Inf,
nested = FALSE,
format = "wide",
width_trim = 0,
cover_na_val = NA,
width_na_val = NA
)
Arguments
- data
data frame or tibble containing the simulation results.
- CI_subsamples
list or name of column from
data
containing list of confidence intervals calculated based on sub-samples with different numbers of replications.- true_param
vector or name of column from
data
containing corresponding true parameters.- B_target
number of bootstrap replications to which the criteria should be extrapolated, with a default of
B = Inf
.- criteria
character or character vector indicating the performance criteria to be calculated, with possible options
"coverage"
and"width"
.- winz
numeric value for winsorization constant. If set to a finite value, estimates will be winsorized at the constant multiple of the inter-quartile range below the 25th percentile or above the 75th percentile of the distribution. For instance, setting
winz = 3
will truncate estimates that fall below P25 - 3 * IQR or above P75 + 3 * IQR.- nested
logical value controlling the format of the output. If
FALSE
(the default), then the results will be returned as a data frame with rows for each distinct number of bootstraps. IfTRUE
, then the results will be returned as a data frame with a single row, with each performance criterion containing a nested data frame.- format
character string controlling the format of the output when
CI_subsamples
has results for more than one type of confidence interval. If"wide"
(the default), then each performance criterion will have a separate column for each CI type. If"long"
, then each performance criterion will be a single variable, with separate rows for each CI type.- width_trim
numeric value specifying the trimming percentage to use when summarizing CI widths across replications from a single set of bootstraps, with a default of 0.0 (i.e., use the regular arithmetic mean).
- cover_na_val
numeric value to use for calculating coverage if bootstrap CI end-points are missing. Default is
NA
.- width_na_val
numeric value to use for calculating width if bootstrap CI end-points are missing. Default is
NA
.
Value
A tibble containing the number of simulation iterations, performance criteria estimate(s) and the associated MCSE.
References
Boos DD, Zhang J (2000). “Monte Carlo evaluation of resampling-based hypothesis tests.” Journal of the American Statistical Association, 95(450), 486--492. doi:10.1080/01621459.2000.10474226 .
Examples
dgp <- function(N, mu, nu) {
mu + rt(N, df = nu)
}
estimator <- function(
dat,
B_vals = c(49,59,89,99),
m = 4,
trim = 0.1
) {
# compute estimate and standard error
N <- length(dat)
est <- mean(dat, trim = trim)
se <- sd(dat) / sqrt(N)
# compute booties
booties <- replicate(max(B_vals), {
x <- sample(dat, size = N, replace = TRUE)
data.frame(
M = mean(x, trim = trim),
SE = sd(x) / sqrt(N)
)
}, simplify = FALSE) |>
dplyr::bind_rows()
# confidence intervals for each B_vals
CIs <- bootstrap_CIs(
boot_est = booties$M,
boot_se = booties$SE,
est = est,
se = se,
CI_type = c("normal","basic","student","percentile"),
B_vals = B_vals,
reps = m,
format = "wide-list"
)
res <- data.frame(
est = est,
se = se
)
res$CIs <- CIs
res
}
#' build a simulation driver function
simulate_bootCIs <- bundle_sim(
f_generate = dgp,
f_analyze = estimator
)
boot_results <- simulate_bootCIs(
reps = 80, N = 20, mu = 2, nu = 3,
B_vals = seq(49, 149, 20),
)
extrapolate_coverage(
data = boot_results,
CI_subsamples = CIs,
true_param = 2
)
#> K_boot_coverage bootstraps boot_coverage_normal boot_coverage_basic
#> 49 80 49 1.000000 0.975000
#> 69 80 69 0.996875 0.984375
#> 89 80 89 1.000000 0.984375
#> 109 80 109 1.000000 0.996875
#> 129 80 129 1.000000 1.000000
#> 149 80 149 1.000000 1.000000
#> Inf 80 Inf 1.000276 1.012390
#> boot_coverage_student boot_coverage_percentile boot_coverage_mcse_normal
#> 49 0.993750 0.984375 0.0000000000
#> 69 0.996875 1.000000 0.0031250000
#> 89 1.000000 0.996875 0.0000000000
#> 109 1.000000 1.000000 0.0000000000
#> 129 1.000000 1.000000 0.0000000000
#> 149 1.000000 1.000000 0.0000000000
#> Inf 1.004117 1.008974 0.0002755921
#> boot_coverage_mcse_basic boot_coverage_mcse_student
#> 49 0.011425322 0.004391357
#> 69 0.010280625 0.003125000
#> 89 0.011201328 0.000000000
#> 109 0.003125000 0.000000000
#> 129 0.000000000 0.000000000
#> 149 0.000000000 0.000000000
#> Inf 0.006077006 0.002899379
#> boot_coverage_mcse_percentile boot_width_normal boot_width_basic
#> 49 0.009268915 1.245114 1.196489
#> 69 0.000000000 1.237440 1.278346
#> 89 0.003125000 1.243898 1.216022
#> 109 0.000000000 1.242731 1.276201
#> 129 0.000000000 1.244961 1.227860
#> 149 0.000000000 1.248090 1.284844
#> Inf 0.005301301 1.246348 1.293357
#> boot_width_student boot_width_percentile boot_width_mcse_normal
#> 49 1.294746 1.196489 0.03971512
#> 69 1.402500 1.278346 0.03876459
#> 89 1.326687 1.216022 0.03993220
#> 109 1.404885 1.276201 0.03815378
#> 129 1.348227 1.227860 0.03936534
#> 149 1.394158 1.284844 0.03917576
#> Inf 1.421547 1.293357 0.03923332
#> boot_width_mcse_basic boot_width_mcse_student boot_width_mcse_percentile
#> 49 0.03817230 0.04376367 0.03817230
#> 69 0.04046026 0.04738402 0.04046026
#> 89 0.03827794 0.04632374 0.03827794
#> 109 0.03899069 0.04911762 0.03899069
#> 129 0.03895437 0.04695755 0.03895437
#> 149 0.04185961 0.04817379 0.04185961
#> Inf 0.04187225 0.05108521 0.04187225
extrapolate_coverage(
data = boot_results,
CI_subsamples = CIs,
true_param = 2,
B_target = 999,
format = "long"
)
#> K_boot_coverage bootstraps CI_type boot_coverage boot_coverage_mcse
#> 1 80 49 normal 1.000000 0.0000000000
#> 2 80 69 normal 0.996875 0.0031250000
#> 3 80 89 normal 1.000000 0.0000000000
#> 4 80 109 normal 1.000000 0.0000000000
#> 5 80 129 normal 1.000000 0.0000000000
#> 6 80 149 normal 1.000000 0.0000000000
#> 7 80 999 normal 1.000207 0.0002070379
#> 8 80 49 basic 0.975000 0.0114253216
#> 9 80 69 basic 0.984375 0.0102806254
#> 10 80 89 basic 0.984375 0.0112013276
#> 11 80 109 basic 0.996875 0.0031250000
#> 12 80 129 basic 1.000000 0.0000000000
#> 13 80 149 basic 1.000000 0.0000000000
#> 14 80 999 basic 1.010471 0.0053018657
#> 15 80 49 student 0.993750 0.0043913573
#> 16 80 69 student 0.996875 0.0031250000
#> 17 80 89 student 1.000000 0.0000000000
#> 18 80 109 student 1.000000 0.0000000000
#> 19 80 129 student 1.000000 0.0000000000
#> 20 80 149 student 1.000000 0.0000000000
#> 21 80 999 student 1.003628 0.0025534950
#> 22 80 49 percentile 0.984375 0.0092689145
#> 23 80 69 percentile 1.000000 0.0000000000
#> 24 80 89 percentile 0.996875 0.0031250000
#> 25 80 109 percentile 1.000000 0.0000000000
#> 26 80 129 percentile 1.000000 0.0000000000
#> 27 80 149 percentile 1.000000 0.0000000000
#> 28 80 999 percentile 1.007932 0.0046862397
#> boot_width boot_width_mcse
#> 1 1.245114 0.03971512
#> 2 1.237440 0.03876459
#> 3 1.243898 0.03993220
#> 4 1.242731 0.03815378
#> 5 1.244961 0.03936534
#> 6 1.248090 0.03917576
#> 7 1.246120 0.03918583
#> 8 1.196489 0.03817230
#> 9 1.278346 0.04046026
#> 10 1.216022 0.03827794
#> 11 1.276201 0.03899069
#> 12 1.227860 0.03895437
#> 13 1.284844 0.04185961
#> 14 1.289335 0.04156572
#> 15 1.294746 0.04376367
#> 16 1.402500 0.04738402
#> 17 1.326687 0.04632374
#> 18 1.404885 0.04911762
#> 19 1.348227 0.04695755
#> 20 1.394158 0.04817379
#> 21 1.416410 0.05065018
#> 22 1.196489 0.03817230
#> 23 1.278346 0.04046026
#> 24 1.216022 0.03827794
#> 25 1.276201 0.03899069
#> 26 1.227860 0.03895437
#> 27 1.284844 0.04185961
#> 28 1.289335 0.04156572