Bootstrapping Confidence Intervals

JohnP · 1 September 2024 06:26

Hi Jérôme / everyone.

Is there anything inherently problematic about bootstrapping your confidence interval of the arithmetic mean?

It seems like cheating, because (if I understand correctly), the highest the bootstrapped CI can be is the maximum sampled value.

I’ve never seen anyone recommend or use this method. But I’ve never seen an explanation as to why it would be inappropriate.

Thoughts?

Kind regards,

John

jerome.lavoue · 3 September 2024 12:12

Hi John,

I haven’t studied botostrapping in a long time so take this with a grain of salt.

If you talk about just the simple AM of measurements as the estimator, then yes, the final CI would asymptotically be [min-max] of the measurements (if you increased the confidence level to 100%).

however, if you use the land estimator for example, you might get a little more complicated maxed interval.

My recollection of what I read (back in the early 2000s when I was seeking alternative to traditionnal formulas) is that boostrapping provides a sound approach in many situations, and I had not read any mention of the type of issue you mention as a limitation of the approach : after all [min-max] of your data is pretty wide for a CI around the AM !

I used boostrapp myself in a couple of papers when the quantity of interest didn’t lend itself to theoretical condifence interval formulas.

Why, e.g. there is no tool using it in IH : It requires random number generation, we have existing formulas for most of our questions, and, more recently, we have Bayesian stats.

Cheers

JohnP · 8 September 2024 02:23

I have occasionally found that the Land’s estimation of the AM UCL to be well above the max value. I think this may be because of small sample sizes and elevated GSD (even if all the values are less than, say, 20% of the OEL).

As you eluded to, this is far less often an issue in Bayesian stats as the priors tend to prevent ridiculous / unrealised estimates.

But even if its potentially redundant, I think it’s interesting…

Even without RNG, there could be a ‘rule of thumb’ in the logic:

The population AM is going to be less than the sampled max value unless your sample is entirely below the population AM. And you can estimate how likely this is to occur by calculating the est. percentile of your est. AM (by using an est. GSD) and your sample size:

nsim = 1000000
mu = 0
sig = 1.1 # sub in your ln(GSD) here
mean = exp(mu+sig^2/2)
y = rlnorm(nsim,mu,sig)
mean.percentile = mean( y < mean )
sample.size = 7 #enter sample size here
(Prob = mean.percentile^sample.size)
Prob
0.04518503

In this case (GSD = 3, sample size = 7), you could say with 95% confidence that the AM is less than the max value.
In the same way, I think that “Prob” gives an estimate at how likely a probably bootstrapped estimate is at being completely wrong.

Again, this is possibly all redundant. But I think it’s interesting. And may come in handy in certain circumstances where other CI estimates are unrealistic.

jerome.lavoue · 12 September 2024 12:04

Hi John, the same calculation yield 9% for me

nsim = 1000000
mu = 0
sig = 1.1 # sub in your ln(GSD) here
mean = exp(mu+sig^2/2)
y = rlnorm(nsim,mu,sig)
mean.percentile = mean( y < mean ) # proportion of values below the mean 71%
sample.size = 7 #enter sample size here
(Prob = mean.percentile^sample.size) # probability of observing exactly 7 values below the mean 9%
dbinom(7,7,mean.percentile) # equivalent calculation
dbinom(0,7,(1-mean.percentile)) # equivalent calculation

What this means to me is “in the scenario evaluated, the chances that a value from a n=7 sample would be greaster than the true mean is 9%”

I must admit I am unsure whether this realy implies : “in such scenario the chances that the true mean would be greater than the max value is 9%”

This feels like mistakingly equating P(A|B) and P(B|A). But I am unsure and will continue raking my brain

In any case, for sure calculating CIs using several approaches (freq, bootstrap, bayes) seems a sound approach when one feels something might be off.