Sample order changing confidence limits

Mikeyjj · 7 December 2023 20:29

It looks like if you change the order of the data it can effect the confidence intervals.

For example, if I use a 90% credible interval, for the following ordered data I get different UTLs:

19.4, 28.9, <5.5. UTL: 940

28.9, 19.4, <5.5. UTL: 1080

And then that can effect the overexposure risk slightly.

Why is that and is there a certain order the data should be entered? Thanks!

jerome.lavoue · 8 December 2023 02:03

Hello Michael,

Good spotting !

What you see is related to “markov chain” variability.

Because bayesian inference in expostats relies on simulation, as the model doesn’t have a closed form solution, there is going to be variation each time the calculation is run.

This variation is always going to be much smaller that the uncertainty associated with the parameter themselves.

In your case the difference is noticeable because you are in a very uncertain situation : only 3 values plus one censored plus looking at very difficult quantities such as P95. Noticeable but still very small when you consider the entire confidence interval

Now on the matter of why you only see that when changing the order, and not if you launch another iteration of expostats : you exposed our “trick” : because people are not used to seeing this kind of variabitlity (using traditional fixed equations), we created a pre-analysis algorithm that ensures that the “random seed” is the same whenever the same data is entered. It is very simple, and just associates the seed value with the value of the data pasted together. So by entering a different order, you saw what a different random seed would yield.

So two take aways to my answer :
1.The order has no importance, but changing it will make the user experience “markov chain” variability. 2.This should always be very small, but can be noticeable when looking at ery uncertain quantities such as the 95% UCL on the 95th percentile.

Does that make sense ?

Mikeyjj · 8 December 2023 04:50

Hi Jerome,

I can’t take credit for this catch haha, one of my sharp colleagues noticed while we were going through examples.

I understand most of what you’re saying, I think my only gap is why a random seed is necessary for the simulation.

Thanks!
Mike

jerome.lavoue · 15 December 2023 23:41

Hello,
I am no random simulation specialist either but here is my attempt : To obtain the posterior sample in Bayesian analysis, the various algorithms used (e.g. gibbs sampling) all use random number generation at their core. Generating random numbers is a science in itself, and the random numbers we obtain are not really random, as their generation depends on a specific number used for initializing the process : the random seed. Using the same random seed will leat to generating the same series of “random” numbers.

So in expostats, the same data (in the same order) will make use of the same random seed, therefore to exactly the same result.

See the wikipedia page : Random seed - Wikipedia