Compliance and variance analysis assumptions

Hello all (or Jérôme haha) (and @PeterK)

Rappaport/Kupper’s book “Quantitative Exposure Assessments” talks about between/within variance and compliance testing - either exceedance fraction or arithmetic mean.

To get between- within- variance you need multiple samples on multiple people.

They also state that both compliance tests assume uncorrolated data - I.e. 1 sample per person.

Obviously it would be desirable to do both at the same time but that raises the question I wondered if either of you could help with:

  1. How important is this assumption of independent data?
  2. Does this assumption apply to Bayesian methods, or is it a frequentist thing?

(Reference: pg 90, last paragraph)

Thanks for the recommendation @PeterK i got it from your lecture and it’s a great book.

Hello John,

The easy part first : No difference Bayes / Frequentist for this question.

More difficult : is this important…

In theory it is : it makes sense that if you have repeats on a worker, and it is likely that “worker” is at least to some extent a determinant of exposure levels, then 2 repeats on the same worker will be closer realtive to other measurements on other workers.

Then there is “in practice” : we’ll focus on group exceedance. You can estimate it 2 ways : using the simple model (one lognormal distribution, tool 1) which assumes independant samples ; or you can estimate it using the more complex model (ANOVA, tool2) which also assumes independant samples, but only within workers.

Heres one of the WEBEXPO examples (example 5) :

worker-1 31
worker-1 60.1
worker-1 133
worker-1 27.1
worker-2 61.1
worker-2 5.27
worker-2 30.4
worker-2 31.7
worker-3 20.5
worker-3 16.5
worker-3 15.5
worker-3 71.5

Simplified analysis (Tool1) : GM = 31 GSD = 2.4
More complex analysis (Tool2, group tab) : GM = 31 GSD = 2.6

In that case the complex model estimates quite low within worker correlation (0.13) and we see no big difference.

There is also the consideration of sample size : the more complex model is more costly because it adds one parameter to estimate. So with low sample size, its results will be more heavily influenced by the prior, which might compensate its theoretical advantage. In the frequentist world, no prior but the estimates wil be more variable / unstable.

If we take a much bigger sample (10 workers * 10 meas per worker and high correlation : 0.66, example 4 in the Webexpo report), we get the following :

simple model : GM = 29 GSD = 2.5
complex model : GM = 29 GSD = 2.6

This seems close too, but, not shown here, uncertainty is appreciably higher with the more complex model: 95th perc and UCL 135 and 175 for simpler model, 135 and 300 for more complex, so definitely a potential impact on decision. This is the main theoretical issue with not taking correlation into account, it causes underestimation of variance and uncertainty.

Bottomline (in my opinion) : Try both since Expostats allows it. if the conclusions would be dramatically different, maybe lean towards the simpler model if you have few samples, towards the more complex / theoretically sounder model if you have a more hefty sample size.

Did I just increase the general level of confusion ? :slight_smile:

1 Like

No extra confusion. That makes sense!

  1. A more complex model needs more data
  2. Don’t use more complex models than necessary
  3. Unaccounted for Correlated data can lead to underestimation of variance
  4. Compare the models to see if there is a substantial difference

A follow up question to clarify, if I may:

Say you had 6 samples collected on 5 people (1 worker samples twice - theoretical correlation introduced).

Is it fair to say that sample size would play a bigger role in the quality of the estimates (e.g. group mean) than the effect of the correlated data?

Or put another way, if you had 5 samples from 5 people, and you had the chance to collect 1 more sample on an already sampled worker, should you?

My intuition says: Yes! That correlation is real, but considering how low quality OH data sets (small sample sizes etc) are that it’s not the primary concern.

Is that fair?

I think yes. I am remember my failed answer to you in Melbourne :slight_smile: more samples is always good !

The idea of correlation is that with 2 sample presumably correlated (and recall the median observed by Kromhout et al in their dataset was 0.2, fairly low), the information you get is a little less than you think, and the simple model will not reflect that.

1 Like

I see, I see!

Thanks for helping me getting that straight in my mind.

Thanks Jérôme.

1 Like