Effect of data censorship on parameter estimates

JohnP · 26 December 2021 06:11

I was playing around with trying to incorporate analytical and sampling uncertainty.

The made-up set of results were:

13, 50, 15, 25, 26, 45

I roughly calculated some uncertainty in the results and used the following:

[7.5-19.1], [28.7-73.4], [8.6-22.0], [14.3-36.7], [14.9-38.2], {25.8-66.1]

I was expecting to find that the upper credible interval for the parameters to increase. The logical being that more uncertainty in the results, means more uncertainty in the parameters, means wider credible intervals.

To my surprise, they all decreased!

Is this what you would expect to see? Have I done something wrong or just not understanding the Bayes magic behind the scenes.

Thanks in advance

Original Results		Censored Results
GM	26	GM	24
95% Upper Cred of GM	46	95% Upper Cred of GM	43
95th %tile	71.7	95th %tile	59.6
95% Upper Cred. of 95%tile	255	95 Upper Cred. of 95%tile	229
Overexposure risk	24.2%	Overexposure risk	17.1%
AM	31.7	AM	27.9
AM 95%	77.5	AM 95%	70.1

jerome.lavoue · 6 January 2022 16:00

Hello John,

Interesting example !

What I see from your results is more like a similar uncertainty, but point estimates a little lower (i.e. the distance between estimates and UCL seem similar).

I would have intuitively expected, like you, to have more uncertainty with the intervals.

The slightly lower values I can explain : as your error CIs are symetrical but the overal model is lognormal, the lower range has more impact than the higher.

Can you provide me with the error CVs you used to derive your CIs ? We have a model with measurement error that I am curious to check with your dataset.

Jérôme

JohnP · 7 January 2022 04:49

Hi Jerome,

Thanks for the response.

The effect of the CIs being symmetrical and the model being lognormal makes sense. Its an interesting problem.

I don’t know how to calculate the standard deviations, and therefore also the error coefficient of variances in this case.

I used a very crude, and probably wrong, way of getting the CIs. I’ll email the excel file with my numbers to your email (jerome.lavoue(at)umontreal.ca).

I found an example of analytical uncertainty from a lab I use which was +/- 40% (to what %CI, I’m not sure yet). I then found the accuracy of the calibration equipment and timing equipment.

I then calculated the sample CIs by:

Upper sample CI = Upper weight CI / Lower volume CI
Lower sample CI = Lower weight CI / Upper volume CI

Again this very well may be wrong, but its meant to be a proof of concept.

I’m thinking of doing a masters research project (or similar) on finding out the different uncertainties involved in monitoring - analytical, equipment, user error etc. and seeing their effect on the parameters we use to determine compliance to legislative requirements.

It seems like hygienists put a lot of confidence in results that have a lot more uncertainty than we my first realize.

jerome.lavoue · 10 January 2022 13:06

Hello John,

You can have a look at the following link :WebExpo ─ Towards a Better Interpretation of Measurements of Occupational Exposure to Chemicals in the Workplace

This is the scientific report describing some of our efforts a creating bayesian algorithms to solve IH related data interpretation questions. There is a section and some examples about measurement error.

We have set up a theoretical model, but havent yet studied the implications of using this model (i.e. including measurement error CV in the data interpretation).

Your calculation may not be the purest mathematically ( I don’t know the formula to mix sampling volume and analytical errors myself) but it makes sense to me for a rough idea.

Historically, measurement error was dismissed from IH data interpretation when some simulations showed that this error would be fairly negligible compared to environmental variability when it remained under 20% as a CV. With the advent of bayesian stats, it has become easier to set up measurement error models, and I think it is time these results be revisited, especially since I agree with you that we might be overoptimistic as to the actual accuracy of our measurements.

Jérôme

JohnP · 28 January 2022 11:09

Hi Jérôme,

After reading the WebExpo Report (particularly pg. 33&34), I realize that my initial expectations were the exact opposite of what is going on.

Not incorporating measurement error over estimates the GSD and therefore 95th percentile etc.

While including it results in a potentially more accurate estimated of the level of risk, its less conservative.

Unless you are very confident that you are estimating the error CV, incorporating error also runs the risk of underestimating the exposure profile.

As a hygienist that would rather slightly over estimate than underestimate risk, I now wonder if its worth doing and communicating to other.

Your thoughts?

jerome.lavoue · 28 January 2022 13:23

Hello John,

Here are some “partially” educated thoughts : the few examples we ran in Webexpo indeed suggest overestimation of risk would be the main consequence of not incorporating measurement error in our models. These examples moreover did not really suggest an increase in final uncertainty, probably because there was little uncertainty in the CV themselves.

I am for now reluctant to recommend any course of action until more extensive simulations have been run using plausible GSDs and error scenarios to assess the actual impact on risl decision. The protocol is already in place to do this in our group…but we are running late for lack of students interested in computational theoretical issues

I also need to get back into error measurement theory…

The old papers on measurement error, based on simplification assumptions, had suggested S&A CVs were probably not a big issue because overwhelming environmental variability. We probably can wait a bit for a more definitive answer, at least now we have the means to answer the questions without any simplification.

As an aside (another topic where we are a bit stalled), there is also the question of "what is the real error CV for an 8-h TWA derived after an hygienist decided that a 2-h measurement (with presumably know S&A CV) was “representative”…

jerome.lavoue · 17 March 2022 23:00

Hello colleagues.

Here (link) is the result of some exploration following this discussion.

Jérôme