Literature DB >> 33054684

Reference range: Which statistical intervals to use?

Wei Liu¹, Frank Bretz², Mario Cortina-Borja³.

Abstract

Reference ranges, which are data-based intervals aiming to contain a pre-specified large proportion of the population values, are powerful tools to analyse observations in clinical laboratories. Their main point is to classify any future observations from the population which fall outside them as atypical and thus may warrant further investigation. As a reference range is constructed from a random sample from the population, the event 'a reference range contains (100 P)% of the population' is also random. Hence, all we can hope for is that such event has a large occurrence probability. In this paper we argue that some intervals, including the P prediction interval, are not suitable as reference ranges since there is a substantial probability that these intervals contain less than (100 P)% of the population, especially when the sample size is large. In contrast, a (P,γ) tolerance interval is designed to contain (100 P)% of the population with a pre-specified large confidence γ so it is eminently adequate as a reference range. An example based on real data illustrates the paper's key points.

Entities: Chemical Disease Gene Species

Keywords: Nonparametric prediction interval; nonparametric tolerance interval; prediction interval; reference range; tolerance interval

Year: 2020 PMID： 33054684 PMCID： PMC8008401 DOI： 10.1177/0962280220961793

Source DB: PubMed Journal: Stat Methods Med Res ISSN： 0962-2802 Impact factor: 3.021

1 Introduction

The ‘Choose Wisely’ campaign was developed in the United States in 2012 by the American Board of Internal Medicine Foundation and was launched in the United Kingdom in 2016 by the Academy of Medical Royal Colleges. It aims to encourage a dialogue between clinicians and patients regarding the risk and benefits of interventions, and the practice of evidence-based treatment regimens.[1] As described recently,[2] this conversation often refers to the patient’s observed values of relevant clinical markers. Since the clinical laboratory provides comparator intervals to assist the clinician in determining a context for an individual value, a natural question from the patient is ‘Are my test results typical with respect to a healthy population?’. Although such assessment values are often referred to as the test’s normal range, this terminology should be discouraged as it implies that such a result has a binary ‘normal or abnormal’ quality which may lead to an arbitrary dichotomous interpretation of the patient’s health status.[2] Instead, the terms ‘reference limits’ or ‘reference range’ should be used in this context. Reference ranges are powerful tools in laboratory medicine to aid decision making[3] and their use has become increasingly prevalent in clinical practice. Searching in the Web of Science engine at the time of writing for articles published between 1999 and 2019 with ‘reference range’ as a topic, we found 5431 articles of which 469 appeared in 2019, in contrast to 268 articles that appeared in 2009. These articles have collectively been cited by 91,034 publications of which 11,270 appeared in 2019, 2.4 times more than the number of citing publications 10 years earlier. Apart from the important individual overtones for patients, incorrectly estimating the reference range of a sensitive clinical marker of physiological function has enormous public health implications. For example, underestimating the upper limit of a reference range would mean classifying a large number of people as diseased, thus affecting the doses of medication prescribed.[4] Construction of appropriate reference ranges is therefore crucial in laboratory medicine practice. Well-known general references[3,5-9] and a case for teaching tolerance intervals in introductory statistics courses[10] are available. It is common practice to assume that clinical markers related to a disease follow a normal distribution among healthy subjects. If there is evidence against this assumption we could fit models to specify optimal transformations to normality, e.g. logarithmic or square root though this might still result in biased estimates of the upper or lower limits of the reference range depending on whether the distribution is right or left skewed.[9] Alternatively we could construct reference ranges under specific parametric assumptions different to normality, or follow a nonparametric procedure. The focus of this paper is on the construction of parametric and nonparametric reference ranges for a selected reference population based on a random sample from the population. The problems related to selecting a reference population have been discussed elsewhere.[6] A P (commonly set to 95%) reference range is a data-based interval that purports to include of the values in the population of interest. Their main point is to classify any future observations from the population which fall outside these intervals as atypical and thus may warrant further investigation. Let denote the continuous cumulative distribution function (cdf) of the population, and denote the -th percentile of the population for a given . The interval contains exactly of the population and would be used as the P reference range had F been known. Since is usually not known completely in real problems, the reference range has to be estimated from a random sample from the population, i.e. are independent random variables identically distributed . Note that we follow the notation in Krishnamoorthy and Mathew[11] thus denoting the interval’s content level by P instead of the commonly used , and its confidence level by γ. When is assumed to have a normal distribution with unknown mean μ and unknown variance , we have where denotes the -th percentile of the standard normal distribution N(0, 1). When is not assumed to have a parametric form, nonparametric (or distribution free) methods can be used. In this paper, both normal-based and nonparametric methods are considered. As a reference range depends on the random sample, the proportion of the population contained in it is also random. Thus the question is ‘which statistical intervals should be used as reference ranges?’ In this article we argue that a P prediction interval, which continues to be used as a reference range in the literature,[6,12,13] is not fit for the purpose of interest since there is a substantial probability (due to the randomness in the sample) that the prediction interval contains less than of the population. We then argue that a tolerance interval, with confidence set at a pre-specified large value, say, is valid as a reference range since it guarantees, with large confidence γ due to the randomness in the sample, to contain of the population values. Several authors have proposed to use tolerance intervals as reference ranges.[5,14,15] With almost 80 years of research on tolerance intervals or regions, various parametric and nonparametric procedures are readily available for use as reference ranges. The next two sections discuss reference ranges based on the normal distribution, and nonparametric reference ranges. They are followed by a section considering a numerical example, and a final one with concluding remarks.

2 Reference ranges based on the normal distribution

2.1 Reference ranges currently in use

Based on the sample, one reference range that has been widely used is the P prediction interval for a future observation Y from a population with distribution[6,12,13] where is the sample mean, is the sample variance, is the -th percentile of the t distribution with ν degrees of freedom (df), , and . A relevant guide on prediction intervals for reference regions is available,[7] and we note that the prediction interval RR1 has also been called the P expectation tolerance interval.[16,17] Other reference ranges are based on estimators of the percentiles and include where , and .[9,12] Now is a naïve estimator of has the minimum variance among unbiased estimators of , and has minimum mean squared error among estimators of the form where c is a constant.[12] One immediate question is whether these reference ranges RR contain of the values in the population, which is the objective of a reference range. Note that the proportion of the population within the reference range is given by where and is independent of the sample , is the conditional probability of Y conditioning on the sample , and is the cdf of a N(0, 1) random variable. Hence the objective of a reference range is to have . It is clear from equation (1) that K is a random variable depending on the random sample via and S so whether ‘’ is also random. As a result, all we can hope is that the event has a large probability of occurrence. We note from equation (1) that K increases as c increases. Hence, among the RR () given above, the one that has the largest c contains the largest proportion of the population. Figure 1 compares the c for given sample sizes and P = 0.95. Clearly, c1 is the largest among the c (), and so RR1 contains the largest proportion of the population among the four reference ranges. We therefore investigate whether or not ‘’ has a large probability to occur in order for RR1 to be used as a reference range.

Figure 1.

The value of c as a function of the sample size n.

The value of c as a function of the sample size n. First, note that where the equality in equation (2) results directly from the well-known conditional expectation formula,[18] and the equality in equation (3) follows from the fact that is distributed N(0, 1) and is independent of which has the distribution , with denoting a chi-squared random variable with df. That the probability in equation (3) is equal to P qualifies RR1 as a P prediction interval for a future observation Y from the same population that the sample is drawn. Second, the distribution of K1 can be studied by simulating a large number, say, of independent realisations of K1. Note from equation (1) that where is a standard N(0, 1) random variable, is a chi-squared random variable with df, and Z and are statistically independent. From equation (4), K1 can easily be simulated. For given P and n, replicas of K1 are simulated, based on which the probability density function (pdf) of K1 can be accurately approximated. In Figure 2, the kernel density estimate[19] of the pdf of K1 based on the simulated K1 values is plotted (by using the R package KernSmooth)[20] for , 50, 100 and 150. Based on the simulated values, we approximated by the proportion of the values that are less than , which are given by 0.385, 0.429, 0.450 and 0.459 for , 50, 100 and 150, respectively. Note that is given in Figure 2 by the area under the pdf to the left of the vertical line at .

Figure 2.

The pdf’s of K1 for various sample sizes n.

The pdf’s of K1 for various sample sizes n. Given equation (3), it can be shown by the delta method that tends when to a normal distribution with zero mean and finite variance. This is supported by Figure 2 which shows that the pdf of K1 is getting closer to be symmetric and centered with decreasing variance at P as n increases. Note that n = 150 is not large enough yet for the pdf of K1 to converge to a normal pdf. From a brief simulation study we found that in order to achieve this satisfactorily the sample size must be very large indeed. Even for n = 10, 000 the skewness and kurtosis values suggest a significant lack of normality. The coefficient of variation of K1 for n = 150 is 0.014, and becomes smaller than 0.01 for , and is around 0.002 for n = 10,000. This asymptotic normal distribution implies that as , that is, the probability of the reference range RR1 containing less than of the population is about 1/2 when the sample size is large. The argument above means that, due to the sample’s randomness, using RR1 as the reference range implies that there is a substantive probability, close to 50% when n is sufficiently large, that the reference range does not fulfill its objective of containing of the population. Its property in equation (3) has the following interpretation. A large number of individuals, I say, collect independent samples, one each, and compute the corresponding reference ranges RR1 based on their own samples. Then the proportions of the population contained in these I reference ranges, , are random values from the interval (0, 1) and form a random sample from the distribution of K1 although some values could be very close to 0 and some values could be very close to 1. The property merely says that is close to P when I is large. Hence, the proportion of the population that one particular reference range contains could be very small but this is compensated by some very large proportions of the population that some other individuals’ reference ranges might contain in the sense that is close to P. This potential for compensation from other reference ranges is unlikely to offer any comfort for knowing that one’s reference range has a substantial probability of containing less than of the population. It is clearly desirable to have a high confidence that our own reference range contains of the population. Hence RR1 falls short on this ground and should not be used as a reference range. The justification for using prediction intervals as reference ranges[5,13] is that exactly of the future observations from the population should fall within the prediction intervals. It is clear from the line of reasoning stated in the previous paragraphs that this argument is not valid. The inappropriateness of prediction regions when used as reference regions has also been noted in Sections 2.2 and 3.3 of Dong and Mathew.[15] In the next section we discuss tolerance intervals since several authors[5,14,15] have proposed to use them as reference ranges. For example it has been stated that ‘it would seem that the statistical tolerance interval is what clinical chemists have in mind when they speak of a reference range derived from a sample of individuals representing some defined population’[5] (p. 55).

2.2 Tolerance intervals

A tolerance interval with content level P is a data-based random interval constructed to contain of the population with a pre-specified (large) confidence level γ about the randomness in the sample.[11,16,17,21-23] Specifically, a tolerance interval is given by[11] where the critical constant is chosen such that where the random variables Z and in equation (5) are the same as those in equation (4). The R package tolerance[24,25] can be used to compute c5. Figure 3 compares c1 and c5 for given sample sizes with P = 0.95 and . It is clear from Figure 3 that c5 is considerably larger than c1 in order that RR5 contains of the population with a pre-specified large confidence γ about the randomness in the sample. Also, as expected, c5 increases with γ as seen in Figure 3.

Figure 3.

The values of c1 and c5 for various sample sizes n.

3 Equal-tailed tolerance intervals

The tolerance interval RR5 contains of the population with a pre-specified (large) confidence γ about the randomness in the sample. But the proportion P of the population contained in RR5 may not be the central interval of the population. If we insist that a reference range should contain that central proportion of the population, i.e. with pre-specified confidence γ about the randomness in the sample, then we should use the following interval as the reference range where the critical constant is chosen such that This interval is called the equal-tailed or central tolerance interval.[15] A formula for values of c6 is available[11] and can be computed using the function K.factor of the R package tolerance.[24,25] This interval can be viewed as a γ confidence simultaneous lower confidence bound on quantile and upper confidence bound on quantile .[26] It is clear that comprising the central of the population implies containing of the population. Hence the equal-tailed RR6 satisfies a more stringent requirement than RR5 and, as a result, c6 is larger than c5. Figure 4 compares c5 and c6 for given sample sizes with P = 0.95 and confidence . It is clear from Figure 4 that , as expected.

Figure 4.

The values of c5 and c6 for various sample sizes n.

The values of c5 and c6 for various sample sizes n. Our view is that the tolerance interval should be used as the reference range since its form is centered at , mimicking the form of the equal-tailed tolerance interval , and with a large confidence γ it does contain of the population. Only if we specifically require the reference range to contain the central of the population, , then the equal-tailed tolerance interval should be used; otherwise it is unnecessarily wider and flags as atypical fewer individuals than the tolerance interval.

4 Nonparametric reference ranges

4.1 Nonparametric prediction intervals

When is not assumed to have a specific form, nonparametric reference ranges can be considered and are based on the order statistics of the sample , and the sample quantiles have been used to estimate the population quantiles and .[6] In what follows, and are indices used for prediction and tolerance intervals, respectively. Let , with , be the largest natural number such that where Y is a future observation from the population independent of the random sample as before. Using the well-known facts that are independent, each having a uniform distribution on the interval (0, 1), and that is the k-th order statistic of and has a beta distribution with parameters k and , the probability in (6) is equal[16] to . Hence the constraint on required in equation (6) gives where denotes the integer part of a. This leads to use the nonparametric prediction interval as a reference range. An interesting remark is that and are consistent point estimators of the population quantiles and , respectively. The proportion of the population contained in RR7 is given by which is a random variable. The important question is whether the probability that this proportion is at least P, given by is sufficiently large to qualify the P prediction interval as a reference range. By noting that and follow the same beta distribution , Tukey’s equivalence blocks result[27] directly implies that where denotes the cdf of the beta distribution with parameters and . This probability can be easily calculated using the function pbeta in R. Note that, as , the beta distribution converges to a normal distribution with mean P thus the probability in equation (10) approaches 0.5 as . Figure 5 plots this probability against n for . The plots are saw-tooth shaped due to the discreetness of n and . It is clear from the figure that this probability can be substantially smaller than P, and approaches 0.5 as n is large as expected from the asymptotic normal distribution pointed out above. This shows that the nonparametric prediction interval has a substantial probability, close to 0.5 when n is large, of containing less than of the population values. Hence, this nonparametric prediction interval should not be used as a reference range for the same reason as the prediction interval based on the normal distribution.

Figure 5.

The probability in equation (10) for various sample sizes n.

5 Nonparametric tolerance intervals

A nonparametric tolerance interval is constructed to contain of the population with a pre-specified (large) confidence γ about the randomness in the sample. Consider the following nonparametric tolerance interval[21] where satisfies that should be the largest natural number such that the proportion of the population contained in RR8, given by following similar lines as K7 in equation (8), is at least P with probability γ about the randomness in the sample . It follows therefore from equation (10) that should be the largest natural number that satisfies For given n, P and γ, can be easily computed by a direct search over the natural numbers in the range from 1 to . Note that if the sample size n is too small, then the existence of is not guaranteed unless n satisfies[11] The equal-tailed or central nonparametric tolerance intervals can be constructed in a similar way. Our view is that a nonparametric tolerance interval is pertinent as a reference range similar to the normal distribution case. Hence we do not go into the details about the equal-tailed nonparametric tolerance intervals to save space. Figure 6 compares and for given sample sizes n with P = 0.95 and . It is clear from Figure 6 that is considerably smaller than , and so RR8 is wider than RR7, in order that RR8 contains of the population with a pre-specified large confidence γ about the randomness in the sample. Also, as expected, decreases as γ increases.

Figure 6.

The values of and for various sample sizes n given P and γ.

6 Example

A random sample of n = 210 observations on fasting plasma glucose is taken from the population of interest. The data and the R code for all the computations in this paper are available at . Suppose that the usual normality tests[28] show that it is reasonable to assume the population has a normal distribution. The sample mean and standard deviation are computed to be and S = 0.41 (in unit mmol/L). If we use the prediction interval as the reference range, then it is given by Note, however, as pointed out above, that the probability of the prediction interval containing less than of the population can be substantial and is computed to be 47%. So there is a 47% probability that the interval does not do what it purports to do: containing of the population. If we use the tolerance interval as the reference range, with , then it is given by This interval is wider than the prediction interval. But, as we pointed out, the tolerance interval does contain of the population with probability . Therefore, any future observations falling outside this interval can be regarded as atypical and should be considered for further investigation. While the tolerance interval above has a confidence of containing of the population, it has a less than probability of containing the central of the population, . This probability is computed to be 86%. In order to have a probability of containing the central of the population, , we can use the equal-tailed tolerance interval, which is given by The confidence that this equal-tailed tolerance interval contains of the population is computed to be 99%, which is much larger than . Hence, with a 99% probability, the equal-tailed tolerance interval contains of the population. Furthermore, we estimated that the equal-tailed tolerance interval is the tolerance interval, that is, the interval contains 95.7% of the population with confidence . Now suppose that the distribution of the population cannot be assumed to be normal. Then nonparametric reference ranges should be used. If we use the prediction interval as the reference range, then it is given by with . Note, however, as we have pointed out, that the probability of the prediction interval containing less than of the population can be substantial and is computed to be 39%. So there is a 39% probability that the interval does not do what it purports to do: containing of the population. If we use the nonparametric tolerance interval as the reference range, with , then it is given by with . This tolerance interval is wider than the nonparametric prediction interval but, as we pointed out, it does contain of the population with 95% confidence. Therefore, any future observations falling outside this interval can be regarded as atypical and should be considered for further investigation. Finally, we note that nonparametric intervals are usually wider than the corresponding parametric ones since they require fewer assumptions than the parametric model.

7 Conclusions

The objective of a reference range is to contain a pre-specified large content level of the population with γ confidence level, so that a future observation falling outside the reference range is regarded as atypical and considered for further investigation. This procedure should be useful as part of screening programmes, whose aim is to identify subjects at sufficient risk of a specific disorder who may benefit from further investigation or direct preventive action to avoid death or disability and to improve their quality of life.[29] Since a reference range depends on the random sample, the event ‘a reference range contains of the population’ is also random and so we can never be certain that a reference range contains of the population. All we can hope for is that the event ‘a reference range contains of the population’ occurs with a large probability, γ. Based on this premise, we have argued that the prediction interval is not suitable as a reference range since there is a substantial probability, close to 50% when n is large, that the prediction interval contains less than of the population. In contrast, a tolerance interval is designed to contain of the population with a pre-specified large confidence γ so it is eminently adequate as a reference range. Tolerance intervals or regions have been studied by many statisticians since the 1940s. Various parametric and nonparametric procedures are readily available for use as reference ranges or reference regions.[11,16,17,24] Finally, we note that there is some work on constructing reference ranges specifically assuming that the clinical marker follows a log-normal distribution,[30] and on sample size calculation for reference ranges,[31,32] and tolerance intervals.[33-35] These aspects, however interesting, fall beyond the scope of our paper.

13 in total

Review 1. Reference values: from philosophy to a tool for laboratory medicine.

Authors: Joseph Henny; Per Hyltoft Petersen
Journal: Clin Chem Lab Med Date: 2004 Impact factor: 3.694

2. Estimation of reference ranges from normal samples.

Authors: P Royston; J N Matthews
Journal: Stat Med Date: 1991-05 Impact factor: 2.373

Review 3. Multivariate probability-based detection of drug-induced hepatic signals.

Authors: Donald C Trost
Journal: Toxicol Rev Date: 2006

Review 4. Reference values: a review.

Authors: Anne Geffré; Kristen Friedrichs; Kendal Harr; Didier Concordet; Catherine Trumel; Jean-Pierre Braun
Journal: Vet Clin Pathol Date: 2009-09 Impact factor: 1.180

5. Normal probability plots with confidence.

Authors: Wanpen Chantarangsi; Wei Liu; Frank Bretz; Seksan Kiatsupaibul; Anthony J Hayter; Fang Wan
Journal: Biom J Date: 2014-10-21 Impact factor: 2.207

6. Choosing Wisely: how the UK intends to reduce harmful medical overuse.

Authors: Jacqui Wise
Journal: BMJ Date: 2017-01-26

7. Simultaneous inference for several quantiles of a normal population with applications.

Authors: Wei Liu; Frank Bretz; Anthony J Hayter; Ekkehard Glimm
Journal: Biom J Date: 2012-11-21 Impact factor: 2.207

8. Assessing uncertainty in reference intervals via tolerance intervals: application to a mixed model describing HIV infection.

Authors: Hormuzd A Katki; Eric A Engels; Philip S Rosenberg
Journal: Stat Med Date: 2005-10-30 Impact factor: 2.373

9. A new approach to sample size calculation for reference interval studies.

Authors: C Jennen-Steinmetz; S Wellek
Journal: Stat Med Date: 2005-10-30 Impact factor: 2.373

10. Effect of Thyroid Function Variations Within the Laboratory Reference Range on Health Status, Mood, and Cognition in Levothyroxine-Treated Subjects.

Authors: Mary H Samuels; Irina Kolobova; Anne Smeraglio; Meike Niederhausen; Jeri S Janowsky; Kathryn G Schuff
Journal: Thyroid Date: 2016-07-25 Impact factor: 6.568