Literature DB >> 33266690

Likelihood Ratio Testing under Measurement Errors.

Michel Broniatowski¹, Jana Jurečková^2,3, Jan Kalina⁴.

Abstract

We consider the likelihood ratio test of a simple null hypothesis (with density f 0 ) against a simple alternative hypothesis (with density g 0 ) in the situation that observations X i are mismeasured due to the presence of measurement errors. Thus instead of X i for i = 1 , … , n , we observe Z i = X i + δ V i with unobservable parameter δ and unobservable random variable V i . When we ignore the presence of measurement errors and perform the original test, the probability of type I error becomes different from the nominal value, but the test is still the most powerful among all tests on the modified level. Further, we derive the minimax test of some families of misspecified hypotheses and alternatives. The test exploits the concept of pseudo-capacities elaborated by Huber and Strassen (1973) and Buja (1986). A numerical experiment illustrates the principles and performance of the novel test.

Entities: Chemical

Keywords: 2-alternating capacities; measurement errors; misspecified hypothesis and alternative; robust testing; two-sample test

Year: 2018 PMID： 33266690 PMCID： PMC7512565 DOI： 10.3390/e20120966

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

Measurement technologies are often affected by random errors; if the goal of the experiment is to compare two probability distributions using data, then the conclusion can be distorted if the data are affected by some measurement errors. If the data are mismeasured due to the presence of measurement errors, the statistical inference performed with them is biased and trends or associations in the data are deformed. This is common for a broad spectrum of applications e.g., in engineering, physics, biomedicine, molecular genetics, chemometrics, econometrics etc. Some observations can be even undetected, e.g., in measurements of magnetic or luminous flux in analytical chemistry when the flux intensity falls below some flux limit. Actually, we can hardly imagine real data free of measurement errors; the question is how severe the measurement errors are and what their influence on the data analysis is [1,2,3]. A variety of functional models have been proposed for handling measurement errors in statistical inference. Technicians, geologists, and other specialists are aware of this problem, and try to reduce the effect of measurement errors with various ad hoc procedures. However, this effect cannot be completely eliminated or substantially reduced unless we have some additional knowledge on the behavior of measurement errors. There exists a rich literature on the statistical inference in the error-in-variables (EV) models as is evidenced by the monographs of Fuller [4], Carroll et al. [5], and Cheng and van Ness [6], and the references therein. The monographs [4] and [6] deal mostly with classical Gaussian set up while [5] discusses numerous inference procedure under semi-parametric set up. Nonparametric methods in EV models are considered in [7,8] and in references therein, and in [9], among others. The regression quantile theory in the area of EV models was started by He and Liang [10]. Arias [11] used an instrumental variable estimator for quantile regression, considering biases arising from unmeasured ability and measurement errors. The papers dealing with practical aspects of measurement error models include [12,13,14,15,16], among others. Recent developments in treating the effect of measurement errors on econometric models was presented in [17] or [18] The advantage of rank and signed rank procedures in the measurement errors models was discovered recently in [19,20,21,22,23,24]. The problem of interest in the present paper is to study how the measurement errors can affect the conclusion of the likelihood ratio test. The distribution function of measurement errors is considered unknown, up to zero expectation and unit variance. When we use the the likelihood ratio test while ignoring the possible measurement errors, we can suffer a loss in both errors of the first and second kind. However, we show that under a small variance of measurement errors, the original likelihood ratio test is still most powerful, only on a slightly changed significance level. On the other hand, we may consider the situation that or are classes of distributions of random variables Hence, both hypothesis and alternative are composite as families and if they are bounded by alternating Choquet capacities of order 2, then we can look for a minimax test based on the ratio of the capacities, and/over on the ratio of the pair of the least favorable distributions of and , respectively (cf. Huber and Strassen [25]).

2. Likelihood Ratio Test under Measurement Errors

Our primary goal is to test the null hypothesis that independent observations come from a population with a density f against the alternative that the true density is where f and g are fixed densities of our interest. For the identifiability, we shall assume that f and g are continuous and symmetric around 0. Although the alternative is the main concern of the experimenter, some measurement errors or just the nature may cause the situation that the true alternative should be considered as composite. Specifically, can be affected by additive measurement errors, what appears in numerous fields, as illustrated in Section 1. Hence the alternative is under which the observations are identically distributed with continuous density Here, both under the hypothesis and under the alternative, are independent random variables, unobservable with unknown distribution, independent of The parameter is also unknown, only we assume that and for simplicity. The mismeasured, hence unobservable, are assumed to have the density g under the alternative. Quite analogously, the mismeasured observations lead to a composite hypothesis under which the density of observations is while the are assumed to have density f. If we knew and we would use the Neyman-Pearson critical region with u determined so that with a significance level Evidently Indeed, notice that where the expectations are considered with respect to the conditional distribution; a similar equality holds for Combining the integration transmission in the conditional distribution, we obtain hence the size of the critical region W when used for testing against differs from Then we ask how the critical region W in (1) behaves when it is used as a test of This problem we shall try to attack with an expansion of in close to zero.

Approximations of Densities

Put the densities of X under the hypotheses and alternative, respectively. For the identifiability, we shall assume that and are continuous and symmetric around 0. Denote the density of This means that X is affected by an additive measurement error where V is independent of X and Notice that if densities of X and V are strongly unimodal, then that of Z is also strongly unimodal (see [26]). Under some additional conditions on we shall derive approximations of and for small More precisely, we assume that both and have differentiable and integrable derivatives up to order 5. Then we have the following expansion of and a parallel result for : Assume that Let be the characteristic function of Then where denotes the characteristic function of V. Taking the inverse Fourier transform on both sides, we obtain (3), taking the above assumptions on V into account. □ Consider the problem of testing the hypothesis that the observations are distributed according to density against the alternative that they are distributed according to density Parallelly, we consider the hypothesis that observations are distributed according the against the alternative that the true density is Let be the likelihood ratio test with critical region and the significance level and be the test with critical region based on observations We know neither nor hence the test is just an application of the critical region W for contaminated data Thus, due to our lack of information, we use the test even for testing against and the performance of this test is of interest. This is described in the following theorem: Then, as If is symmetric, then the derivative is symmetric for k even and skew-symmetric for k odd, Moreover, because and are integrable, then and Hence, using the expansion (3), we obtain □

3. Robust Testing

If the observations are missmeasured or contaminated, we observe with unknown and unobservable V instead of Z. Hence, instead of simple and we are led to composite hypothesis and alternative and . Following [25], we can try to find suitable 2-alternating capacities, dominating and and to construct a pertaining minimax test. As before, we assume that Z and V are independent, , and Moreover, we assume that and are symmetric, strongly unimodal and differentiable up to order 5, with derivatives integrable and increasing distribution functions and respectively. The measurement errors V are assumed to satisfy with a fixed Hence the distribution of V is restricted to have the tails lighter than t-distribution with 4 degrees of freedom. We shall construct a pair of 2-alternating capacities around specific subfamilies of and Let us determine the capacity around ; that for is analogous. By Theorem 1 we have We shall concentrate on the following family of densities (similarly for ): with fixed suitable Indeed, under our assumptions, each is a positive and symmetric density satisfying for some Let be the probability distribution induced by density with being the Borel -algebra. Then the set function is a pseudo-capacity in the sense of Buja [27], i.e., satisfying Analogously, consider a density symmetric around 0 and satisfying the assumptions of Theorem 1 as a simple hypothesis. Construct the family of densities and the corresponding family of distributions similarly as above. Then the set function is a pseudo-capacity in the sense of Buja [27]. Buja [27] showed that on any Polish space exists a (possibly different) topology which generates the same Borel algebra and on which every pseudo-capacity is a 2-alternating capacity in the sense of [25]. Let us now consider the problem of testing the hypothesis against the alternative based on an independent random sample Assume that and satisfy (5). Then, following [27] and [25], we have the main theorem providing the minimax test of against with significance level The test where

4. Numerical Illustration

We assume to observe independent observations for , where as described in Section 3, where are independent identically distributed (with a distribution function F) but unobserved. Let us further denote by the distribution function of and by the distribution function of . The primary task here is to test against with a fixed and . We perform all the computations using the R software [28]. To describe our approach to computing the test, we will need the notation for the set of pseudo-distribution functions corresponding to the set of pseudo-densities denotes as where denotes the distribution function of distribution. Under the alternative, the set analogous to is defined as Our task is to approximate and Here, the functions and are evaluated over a grid with step 0.05. Then, the maximization in (8) and (9) is performed for values of z over the grid and over four boundary values of , which are equal to , , , and . Additional computations with 10 randomly selected pairs of over and revealed that the optimum is attained in one of the boundary values. Further, the Radon-Nikodym derivatives of V and W are estimated by a finite difference approximation in order to compute the test statistic. The test rejects if the test statistics exceeds a critical value, which (as well as the p-value) can be approximated by a Monte Carlo simulation, i.e., by a repeated random generating random variables under , and we generate them 10,000 times here. We perform the following particular numerical study. We compute the critical value of the -test for (or ), , , , , and . Further, we are interested in evaluating the probability of rejecting this test for data generated from with different values of and . Its values are shown in Table 1 (for ) and Table 2 (for ), which are approximated using (again) 10,000 randomly generated variables from (10). The boldface numbers are equal to the power of the test (under the simple ). The proposed test seems meaningful, while its power is increased for compared to ; in addition, the power increases with an increasing if is retained; and the power also increases with an increasing if is retained.

Table 1

Probability of rejecting the test in the simulation with .

Value of λ˜	Value of σ˜2
Value of λ˜	3	4	5	6
0.25	0.39	0.52	0.61	0.67
0.35	0.50	0.67	0.75	0.81
0.45	0.61	0.76	0.85	0.89

Table 2

Probability of rejecting the test in the simulation with .

Value of λ˜	Value of σ˜2
Value of λ˜	3	4	5	6
0.25	0.55	0.73	0.82	0.87
0.35	0.72	0.86	0.93	0.96
0.45	0.82	0.94	0.97	0.99

5. Conclusions

The likelihood ratio test of against is considered in the situation that observations are mismeasured due to the presence of measurement errors. Thus instead of for we observe with unobservable parameter and unobservable random variable . When we ignore the presence of measurement errors and perform the original test, the probability of type I error becomes different from the nominal value, but the test is still the most powerful among all tests on the modified level. Under some assumptions on and and for we further construct a minimax likelihood ratio test of some families of distributions of the based on the capacities of the Huber-Strassen type. The test treats the composite null and alternative hypotheses, which cover all possible measurement errors satisfying the assumptions. The advantage of the novel test is that it keeps the probability of type I error below the desired value () across all possible measurement errors. The test is performed in a straightforward way, while the user must specify particular (not excessively large) values of and K. We do not consider this a limiting requirement, because parameters corresponding to the severity of measurement errors are commonly chosen in a similar way in numerous measurement error models [5,23] or robust optimization procedures [29]. The critical value of the test can be approximated by a simulation. The numerical experiment in Section 4 illustrates the principles and performance of the novel test.

5 in total

1. Predicting and correcting bias caused by measurement error in line transect sampling using multiplicative error models.

Authors: Tiago A Marques
Journal: Biometrics Date: 2004-09 Impact factor: 2.571

2. Non-parametric regression estimation from data contaminated by a mixture of Berkson and classical errors.

Authors: Raymond J Carroll; Aurore Delaigle; Peter Hall
Journal: J R Stat Soc Series B Stat Methodol Date: 2007-11-01 Impact factor: 4.488

3. All your data are always missing: incorporating bias due to measurement error into the potential outcomes framework.

Authors: Jessie K Edwards; Stephen R Cole; Daniel Westreich
Journal: Int J Epidemiol Date: 2015-04-28 Impact factor: 7.196

4. Quantifying uncertainty of determination by standard additions and serial dilutions methods taking into account standard uncertainties in both axes.

Authors: Wojciech Hyk; Zbigniew Stojek
Journal: Anal Chem Date: 2013-05-30 Impact factor: 6.986

5. Measurement error is often neglected in medical literature: a systematic review.

Authors: Timo B Brakenhoff; Marian Mitroiu; Ruth H Keogh; Karel G M Moons; Rolf H H Groenwold; Maarten van Smeden
Journal: J Clin Epidemiol Date: 2018-03-06 Impact factor: 6.437

5 in total

1 in total

1. Sparse Density Estimation with Measurement Errors.

Authors: Xiaowei Yang; Huiming Zhang; Haoyu Wei; Shouzheng Zhang
Journal: Entropy (Basel) Date: 2021-12-24 Impact factor: 2.524

1 in total