Literature DB >> 26509690

Hypothesis Testing of Inclusion of the Tolerance Interval for the Assessment of Food Safety.

Abstract

In the testing of food quality and safety, we contrast the contents of the newly proposed food (genetically modified food) against those of conventional foods. Because the contents vary largely between crop varieties and production environments, we propose a two-sample test of substantial equivalence that examines the inclusion of the tolerance intervals of the two populations, the population of the contents of the proposed food, which we call the target population, and the population of the contents of the conventional food, which we call the reference population. Rejection of the test hypothesis guarantees that the contents of the proposed foods essentially do not include outliers in the population of the contents of the conventional food. The existing tolerance interval (TI0) is constructed to have at least a pre-specified level of the coverage probability. Here, we newly introduce the complementary tolerance interval (TI1) that is guaranteed to have at most a pre-specified level of the coverage probability. By applying TI0 and TI1 to the samples from the target population and the reference population respectively, we construct a test statistic for testing inclusion of the two tolerance intervals. To examine the performance of the testing procedure, we conducted a simulation that reflects the effects of gene and environment, and residual from a crop experiment. As a case study, we applied the hypothesis testing to test if the distribution of the protein content of rice in Kyushu area is included in the distribution of the protein content in the other areas in Japan.

Entities: CellLine Chemical Species

Mesh：

Year: 2015 PMID： 26509690 PMCID： PMC4624947 DOI： 10.1371/journal.pone.0141117

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The safety assessment of genetically modified (GM) foods was confirmed as an important issue in the Organization for Economic Cooperation and Development (OECD) discussion resumed in 1988. Substantial equivalence has been a starting point for the safety assessment for GM foods which is used worldwide since this approach was first suggested in 1993 [1]. Substantial equivalence embodies the concept that if a new food or food component is found to be substantially equivalent to an existing food or feed component, it can be treated in the same manner with respect to safety [2]. To decide if a modified product is substantially equivalent, the product is tested by the manufacturer for unexpected changes in a limited set of components such as toxins, nutrients, or allergens that are present in the unmodified food. Piaggio et al. [3] gave a clear framework of reporting of equivalence randomized trials. Ennis and Ennis [4,5] used an open interval to define equivalence and provided methods for testing a null hypothesis of nonequivalence. McNally et al. [6] proposed applying the generalized test function method in comparison to the confidence interval for assessing population bioequivalence. Herman and Price [7] examined research that has occurred over the past two decades relative to the mechanisms that affect crop composition in GM and traditionally bred crops. In substantial equivalence tests of the population means, it is impossible to prove exact equality, so a buffer margin (c) for the treatment effect is defined. The equivalence is defined as the treatment effect being between c and −c. A broad range of factors affect crop compositions, such as the genetic background [8,9], environmental factors [10,11], and agronomic practices [12,13]. Ricroch et al. [14] reviewed the published studies regarding the effect of genetic modification in comparison with the environmental and intervariety variations. Because the contents vary largely between crop varieties and production environments, the test of substantial equivalence should examine the inclusion of the tolerance intervals of the samples from the two populations, the population of the contents of the proposed food or feed, which we call the target population and denote as POPtar, and the population of the contents of the conventional food or feed, which we call the reference population and denote as POPref (Fig 1).

Fig 1

The distributions of two normal populations, the target population (POPtar) and the reference population (POPref), and the tolerance intervals TI(γ tar, POPtar) and TI(γ ref, POPref).

As an example, γ tar and γ ref were set to 0.05 and 0.01 respectively.

The distributions of two normal populations, the target population (POPtar) and the reference population (POPref), and the tolerance intervals TI(γ tar, POPtar) and TI(γ ref, POPref).

As an example, γ tar and γ ref were set to 0.05 and 0.01 respectively. Statistical tolerance intervals are useful in practical applications in many areas and the construction of tolerance intervals has been extensively studied [15]. Formula for tolerance intervals (regions) for known and unknown mean and variance was given by Proschan [16] for univariate normal distribution and by Chew [17] for multivariate normal distribution. The tolerance interval procedure was developed for balanced one-way random model [18], general linear mixed models for balanced data [19] and unbalanced data [20]. A (1 − γ, 1 − α) tolerance interval (TI0) based on a sample is constructed so that it would include at least a proportion 1 − γ of the sampled population with confidence 1 − α [21]. Such a tolerance interval is usually referred to as (1 − γ)-content-(1 − α)-confidence (coverage) tolerance interval. We introduced the complementary tolerance interval that is guaranteed to have at most a pre-specified level of the coverage probability. A (1 − γ, 1 − α) tolerance interval (TI1) based on a sample is constructed so that it would include at most a proportion 1 − γ of the sampled population with confidence 1 − α. By applying TI0 and TI1 to the samples from the target population and the reference population respectively, the rejection of the test guarantees that the target population essentially does not include outliers in the reference population.

Material and Methods

Two complementary tolerance intervals and two-sample hypothesis testing

We consider a sample X from a Gaussian population N(μ, σ 2). When the sample is collected by simple random sampling, the sample mean follows N(μ, σ 2/R 0), and the sample variance follows (σ 2/m) . R 0 is the sample size, and the degree of freedom, m, is R 0 − 1. By allowing for the uncertainty of the sample mean and the sample variance, the conventional two-sided (1 − γ)-content, (1 − α)-confidence tolerance interval is defined as where denotes the upper 100(1 − γ)% point of the non-central chi-squared distribution with degree of freedom one and non-centrality parameter 1/R 0, and denotes the upper 100α% point of chi-squared distribution with degree of freedom m [22]. The notation ncp stands for non-centrality parameter. R 0 is the ratio of σ 2 over the variance of , and m represents the ratio of 2 over variance of σ 2. The tolerance interval TI0 covers at least 1 − γ of the population with the probability of 1 − α. Here, we introduce a new tolerance interval TI1 defined by where denotes the lower 100α% point of Chi-squared distribution with degree of freedom m. As is seen below, the tolerance interval TI1 covers at most 1 − γ of the population with the probability of 1 − α. By increasing the sample size, the two complementary tolerance intervals both converge to the population tolerance interval. We contrast the tolerance interval of the target population, POPtar, with the tolerance interval of the reference population, POPref. Given the values of γ tar and γ ref (γ tar > γ ref), the null hypothesis is that the tolerance interval of POPtar is not included in the tolerance interval of POPref. The alternative is that the tolerance interval of POPtar is included in the tolerance interval of POPref. To make the dependence of the tolerance intervals on the sample X and population P explicit, we express them as TI0(α, γ, X), TI1(α, γ, X), and TI(γ, P). Our framework of testing substantial equivalence is to test the null hypothesis, H0, against the alternative hypothesis, H1. We define the test statistic as, where Xtar and Xref are the sample from POPtar and POPref respectively. The p value is obtained by locating the test statistic on its distribution for the case of TI(γ tar, POPtar) = TI(γ ref, POPref).

Mixed effect model and the coverage probabilities of the tolerance intervals

The two complementary tolerance intervals (Eqs (2) and (3)) can be generalized for the non-iid samples. The effective sample size, R 0, is obtained by comparing the variance of the estimated global mean with the total variance: . The effective degree of freedom, m, is obtained by equating the estimated variance of the estimated total variance and the expected variance of the estimated total variance by the Satterthwaite’s chi-square approximation: . As an example, we consider the hypothetical samples with random genetic and environmental effects. The hypothetical samples reflect the maize samples of 61 lines from eight multi-site field studies. The field sites represented 47 unique environments in the commercial maize-growing regions of the United States, Canada, Chile and Argentina [23]. The experimental design used at each field site was a randomized complete block design containing three of four blocks. Variances of random components of concentrations of two analytes (tryptophan and oleic acid) were used to generate the simulated data. Table 1 shows the variances of random components of tryptophan and oleic acid. The variance component of environmental effect is large for tryptophan, whereas the genetic effect is the major component for oleic acid.

Table 1

Variance of random components of a maize experiment.

	% Total variance
Analyte	G	E	GxE	B	ε
Tryptophan	6.7	71.6	3.5	0.1	18.1
Oleic acid	55.6	16.4	8.7	0.1	19.3

G, genotype; E, environment; GxE, interaction of genotype and environment; B, block; ε, residual.

G, genotype; E, environment; GxE, interaction of genotype and environment; B, block; ε, residual. Table 2 shows the simulation setting with total number of environment, nE = 50, total number of genotype, nG = 50 and number of blocks per environment, nB = 4. We generated 1,000 sample datasets by normal random numbers with the variances in Table 1. We applied a linear mixed model to each of the dataset, and estimated the total mean and the variance components. The variance of total mean and the total variance, which are required for the calculation of R 0 and m, were estimated by the variance among the 100 runs of parametric bootstrap.

Table 2

Design setup for the simulation study.

	Genotype
Environment	G01−G10	G11−G20	G21−G30	G31−G40	G41−G50
E01−E10	√	√
E11−E20		√	√
E21−E30			√	√
E31−E40				√	√
E41−E50	√				√

The estimated m and R 0 were distributed widely (Fig 2). The means of the estimated m were 98.0 for tryptophan and 149.3 for oleic acid. The means of the estimated R 0 were 65.4 for tryptophan and 70.8 for oleic acid. Fig 3 shows the median, lower and upper 5 percentiles of the coverage probabilities of the tolerance intervals, TI0 and TI1. The coverage probability of TI0 is larger than the nominal coverage probability (1 –γ) with probability 0.95. For the value of γ = 0.01 (see the first dotted vertical line from the left on both panels), with probability 95%, the lower 5 percentiles of coverage probabilities of TI0 were larger than 98.9% and 98.9% for tryptophan and oleic acid respectively; the upper 5 percentiles of coverage probabilities of TI0 were smaller than 99.9% and 99.8% for tryptophan and oleic acid respectively; the medians were 99.6% for both tryptophan and oleic acid. This means that TI0 covers at least 1 − γ of the population with the probability 95%.

Fig 2

The distributions of the estimated effective sample size (R 0) and the effective degree of freedom (m) for tryptophan and oleic acid separately.

Fig 3

The median, lower and upper 5 percentiles of the coverage probabilities of the tolerance intervals, TI0 (dark gray area) and TI1 (light gray area).

The median, lower and upper 5 percentiles of the coverage probabilities of the tolerance intervals, TI0 (dark gray area) and TI1 (light gray area).

For each area, middle, lower, and upper curves represent the median, lower and upper 5 percentiles of the coverage probabilities respectively. The 5 dotted horizontal lines represent the nominal coverage probability (1 –γ) for the 5 marked γ (0.01, 0.02, 0.03, 0.04, and 0.05). On the other hand, the coverage probability of TI1 is smaller than the nominal coverage probability (1 –γ) with probability 0.95. For the value of γ = 0.01 (see the first dotted vertical line form the left on both panels), with probability 95%, the upper 5 percentiles of coverage probabilities of TI1 were smaller than 99.0% for both tryptophan and oleic acid; the lower 5 percentiles of coverage probabilities of TI1 were larger than 95.9% and 96.7% for tryptophan and oleic acid respectively; the medians were 97.8% and 98.1% for tryptophan and oleic acid respectively. This means that TI1 covers at most 1 − γ of the population with the probability 95%.

Results

The p values and the power of the hypothesis test: a simulation study

To investigate the performance of the test procedure, we conducted a simulation study of contrasting two normal populations. The value of γ tar and γ ref were set to 0.05 and 0.01 respectively. The POPtar and POPref are assumed to follow normal distribution with means μ tar = μ ref = 0. By solving the relation TI(γ tar, POPtar) = TI(γ ref, POPref), we obtained σ tar0 = 1.41σ ref0 as the population parameter of the null hypothesis. The sample sizes were fixed to 50 for both POPtar and POPref. The distribution under the null hypothesis was obtained by 10,000 simulation trials. For each value of σ tar = (1 − 0.05i)σ tar0, i = 0, 1, 2, …, 10, 1,000 values of α TI01 were generated randomly. Fig 4A shows the distribution of the p-values and the power of the test with the significance level of 0.05. The p-value followed mostly the uniform distribution when the null hypothesis is real (σ tar /σ ref = 1.41). The power at the null hypothesis was 0.051, which was slightly larger but close to the significance level of 0.05 (Fig 4B). The power increased to 0.606 for the case of σ tar /σ ref = 1.06, and to 0.999 for the case of σ tar /σ ref = 0.78.

Fig 4

The p value (A) and the power (B) of the test with sample size 50 for both target population and reference populations.

See Material and methods for the definition of the defined test statistics α TI01. The dotted line represents the significance level of 0.05.

The p value (A) and the power (B) of the test with sample size 50 for both target population and reference populations.

See Material and methods for the definition of the defined test statistics α TI01. The dotted line represents the significance level of 0.05. To see the effect of sample size, we conducted the simulation for the cases of sample sizes to 100, 150 and 200 for both POPtar and POPref. The power to reject the null hypothesis with the significance level of 0.05 is shown in Table 3. The power for the case of σ tar /σ ref = 1.06 became 0.852 when the sample size was doubled, and 0.987 when the sample size was 200. On the other hand, the power for the case of σ tar /σ ref = 1.41 stayed nearly at the value of 0.05.

Table 3

The power to reject the null hypothesis with significance level of 0.05 for each combination of sample size and the size of σ tar/σ ref.

Sample size	σ _tar/σ _ref
Sample size	1.06	1.13	1.20	1.27	1.34	1.41
50	0.606	0.449	0.287	0.171	0.097	0.051
100	0.852	0.684	0.456	0.246	0.114	0.053
150	0.948	0.816	0.578	0.343	0.164	0.045
200	0.987	0.915	0.708	0.390	0.183	0.058

A case study of testing inclusion of tolerance intervals: Contrasting protein composition of rice in Kyushu against other areas in Japan

As an example of empirical study, we applied the hypothesis testing to test if the protein value of rice in Kyushu area (Kyushu, including prefectures Fukuoka and Kagoshima) was included in the other areas in Japan (Japan). We downloaded the rice component data for Japan from The Food Composition Database for Safety Assessment of Genetically Modified Crops as Foods and Feeds [24,25]. It is third-party data and not owned by the authors. Major varieties of non-glutinous rice cultivated and distributed in Japan were collected from 1999 to 2009 (except for 2003 and 2004). A total of 15 or 16 samples consisting of 10−12 varieties were collected every year. The production areas are located in Japan stretching from the far north to south of the country. Table 4 shows the number of samples of different varieties and production areas. In total, the sample XJapan includes 120 rice samples of varieties and the sample XKyushu includes 18 rice samples of varieties.

Table 4

Number of rice sample of varieties and production areas.

Production area	Variety
(Prefecture)	Aic	Aki	Don	Hae	Han	Hin	Hit	Hos	Kin	Kir	Kos	Mas	Tsu	Yum	Total
Aichi	5														5
Akita		9													9
Aomori												2	4	2	8
Fukui					9						1				10
Fukuoka						9									9
Hokkaido								9		9					18
Hyogo			1												1
Ibaraki											8				8
Iwate							9								9
Kagoshima						9									9
Miyagi							9								9
Nagano		2													2
Niigata			4								9				13
Shiga									9						9
Tochigi											9				9
Yamagata		1		9											10
Total	5	12	5	9	9	18	18	9	9	9	27	2	4	2	138

Aic, Aichinokaori; Aki, Akitakomachi; Don, Dontokoi; Hae, Haenuki; Han, Hanaechizen; Hin, Hinohikari; Hit, Hitomebore; Hos, Hoshinoyume; Kin, Kinuhikari; Kir, Kirara397; Kos, Koshihikari; Mas, Masshigura; Tsu, Tsugaruroman; Yum, Yumeakar. To test if the protein value of rice in Kyushu was included in Japan, we applied TI0 and TI1 to the samples from Kyushu and Japan respectively. The null hypothesis is that the tolerance interval of the protein of rice in Kyushu was not included in the tolerance interval of that in Japan. The alternative is that the tolerance interval of the protein of rice in Kyushu was included in the tolerance interval of that in Japan. The value of γ Kyushu and γ Japan were set to 0.05 and 0.01 respectively. Using a linear mixed-effect model we estimated the total mean of POPJapan as μ Japan = 6.60 and random effects and = 0.05, 0.07 and 0 respectively, and the error term, = 0.19. The total variance = 0.31. The variance of the estimated total mean and that of the estimated total variance were estimated as and respectively by 1,000 runs of parametric bootstrap. These values provide the effective sample size, and the effective degree of freedom, . The sample from POPKyushu is assumed to be an iid sample from normal distribution with mean μ Kyushu = 6.92 and variance = 0.22. In this case, R 0,Kyushu is the sample size, 18, and m Kyushu is R 0,Kyushu − 1 = 17. With these values of R 0’s and m’s, we obtained TI1(α = 0.05, γ = 0.01, XJapan) as (5.34, 7.86) and TI0(α = 0.05, γ = 0.05, XKyushu) as (5.59, 8.24). The latter does not include the former. The value of the test statistic, α TI01, was numerically obtained as 0.247 by solving the Eq (3). We obtained the p value by locating the value of α TI01 on the distribution under the null hypothesis. We generated this distribution by parametric bootstrap, assuming μ Kyushu = μ Japan and σ T,Kyushu = 1.41σ T,Japan. Without losing generosity, we assumed μ Kyushu = μ Japan = 0 and σ T,Japan = 1. The iid sample of Kyushu was generated by normal random numbers with mean 0 and standard deviation 1.41. As for the sample of Japan, we generated the genetic effect (G), environmental effect (E), the G×E interaction and the error term, by decomposing the total variance into the variance components by the relative size of the estimated variance components. We generated 1,000 sets of the data. For each of the simulated data, we estimated the means and the total variances of Kyushu and Japan. We estimated their variances by 100 parametric bootstrap. With these estimates, we obtained R 0’s and m’s, and the value of α TI01. From 1,000 values of α TI01, we obtained the cumulative distribution of α TI01 under the null hypothesis (Fig 5). As a result, we obtained the p value as 0.195.

Fig 5

Distribution of α TI01 obtained by 1,000 simulation trials under the null hypothesis.

The dotted line represents the value of 0.247.

Distribution of α TI01 obtained by 1,000 simulation trials under the null hypothesis.

The dotted line represents the value of 0.247.

Conclusion

In this study, we proposed a hypothesis test of inclusion of tolerance interval using the existing tolerance interval and a newly introduced the complementary tolerance interval. The result of simulation showed that the power of the test for the case of σ tar /σ ref = 1.41 stayed nearly at the value of 0.05 (Fig 4), which means that the testing procedure is almost unbiased. However, the test statistic, α TI01, is complex in form, and we could not attach a direct interpretation to it. We need make further effort to develop candidates of test statistics that measure the extent of coverage or non-coverage of the target population by the reference population. The mixed effect model enables unbiased estimation of the effective sample size and the effective degree of freedom, when the samples consist of subsamples collected in various conditions of genetic factors and environmental factors. However, a survey may be designed to collect the samples of matched controls. Another promising project is to develop a testing procedure for such samples. As an alternative to the testing non-inferiority or substantial equivalence of the population mean, the proposed test examines the “range” of the distribution. A statistical test on the range of the distribution may be useful, especially when it is difficult to formulate the distribution by a simple statistical model. If a large sample is available, it is possible to construct non-parametric tolerance intervals [26,27]. The future study will investigate the statistical property of the non-parametric testing procedure.

7 in total

1. Tests for individual and population bioequivalence based on generalized p-values.

Authors: Richard J McNally; Hari Iyer; Thomas Mathew
Journal: Stat Med Date: 2003-01-15 Impact factor: 2.373

2. Natural variability of metabolites in maize grain: differences due to genetic background.

Authors: Tracey L Reynolds; Margaret A Nemeth; Kevin C Glenn; William P Ridley; James D Astwood
Journal: J Agric Food Chem Date: 2005-12-28 Impact factor: 5.279

3. Availability and utility of crop composition data.

Authors: Kazumi Kitta
Journal: J Agric Food Chem Date: 2013-06-18 Impact factor: 5.279

Review 4. Evaluation of genetically engineered crops using transcriptomic, proteomic, and metabolomic profiling techniques.

Authors: Agnès E Ricroch; Jean B Bergé; Marcel Kuntz
Journal: Plant Physiol Date: 2011-02-24 Impact factor: 8.340

5. Unintended compositional changes in genetically modified (GM) crops: 20 years of research.

Authors: Rod A Herman; William D Price
Journal: J Agric Food Chem Date: 2013-02-25 Impact factor: 5.279

6. Model-based tolerance intervals derived from cumulative historical composition data: application for substantial equivalence assessment of a genetically modified crop.

Authors: Bonnie Hong; Tracey L Fisher; Theresa S Sult; Carl A Maxwell; James A Mickelson; Hirohisa Kishino; Mary E H Locke
Journal: J Agric Food Chem Date: 2014-09-29 Impact factor: 5.279

7. Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement.

Authors: Gilda Piaggio; Diana R Elbourne; Stuart J Pocock; Stephen J W Evans; Douglas G Altman
Journal: JAMA Date: 2012-12-26 Impact factor: 56.272

7 in total