| Literature DB >> 18941524 |
Wonkuk Kim1, Derek Gordon, Jonathan Sebat, Kenny Q Ye, Stephen J Finch.
Abstract
Recent studies suggest that copy number polymorphisms (CNPs) may play an important role in disease susceptibility and onset. Currently, the detection of CNPs mainly depends on microarray technology. For case-control studies, conventionally, subjects are assigned to a specific CNP category based on the continuous quantitative measure produced by microarray experiments, and cases and controls are then compared using a chi-square test of independence. The purpose of this work is to specify the likelihood ratio test statistic (LRTS) for case-control sampling design based on the underlying continuous quantitative measurement, and to assess its power and relative efficiency (as compared to the chi-square test of independence on CNP counts). The sample size and power formulas of both methods are given. For the latter, the CNPs are classified using the Bayesian classification rule. The LRTS is more powerful than this chi-square test for the alternatives considered, especially alternatives in which the at-risk CNP categories have low frequencies. An example of the application of the LRTS is given for a comparison of CNP distributions in individuals of Caucasian or Taiwanese ethnicity, where the LRTS appears to be more powerful than the chi-square test, possibly due to misclassification of the most common CNP category into a less common category.Entities:
Mesh:
Year: 2008 PMID: 18941524 PMCID: PMC2566806 DOI: 10.1371/journal.pone.0003475
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 11a and 1b. In this figures, we present probability density plots for statistical distributions that are mixtures of four univariate normal distributions with equally spaced means 1, 2, 3, and 4, and a common variance.
In Figure 1a, the variance of each component distribution is . In Figure 1b, the variance of each component distribution is ¼.
Simulation results of the null distribution of chi-squared test.
| Sample size | Proportions |
| KS-Test P-value | ||||
| 0.975 Level | 0.10 Level | 0.05 Level | 0.025 Level | 0.01 Level | |||
| 200 | (0.25, 0.25, 0.25, 0.25) | 0.976 | 0.107 | 0.042 | 0.018 | 0.005 | 0.72 |
| 500 | (0.25, 0.25, 0.25, 0.25) | 0.971 | 0.092 | 0.047 | 0.025 | 0.007 | 0.78 |
| 200 | (0.1, 0.2, 0.3, 0.4) | 0.979 | 0.094 | 0.046 | 0.018 | 0.006 | 0.54 |
| 500 | (0.1, 0.2, 0.3, 0.4) | 0.983 | 0.106 | 0.056 | 0.036 | 0.010 | 0.36 |
Based on 1000 replications for each settings.
Simulation results of the null distribution of LRTS.
| Sample size | Proportions |
| KS-Test P-value | ||||
| 0.975 Level | 0.10 Level | 0.05 Level | 0.025 Level | 0.01 Level | |||
| 200 | (0.25, 0.25, 0.25, 0.25) | 0.979 | 0.103 | 0.045 | 0.015 | 0.007 | 0.81 |
| 500 | (0.25, 0.25, 0.25, 0.25) | 0.971 | 0.097 | 0.052 | 0.021 | 0.013 | 0.79 |
| 200 | (0.1, 0.2, 0.3, 0.4) | 0.977 | 0.106 | 0.046 | 0.020 | 0.005 | 0.34 |
| 500 | (0.1, 0.2, 0.3, 0.4) | 0.982 | 0.109 | 0.060 | 0.028 | 0.011 | 0.41 |
Based on 1000 replications for each settings.
Simulation results for LRTS under alternative distributions.
| MOI | Method to calculate power |
| KS-Test P-value | ||
| 10−3 Level | 10−4 Level | 10−5 Level | |||
| Dosage | Simulation | 0.958 (0.946, 0.970) | 0.866 (0.845, 0887) | 0.735 (0.708, 0.762) | 0.01 |
| Asymptotic | 0.949 | 0.856 | 0.712 | ||
| Extremes | Simulation | 0.950 (0.936, 0.964) | 0.857 (0.835, 0.879) | 0.738 (0.711, 0.765) | 0.07 |
| Asymptotic | 0.946 | 0.848 | 0.700 | ||
Legend for Table 2. Based on 1000 replications and 200 sample size per case/control group.
95% approximate confidence intervals for simulated power are given in parentheses.
Here, we present simulated and asymptotic power for the LRTS when the alternative hypothesis that mixing proportions are different in each of two groups is true. The mixing proportions are computed using equations (4) for the Dosage and Extremes models, where CNP population frequencies are as specified above (Methods - Genetic model parameters for efficiency analysis). For the Dosage model, the relative risks are: R 2 = 1.8, R 3 = 1.82 = 3.64, R 4 = 1.83 = 5.83. For the Extremes model, the relative risks are: R 1 = 1, R 2 = 0.3, R 3 = 0.3, R 4 = 1. Asymptotic power is computed using the non-centrality parameter documented in equation (A1). The column “KS-Test P-value” refers to the p-value computed using the Kolmogoroff-Smirnoff goodness of fit test, as implemented in R programming environment.
Figure 2Here we present the relative efficiency Eff (defined in Methods) of the chi-square test of independence in relation to the LRTS as a function of separation () between the four component distributions that comprise the mixture distribution.
All information regarding parameter specification for the Dosage and Extremes models for which relative efficiencies are calculated is presented in the Methods section (Genetic model parameters for efficiency analysis).
Parameter estimation with 3 component normal mixtures for probe P4077 ratio data.
| Hypothesis | Estimated parameters | CNP Category | ||
|
|
|
| ||
| Null ( | Mixing proportions | 0.815 | 0.179 | 0.006 |
| Means ( | 1.062 | 1.446 | 2.191 | |
| Alternative ( | Mixing proportions for Taiwanese ( | 0.626 | 0.362 | 0.011 |
| Mixing proportions for Caucasians ( | 0.843 | 0.152 | 0.005 | |
| Means ( | 1.056 | 1.420 | 2.180 | |
Legend for Table 4. Data are determined for 261 individuals of Caucasian ethnicity and 88 individuals of Taiwanese ethnicity. The estimated variance (η) under both the null and alternative hypotheses is 0.03.
Figure 3In these figures, we provide histograms of P4077 probe ratio data for Taiwanese, Caucasian and Combined (Taiwanese and Caucasian) samples.
We also provide a fitted probability density function line for each data set. These graphs were created using the R programming environment. The horizontal axis labeled “MEASUREMENT” refers to each individual's probe ratio data value (after log transform) for the P4077 probe.