| Literature DB >> 25485788 |
Weihua Guan1, Chun Li2.
Abstract
BACKGROUND: Rapid advances in next-generation sequencing technologies facilitate genetic association studies of an increasingly wide array of rare variants. To capture the rare or less common variants, a large number of individuals will be needed. However, the cost of a large scale study using whole genome or exome sequencing is still high. DNA pooling can serve as a cost-effective approach, but with a potential limitation that the identity of individual genomes would be lost and therefore individual characteristics and environmental factors could not be adjusted in association analysis, which may result in power loss and a biased estimate of genetic effect.Entities:
Mesh:
Year: 2014 PMID: 25485788 PMCID: PMC4259344 DOI: 10.1371/journal.pone.0114523
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Characteristics of simulation models.
| Model | n | RAF | corr | ORg | ORz | αmax | note |
| 1 | 1000 | .01 | 0 | 1.0 | (1.5, 1.5, 1.5, 1.5, 1.5) | 0 | Base model |
| 2 | 1000 | .01 |
| 1.0 | (1.5, 1.5, 1.5, 1.5, 1.5) | 0 | Correlated G and Z |
| 3 | 1000 | .01 |
| 1.0 |
|
| Unequal %sample |
| 4 |
| .01 |
| 1.0 |
| 0 | Large sample size |
| 5 | 1000 | .01 | 0 | 3.5 | (1.5, 1.5, 1.5, 1.5, 1.5) | 0 | Base model |
| 6 | 1000 | .01 |
| 3.5 | (1.5, 1.5, 1.5, 1.5, 1.5) | 0 | Correlated G and Z |
| 7 | 1000 | .01 |
| 3.5 |
| 0 | Not all Z observed |
| 8 | 1000 | .01 |
| 3.5 |
| 0 | Varying effect of Z |
| 9 |
| .01 |
| 3.5 |
| 0 | Large sample size |
| 10 | 1000 | .01 |
| 3.5 |
|
| Unequal %sample |
*. The simulation model consists of 5 covariates, each with OR of 1.5. In analysis, we assume that only the last two covariates are considered.
$. In analysis, a more stringent threshold (10−4) is used for significance, compared to other simulation models (.05).
n: number of cases, assuming case:control ratio of 1∶1; RAF: risk allele frequency; ORg: odds ratio for risk allele; ORz: odds ratios for covariates; corr: correlation coefficient between causal variant and the last covariate; αmax: variation in sample proportions (see “Methods”). Model 1–4 were simulated under the null hypothesis of no association; and model 5–12 were under the alternative hypothesis. Model 1 and 5 were treated as baseline models, and changes of parameters in other models were highlighted.
Type 1 error for multiple-imputation based pooling method (“poolMI”), individual sequencing of all samples (“seqall”) and pooling without considering other risk factors (“poolunivariate”).
| Model | seqall | poolunivariate | poolMI |
| 1 | .044 | .046 | .043 |
| 2 | .046 | .458 | .048 |
| 3 | .058 | .368 | .060 |
| 4 | 1.1E-4 | .040 | 1.2E-4 |
The significance level = .05 for model 1–3, and 10−4 for model 4. Number of simulations is 1000 for model 1–3, and 100,000 for model 4.
Power for multiple-imputation based pooling method (“poolMI-prob”), individual sequencing of all samples (“seqall”) and pooling without considering other risk factors (“poolunivariate”).
| Model | seqall | poolunivariate | poolMI |
| 5 | .71 | .42 | .66 |
| 6 | .63 | .35 | .59 |
| 7 | .42 | .25 | .37 |
| 8 | .52 | .30 | .47 |
| 9 | .71 | .24 | .64 |
| 10 | .50 | .25 | .48 |
*. Power adjusted for the nominal false positive rates.
The significance level = .05 (10−4 for model 11). Number of simulations is 1000.
Figure 1Power for individual sequencing of all samples, pooling with individual genotype imputed, and pooling without considering other risk factors.
The simulation setting is described in Table 1, model 5, but with different odds ratio for the risk allele (ORg). Number of simulations is 200 for each setting.
Figure 2Power for individual sequencing of all samples, pooling with individual genotype imputed, and pooling without considering other risk factors.
The simulation setting is similar to that described in Table 1, model 5, but with different risk allele frequency (RAF) with n = 5000 cases/controls, and ORg = 2. Number of simulations is 200 for each setting.
Figure 3Power for individual sequencing of all samples, pooling with individual genotype imputed, and pooling without considering other risk factors.
The simulation setting is described in Table 1, model 5, but with different odds ratio for the covariates (ORz). Number of simulations is 200 for each setting.
Type 1 error (model 2) and power (model 6) for multiple-imputation based pooling method with sequencing error rate of 0.5%, 1%, and 2%.
| Model | Sequencing error rate | ||
| 0.5% | 1% | 2% | |
| 2 | .051 | .053 | .056 |
| 6 | .66 | .67 | .62 |
Number of simulations is 1000.
Figure 4Design of DNA pooling with sample matching.
After sample matching and pool creation, the pools are grouped into K groups, with allele frequency in each group denoted by (p 1, …, p). Pools from the same groups are randomly distributed into M lanes, with sequencing errors (e 1, …, e).