| Literature DB >> 23476151 |
Yan Guo1, David C Samuels, Jiang Li, Travis Clark, Chung-I Li, Yu Shyr.
Abstract
Next-generation sequencing (NGS) technology has provided researchers with opportunities to study the genome in unprecedented detail. In particular, NGS is applied to disease association studies. Unlike genotyping chips, NGS is not limited to a fixed set of SNPs. Prices for NGS are now comparable to the SNP chip, although for large studies the cost can be substantial. Pooling techniques are often used to reduce the overall cost of large-scale studies. In this study, we designed a rigorous simulation model to test the practicability of estimating allele frequency from pooled sequencing data. We took crucial factors into consideration, including pool size, overall depth, average depth per sample, pooling variation, and sampling variation. We used real data to demonstrate and measure reference allele preference in DNAseq data and implemented this bias in our simulation model. We found that pooled sequencing data can introduce high levels of relative error rate (defined as error rate divided by targeted allele frequency) and that the error rate is more severe for low minor allele frequency SNPs than for high minor allele frequency SNPs. In order to overcome the error introduced by pooling, we recommend a large pool size and high average depth per sample.Entities:
Mesh:
Year: 2013 PMID: 23476151 PMCID: PMC3582166 DOI: 10.1155/2013/895496
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Allele balance for 3 independent datasets.
| Dataset | Sample | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Mean 95% conf. lo. | Mean 95% conf. hi. |
|---|---|---|---|---|---|---|---|---|---|
| 1055QC0003 | 0.091 | 0.423 | 0.48 | 0.48 | 0.536 | 0.862 | 0.476 | 0.483 | |
| 1055QC0004 | 0.1 | 0.427 | 0.477 | 0.48 | 0.53 | 0.826 | 0.477 | 0.483 | |
| 1055QC0005 | 0.046 | 0.429 | 0.481 | 0.482 | 0.536 | 0.939 | 0.479 | 0.486 | |
| 1055QC0006 | 0.1 | 0.418 | 0.478 | 0.481 | 0.542 | 0.909 | 0.477 | 0.485 | |
| 1055QC0007 | 0.156 | 0.417 | 0.476 | 0.475 | 0.536 | 0.879 | 0.472 | 0.479 | |
| 1055QC0008 | 0.148 | 0.421 | 0.481 | 0.482 | 0.542 | 0.905 | 0.479 | 0.486 | |
| 1055QC0009 | 0.148 | 0.422 | 0.478 | 0.48 | 0.536 | 0.963 | 0.476 | 0.483 | |
| 1055QC0011 | 0.1 | 0.421 | 0.481 | 0.48 | 0.538 | 0.952 | 0.477 | 0.484 | |
| 1055QC0012 | 0.095 | 0.429 | 0.478 | 0.48 | 0.531 | 1 | 0.477 | 0.483 | |
| 1055QC0013 | 0.165 | 0.424 | 0.482 | 0.482 | 0.541 | 0.9 | 0.479 | 0.486 | |
| SureSelect | 1055QC0014 | 0.103 | 0.429 | 0.481 | 0.483 | 0.538 | 0.818 | 0.48 | 0.487 |
| 1055QC0016 | 0.13 | 0.425 | 0.48 | 0.482 | 0.54 | 0.909 | 0.478 | 0.485 | |
| 1055QC0017 | 0.136 | 0.422 | 0.481 | 0.48 | 0.536 | 0.9 | 0.477 | 0.483 | |
| 1055QC0018 | 0.182 | 0.424 | 0.48 | 0.48 | 0.537 | 0.987 | 0.477 | 0.483 | |
| 1055QC0020 | 0.2 | 0.432 | 0.483 | 0.485 | 0.536 | 0.815 | 0.482 | 0.488 | |
| 1055QC0021 | 0.12 | 0.429 | 0.481 | 0.484 | 0.538 | 1 | 0.48 | 0.487 | |
| 1055QC0022 | 0.091 | 0.424 | 0.478 | 0.479 | 0.533 | 0.905 | 0.476 | 0.482 | |
| 1055QC0024 | 0.077 | 0.422 | 0.478 | 0.478 | 0.535 | 0.857 | 0.474 | 0.481 | |
| 1055QC0025 | 0.13 | 0.429 | 0.481 | 0.484 | 0.54 | 0.897 | 0.481 | 0.488 | |
| 1055QC0026 | 0.13 | 0.42 | 0.478 | 0.479 | 0.539 | 0.793 | 0.476 | 0.482 | |
| 1055QC0028 | 0.039 | 0.419 | 0.477 | 0.476 | 0.531 | 0.938 | 0.472 | 0.479 | |
|
| |||||||||
| 10009 | 0.044 | 0.447 | 0.5 | 0.499 | 0.55 | 1 | 0.496 | 0.501 | |
| 10244 | 0.091 | 0.444 | 0.5 | 0.497 | 0.55 | 0.909 | 0.495 | 0.499 | |
| TruSeq | 10290 | 0.065 | 0.444 | 0.5 | 0.497 | 0.55 | 0.917 | 0.495 | 0.499 |
| 20007 | 0.077 | 0.447 | 0.5 | 0.498 | 0.55 | 0.923 | 0.496 | 0.5 | |
| 20017 | 0.044 | 0.447 | 0.5 | 0.498 | 0.55 | 0.921 | 0.496 | 0.5 | |
| 20301 | 0.077 | 0.449 | 0.5 | 0.499 | 0.55 | 0.967 | 0.497 | 0.501 | |
|
| |||||||||
| ERR004043 | 0.04 | 0.376 | 0.44 | 0.447 | 0.511 | 0.986 | 0.44 | 0.453 | |
| ERR004047 | 0.125 | 0.391 | 0.447 | 0.451 | 0.503 | 1 | 0.446 | 0.457 | |
| Array based | SRR013908 | 0.081 | 0.37 | 0.475 | 0.481 | 0.584 | 0.977 | 0.472 | 0.489 |
| SRR013909 | 0.071 | 0.372 | 0.476 | 0.484 | 0.591 | 0.95 | 0.476 | 0.492 | |
| SRR015428 | 0.093 | 0.389 | 0.488 | 0.49 | 0.586 | 0.909 | 0.483 | 0.498 | |
| SRR015429 | 0.1 | 0.426 | 0.496 | 0.497 | 0.564 | 0.913 | 0.491 | 0.503 | |
|
| |||||||||
| All | Mean | 0.103 | 0.421 | 0.482 | 0.483 | 0.543 | 0.919 | 0.479 | 0.487 |
Figure 1Relative RMSE for different pool sizes and MAFs under different standard deviations.
Figure 2Relative RMSE for different pool sizes and MAFs under different average per sample depths.
Statistics for doing 10,000 simulations at different MAFs.
| MAF | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Var. | Relative RMSE |
|---|---|---|---|---|---|---|---|---|
| 0.5 | 0.0000 | 0.0036 | 0.0049 | 0.0050 | 0.0064 | 0.0162 | 0.0000 | 0.5037 |
| 1 | 0.0000 | 0.0075 | 0.0098 | 0.0100 | 0.0124 | 0.0264 | 0.0000 | 0.3552 |
| 5 | 0.0256 | 0.0448 | 0.0500 | 0.0500 | 0.0551 | 0.0795 | 0.0001 | 0.1540 |
| 10 | 0.0615 | 0.0928 | 0.0999 | 0.1000 | 0.1071 | 0.1401 | 0.0001 | 0.1070 |
| 20 | 0.1444 | 0.1904 | 0.2000 | 0.2001 | 0.2098 | 0.2558 | 0.0002 | 0.0716 |
| 30 | 0.2449 | 0.2889 | 0.2997 | 0.2998 | 0.3106 | 0.3619 | 0.0003 | 0.0537 |
| 40 | 0.3397 | 0.3879 | 0.3998 | 0.4000 | 0.4118 | 0.4707 | 0.0003 | 0.0442 |
| 50 | 0.4348 | 0.4877 | 0.5000 | 0.4998 | 0.5116 | 0.5675 | 0.0003 | 0.0359 |
Figure 31000 Genome MAF distributions.
Figure 4Median error rates for simulating 1000 exome sequences using different numbers of lanes. Simulation on 2 lanes shows nearly 30% error, and only around 5% error rate is observed for 16 lanes simulation.
Pooled and individual sequencing pricing.
| Sequencing per pool | 200 | 400 | 600 | 800 | 1000 |
|---|---|---|---|---|---|
| 2 lanes | $3,650 | $4,050 | $4,450 | $4,850 | $5,250 |
| 4 lanes | $6,650 | $7,050 | $7,450 | $7,850 | $8,250 |
| 6 lanes | $9,650 | $10,050 | $10,450 | $10,850 | $11,250 |
| 8 lanes | $12,650 | $13,050 | $13,450 | $13,850 | $14,250 |
| 10 lanes | $15,650 | $16,050 | $16,450 | $16,850 | $17,250 |
| 12 lanes | $18,650 | $19,050 | $19,450 | $19,850 | $20,250 |
| 16 lanes | $24,650 | $25,050 | $25,450 | $25,850 | $26,250 |
| Individual prep. | $125,000 | $250,000 | $375,000 | $500,000 | $625,000 |