| Literature DB >> 20565776 |
Matthew N Bainbridge1, Min Wang, Daniel L Burgess, Christie Kovar, Matthew J Rodesch, Mark D'Ascenzo, Jacob Kitzman, Yuan-Qing Wu, Irene Newsham, Todd A Richmond, Jeffrey A Jeddeloh, Donna Muzny, Thomas J Albert, Richard A Gibbs.
Abstract
We have developed a solution-based method for targeted DNA capture-sequencing that is directed to the complete human exome. Using this approach allows the discovery of greater than 95% of all expected heterozygous singe base variants, requires as little as 3 Gbp of raw sequence data and constitutes an effective tool for identifying rare coding alleles in large scale genomic studies.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20565776 PMCID: PMC2911110 DOI: 10.1186/gb-2010-11-6-r62
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
A comparison of different capture methodologies
| Study | Capture type | Reactors | Capture size | Sequencer type |
|---|---|---|---|---|
| Ng | Array | Multiple array | >30 Mbp | Illumina |
| Choi | Array | Single array | >30 Mbp | Illumina |
| Gnirke | Solution | Single tube | <5 Mbp | Illumina |
| This study | Solution | Single tube | >30 Mbp | SOLiD/Illumina |
Figure 1Normalized coverage of replicate SOLiD libraries 2 to 4 versus normalized coverage of replicate library 1. Average coverage for each target region in library 1 is plotted against library 2 (blue), library 3 (green) and library 4 (red). Coverage for each target is represented as a proportion of the total sequence generated. The line X = Y indicates approximately equal levels of coverage in both libraries.
Alignment statistics for SOLiD frag sequencing libraries
| SOLiD library | ||||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| Total reads aligned | 199,704,874 | 53,558,797 | 47,640,266 | 46,687,848 |
| Total data aligned (Gbp) | 9.99 | 2.68 | 2.38 | 2.33 |
| Reads on target (%) | 50.1 | 50.9 | 49.5 | 48.0 |
| Duplicate reads (%) | 32.8 | 22.1 | 22.5 | 20.9 |
| Mean coverage (X)a | 43 | 22 | 20 | 19 |
| Median coverage (X)a | 42 | 19 | 16 | 15 |
| Targets hit (%) | 99.3 | 98.44 | 98.4 | 98.1 |
| Bases ≥1× coverage (%) | 97.6 | 94.31 | 93.5 | 92.5 |
| Bases ≥10× coverage (%)a | 89.4 | 70.8 | 65.9 | 64.1 |
| Bases ≥20× coverage (%)a | 78.9 | 48.2 | 42.0 | 40.5 |
aCalculated after duplicate read removal.
Figure 2Average coverage for each target region as sequenced by Illumina 75-bp frag reads plotted against SOLiD library 1. Coverage for each target is normalized for the total sequence aligned to target. The line X = Y indicates approximately equal levels of coverage in both libraries.
Alignment statistics for Illumina PE and frag sequencing libraries
| Illumina Frag | Illumina PE | |
|---|---|---|
| Total reads aligned | 33,524,973 | 37,832,835 |
| Total data aligned (Gbp) | 2.51 | 2.84 |
| Reads on target (%) | 67.62 | 78.0 |
| Duplicate reads (%) | 30.97 | 8.3 |
| Mean coverage (X)a | 24 | 52 |
| Median coverage (X)a | 20 | 40 |
| Targets hit (%) | 99.36 | 99.6 |
| Bases ≥1× Coverage (%) | 96.39 | 98.9 |
| Bases ≥10× Coverage (%)a | 71.33 | 90.8 |
| Bases ≥20× Coverage (%)a | 51.23 | 76.9 |
aCalculated after duplicate read removal.
Variant discovery and HapMap concordance for different sequencing types and varying amounts of sequence data
| Illumina | SOLiD | ||||
|---|---|---|---|---|---|
| Frag | PE | PE (high stringency) | 1 | 1 | |
| Bases produced (Gbp) | 2.51 | 2.84 | 3.4 | 9.99 | |
| Bases on target after duplicate removal (Gbp) | 1.04 | 2.01 | 0.59 | 1.72 | |
| Total SNPs | 21,239 | 27,953 | 26,489 | 19,790 | 24,077 |
| dbSNP SNPs | 19,525 | 23,745 | 23,133 | 18,016 | 21,350 |
| dbSNP (%) | 91.9 | 84.95 | 87.3 | 91.04 | 88.67 |
| HapMap variant concordance (%) | 83.0 | 96.0 | 95.8 | 81.6 | 92.9 |
| Variant concordance (>9× coverage) (%) | 95.5 | 98.5 | 98.2 | 94.5 | 97.2 |
Figure 3Coverage distribution across target regions of SOLiD libraries 1 (10 Gbp) and 2 (3 Gbp) and Illumina PE and frag libraries. The number of bases at each level of coverage for each library type is shown for approximately 10 Gbp of SOLiD data (green), approximately 3 Gbp of SOLiD data (red), approximately 3 Gbp of Illumina PE data (yellow) and approximately 3 Gbp of Illumian frag data (blue) after duplicate removal.