| Literature DB >> 24992588 |
Vera N Rykalina1, Alexey A Shadrin2, Vyacheslav S Amstislavskiy3, Evgeny I Rogaev4, Hans Lehrach3, Tatiana A Borodina5.
Abstract
Hybridization-based target enrichment protocols require relatively large starting amounts of genomic DNA, which is not always available. Here, we tested three approaches to pre-capture library preparation starting from 10 ng of genomic DNA: (i and ii) whole-genome amplification of DNA samples with REPLI-g (Qiagen) and GenomePlex (Sigma) kits followed by standard library preparation, and (iii) library construction with a low input oriented ThruPLEX kit (Rubicon Genomics). Exome capture with Agilent SureSelectXT2 Human AllExon v4+UTRs capture probes, and HiSeq2000 sequencing were performed for test libraries along with the control library prepared from 1 µg of starting DNA. Tested protocols were characterized in terms of mapping efficiency, enrichment ratio, coverage of the target region, and reliability of SNP genotyping. REPLI-g- and ThruPLEX-FD-based protocols seem to be adequate solutions for exome sequencing of low input samples.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24992588 PMCID: PMC4081514 DOI: 10.1371/journal.pone.0101154
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The experimental scheme.
Two DNA samples (Test DNA 1 and Test DNA 2) were subjected to four exome sequencing (ES) protocols performed in parallel: control (Standard ES) and three modified (REPLI-g ES, GenomePlex ES and ThruPLEX-FD ES). Common steps performed in parallel for several protocols are shown by text boxes spanning the corresponding number of protocol columns.
Features of the target region.
| Number ofsegments | Total lengthof target region(Mb) | Mean lengthof the segment(bp) | Median lengthof thesegment (bp) | Maximum lengthof the segment(bp) | Minimum lengthof the segment(bp) |
| 199268 | 70.37 | 353.1 | 203 | 21747 | 114 |
Alignment statistics.
| LibraryPreparationmethod | Number of rawreads (Mb of seq) | Percentage ofduplicates (% ofraw reads) | Percentage of high-confident | Percentage of high-confidentreads mapped uniquelyto hg19 (% of raw reads reads) | Percentage of high-confident readsmapped uniquelyto FR | Percentage of high-confident readsmapped uniquelyto TR(% of rawreads) | |||
| Total | Mate is mapped(% of total) | Mate is onthe samechromosome (% oftotal) | Mate is onthe samechromosome and hasproperorientation (% of total) | ||||||
| Standard ES | |||||||||
| Test DNA 1 | 95033226 (9598) | 20.21 | 75.62 | 71.82 | 99.51 | 99.26 | 99.25 | 50.20 | 47.97 |
| Test DNA 2 | 173021034 (17475) | 25.07 | 71.31 | 68.02 | 99.79 | 99.73 | 99.73 | 49.06 | 47.34 |
| GenomePlex ES | |||||||||
| Test DNA 1 | 65957628 (5740) | 18.21 | 69.54 | 63.09 | 96.58 | 95.94 | 95.21 | 31.87 | 30.40 |
| Test DNA 2 | 70253046 (6252) | 20.58 | 63.54 | 57.49 | 95.39 | 94.09 | 93.58 | 31.95 | 31.00 |
| ThruPLEX-FD ES | |||||||||
| Test DNA 1 | 60302154 (6091) | 34.43 | 61.02 | 57.83 | 99.04 | 97.45 | 97.41 | 40.26 | 38.25 |
| Test DNA 2 | 81220550 (8203) | 45.38 | 50.10 | 47.54 | 98.93 | 97.08 | 97.03 | 31.73 | 30.28 |
| REPLI-g ES | |||||||||
| Test DNA 1 | 89106596 (8999) | 20.70 | 75.59 | 71.90 | 99.66 | 99.53 | 99.53 | 50.99 | 49.03 |
| Test DNA 2 | 146075078 (14754) | 25.30 | 71.64 | 68.38 | 99.61 | 99.54 | 99.34 | 49.72 | 48.04 |
*high confident reads-reads with probability of wrong mapping lower than 0.05 according to their MAPQ score (MAPQ>13).
**some of GenomePlex ES library reads contained sequences of the primer used for whole genome amplification. These common segments were cut out before the alignment. As a result, 13.8% of and 11.9% of nucleotides were removed from the reads of the Test DNA 1 and Test DNA 2 libraries, respectively.
***FR-flanking regions (FR), which include 100bp from both ends of the targeted sequences.
Coverage statistics for the target region.
| Amplification method | Mean coverage | Coverage depth (% of bases in TR) | |||||||
| 0 | 1–10 | 11–20 | 21–30 | 31–40 | 41–50 | 51–60 | 61+ | ||
| Standard ES | |||||||||
| Test DNA 1 | 20.80 | 1.56 | 30.75 | 29.95 | 17.49 | 9.12 | 4.70 | 2.48 | 3.83 |
| Test DNA 2 | 20.07 | 2.46 | 34.32 | 26.60 | 15.65 | 8.82 | 4.95 | 2.81 | 4.05 |
| GenomePlex ES | |||||||||
| Test DNA 1 | 19.59 | 8.85 | 39.18 | 21.20 | 11.53 | 6.55 | 3.89 | 2.44 | 5.92 |
| Test DNA 2 | 17.79 | 11.14 | 39.75 | 19.97 | 10.95 | 6.25 | 3.71 | 2.29 | 5.06 |
| ThruPLEX-FD ES | |||||||||
| Test DNA 1 | 19.90 | 1.27 | 26.61 | 32.28 | 20.85 | 10.51 | 4.71 | 2.06 | 1.62 |
| Test DNA 2 | 19.07 | 1.56 | 26.95 | 31.34 | 21.94 | 11.57 | 4.60 | 1.40 | 0.41 |
| REPLI-g ES | |||||||||
| Test DNA 1 | 20.92 | 1.99 | 30.96 | 28.39 | 17.17 | 9.48 | 5.11 | 2.79 | 3.98 |
| Test DNA 2 | 20.01 | 3.41 | 36.06 | 25.67 | 14.58 | 8.08 | 4.56 | 2.61 | 4.65 |
Analysis was performed on subsets of reads uniquely mapped to the target region and having approximately equal total amounts of bases (∼17×108 bases).
Figure 2Per-base sequencing depth distribution on the target region.
Figure 3Coverage distribution along the target regions with different percentages of GC bases.
Figure 4Profiles of coverage depth along the target region for Test DNA 1 (upper panel) and Test DNA 2 (lower panel) WES libraries.
Pearson correlation coefficient with coverage profile of Standard ES strategy and average deviation from Standard ES coverage profile.
| Strategy | Pearson correlationwith Standard EScoverage profile | Average deviationfrom StandardES coverage profile |
| ThruPLEX-FD ES, Test DNA1 (DNA2) | 0.986 (0.980) | 0.318 (0.681) |
| REPLI-g ES, Test DNA1 (DNA2) | 0.809 (0.755) | 1.478 (1.950) |
| GenomePlex ES, Test DNA1 (DNA2) | 0.589 (0.947) | 3.149 (2.233) |
Figure 5Sharing of genetic variations between strategies depicted in a Venn diagram.
Only variation with minimum depth of coverage of 20x and minimum quality of 13 were taken into account in all four strategies. The names of the samples are abbreviated: Standard ES = St; ThruPLEX-FD ES = Tp; REPLI-g ES = Rg; GenomePlex ES = Gp. The lower left tile presents the overall statistics, where “Total” indicates the number of all unique SNVs found in the region of interest, i.e. the union of SNV sets found by each strategy.
Comparison of SNVs found in regions with coverage > = 20 in both Standard ES and one of three tested strategies.
| Strategy | % of TR coveredwith depth≥20for both testedand StandardES libraries | SharedSNVs | Exclusive SNVspresent onlyin StandardES | Exclusive SNVsPresent onlyin tested ES | Discovery rate(found by testedES)/(Total found) |
| REPLI-g ES, Test DNA1 (DNA2) | 34.92 (32.91) | 18890 (17561) | 1408 (1091) | 1281 (1093) | 0.9348 (0.9447) |
| ThruPLEX-FD ES, Test DNA1 (DNA2) | 36.95 (37.80) | 19836 (19722) | 1592 (1803) | 1152 (1273) | 0.9295 (0.9209) |
| GenomePlex ES, Test DNA1 (DNA2) | 26.27 (29.32) | 13010 (14180) | 1661 (2226) | 803 (789) | 0.8927 (0.8705) |
Only high-confidence (probability of false positive detection <0.05) SNVs were taken into account.