| Literature DB >> 26602380 |
Praveen F Cherukuri1,2, Valerie Maduro3, Karin V Fuentes-Fajardo4, Kevin Lam5, David R Adams6,7, Cynthia J Tifft8,9, James C Mullikin10, William A Gahl11,12, Cornelius F Boerkoel13.
Abstract
BACKGROUND: Whole-exome sequencing (WES) is rapidly evolving into a tool of choice for rapid, and inexpensive identification of molecular genetic lesions within targeted regions of the human genome. While biases in WES coverage of nucleotides in targeted regions are recognized, it is not well understood how repetition of WES improves the interpretation of sequencing results in a clinical diagnostic setting.Entities:
Mesh:
Year: 2015 PMID: 26602380 PMCID: PMC4659195 DOI: 10.1186/s12864-015-2107-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Independent technical replicate target exome capture and aggregate data alignment results for a six member multiple generation family. a Probability density function plots of three independent library targeted exome capture experiments as a function of depth of sequencing per targeted base (X) categorized by each member of the six member 3 generation family. b Mean depth of coverage as a function of total bases aligned in targeted exome region (62 Mb) for all 18 technical replicates, and aggregate data for 6 individuals derived by merging 3 technical replicate captures per individual. c Percent targeted bases sequenced at ≥1x, ≥10x, and ≥20x thresholds as a function of total number of bases aligned in targeted exome region (62 Mb) for technical replicate and aggregate data for each individual. Black lines show the predicted local polynomial regression (loess) fit to data with default span value of 0.75, and red dashed lines represent predicted 95 % confidence interval along the predicted line
Whole exome sequencing mean target coverage depth and percent target coverage statistics of replicate and aggregate data
| WES sample | Mean target depth | % Target ≥1x | % Target ≥10x | % Target ≥20x |
|---|---|---|---|---|
| ID3866 | ||||
| Replicate 1 | 68x | 96.2 | 91.5 | 86.6 |
| Replicate 2 | 65x | 96.3 | 91.9 | 87.3 |
| Replicate 3 | 71x | 96.4 | 92.3 | 88.3 |
| Mean ± SE | 96.3 ± 0.1 | 91.9 ± 0.4 | 87.4 ± 0.8 | |
| Aggregate | 205x | 97.6 | 95.1 | 93.8 |
| ID4382 | ||||
| Replicate 1 | 69x | 96.5 | 92.1 | 87.8 |
| Replicate 2 | 61x | 96.3 | 91.6 | 85.8 |
| Replicate 3 | 63x | 96.3 | 91.7 | 86.4 |
| Mean ± SE | 96.4 ± 0.1 | 91.8 ± 0.4 | 86.7 ± 1.0 | |
| Aggregate | 193x | 97.7 | 95.3 | 93.9 |
| ID4384 | ||||
| Replicate 1 | 56x | 96.0 | 90.5 | 83.5 |
| Replicate 2 | 68x | 96.3 | 92.0 | 87.4 |
| Replicate 3 | 83x | 96.5 | 92.7 | 89.4 |
| Mean ± SE | 96.3 ± 0.3 | 91.7 ± 1.2 | 86.8 ± 3.0 | |
| Aggregate | 208x | 97.6 | 95.1 | 93.8 |
| ID4385 | ||||
| Replicate 1 | 51x | 96.2 | 90.3 | 83.1 |
| Replicate 2 | 86x | 96.7 | 92.9 | 89.6 |
| Replicate 3 | 61x | 96.3 | 91.6 | 85.7 |
| Mean ± SE | 96.4 ± 0.2 | 91.6 ± 1.3 | 86.1 ± 3.2 | |
| Aggregate | 198x | 97.7 | 95.3 | 93.9 |
| ID4386 | ||||
| Replicate 1 | 64x | 96.4 | 91.6 | 86.8 |
| Replicate 2 | 69x | 96.3 | 91.9 | 87.4 |
| Replicate 3 | 49x | 96.0 | 90.3 | 82.1 |
| Mean ± SE | 96.2 ± 0.2 | 91.2 ± 0.8 | 85.4 ± 2.9 | |
| Aggregate | 182x | 97.6 | 94.9 | 93.5 |
| ID4606 | ||||
| Replicate 1 | 77x | 96.5 | 92.1 | 87.8 |
| Replicate 2 | 48x | 96.0 | 90.1 | 81.2 |
| Replicate 3 | 55x | 96.2 | 90.9 | 83.7 |
| Mean ± SE | 96.2 ± 0.2 | 91.0 ± 1.0 | 84.2 ± 3.3 | |
| Aggregate | 179x | 97.5 | 95.0 | 93.5 |
Fig. 2Estimation of stochastic variability between technical replicate targeted sequencing experiments within the same individual. a Schematic representation of intersection–union test (IUT) on technical replicate data generated independently in triplicate (R1, R2 and R3). The probability density function was generated from technical replicate data of a single individual (ID3866) with least variable input sequence data. The IUT is performed at preset thresholds to test for low stochastic variability (H ) or the alternative hypothesis of high stochastic variability (H ) (b) Area-proportional Euler Venn diagram (eulerAPE v3.0) of targeted bases sequenced in three technical replicates R1, R2 and R3 at ≥20x. The square represents the total targeted bases, and area in white as the total number of targeted bases not sequenced at a given threshold (≥20x). c Area-proportional Euler Venn diagram (eulerAPE v3.0) of targeted bases sequenced in three technical replicates R1, R2 and R3 at ≥1x. The square represents the total targeted bases (not proportional relative to the circles), and area in white as the total number of targeted bases not sequenced at a given threshold (≥1x)
Fig. 3Impact of deep sequencing as estimated by aggregate exome sequence data from replicates in least variable individual. a Area-proportional Euler Venn diagram (eulerAPE v3.0) of targeted bases sequenced by standard exome sequencing (regular) and aggregate exome sequencing. Sequenced data are represented within circles of Venn diagram (black numbers), whereas targeted and missed by exome sequencing is represented by the square (red numbers). Left panel represents targeted bases in megabases (Mb), and right panel represents the results as percentage of total targeted bases. b Distribution analysis of number of consecutive targeted bases recovered by deep sequencing to ≥20x. Left panel is a log-log plot of frequency of consecutive targeted bases recovered. Right panel plots the distribution of total number of bases sequenced as a function of consecutive targeted bases recovered by deep sequencing to ≥20x. Red dashed lines represent 95 % confidence interval of loess predicted to the data (blue line). c UCSC genome browser screen shot example of LMNA exon that illustrates the variability of ≥20x sequencing along the length of the exon. The black arrow and red box highlight a known disease causing mutation (c.16C > T; p.Q6X) that is consistently missed at the ≥20x threshold by all three technical replicates, but addressed by aggregate sequencing. Aggregate data covers the entire exon to ≥20x