| Literature DB >> 26269718 |
Anil Patwardhan1, Jason Harris1, Nan Leng1, Gabor Bartha1, Deanna M Church1, Shujun Luo1, Christian Haudenschild1, Mark Pratt1, Justin Zook2, Marc Salit2, Jeanie Tirch1, Massimo Morra1, Stephen Chervitz1, Ming Li1, Michael Clark1, Sarah Garcia1, Gemma Chandratillake1, Scott Kirk1, Euan Ashley3, Michael Snyder4, Russ Altman5, Carlos Bustamante6, Atul J Butte7, John West1, Richard Chen1.
Abstract
BACKGROUND: Whole exome sequencing is increasingly used for the clinical evaluation of genetic disease, yet the variation of coverage and sensitivity over medically relevant parts of the genome remains poorly understood. Several sequencing-based assays continue to provide coverage that is inadequate for clinical assessment.Entities:
Mesh:
Year: 2015 PMID: 26269718 PMCID: PMC4534066 DOI: 10.1186/s13073-015-0197-4
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1A total of 5,419 genes in the MIG drawn from five data sources. The bulk (98 %) of genes came from HGMD, OMIM, and GTR with additional genes supplemented from COSMIC (67) and PharmGKB (1). Areas of vertical overlap indicate genes common across multiple sources
Fig. 2Coverage efficiency in the medically interpretable genome (MIG). Shown is the cumulative distribution of on-target sequence coverage obtained from sequencing NA12878 across multiple platforms: Personalis Accuracy and Content Enhanced (ACE) Clinical Exome, Agilent SureSelect Clinical Research Exome (SSCR), Agilent SureSelect Human All Exon v5 plus untranslated regions (UTR) (SS), lllumina’s Nextera Exome Enrichment (NX), NimbleGen SeqCap EZ Human Exome Library v3.0 (NG), and 31× whole-genome sequencing (WGS) using an Illumina PCR-free protocol. For clinical applications, we indicate ≥20× as the minimum coverage threshold required (gray line) among all coding (left) and non-coding (right) regions. For reference, insets show an expanded distribution of sequence coverage. ACE and conventional WES data are normalized to 100× mean target coverage
Fig. 3Relationship between the percentages of MIG exons ‘finished’ as the coverage stringency varies. The left graph shows the percentage of MIG exons (y-axis) with ≥90.0-100.0 % of bases covered at ≥20× depth (x-axis) among different platforms using data obtained on NA12878. The right graph shows the percentage of finished exons (y-axis) with 100.0 % base coverage as the local coverage depth varies ≥10-20× (x-axis). At higher coverage stringencies, ACE finishes more exons than other WGS or WES assays in regions defined as the entire exon (solid curves) or only the subset of coding-regions (circles). ACE and conventional WES data are normalized to 100× mean target coverage
Fig. 4Relationship between GC content and the percentages of MIG exons ‘finished’ by platform. Regions with >30-80 % GC content (x-axis) represent 99 % of exons in the MIG. Finishing is determined by 100 % base coverage at ≥20×
Fig. 5Disease-associated variants covered at ≥20× for 56 genes in the ACMG gene list. The x-axis labels indicate the total number of disease-associated SNVs (daSNVs) drawn from HGMD for each ACMG gene; and the y-axis indicates the percentage of those variants covered at ≥20×. For brevity, only the highest obtained percentage (Max over all WES) observed across all conventional WES (SS, SSCR, NX, NG) platforms is shown. Seventeen of the 56 genes failed to have some fraction of their daSNVs covered at ≥20× among any of the conventional WES platforms. On a gene basis, the platforms with the highest to lowest number of genes with constituent daSNVs adequately covered included ACE (51 genes with 100 % daSNVs covered at ≥20×), SSCR (39 genes), NX (36 genes), SS (15 genes), NG (12 genes), and WGS (2 genes). The y-axis is truncated at 95 %, with truncated points labelled accordingly
Accuracy across target regions. Errors, Sensitivity, and FDR for the ACE, WGS, SSCR, SS, NX, and NG platforms based on evaluation of observed variant calls using data normalized to 100× mean coverage (conventional WES and ACE) or 31× WGS. Calculations are based on position and genotype matching to the GIBv2.18 high-confident call-set within the MIG (left), a target region common to all ACE and WES platforms (middle, Common Target File), and a target region aggregated across all ACE and WES specific target files that contain moderate-impact and high-impact loci (right, Union Target File)
| MIG | Common Target File | Union Target File | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TP | FP | FN | %Sens | %FDRa | TP | FP | FN | %Sens | %FDRa | TP | FP | FN | %Sens | %FDRa | ||
| 95%CI | 95%CI | 95%CI | 95%CI | 95%CI | 95 % CI | |||||||||||
| ACE | SNV | 5362 | 5 | 62 | 98.9 | 0.1 | 7133 | 12 | 90 | 98.8 | 0.2 | 7486 | 6 | 191 | 97.5 | 0.1 |
| 98.5-99.1 | <0.1-0.2 | 98.5-99.0 | 0.1-0.3 | (97.1-97.8) | (<0.1-0.2) | |||||||||||
| InDel | 34 | 1 | 2 | 94.4 | 2.9 | 83 | 0 | 0 | 100 | <0.1 | 198 | 3 | 16 | 92.5 | 1.5 | |
| 81.3-99.3 | 0.1-14.9 | 95.7-100 | <0.1-4.3 | (88.1-95.7) | (0.3-4.3) | |||||||||||
| WGSb | SNV | 5309 | 2 | 115 | 97.9 | <0.1 | 7076 | 6 | 147 | 98.0 | 0.1 | 7479 | 2 | 198 | 97.4 | <0.1 |
| 97.5-98.2 | <0.1-0.1 | 97.6-98.3 | <0.1-0.2 | (97–97.8) | (<0.1-0.1) | |||||||||||
| InDel | 33 | 1 | 3 | 91.7 | 2.9 | 78 | 0 | 5 | 94.0 | <0.1 | 197 | 2 | 17 | 92.1 | 1.0 | |
| 77.5-98.2 | 0.1-15.3 | 86.5-98.0 | <0.1-4.6 | (87.6-95.3) | (0.1-3.6) | |||||||||||
| SSCR | SNV | 5341 | 4 | 83 | 98.5 | 0.1 | 7107 | 11 | 116 | 98.4 | 0.2 | 7443 | 4 | 234 | 97.0 | 0.1 |
| 98.1-98.8 | <0.1-0.2 | 98.1-98.7 | 0.1-0.3 | (96.5-97.3) | (<0.1-0.1) | |||||||||||
| InDel | 34 | 2 | 2 | 94.4 | 5.6 | 82 | 0 | 1 | 98.8 | <0.1 | 194 | 4 | 20 | 90.7 | 2 | |
| 81.3-99.3 | 0.7-18.7 | 93.5-100 | <0.1-4.4 | (85.9-94.2) | (0.6-5.1) | |||||||||||
| SS | SNV | 5355 | 2 | 69 | 98.7 | <0.1 | 7126 | 5 | 97 | 98.7 | 0.1 | 7468 | 3 | 209 | 97.3 | <0.1 |
| 98.4-99.0 | <0.1-0.1 | 98.4-98.9 | <0.1-0.2 | (96.9-97.6) | (<0.1-0.1) | |||||||||||
| InDel | 33 | 2 | 3 | 91.7 | 5.7 | 82 | 0 | 1 | 98.8 | <0.1 | 192 | 5 | 22 | 89.7 | 2.5 | |
| 77.5-98.2 | 0.7-19.2 | 93.5-100 | <0.1-4.4 | (84.8-93.4) | (0.8-5.8) | |||||||||||
| NX | SNV | 5240 | 4 | 184 | 96.6 | 0.1 | 7020 | 8 | 203 | 97.2 | 0.1 | 6754 | 10 | 923 | 88.0 | 0.1 |
| 96.1-97.1 | <0.1-0.2 | 96.8-97.6 | <0.1-0.2 | (87.2-88.7) | (0.1-0.3) | |||||||||||
| InDel | 33 | 2 | 3 | 91.7 | 5.7 | 77 | 2 | 6 | 92.8 | 2.5 | 160 | 6 | 54 | 74.8 | 3.6 | |
| 77.5-98.2 | 0.7-19.2 | 84.9-97.3 | 0.3-8.8 | (68.4-80.4) | (1.3-7.7) | |||||||||||
| NG | SNV | 5190 | 31 | 234 | 95.7 | 0.6 | 6900 | 39 | 323 | 95.5 | 0.6 | 7065 | 38 | 612 | 92.0 | 0.5 |
| 95.1-96.2 | 0.4-0.8 | 95.0-96.0 | 0.4-0.8 | (91.4-92.6) | (0.4-0.7) | |||||||||||
| InDel | 31 | 4 | 5 | 86.1 | 11.4 | 74 | 2 | 9 | 89.2 | 2.6 | 168 | 10 | 46 | 78.5 | 5.6 | |
| 70.5-95.3 | 3.2-26.7 | 80.4-94.9 | 0.3-9.2 | (72.4-83.8) | (2.7-10.1) | |||||||||||
FDR false discovery rate, FN false negatives, FP false positives, MIG medically interpretable genome, SENS Sensitivity, TP true positives
aFDR is used in lieu of specificity due to a large skew in the TN, FP class distribution
bIn WGS data, there was no difference in error rates when using either VQSLOD scores or hard-thresholding cutoffs for InDels
Accuracy in high-GC rich regions. Errors, Sensitivity, and FDR for the ACE, WGS, SSCR, SS, NX, and NG platforms based on evaluation of observed variant calls using data normalized to 100× mean coverage (conventional WES and ACE) or 31× WGS. Calculations are based on position and genotype matching to the GIBv2.18 less restrictive call-set within the MIG (left), a target region common to all ACE and WES platforms (middle, Common Target File), and a target region aggregated across all ACE and WES specific target files that contain moderate-impact and high-impact loci (right, Union Target File)
| MIG | Common Target File | Union Target File | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TP | FP | FN | %Sens | %FDRa | TP | FP | FN | %Sens | %FDRa | TP | FP | FN | %Sens | %FDRa | ||
| 95%CI | 95%CI | 95%CI | 95%CI | 95%CI | 95 % CI | |||||||||||
| ACE | SNV | 518 | 0 | 16 | 97.0 | <0.1 | 706 | 1 | 22 | 97.0 | 0.1 | 562 | 2 | 30 | 94.9 | 0.4 |
| 95.2-98.3 | <0.1-0.7 | 95.5-98.1 | <0.1-0.8 | (92.8-96.6) | (<0.1-1.3) | |||||||||||
| InDel | 18 | 1 | 1 | 94.7 | 5.3 | 23 | 0 | 0 | 100 | <0.1 | 37 | 0 | 3 | 92.5 | <0.1 | |
| 74.0-99.9 | 0.1-26.0 | 85.2-100 | <0.1-14.8 | (79.6-98.4) | (<0.1-9.5) | |||||||||||
| WGSb | SNV | 499 | 0 | 35 | 93.4 | <0.1 | 701 | 0 | 27 | 96.3 | <0.1 | 573 | 0 | 19 | 96.8 | <0.1 |
| 91.0-95.4 | <0.1-0.7 | 94.6-97.5 | <0.1-0.5 | (95.0-98.1) | (0–0.6) | |||||||||||
| InDel | 18 | 0 | 1 | 94.7 | <0.1 | 23 | 0 | 0 | 100 | <0.1 | 38 | 0 | 2 | 95.0 | <0.1 | |
| 74.0-99.9 | <0.1-18.5 | 85.2-100 | <0.1-14.8 | (83.1-99.4) | (<0.1-9.3) | |||||||||||
| SSCR | SNV | 504 | 1 | 30 | 94.4 | 0.2 | 684 | 4 | 44 | 94.0 | 0.6 | 545 | 2 | 47 | 92.1 | 0.4 |
| 92.1-96.2 | <0.1-1.1 | 92.0-95.6 | 0.2-1.5 | (89.6-94.1) | (<0.1-1.3) | |||||||||||
| InDel | 17 | 1 | 2 | 89.5 | 5.6 | 21 | 1 | 2 | 91.3 | 4.5 | 37 | 0 | 3 | 92.5 | <0.1 | |
| 66.9-98.7 | 0.1-27.3 | 72.0-98.9 | 0.1-22.8 | (79.6-98.4) | (<0.1-9.5) | |||||||||||
| SS | SNV | 497 | 2 | 37 | 93.1 | 0.4 | 704 | 0 | 24 | 96.7 | <0.1 | 562 | 1 | 30 | 94.9 | 0.2 |
| 90.6-95.1 | <0.1-1.4 | 95.1-97.9 | <0.1-0.5 | (92.8-96.6) | (<0.1-1) | |||||||||||
| InDel | 16 | 2 | 3 | 84.2 | 11.1 | 21 | 0 | 2 | 91.3 | <0.1 | 37 | 0 | 3 | 92.5 | <0.1 | |
| 60.4-96.6 | 1.4-34.7 | 72.0-98.9 | <0.1-16.1 | (79.6-98.4) | (<0.1-9.5) | |||||||||||
| NX | SNV | 465 | 1 | 69 | 87.1 | 0.2 | 650 | 1 | 78 | 89.3 | 0.2 | 484 | 0 | 108 | 81.8 | <0.1 |
| 83.9-89.8 | <0.1-1.2 | 86.8-91.4 | <0.1-0.9 | (78.4-84.8) | (<0.1-0.8) | |||||||||||
| InDel | 19 | 0 | 0 | 100 | <0.1 | 21 | 0 | 2 | 91.3 | <0.1 | 31 | 1 | 9 | 77.5 | 3.1 | |
| 82.4-100 | <0.1-17.6 | 72.0-98.9 | <0.1-16.1 | (61.5-89.2) | (0.1-16.2) | |||||||||||
| NG | SNV | 346 | 6 | 188 | 64.8 | 1.7 | 436 | 14 | 292 | 59.9 | 3.1 | 373 | 10 | 219 | 63.0 | 2.6 |
| 60.6-68.8 | 0.6-3.7 | 56.2-63.5 | 1.7-5.2 | (59.0-66.9) | (1.3-4.7) | |||||||||||
| InDel | 11 | 0 | 8 | 57.9 | <0.1 | 11 | 1 | 12 | 47.8 | 8.3 | 20 | 1 | 20 | 50.0 | 4.8 | |
| 33.5-79.7 | <0.1-28.5 | 26.8-69.4 | 0.2-38.5 | (33.8-66.2) | (0.1-23.8) | |||||||||||
FDR false discovery rate, FN false negatives, FP false positives, MIG medically interpretable genome, SENS Sensitivity, TP true positives aFDR is used in lieu of specificity due to a large skew in the TN, FP class distribution.
aFDR is used in lieu of specificity due to a large skew in the TN, FP class distribution
bIn WGS data, there was no difference in error rates when using either VQSLOD scores or hard-thresholding cutoffs for InDels
Fig. 6Coverage gaps in Retinitis Pigmentosa and Cystic Fibrosis genes are recovered with augmented exome approaches. Chromosomal position (x-axis) is plotted against coverage depth (y-axis) averaged over multiple 1000 Genome samples, with the clinical coverage threshold (≥20×) represented by a horizontal black line. Blue areas represent mean-depth of coverage across coding and non-coding regions using the SS (light blue), and SSCR (dark blue) exomes. Areas in green represent coverage gaps ‘filled in’ by ACE. These include areas with known pathogenic variants in high GC rich areas in the RPGR gene associated with retinitis pigmentosa (a); or non-coding regions of the CFTR gene (b)