| Literature DB >> 24758382 |
Michael D Linderman1, Tracy Brandt, Lisa Edelmann, Omar Jabado, Yumi Kasai, Ruth Kornreich, Milind Mahajan, Hardik Shah, Andrew Kasarskis, Eric E Schadt.
Abstract
BACKGROUND: Whole exome and genome sequencing (WES/WGS) is now routinely offered as a clinical test by a growing number of laboratories. As part of the test design process each laboratory must determine the performance characteristics of the platform, test and informatics pipeline. This report documents one such characterization of WES/WGS.Entities:
Mesh:
Year: 2014 PMID: 24758382 PMCID: PMC4022392 DOI: 10.1186/1755-8794-7-20
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Summary of samples used in validation experiments
| NA12878 | F | Heterozygous | |
| NA12891 | M | Homozygous | Father to NA12878 |
| NA12892 | F | N/A | Mother to NA12878 |
| NA10080 | M | Heterozygous | |
| NA18507 | M | N/A |
Known variants are those reported by Coriell.
Figure 1Schematic of sequencing runs used in WES validation experiments. Up to 4 samples are multiplexed into a single lane high-throughput lane. Cross-hatching indicates the same or different library preparations. Run #4 is an Illumina HiSeq 2500 RapidRun, with each lane treated as a separate replicate. Individual replicates are named as sample/run-machine-slot.
Figure 2Schematic of sequencing runs used in WGS validation experiments. Cross-hatching indicates the same or different library preparations. Individual replicates are named as sample/run-machine-slot.
Description of reference callsets used in analysis
| In-house Omni 2.5 Microarray | Described in methods section |
| 1000G Omni 2.5 Microarray | As distributed in GATK Resource Bundle version 2.3 |
| Hapmap | Version 3.3. As distributed in GATK Resource Bundle version 2.3 |
| Genome in a Bottle | Version 2.17, dated Oct. 17 2013. Most restrictive “high confidence” intervals excluding simple repeats, segmental duplications, decoys, STRs, and known CNVs. |
| Autism(ASD) Panel | 129 kilobase targeted clinical sequencing panel of genes related to austism spectrum disorder (ASD). Indels are left aligned. |
Figure 3Precision vs. non-reference concordance (NRC) for WES (A) and WGS (B) SNVs (solid line) and Indels (dashed line) as a function of VQSR VQSLoD score relative to GIAB high confidence and all variant sets. Thick line is mean across all replicates; shaded region shows the standard deviation. Points show the PR at the VQSR PASSing threshold. VQSR is not applied in WES; points show PR for PASSing and all variants (both PASS and not-PASS).
Figure 4Different genotype concordance metrics. ‘B’ is the alternate allele. Each metric is calculated as the ratio of red elements to blue-outline elements. Non-reference genotype concordance (NRC) is genotype-aware sensitivity/recall.
Summary of validation experiments performed
| Measure concordance with high-density SNP arrays | |
| Measure concordance with Genome In A Bottle (GIAB) NA12878 callset | |
| Measure concordance with calls from a targeted NGS panel (all calls previously validated either by in-house Sanger assays or the presence of the variant in the sample in Hapmap, 1000G, etc.) | |
| Measure concordance for the same sample sequenced in the same run with: | |
| A) Different sample preparations across the same flow cell, or | |
| B) The same sample preparation across different flow cells | |
| Measure concordance for the same sample on the same machine with: | |
| A) The same sample preparation different runs, or | |
| B) Different sample preparation across different runs | |
| Measure concordance for the same sample with the same sample preparation on different machines of the same model in the same run cycle | |
| Measure concordance between high-throughput and rapid run Illumina modes |
WES and WGS samples used in different concordance experiments
| NA12878: r1-1-1 vs. r1-1-2 | NA12878: r1-1-1 vs. r1-1-3 | |
| NA18507: r2-1-2 vs. r2-1-3 | NA18507: r2-1-3 vs. r2-1-4 | |
| NA18507: r2-1-2 vs. r3-1-2 | NA18507: r2-1-3 vs. r3-1-3 | |
| NA18507: r2-1-3 vs. r3-1-2 | NA18507: r2-1-4 vs. r3-1-3 | |
| NA12878: r1-1-1 vs. r2-1-1 vs. r3-1-1 | NA12878: r1-1-1 vs. r2-1-1 vs. r3-1-1 | |
| | NA12878: r2-2-1 vs. r3-2-1 | |
| NA12878: r3-1-1 vs. r3-2-1 | NA12878: r3-1-1 vs. r3-2-1 | |
| NA12878: r4-1-1 vs. r4-2-1 | NA12878: r2-1-1 vs. r2-2-1 | |
| NA18507: r4-1-2 vs. r4-2-2 | | |
| NA12878, NA18507, NA12891, NA12892 r4-*-* versus all others | N/A |
Summary of WES coverage statistics
| NA12878 | ||||||
| r1-1-1 | 57.4 | 96.8 | 93.6 | 89.2 | 79.3 | 83.8 |
| r1-1-2 | 57.4 | 96.8 | 93.6 | 89.1 | 79.3 | 83.8 |
| r2-1-1 | 66.8 | 96.2 | 92.0 | 85.4 | 75.2 | 79.7 |
| r3-1-1 | 81.8 | 96.7 | 93.6 | 90.7 | 85.1 | 84.8 |
| r3-2-1 | 74.5 | 96.7 | 93.6 | 90.8 | 85.0 | 85.1 |
| r4-1-1 | 83.7 | 96.7 | 93.7 | 90.9 | 85.5 | 85.0 |
| r4-2-1 | 73.6 | 96.6 | 93.2 | 89.1 | 81.5 | 83.1 |
| NA12891 | ||||||
| r3-1-3 | 70.9 | 96.5 | 92.4 | 86.0 | 75.8 | 80.1 |
| r4-1-3 | 70.8 | 96.6 | 92.5 | 85.9 | 75.7 | 80.2 |
| r4-2-3 | 60.9 | 96.3 | 91.4 | 82.2 | 69.8 | 76.5 |
| NA12892 | ||||||
| r3-1-4 | 70.7 | 96.5 | 92.7 | 87.5 | 78.7 | 82.5 |
| r4-1-4 | 71.4 | 96.6 | 92.8 | 87.7 | 79 | 82.7 |
| r4-2-4 | 61.7 | 96.4 | 91.9 | 84.5 | 73.2 | 79.7 |
| NA18507 | ||||||
| r2-1-2 | 66.4 | 96.4 | 92.1 | 84.7 | 73.8 | 78.7 |
| r2-1-3 | 92.2 | 96.7 | 93.6 | 90.9 | 86.2 | 84.9 |
| r3-1-2 | 81.2 | 96.7 | 93.2 | 88.9 | 81.3 | 82.9 |
| r4-1-2 | 82.8 | 96.7 | 93.3 | 89.2 | 81.7 | 83.2 |
| r4-2-2 | 71.5 | 96.5 | 92.6 | 86.4 | 76.6 | 80.4 |
| NA10080 | ||||||
| r2-1-4 | 60.7 | 96.4 | 92.1 | 84.5 | 73.2 | 78.4 |
Percent RefSeq Coding Bases Callable reports the percentage of all RefSeq coding exons bases (as downloaded from the UCSC Genome Browser) considered callable. The baseline value for the capture targets is 95.5%.
Summary of WGS coverage statistics
| NA12878 | ||||||
| r1-1-1 | 51.0 | 95.4 | 94.9 | 94.3 | 93.1 | 91.8 |
| r1-1-3 | 49.8 | 95.4 | 94.9 | 94.3 | 93.1 | 91.9 |
| r2-1-1 | 36.5 | 95.4 | 94.7 | 93.5 | 86.9 | 91.6 |
| r2-2-1 | 34.6 | 95.4 | 94.7 | 93.2 | 82.9 | 91.2 |
| r3-1-1 | 49.9 | 95.5 | 95.0 | 94.5 | 93.5 | 91.9 |
| r3-2-1 | 39.8 | 95.5 | 94.8 | 93.9 | 89.4 | 91.0 |
| NA12891 | ||||||
| r3-1-2 | 42.7 | 96.8 | 96.1 | 94.3 | 88.7 | 91.2 |
| NA12892 | ||||||
| r3-1-4 | 45.1 | 95.2 | 90.8 | 86.6 | 81.3 | 90.9 |
| NA18507 | ||||||
| r2-1-3 | 30.0 | 96.8 | 95.7 | 89.4 | 65.8 | 87.3 |
| r2-1-4 | 35.8 | 96.8 | 96.0 | 92.3 | 83.4 | 90.2 |
| r3-1-3 | 51.7 | 96.9 | 96.3 | 95.4 | 91.6 | 91.6 |
| NA10080 | ||||||
| r2-1-2 | 35.1 | 96.9 | 96.1 | 92.0 | 81.7 | 89 |
Percent RefSeq Coding Bases Callable reports the percentage of all RefSeq coding exons bases (as downloaded from the UCSC Genome Browser) considered callable.
Figure 5WES and WGS genotype concordance (concordance), non-reference sensitivity (NRS) and non-reference concordance (NRC) relative to three different SNP microarray genotypes: 1) an Illumina Omni2.5 genotype performed in-house, 2) an Omni2.5 genotype performed as part of the 1000 Genomes project, and 3) Hapmap 3.3 genotype. Not all genotypes are available for all samples. Only those SNPs within the exome capture targets are considered for WES concordance. The GAP does not report non-variant sites, so homozygous reference calls are not considered in the concordance evaluation.
Site-level sensitivity (NRS), specificity and genotype-level NRC relative to Sanger validated and or Hapmap or 1000G-reported variants in 129 Kb of target from a clinical gene panel (the ASD panel callset)
| NA12878 Exome | ||||||||
| r1-1-1 | 48 | 40 | 5 | 3 | 0 | 0 | 93.0% | 81.4% |
| r1-1-2 | 46 | 40 | 2 | 3 | 1 | 0 | 93.0% | 83.7% |
| r2-1-1 | 45 | 37 | 1 | 6 | 1 | 0 | 86.0% | 74.4% |
| r3-1-1 | 44 | 40 | 0 | 3 | 1 | 0 | 93.0% | 83.7% |
| r3-2-1 | 44 | 40 | 0 | 3 | 1 | 0 | 93.0% | 83.7% |
| r4-1-1 | 44 | 39 | 0 | 4 | 1 | 0 | 90.7% | 79.1% |
| r4-2-1 | 44 | 39 | 0 | 4 | 1 | 0 | 90.7% | 81.4% |
| NA12878 Genome | ||||||||
| r1-1-1 | 64 | 54 | 1 | 8 | 1 | 1 | 87.3% | 81.0% |
| r1-1-3 | 74 | 53 | 11 | 9 | 1 | 1 | 85.7% | 79.4% |
| r2-1-1 | 73 | 53 | 10 | 9 | 1 | 1 | 85.7% | 77.8% |
| r2-2-1 | 73 | 54 | 10 | 8 | 1 | 1 | 87.3% | 81.0% |
| r3-1-1 | 71 | 55 | 8 | 7 | 1 | 1 | 88.9% | 81.0% |
| r3-2-1 | 64 | 52 | 1 | 10 | 1 | 1 | 84.1% | 74.6% |
Not all variants in this interval were validated, e.g. intronic variants, so we cannot conclusively determine if a variant is a false positive. Instead we report “excess positives”, variants discovered in the NGS replicates that were not Sanger confirmed or conclusively reported elsewhere.
False negative variants in NA12878 relative to the ASD panel callset
| chr12:2614070 | G > T | G/T | 6/7 | Yes (silent) | Called, but filtered in one replicate |
| chrX:15863648 | GA > G | GA/G | 0/7 | No | Not called in any replicate; 10 bp homopolymer |
| chrX:135115669 | GA > G | GA/G | 0/7 | No | Called, but filtered in two replicates; 11 bp homopolymer |
| chrX:152954025 | A > G | G/G | 6/7 | Yes (UTR) | Low depth region |
| chrX:153287314 | TG > T | TG/T | 0/7 | Yes (UTR) | Called, but filtered in all replicates for QD; 10 bp homopolymer |
| Genome | |||||
| chr5:176639217 | TA > T | TA/T | 0/6 | No | Not called in any replicate; 13 bp homopolymer |
| chr7:146805220 | AT > A | AT/A | 0/6 | No | Not called in any replicate; 11 bp homopolymer |
| chr10:89720633 | CT > C | CT/C | 0/6 | No | Not called in any replicate; 15 bp homopolymer |
| chr10:89720907 | T > G | T/G | 0/6 | No | Called, but filtered in all replicates |
| chr11:70348852 | G > CG | CG/CG | 0/6 | No | Called, but as heterozygous in all replicates; inside 12 bp homopolymer |
| chrX:15863648 | GA > G | GA/G | 0/6 | No | Not called in any replicate; 10 bp homopolymer |
| chrX:132888207 | TA > T | TA/T | 3/6 | No | Not called in three replicates; 16 bp homopolymer |
| chrX:135067675 | G > C | G/C | 5/6 | Yes (missense) | Not called in one replicate |
| chrX:135115669 | GA > G | GA/G | 1/6 | No | Called in one replicate; 11 bp homopolymer |
| chrX:153287314 | TG > T | TG/T | 1/6 | Yes (UTR) | Called in one replicate; 10 bp homopolymer |
| chrX:153357614 | TA > T | TA/T | 1/6 | No | Called in one replicate; 13 bp homopolymer |
Most common error mode is single base deletions in homopolymer regions. Each variant is annotated as to whether it would be further reviewed in interpretations workflow (Yes) or automatically filtered out from further consideration (No).
Figure 6WES and WGS genotype concordance (concordance), non-reference sensitivity (NRS) and non-reference concordance (NRC) for all pairwise comparisons of the technical replicates, differentiated by the kind of comparison: intra-run, inter-run, inter-machine and inter-mode. Those comparisons marked as “other” fit into multiple categories, including inter-library. Each replicate is alternately treated as both “truth” and “test”.
Percentage of variants, by type, that are called identically at the site level across different subsets of replicates
| SNV | 3.20 | 1.37 | 1.18 | 1.28 | 1.42 | 2.69 | 88.85 |
| Indel | 10.25 | 4.60 | 2.62 | 2.33 | 2.51 | 3.71 | 73.99 |
| NA18507 WES | 1/5 | 2/5 | 3/5 | 4/5 | 5/5 | | |
| SNV | 2.87 | 1.60 | 1.47 | 2.62 | 91.43 | | |
| Indel | 7.61 | 3.57 | 2.96 | 3.70 | 82.17 | | |
| NA12878 WGS | 1/6 | 2/6 | 3/6 | 4/6 | 5/6 | 6/6 | |
| SNV | 3.40 | 1.76 | 1.52 | 1.79 | 3.69 | 87.80 | |
| Indel | 20.20 | 9.89 | 6.11 | 5.17 | 7.07 | 51.60 | |
| NA18507 WGS | 1/3 | 2/3 | 3/3 | | | | |
| SNV | 2.61 | 2.93 | 94.46 | | | | |
| Indel | 20.01 | 13.10 | 66.89 |
N/N indicates the variant site, but not necessarily genotype, was identified in all replicates.