Sek Won Kong1,2, In-Hee Lee3,4, Xuanshi Liu3,4, Joel N Hirschhorn4,5, Kenneth D Mandl3,4,6. 1. Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA. Sekwon.Kong@childrens.harvard.edu. 2. Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA. Sekwon.Kong@childrens.harvard.edu. 3. Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA. 4. Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA. 5. Broad Institute, Cambridge, Massachusetts, USA. 6. Department of Biomedical Informatics, Harvard Medical School, Boson, Massachusetts, USA.
Abstract
PURPOSE: To evaluate the coverage and accuracy of whole-exome sequencing (WES) across vendors. METHODS: Blood samples from three trios underwent WES at three vendors. Relative performance of the three WES services was measured for breadth and depth of coverage. The false-negative rates (FNRs) were estimated using the segregation pattern within each trio. RESULTS: Mean depth of coverage for all genes was 189.0, 124.9, and 38.3 for the three vendor services. Fifty-five of the American College of Medical Genetics and Genomics 56 genes, but only 56 of 63 pharmacogenes, were 100% covered at 10 × in at least one of the nine individuals for all vendors; however, there was substantial interindividual variability. For the two vendors with mean depth of coverage >120 ×, analytic positive predictive values (aPPVs) exceeded 99.1% for single-nucleotide variants and homozygous indels, and sensitivities were 98.9-99.9%; however, heterozygous indels showed lower accuracy and sensitivity. Among the trios, FNRs in the offspring were 0.07-0.62% at well-covered variants concordantly called in both parents. CONCLUSION: The current standard of 120 × coverage for clinical WES may be insufficient for consistent breadth of coverage across the exome. Ordering clinicians and researchers would benefit from vendors' reports that estimate sensitivity and aPPV, including depth of coverage across the exome.
PURPOSE: To evaluate the coverage and accuracy of whole-exome sequencing (WES) across vendors. METHODS: Blood samples from three trios underwent WES at three vendors. Relative performance of the three WES services was measured for breadth and depth of coverage. The false-negative rates (FNRs) were estimated using the segregation pattern within each trio. RESULTS: Mean depth of coverage for all genes was 189.0, 124.9, and 38.3 for the three vendor services. Fifty-five of the American College of Medical Genetics and Genomics 56 genes, but only 56 of 63 pharmacogenes, were 100% covered at 10 × in at least one of the nine individuals for all vendors; however, there was substantial interindividual variability. For the two vendors with mean depth of coverage >120 ×, analytic positive predictive values (aPPVs) exceeded 99.1% for single-nucleotide variants and homozygous indels, and sensitivities were 98.9-99.9%; however, heterozygous indels showed lower accuracy and sensitivity. Among the trios, FNRs in the offspring were 0.07-0.62% at well-covered variants concordantly called in both parents. CONCLUSION: The current standard of 120 × coverage for clinical WES may be insufficient for consistent breadth of coverage across the exome. Ordering clinicians and researchers would benefit from vendors' reports that estimate sensitivity and aPPV, including depth of coverage across the exome.
Entities:
Keywords:
breadth of coverage; depth of coverage; false negatives; pharmacogenomics; whole-exome sequencing
Next generation sequencing, widely used by researchers is entering clinical care as a diagnostic test, but methods and quality vary across vendors. Customers should be fully informed of a test’s accuracy in detecting coding variants, both to drive selection of high quality vendors, and to be aware of the likelihood of false positives and negatives.[1,2] The accuracy of variant detection in coding regions is lower for whole-exome sequencing (WES) than whole-genome sequencing (WGS), even at equivalent coverage.[3,4] Nonetheless, WES is widely used due to lower cost and because most disease-associated genomic variants discovered thus are in coding regions and splice sites of protein coding genes.WES is performed as a series of biochemical and computational analytic procedures, varying from vendor to vendor, that influence exome coverage and genotype accuracy. Factors contributing to variation include: (1) quality of gDNA,[5,6] (2) DNA extraction methods,[7,8] (3) sequence library preparation including exome capture[9] and PCR amplification,[10] (4) the sequencing platform,[11,12] (5) short read-length and depth of coverage,[12,13] (6) computational analytical pipeline,[14] (7) sequence contexts such as GC-contents and simple repetitive DNA sequences[11,15] and (8) the type of variant (single nucleotide variant (SNV), insertion/deletion (indels), and more complex variants).[16] Re-analysis of raw sequence reads using a standardized software pipeline can improve comparability between WES results,[17] but cannot overcome the differences in targeted intervals, capture efficiency and sequencing chemistry.[18]We systematically compared the WES results from three vendors selected on the basis of varying exome capture methods with differing read-lengths and mean depths of coverage. We reprocessed raw reads using a single standard analytical pipeline to minimize variability due to bioinformatics pipelines among the vendors. For each gene, we focused on breadth of coverage at a minimum depth of ten high-quality aligned reads. For each individual, concordant and discordant calls among the vendors were analyzed for genotype quality (GQ), depth of coverage (DP) and presence of reported minor allele frequency (MAF) to prioritize likely true-positive variants using all variant calls from the three vendors. Finally, we calculated the analytical positive predictive value (aPPV) and sensitivity for each vendor using a likely true-positive set, and estimated a lower bound for the false-negative rate (FNR) in each offspring among the three trios.
MATERIALS AND METHODS
Samples and whole exome sequencing
Blood-derived DNA samples from three trios – hereafter referred to as trios A, B and C – were collected and aliquoted per each vendor’s specification. Exome sequencing and subsequent use for research was approved by the Boston Children’s Hospital Committee on Clinical Investigation. We selected three regional vendors providing the Clinical Laboratory Improvement Amendments certified clinical sequencing service – denoted as V1, V2, and V3 – all using a four-channel sequencing by synthesis technology platform (Illumina HiSeq 2500 for V1 and V2, and HiSeq 4000 for V3). Three different whole exome capture methods were used: the Illumina Nextera Rapid capture (Nextera, V1), Agilent SureSelectXT (SureSelectXT, V2), and NimbleGen SeqCapEZ-MedExome (MedExome, V3). Read-lengths were 76, 150, and 101 bps for V1, V2 and V3, respectively.
Comparison of capture targets, variant calling and annotation
Comparison of exome capture target regions, calculation of coverage in target regions, and variant calling pipeline are described in Supplementary Methods. All VCF files were annotated using ENSEMBL Variant Effect Predictor (VEP) release 86.[19] Rare and high impact variants (RHI) were defined in two ways. First as MAF <0.005 in any population from the 1000 Genomes Project[20] or Exome Aggregation Consortium (ExAC).[21] Second as predicted high impact by VEP resulting in frameshift, transcript-ablation, transcript-amplification, splice-acceptor, splice-donor, start-lost, stop-lost and stop-gain.We compiled a list of 6,367 putative disease-associated genes from the following databases: Human Gene Mutation Database (Professional 2016.02), Online Mendelian Inheritance in Men,[22] ClinVar,[23] Genetic Testing Registry,[24] Cancer Gene Census,[25] and OrphaNet.[26] All online databases were accessed on Oct. 27th, 2016 (Table S1). The nuclear genes implicated in clinical drug response and metabolism were collected from the Pharmacogenomics Knowledge Base Very Important Pharmacogenes excluding one mitochondrial gene MT-RNR1 (PGx-VIPs, N=63)[27]. Discordant variants in the American College of Medical Genetics (ACMG) 56 genes[28] and PGx-VIPs were further inspected using Integrated Genome Viewer (version 2.3.79).
Analytical positive predictive value and sensitivity
To evaluate the performance of variant calls in detecting the true genotype for each vendor, we defined a ‘likely true-positive’ set of variants among all unique variants identified by V1 and V2. In an individual, we compared variant calls between V1 and V2 in the genomic regions covered with ≥10x in all three vendors (3COV) and used V3 as a tiebreaker. First, a concordant variant between V1 and V2 was considered as ‘likely true-positive.’ Second, a discordant variant between V1 and V2 was labeled as ‘likely true-positive’ if V3 called the same variant. Finally, a discordant variant between V1 and V2 for which V3 failed to call was considered as ‘likely true-positive’ if GQ≥20 and MAF>0 was reported in ExAC. Then we calculated aPPV – the proportion of likely true-positive variants out of total variants identified by the vendor – and sensitivity – the proportion of likely true-positive variants identified by the vendor out of total likely true-positive variants. FNR was calculated as 1–sensitivity per exome.
False positives and false negatives among trios
We analyzed the segregation pattern in autosomes in offspring for the loci where both parents had DP≥10x in each trio. We analyzed all variants in overlapping 3COV regions of both parents where one parent was consistently called as heterozygous and the other as reference concordant homozygous in all three vendors. We then used the variant call in the offspring to estimate FNRs in these loci (see Supplementary Methods).For all summary statistics, mean values across nine individuals are shown in the Results. Standard deviations and the other descriptive statistical scores are detailed in Tables and Supplementary Tables.
RESULTS
Depth and breadth of coverage for target regions
Compared to previous generation technologies, current hybridization capture based methods have superior coverage and probe design.[29] However, none of the three capture methods was designed to cover 100% of coding exons in the current version of Consensus Coding Sequence (CCDS).[30] Nonetheless, 99.97, 99.85 and 99.67% of CCDS overlapped the target regions of Nextera (V1), SureSelectXT (V2) and MedExome (V3), respectively. All three methods targeted ≥ 99.8% of coding exons of putative disease-associated genes (N=6,367)(Table S2).Mean depth of coverage across all CCDS genes was 189.0 (V1), 124.9 (V2), and 38.3 (V3), although there was a wide range of variation across genes (Figure S1). We analyzed coverage in more detail only for V1 and V2, because V3 did not meet the current standards typically used for WES. For each CCDS gene, we calculated the percent of exonic bases covered at ≥10x that provides 95% sensitivity for heterozygous SNVs (Table S3).[4] The proportion of genes covered 100% at ≥10x was 80.1% (V1) and 79.1% (V2) on average (Table S4).For the ACMG 56 genes and PGx-VIPs, we compared the list of finished genes – i.e., 100% covered at ≥10x – in at least one of nine individuals for each vendor. Except for RYR1, RYR2 and TGFBR1, 53 genes were finished in at least one of the nine individuals by V1. Likewise, V2 had incomplete coverage for PKP2, RB1 and SDHD (Figure 1A). From the PGx-VIPs (N=63), 61 and 60 genes were finished by V1 and V2 respectively (Figure 1B). The breadth of coverage for both the ACMG 56 genes and PGx-VIPs was higher than exome-wide averages at all thresholds (Wilcoxon signed-rank tests p-values < 0.01, Table 1). For CCDS genes, V1 had consistently higher breadth of coverage compared to V2. However, no differences were found for the ACMG 56 genes and PGx-VIPs, except for PGx-VIPs at ≥10x (Wilcoxon signed-rank test p-value < 0.01). Of note, MYBPC3 and TNNI3 were not completely targeted by SureSelectXT, but showed 100% coverage in one or more individuals. SDHD was not well-covered by V2 (80.3% on average at ≥10x).
Figure 1
Variability in breadth of coverage for the American College of Medical Genetics (ACMG) 56 genes and 63 pharmacogenes among the nine individuals
The percentages of coding sequence bases covered with per-site read depth ≥10x are shown for each of ACMG 56 genes (A) and 63 genes from the Pharmacogenomics Knowledge Base Very Important Pharmacogenes (PGx-VIPs) (B). Of 63 pharmacogenes, the 12 clinically actionable genes per the Clinical Pharmacogenetics Implementation Consortium guidelines are highlighted with blue background and bold in their symbols. Each row represents a gene and columns are grouped by the nine individuals across the three vendors. Green squares represent finished genes (i.e., 100% covered at ≥10x), and yellow (95 – 99%) and red (<95%) squares represent lower breadths of coverage at ≥10x. Genes with significantly different breadths of coverage between V1 and V2 are marked with * and † (Wilcoxon signed-rank tests, Bonferroni corrected p-values < 0.05). * represents better breadth of coverage in V1 and † shows better ones in V2.
Table 1
Comparison of covered regions between two vendors
For each vendor, mean and standard deviation (SD) for percentage of coding sequence bases covered at different thresholds are calculated from the set of nine individuals. Mean depth of coverage is shown next to vendor identifier in top row. Consensus Coding Sequence genes (N = 18,616), putative disease-associated genes (N = 6,367) from disease gene databases, the American College of Medical Genetics (ACMG) 56 genes, and the Pharmacogenomics Knowledge Base Very Important Pharmacogenes (PGx-VIPs, N = 63) are listed in Table S1. Sum of coding sequence bases for each region of interests is shown as total bases.
We observed a wide range in the breadth of coverage across the nine individuals. As such, only 47 (V1) and 41 (V2) of the ACMG 56 genes were finished in all nine individuals (Figure 1A). Twelve of the PGx-VIPs with the Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines (CPIC genes)[31] were finished by both V1 and V2 in at least one of the nine individuals (Figure 1B). Percent coverage of the PGx-VIPs also varied across individuals; 9 (V1) and 8 (V2) of the 12 CPIC genes were finished in all nine individuals. For the ACMG 56 genes, V1 had significantly better breadths of coverage for PKP2, RB1, and SDHD, and V2 had advantages for DSG2, RYR1, RYR2, SCN5A, and TGFBR1 (Mann-Whitney-Wilcoxon tests, Bonferroni corrected p-values <0.05). Seven out of 63 PGx-VIPs showed significantly different breadth of coverage between V1 and V2 (Figure 1B).We examined finished genes at ≥20x coverage that could provide 99% sensitivity for heterozygous SNVs.[4] At this threshold, 40 (V1) and 26 (V2) of the ACMG 56, and 9 (V1) and 5 (V2) of the 12 CPIC genes were finished in all nine suggesting incomplete breadth of coverage even with higher mean depth of coverage of V1 and V2 (Figure S2). At ≥20x, variability in breadth of coverage across nine individuals was significantly higher than at ≥10x (Figure S3). Breadth of coverage for four genes (ATP7B, BMPR1A, OTC and SMAD4) recently added to the ACMG gene list for reporting secondary findings[32] are shown in Figure S4.
Concordant and discordant variant calls among the vendors in clinically implicated genes
For each variant type, we checked agreement of variant calls among the three vendors, restricting the analysis to genomic regions with ≥10x in all three vendors to minimize the effects of the lower coverage in V3 (Table 2). We also compared concordance rates between V1 and V2, to further minimize the effect of low coverage in V3. As expected, restricting the analysis to genomic regions with ≥10x coverage increased concordance rates for all types of variants (Tables 2
and
S5). Of note, heterozygous indels showed the lowest concordance rates at any thresholds compared to the other types of variants.
Table 2
Concordant variant calls among the vendors
Overall concordance rates of called variants are significantly different across variant types. For each variant type, we calculate the proportion of concordant variants among the union of variants called by the three vendors. A total of 58,870 unique variants were called including 682 multi-allelic variants. Mean and range across the nine individuals are shown for the number of union and concordant variants, and mean and standard deviation are shown for concordance rates. ‘V1-V2-V3’ refers to results from variants covered with ≥10x in all three vendors, and ‘V1-V2’ refer to results from variants covered with ≥10x or 20x by both V1 and V2. Concordant variants in ‘V1-V2-V3’ include the ones with completely agreed by the three vendors.
Heterozygous variants
Homozygous variants
SNVs
INDELs
SNVs
INDELs
V1-V2-V3(≥10x)
Union of variants
18,702.4[17,085–21,939]
1,961.4[1,740–2,139]
9,945.3[9,647–10,270]
607.7[527–695]
Concordant variants
17,478.1[15,923–20,561]
1,051.7[941–1,187]
9,839.6[9,559–10,121]
501.0[448–556]
Concordance (%)
93.4 ± 0.85
53.7± 3.34
98.9 ± 0.34
82.6 ± 2.19
V1-V2(≥10x)
Union of variants
19,369.8[17,598–22,848]
2,298.4[2,188–2,442]
10,397.9[10,086–10,793]
697.9[646–749]
Concordant variants
18,356.2[16,673–21,722]
1,426.9[1,261–1,627]
10,304.0[10,008–10,628]
580.2[536–618]
Concordance (%)
94.7 ± 0.68
62.0 ± 2.99
99.1 ± 0.26
83.2 ± 1.10
V1-V2(≥20x)
Union of variants
18,488.9[16,755–21,840]
1,888.1[1,786–2,013]
9,923.4[9,637–10,229]
593.9[546–637]
Concordant variants
17,649.0[15,994–20,929]
1,234.8[1,108–1,417]
9,856.4[9,566–10,129]
515.7[481–552]
Concordance (%)
95.4 ± 0.71
67.7 ± 3.12
99.5 ± 0.09
89.0 ± 1.19
Concordance rates for the variants in clinically implicated genes such as putative disease-associated genes, the ACMG 56 and PGx-VIPs were consistently higher compared to those of exome-wide averages. When calculating the concordance rate, we excluded all variants in the major histocompatibility complex region (HMC, chr6:28,477,798–33,448,354). In 6,367 putative disease-associated genes, concordance rates between V1 and V2 with ≥10x were consistently higher – 96.8 (heterozygous SNVs), 99.5 (homozygous SNVs), 63.3 (heterozygous indels), and 85.2% (homozygous indels). Compared to the concordance rates calculated with VCF files generated using a single software pipeline in our study, concordance rates with vendor-provided VCF files were consistently lower (Table S5).For the ACMG 56 genes, 98.3% of heterozygous and 100% of homozygous SNVs were concordant between V1 and V2. Discordant variants in the ACMG 56 were found in APOB, KCNH2, PKP2, PMS2, SCN5A, and TSC2 (Table S6). No homozygous indel was found in the ACMG 56 genes across the nine individuals, and only three of eight heterozygous indels was concordant between V1 and V2. For the PGx-VIPs, all eight homozygous indels and eight of 21 heterozygous indels were concordant between V1 and V2. Concordance rates were 94.6 and 99.6% for heterozygous and homozygous SNVs, respectively. Overall, concordance rate among the ACMG 56 and PGx-VIPs was higher than the exome-wide average except for indels.Rare genetic variants with deleterious impacts on protein function are often prioritized in WES for further evaluation and validation if such variants are found in putative disease-associated genes. To minimize false-negatives for this class of variants, we used all RHI variants discovered by V1 and V2 without restricting to ≥10x covered regions, and selected RHI variants in putative disease-associated genes. Concordance rate for heterozygous RHI SNVs was 90.4%, and all five homozygous RHI SNVs found were concordant. All RHI SNVs were concordant for genomic regions covered with ≥10x in both V1 and V2. There were no homozygous RHI indels, but only 11 of 46 heterozygous RHI indels from the nine individuals were concordant. We visually inspected read alignments in the regions surrounding discordant RHI variants in Integrated Genome Viewer. The discordant calls were associated with 1) low depth of coverage, 2) allelic imbalance, 3) strand bias, 4) read alignments suggesting structural variation and indels, and 5) homopolymers (Table S6). Next, we examined potential false-positives among concordant variant calls between V1 and V2. We restricted variant calls to heterozygous RHI variants discovered in putative disease-genes (excluding MHC), which resulted in 30 SNVs and 3 indels in three probands. None of these were likely false-positives according to segregation pattern – i.e., they were all inherited from only one of the two parents. Therefore, the proportion of likely false-positives among concordant calls between V1 and V2 would be very low.We found two discordant variants that were reported as pathogenic in ClinVar. Pathogenic heterozygous SNVs – one in FCGR1A and the other in VPS13B – were called by V1, but V2 failed to call the variants because of low coverage for both loci. (Table S6).
Analytical positive predictive value and sensitivity of variant calls
Discordant calls are due to either false-positive calls in one or more vendors, or a failure to detect a variant in one or more vendors. To further characterize discordant variant calls, we examined GQ, DP and reported MAFs > 0 in ExAC for these. Among the discordant SNVs between V1 and V2 (5.8% of total SNVs), 3.8% had low GQ (<20) in one of two vendors. For the discordant SNVs with good GQ (≥20), 0.1% had low DP (<10) in one of two vendors. Overall, 2.0% of SNVs that were discordant between V1 and V2 but had good genotype quality with ≥10x depth of coverage, of which 1.2% had reported MAFs>0 in the ExAC server (Figure 2A
and
Table S7).
Figure 2
Analytical positive predictive value and sensitivity of variant calls from each vendor
As shown in (A), bi-allelic SNVs concordant and discordant between V1 and V2 were sequentially checked for genotype quality (GQ) score, per-site depth of coverage (DP) and reported minor allele frequency (MAF) in the Exome Aggregation Consortium (ExAC) server. The number of concordant and discordant SNVs is shown, as is the average number and standard deviation of variants meeting each criterion across nine individuals. In parentheses, the same statistics are expressed as a percentage of the total number of variants seen in either vendor. The scheme for calculating analytical positive predictive value (aPPV) and sensitivity is shown in (B). In each individual, a likely true-positive set of variants is compiled by aggregating all unique variants seen in both V1 and V2 (blue rectangle) that had GQ ≥20, DP≥10 and (for vendor-specific variants only) MAF >0 in ExAC. All variants discovered by a vendor are used to calculate aPPV (the green shaded squares divided by the red rectangle) and sensitivity (the proportion of the green shaded squares divided by the blue rectangle). The mean and standard deviation across nine individuals for aPPV and sensitivity are shown in (C), for each variant type and vendor.
TP: true positive, FP: false positive, and FN: false negative.
We considered a minimal set of likely true-positive calls to be with a MAF reported in ExAC, GQ≥20 and DP≥10. GQ and DP are often used to filter out variants with erroneous variant calls[33]; however, applying these filtering criteria could result in false-negatives in WES. The proportion of retained variants among 3CON varied by variant type and among the vendors (Figure S5). In V1, 99.6% of heterozygous SNVs passed both DP≥10 and GQ≥20 thresholds. Using the same criteria, 99.3% of heterozygous SNVs retained for V2 (Table S8).To calculate the sensitivity and aPPV of each vendor’s result, we selected likely true-positive calls from aggregated variant calls from V1 and V2. The blue rectangle in Figure 2B represents the likely true-positive set from V1 and V2, and all variant calls from a given vendor are represented by the red rectangle. For both vendors, aPPVs and sensitivities were 99% or higher except for heterozygous indels. V1 showed higher sensitivities for heterozygous indels but lower aPPVs compared to V2 (Figure 2C). Of note, V2 had lower sensitivity for heterozygous indels compared to the other types of variant. V1 showed the lowest aPPV for heterozygous indels compared to the other types of variant and also compared to V2. Thus, accurate detection of heterozygous indels would be challenging for both V1 and V2. Our estimation of sensitivity represents an upper bound since true-positive variants might have not been captured and/or called by any of two vendors, and variant calls that did not pass DP and GQ thresholds could also include true-positive variants.We took an advantage of the family-based trio design to calculate FNRs in regions of high coverage (Figure 3), using the variant calls from all three vendors, but restricting the analysis to genomic regions covered with ≥10x by all three vendors. In trio A, a total of 58,458 variants were found by the three vendors, of which 80.9% (47,281/58,458) were in 3COV regions for both parents (Table S9). From these, we focused on the 22,397 loci where both parents had completely concordant variant calls across all three vendors – hereafter referred to as 3CON – and where one parent was heterozygous and the other was homozygous for the reference allele. For these variants, the offspring should be heterozygous about 50% of the time. In the offspring of trio A, 11,042 (49.3% of 22,397) were called as heterozygous in at least one vendor, of which 10,907 were 3CON. For these loci, concordance rates in the offspring of trio A were higher for both SNVs and indels – 98.9 and 95.7%, respectively – compared with genome-wide concordance rates; these loci also showed higher concordance rates in the six unrelated individuals from trio B and C (98.7 and 92.4% for heterozygous SNVs and indels, respectively). We identified variants as likely false-negatives if there was a no-call or a homozygous reference call in the offspring for trio A for one vendor and a heterozygous call in the other two vendors. There were 8 (V1), 68 (V2) and 23 (V3) likely false-negatives, of which 0% (V1:0/8), 14.7% (V2:10/68) and 0% (V3:0/23) were due to low coverages (< 10x). Of 8 likely false-negatives from V1, 7 were called as homozygous for the reference allele and one was a no-call. Of 68 loci that V2 did not call heterozygous, six were no-calls and 62 were called as homozygous for the reference allele. For V3, all 23 were called as homozygous for the reference allele. We did not consider the variants that were called but had discordant numbers of variant alleles among the vendors – e.g., heterozygous in two vendors and homozygous for the variant allele in the other – since these are more likely ‘genotype errors’ rather than false-negatives.
Figure 3
Vendor-specific false negative rates in trio
For each trio, we calculate number of vendor-specific false negatives in offspring by focusing on the 3CON autosomal loci where one parent is heterozygous (i.e., 0/1-0/1-0/1 for V1, V2, and V3, respectively) and the other is reference concordant homozygous (A). We restrict the analysis to genomic regions covered with ≥10x by all three vendors. Venn diagram (B) shows each subgroup of concordant and discordant variants, and our interpretation of each group is shown in (C). The black solid line in (B) represents the total number of variants discovered by two or more vendors, which we use as a denominator to calculate vendor-specific false negative rates (FNRs) in each trio. FNRs and the numbers of 3CON, 2CONs and vendor-specific heterozygous variants found in offspring are shown in (C). For instance, in trio A, the FNR for V1 (0.07%) is calculated as the number of loci where V1 did not detect the heterozygous variant (8 variants, corresponding to “f”) divided by the total number of likely true set of heterozygous variants (i.e., 2CONs and 3CON) in the proband (23 + 68 + 8 + 10,907 = 11,006, corresponding to “d + e + f + g”).
We then used these false negative loci to estimate a lower bound for false negative rates for each trio. For the denominator of likely true positive variants, we used variants that were called as heterozygous in the offspring in at least two vendors; in trio A there were 11,006 such variants. The lower bound of vendor-specific FNRs in these well-covered regions were 0.07% for V1 (8/11,006), 0.62% for V2 (68/11,006) and 0.22% for V3 (23/11,006). FNRs estimated using the same analysis strategy were 0.16% (V1), 0.56% (V2) and 0.38% (V3) for trio B, and 0.13% (V1), 0.20% (V2) and 0.48% (V3) for trio C (Figure 3C) (we included analysis of V3, even with a lower sequencing depth, because we focused on genomic loci covered at a minimum of 10x in all three vendors). Our estimation of FNRs is comparable to the one reported by Li et al. in family samples.[34] Variants in low coverage regions (<10x) – comprising, for instance, 10.0% of CCDS in trio A – were not used to calculate FNRs; however, the FNR in offspring was likely very low for high coverage regions in both parents, supporting the strength of trio sequencing for some clinical conditions,[35] at least in regions of consistently high coverage.
DISCUSSION
We performed comparative analysis of WES results from three vendors to measure the empirical coverage of medically implicated genes and concordance rate of variant calls among the vendors, using uniform variant calling methods to remove variability from software analytical pipelines. The mean percentage of CCDS coding regions covered with ≥10x was above 95% for the nine individuals and all vendors. For the ACMG 56 and PGx-VIPs, we found a wide range of difference in breadth of coverage with ≥10x across the nine individuals. The mean depth of coverage provided by V2 was typical for clinical WES[36]; however, KCNQ1, PKP2, RB1 and TGFBR1 had variable coverage across the nine individuals with < 95% coverage in some individuals. The depth of coverage provided by V3 would be suboptimal for clinical use.Rare and high impact (RHI) variants in putative disease-associated genes were less concordant than exome-wide averages except for homozygous SNVs, suggesting that these variants were enriched for substantial numbers of false-positives, false-negatives, or variant calling errors. Among these, low coverage in a vendor usually was the source of discordant calls, and the variant was likely a true positive in the other vendor(s). Moreover, structural variation, homopolymer or simple repeats were frequently found in the flanking regions of discordant RHI variants. Therefore, further evaluation and validation including visual inspection of aligned reads and validation using an orthogonal method will be particularly important for RHI variants. Of note, some rare variants have low coverage in population-scale databases such as ExAC. For instance, only two out of 14 discordant RHI variants had good coverage in ExAC. Conversely, estimating accuracy mostly with common variants can bias the results for well-covered genomic regions in population databases where it would be easier to call variants using NGS. Analyzing aPPV and sensitivity of rare variants with or without good coverage in population scale databases for research and clinical applications would be an interesting research topic but is beyond the scope of our current study.For the sites where both parents were highly covered by the three vendors, vendor-specific FNRs in offspring were low for all vendors and three trios (0.07–0.62%). This estimation of FNR cannot be extrapolated to the rest of the genome, since false-negatives are more prevalent outside of the genomic regions covered with ≥10x in all three vendors (3COV) regions (which comprise 10.0, 5.5 and 7.3% of CCDS in trios A, B and C, respectively). Similarly, variants that are harder to call may have higher FNRs, as illustrated by loci in 3COV regions with discordant calls in parents (i.e., 19.1, 12.1 and 15.1% of all variants found in trio A, B and C, respectively). Even so, our results support the strength of trio WES sequencing for molecular genetic diagnosis for most SNVs in regions of high coverage.[35]Our study has some limitations. First of all, although the vendors covered a range of read-lengths – 76 (V1), 150 (V2) and 101 bps (V3) – and mean coverage from 38.3 (V3) to 189.0 (V1), we only sampled a small number of vendors, so our results may not generalize to other vendors or sequencing platforms. To minimize additional sources of variability, we used the same DNA stocks and analytical pipeline, but it is possible that different pipelines or different quality DNA samples could have yielded different results.[37] Notably, we did not validate discordant genetic variants among the vendors – therefore the performance measures were relative between vendors, nor did we use reference genetic material for which several gold standard variant call sets are available, making it difficult to definitively classify discordant genotypes into false-positives, false-negatives, or genotype errors. However, our study used trio samples and multiple vendors to overcome some of these limitations. Furthermore, the utility of a gold standard set of variants from NA12878 may itself be somewhat limited because it is widely used to ensure quality (and hence may be used to optimize platforms for calling this particular set of variants) but does not cover all known disease-associated genes. Estimation of sensitivity and aPPV from an individual genome may also be difficult to generalize to a range of samples[38]; we observed substantial inter-individual variation of the breadth of coverage for clinically implicated genes across the nine individuals in the current study.Mean depth of coverage was a general indicator of overall sensitivity, but did not capture variability in coverage across potentially clinically important genes.[37] For example, V2 provided a standard mean depth of coverage (125x) for clinical research WES, but showed large variability across individuals in the breadth of coverage for the ACMG 56 and CPIC genes, and had consistently higher FNRs in three trios compared with V1. It appears that a depth of coverage for clinical WES may need to be more even, or to be closer to what V1 provided (190x) to achieve consistent coverage across actionable genes across many individuals. Higher depths of coverage (e.g., ≥20x and 30x) were required to improve concordance rates between V1 and V2 for indels. Nonetheless, exonic regions with high sequence homology are challenging to analyze using WES.[30]Establishing the clinical utility of WES therefore requires ongoing measurement of the breadth and depth of coverage and accuracy – both across vendors and within individual vendors over time.[39,40] We observed substantial inter-individual variation in coverage of medically implicated genes. Because aPPV and sensitivity are imperfectly captured by mean coverage, we suggest that clinical WES service providers should inform users as to the range of sensitivity and aPPV for different classes of variants across the sets of genes that are relevant to the particular clinical scenario, estimated across a large cohort of clinical samples. This sort of information will help clinicians both select services and also to interpret clinical reports and distinguish truly negative findings from false-negatives due to low coverage.
Authors: Nicole M Kuderer; Kimberly A Burton; Sibel Blau; Andrea L Rose; Stephanie Parker; Gary H Lyman; C Anthony Blau Journal: JAMA Oncol Date: 2017-07-01 Impact factor: 31.777
Authors: Qianqian Zhu; Qiang Hu; Lori Shepherd; Jianmin Wang; Lei Wei; Carl D Morrison; Jeffrey M Conroy; Sean T Glenn; Warren Davis; Marilyn L Kwan; Isaac J Ergas; Janise M Roh; Lawrence H Kushi; Christine B Ambrosone; Song Liu; Song Yao Journal: Cancer Epidemiol Biomarkers Prev Date: 2015-05-19 Impact factor: 4.254
Authors: Amy S Gargis; Lisa Kalman; Meredith W Berry; David P Bick; David P Dimmock; Tina Hambuch; Fei Lu; Elaine Lyon; Karl V Voelkerding; Barbara A Zehnbauer; Richa Agarwala; Sarah F Bennett; Bin Chen; Ephrem L H Chin; John G Compton; Soma Das; Daniel H Farkas; Matthew J Ferber; Birgit H Funke; Manohar R Furtado; Lilia M Ganova-Raeva; Ute Geigenmüller; Sandra J Gunselman; Madhuri R Hegde; Philip L F Johnson; Andrew Kasarskis; Shashikant Kulkarni; Thomas Lenk; C S Jonathan Liu; Megan Manion; Teri A Manolio; Elaine R Mardis; Jason D Merker; Mangalathu S Rajeevan; Martin G Reese; Heidi L Rehm; Birgitte B Simen; Joanne M Yeakley; Justin M Zook; Ira M Lubin Journal: Nat Biotechnol Date: 2012-11 Impact factor: 54.908
Authors: Neil I Weisenfeld; Shuangye Yin; Ted Sharpe; Bayo Lau; Ryan Hegarty; Laurie Holmes; Brian Sogoloff; Diana Tabbaa; Louise Williams; Carsten Russ; Chad Nusbaum; Eric S Lander; Iain MacCallum; David B Jaffe Journal: Nat Genet Date: 2014-10-19 Impact factor: 38.330
Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis Journal: Nature Date: 2015-10-01 Impact factor: 49.962
Authors: Andrew R Carson; Erin N Smith; Hiroko Matsui; Sigrid K Brækkan; Kristen Jepsen; John-Bjarne Hansen; Kelly A Frazer Journal: BMC Bioinformatics Date: 2014-05-02 Impact factor: 3.169
Authors: William McLaren; Laurent Gil; Sarah E Hunt; Harpreet Singh Riat; Graham R S Ritchie; Anja Thormann; Paul Flicek; Fiona Cunningham Journal: Genome Biol Date: 2016-06-06 Impact factor: 13.583
Authors: Coren A Milbury; James Creeden; Wai-Ki Yip; David L Smith; Varun Pattani; Kristi Maxwell; Bethany Sawchyn; Ole Gjoerup; Wei Meng; Joel Skoletsky; Alvin D Concepcion; Yanhua Tang; Xiaobo Bai; Ninad Dewal; Pei Ma; Shannon T Bailey; James Thornton; Dean C Pavlick; Garrett M Frampton; Daniel Lieber; Jared White; Christine Burns; Christine Vietz Journal: PLoS One Date: 2022-03-16 Impact factor: 3.240
Authors: Edmond Wonkam-Tingang; Isabelle Schrauwen; Kevin K Esoh; Thashi Bharadwaj; Liz M Nouel-Saied; Anushree Acharya; Abdul Nasir; Suzanne M Leal; Ambroise Wonkam Journal: Exp Biol Med (Maywood) Date: 2021-03-09