| Literature DB >> 34007000 |
Stephen E Lincoln1, Tina Hambuch2, Justin M Zook3, Sara L Bristow1, Kathryn Hatchell1, Rebecca Truty1, Michael Kennemer1, Brian H Shirts4, Andrew Fellowes5, Shimul Chowdhury6, Eric W Klee7, Shazia Mahamdallie8,9, Megan H Cleveland3, Peter M Vallone3, Yan Ding6, Sheila Seal8, Wasanthi DeSilva5, Farol L Tomson10,11, Catherine Huang10, Russell K Garlick10, Nazneen Rahman8, Marc Salit3,12, Stephen F Kingsmore6, Matthew J Ferber7, Swaroop Aradhya1, Robert L Nussbaum1.
Abstract
PURPOSE: To evaluate the impact of technically challenging variants on the implementation, validation, and diagnostic yield of commonly used clinical genetic tests. Such variants include large indels, small copy-number variants (CNVs), complex alterations, and variants in low-complexity or segmentally duplicated regions.Entities:
Mesh:
Year: 2021 PMID: 34007000 PMCID: PMC8460443 DOI: 10.1038/s41436-021-01187-w
Source DB: PubMed Journal: Genet Med ISSN: 1098-3600 Impact factor: 8.822
Fig. 1Technically challenging variant types.
Variants were categorized as being technically challenging, or not, based on these six criteria. Note that some variants could be considered challenging for multiple reasons (e.g., a single-exon deletion within a segmentally duplicated region). Examples provided are variants observed in the prevalence analysis. Detailed criteria are provided in the Supplemental Methods. CNV copy-number variant, indels, insertions or deletions, NGS next-generation sequencing, Segdup, segmental duplication, SNVs single-nucleotide variants, STR short tandem repeat.
Proof of concept and sensitivity study results.
| Study | Interlaboratory pilot studya | Sensitivity studyb | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Workflow | 1A | 1B | 2 | 3 | 4 | 5 | 6 | 7A | 7B | 8 | 1B | |||
| Sequencing | IPE | IPE | IPE | IPE | IPE | IPE | IPE | IPE | IPE | Ion | IPE | |||
| Targeting | Hyb | Hyb | Hyb | Hyb | Hyb | Hyb | GS | Hyb | Hyb | Amp | Hyb | |||
| Informatics | CS | CS | CS | TP | CS | CS | TP | EV | EV | EV | CS | |||
| Samples | Syn ( | Syn ( | Syn ( | Syn ( | Syn ( | Syn ( | Syn ( | Syn ( | Syn ( | Syn ( | GIAB ( | Syn ( | Ref ( | Clinical ( |
| SNVs | 15/15 | 15/15 | 15/15 | 11/11 | 15/15 | 12/12 | 15/15 | 11/11 | 15/15 | 10/10 | 434/434 | 1/1 | 1/1 | 18/18 |
| Short Indel | 11/11 | 11/11 | 11/11 | 10/10 | 11/11 | 11/11 | 11/11 | 36/36 | 22/22 | 43/43 | 8/8 | |||
| Large indel | 4/4 | 4/4 | 4/4 | 4/4 | 4/4 | 4/4 | 8/8 | 6/6 | ||||||
| Segdup-associated | 3/3 | 3/3 | 3/3 | 2/2 | ||||||||||
| Low complexity | 4/4 | 4/4 | 4/4 | 4/4 | 4/4 | 3/3 | 6/6 | 2/2 | 1/1 | |||||
| Complex and small CNV | 2/2 | 2/2 | 2/2 | 3/3 | 3/3 | 2/2 | ||||||||
| CNV ≥2 exon | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 4/4 | 1/1 | ||
Amp amplicon sequencing, CNV copy-number variant, CS custom software, EV software provided by the sequencing equipment vendor, GIAB Genome in a Bottle, GS genome sequencing, Hyb hybridization capture, IPE Illumina Paired-end, indel insertion or deletion, Ion Ion Torrent, N/A not applicable, NGS next-generation sequencing, Ref reference specimens from public biobanks, Segdup segmental duplication, SNVs single-nucleotide variants, Syn synthetic controls, TP third-party software.
aFor the interlaboratory pilot study, the performance of 10 NGS workflows used by seven collaborating laboratories is shown for variants in two synthetic control specimens. In each data cell, the denominator is the number of variants within each assay’s target regions and the numerator is the number of these variants that were detected. Normal font indicates 100% observed sensitivity. Bold font indicates an observed limitation. Italics indicate that no study variants were present in regions interrogated by the assay. Details of each of the 10 workflows and the variants are provided in Tables S1 and S2. All workflows included bioinformatics methods to detect SNVs and small indels, and half (workflows 1A, 1B, 2, 6, and 7B) included additional methods to improve sensitivity for large indels and complex variants. CNVs were not included in this study. Workflow 1A had previously detected all of these variants in patients and was included primarily to validate the construction of the synthetic controls and to allow the comparison of synthetic and patient specimen data for the same variants (Figure S1). Workflow 1B corresponds to Fig. 2 and was an evolution of 1A, albeit with substantial differences in the variant calling algorithms used (Table S2, Fig. 2).
bFor the sensitivity study, performance is shown for variants in samples from each source. Positive controls with technically challenging variants were difficult to obtain, requiring a large number of reference specimens and additional synthetic controls. A list of samples used in this study is provided in Table S5.
Fig. 2Variant calling process used in prevalence study.
A next-generation sequencing (NGS) workflow designed to detect a wide variety of variant types was used in our prevalence study, sensitivity study, and also our pilot study (workflow 1B). NGS reads are aligned to a modified reference genome and multiple variant callers are then applied. Follow-up assays are used to confirm potential false positives, to determine the exact sequence of complex variants, and to resolve the location of variants within segmental duplications. For details see “Materials and Methods.” CNVs copy-number variants, GATK Genome Analysis Toolkit, Indels insertions or deletions, LR-PCR long-range polymerase chain reaction, MEIs mobile element insertions, NGS next-generation sequencing, QC quality control, Segdup segmental duplication, SNVs single-nucleotide variants.
Fig. 3Prevalence of technically challenging variants.
For each clinical area, we evaluated the population of pathogenic or likely pathogenic (P/LP) variants that met one or more of our definitions of technically challenging (Fig. 1). Blue bars indicate the prevalence of challenging variants among all reported P/LP findings. The heatmap (green cells) indicates the relative contribution of each variant class to this result. Gray bars indicate the fraction of unique variants that were technically challenging (i.e., when the same variant appeared in more than one patient, it was counted only once in this analysis but was counted multiple times in the prevalence analysis [blue bars]). The differences between these two fractions result from a small number of relatively common P/LP variants that are (e.g., in carrier or neurology testing) or are not (e.g., preventive testing) technically challenging. A total of 102,085 patients with P/LP variants in 1,217 genes are represented in this data set. Challenging variants of most types were observed across clinical areas. CNV copy-number variant, Indel insertion or deletion.
Fig. 4Breakdown of clinical variants.
(a) Size distribution of pathogenic/likely pathogenic (P/LP) indels and copy-number variants (CNVs), whether technically challenging or not. Sixty-four percent of these variants were 1–5 bp in size (not shown). Single-nucleotide variants (SNVs), FMR1 trinucleotide repeat expansions, and variants in the CFTR poly-T/TG site are not included. (b) Next-generation sequencing (NGS) coverage of P/LP clinical variant locations in the gnomAD database of 125,748 exome sequences (version 2.2.1). The gnomAD genome sequences were not used in this analysis. The average gnomAD exome coverage was 76× at these clinical variant sites (much lower than the 660× average for our clinical testing). The observed rate of a clinical variant location having less than the indicated degree of coverage in the gnomAD exomes was calculated at specific thresholds shown. 5.1% have no coverage (0×), 6.7% less than 10× coverage (including 0×), and 10.1% less than 20×. CNVs were not included in this analysis. (c) Comparison of P/LP clinical variant sites with the Genome in a Bottle (GIAB) benchmark regions using the version 3.3.2 and 4.1 GIAB data sets. Many (9.7%) of these variants were outside of the benchmark regions in all seven GIAB samples (“Not Any” category) and 15.1% of these variants were outside of these regions in at least one of the seven samples (“Not all”). However, the newer version 4.1 GIAB data, available only for one of the GIAB samples at this time, substantially improves this situation. CNVs were not included in this analysis.