| Literature DB >> 35191116 |
Jordi Corominas1, Sanne P Smeekens1, Marcel R Nelen1, Helger G Yntema1,2, Erik-Jan Kamsteeg1,2, Rolph Pfundt1,2, Christian Gilissen3.
Abstract
Massive parallel sequencing technology has become the predominant technique for genetic diagnostics and research. Many genetic laboratories have wrestled with the challenges of setting up genetic testing workflows based on a completely new technology. The learning curve we went through as a laboratory was accompanied by growing pains while we gained new knowledge and expertise. Here we discuss some important mistakes that have been made in our laboratory through 10 years of clinical exome sequencing but that have given us important new insights on how to adapt our working methods. We provide these examples and the lessons that we learned to help other laboratories avoid to make the same mistakes.Entities:
Keywords: NGS data analysis; clinical exome; clinical variant interpretation; genetic diagnostics; next generation sequencing; whole exome sequencing
Mesh:
Year: 2022 PMID: 35191116 PMCID: PMC9541396 DOI: 10.1002/humu.24360
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.700
Figure 1Issues that were encountered in data analysis. (a) IGV screenshot of sequence alignment for a pathogenic coding variant in the gene TCF20 that was initially not detected because the sequencing reads align to multiple locations in the reference genome due to the inclusion of alternate contigs. (b) Reanalysis of the same sample while excluding alternate contigs led to unique alignments of the sequence reads and detection of this variant. (c) Example of a coding exon where a variant may be missed because the capture target (Agilent SureSelect v5) does not fully overlap with the exonic region. (d) Overview of the percentage of coding bases (Gencode Basic v.34lift37) that is not exactly within the capture targets, and within 200 bp vicinity of the capture targets, for different enrichment methods. (e) Normalized coverage of capture targets (Agilent v5) for an exome sample when using a heterogeneous reference cohort for CNV calling (CoNIFER). Information related to the number of CNVs and autosomal standard deviation (SD) is added to capture the effects of using a heterogeneous reference cohort. (f) The same sample analyzed with a more homogenous reference cohort showing a reduced variation and less CNV calls. (g) UCSC genome browser gene view showing gene structure and transcripts for the gene CCDC141 for two different GENCODE versions highlighting how additional coding exons may be added that can change variant interpretation. An exome variant is indicated in exon five of the transcript present only in GENCODE version 37. CNV, copy number variants; IGV, Integrated Genomics Viewer
Figure 2Schematic representation of our interpretation workflow. Gray boxes on the left indicate the analysis of single nucleotide variants (SNVs; GATK calling), copy number variants (CNVs; CoNIFER; Krumm et al., 2012), and ExomeDepth (Plagnol et al., 2012) and “Other,” which includes interpretation of regions of homozygosity (Magi et al., 2014), Uniparental disomy (Yauy et al., 2020), and repeat expansions in coding regions (Dolzhenko et al., 2017)
Figure 3Issues that were encountered in data interpretation. (a) An obvious homozygous deletion in GPSM2 was incorrectly called by GATK as being heterozygous, indicating that visual inspection of the data is crucial for the correct interpretation of variants. (b) A pathogenic variant in the AR MICU1 gene was detected together with a two exon deletion in the same gene. (c) A nonsense variant in TCF4 was not detected when filtering on de novo variants because the mother was a mosaic carrier (30%) of this variant (alignments sorted by base in IGV). (d) An isodicentric marker chromosome (q13.1) was detected in WES data as an ~8.4 Mb terminal gain on 15q11.1q13.1, indicating that it is important to keep the chromosome structure in mind when analyzing WES CNV data. (e) Pathogenic variants in control databases, like the p.(Tyr735Cys) variant in DNTM3A, can be recognized by their overrepresentation in older individuals. (f) An 18bp duplication in PHOX2B was not called by GATK, but was detected after reanalysis prompted by the distinctive phenotype, emphasizing the power of hypothesis‐driven diagnostics. IGV, Integrated Genomics Viewer