| Literature DB >> 31976378 |
Matthew N Wakeling1, Thomas W Laver1, Kevin Colclough2, Andrew Parish2, Sian Ellard1,2, Emma L Baple1,3.
Abstract
Multiple Nucleotide Variants (MNVs) are miscalled by the most widely utilised next generation sequencing analysis (NGS) pipelines, presenting the potential for missing diagnoses that would previously have been made by standard Sanger (dideoxy) sequencing. These variants, which should be treated as a single insertion-deletion mutation event, are commonly called as separate single nucleotide variants. This can result in misannotation, incorrect amino acid predictions and potentially false positive and false negative diagnostic results. This risk will be increased as confirmatory Sanger sequencing of Single Nucleotide variants (SNVs) ceases to be standard practice. Using simulated data and re-analysis of sequencing data from a diagnostic targeted gene panel, we demonstrate that the widely adopted pipeline, GATK best practices, results in miscalling of MNVs and that alternative tools can call these variants correctly. The adoption of calling methods that annotate MNVs correctly would present a solution for individual laboratories, however GATK best practices are the basis for important public resources such as the gnomAD database. We suggest integrating a solution into these guidelines would be the optimal approach. Copyright:Entities:
Keywords: GATK; GnomAD; genetic testing; multi nucleotide variants; next generation sequencing; variant calling
Year: 2019 PMID: 31976378 PMCID: PMC6957021 DOI: 10.12688/wellcomeopenres.15420.2
Source DB: PubMed Journal: Wellcome Open Res ISSN: 2398-502X
Figure 1. Diagram illustrating how Multiple Nucleotide Variants will be misannotated if incorrectly treated as separate variants.
Simulated Multiple Nucleotide Variants within the HNF4A gene.
Variants are described according to Human Genome Variation Society sequence variation nomenclature guidelines [17].
| Variant
| Genome position (GRCh37) | Nucleotide position | Codon
| Wild-type
| Variant
|
|---|---|---|---|---|---|
| 1 | 20:43052669_43052671 | NM_175914:c.838_840 | p.Leu280 | CTG | TTC |
| 2 | 20:43053017_43053019 | NM_001030004:c.1186_1188 | p.*396 | TAA | TGG |
| 3 | 20:43056977_43056979 | NM_175914:c.1066_1068 | p.Ser356 | TCC | AGC |
| 4 | 20:43058207_43058209 | NM_175914:c.1261_1263 | p.Ser421 | TCT | TGA |
| 5 | 20:43058219_43058221 | NM_175914:c.1273_1275 | p.Lys425 | AAG | AGT |
Simulated Multiple Nucleotide Variants within the HNF4A gene as annotated by GATK best practices.
| Variant | Wild-type
| Variant
| GATK best
| GATK best
| Correct annotation | Likely implication
|
|---|---|---|---|---|---|---|
| 1 | CTG | TTC | c.838C>T
| c.840G>C
| c.838_840delinsTTC
| False negative
|
| 2 | TAA | TGG | p.*396* | p.*396* | p.*396Trpext*26 | False negative
|
| 3 | TCC | AGC | c.1066T>A
| c.1067C>G
| c.1066_1067delinsAG
| False positive
|
| 4 | TCT | TGA | c.1262C>G
| c.1263T>A
| c.1262_1263delinsGA
| False negative
|
| 5 | AAG | AGT | c.1274A>G
| c.1275G>T
| c.1274_1275delinsGT
| False positive or
|
‡Based on testing for dominant acting heterozygous, pathogenic loss of function variants.
Multiple Nucleotide Variants found in the re-analysed data from the diagnostic panel to be incorrectly annotated as separate variants.
| Gene | Wild-type
| Variant
| GATK best practices
| GATK best practices
| Correct annotation |
|---|---|---|---|---|---|
|
| GCC | TTC | p.Ala752Val | p.Ala752Ser | p.Ala752Phe |
|
| GAT | TCT | p.Asp615Ala | p.Asp615Tyr | p.Asp615Ser |
|
| GAG | AGG | p.Glu421Gly | p.Glu421Lys | p.Glu421Arg |
|
| TAC | CAA | p.Tyr61* | p.Tyr61His | p.Tyr61Gln |