| Literature DB >> 28122645 |
Jennifer L Yen1, Sarah Garcia2,3, Aldrin Montana2, Jason Harris2, Stephen Chervitz2, Massimo Morra2, John West2, Richard Chen2, Deanna M Church2,3.
Abstract
BACKGROUND: Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript- or protein-based descriptions.Entities:
Keywords: Annotation; Clinical testing; Genomics; HGVS; Precision medicine; Sequencing; Syntax; Variant
Mesh:
Year: 2017 PMID: 28122645 PMCID: PMC5267466 DOI: 10.1186/s13073-016-0396-7
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Factors affecting HGVS syntax generation. a Transcript alignment approach can impact the transcript exon structure. Alignment of cDNA sequence by Splign and BLAT to the genome results in a 10-kb difference in an exon positioning in the CARD9 gene (green arrow). b Transcript accession can impact the variant association and HGVS syntax. Here, the identified GNAS variant is outside the clinically relevant transcript. Small changes in versions may also impact the coding sequence. c In the context of nucleotide repeats, variant justification can affect the variant’s position. d Transcript annotation directly impacts its translation to a protein expression. Incorrect transcript annotation can lead to incorrect protein syntax. e Representing the variant in a particular expression. There are different ways of expressing the same coding or protein variant
Fig. 2Methodology of HGVS syntax comparison. To compare two HGVS expressions in our dataset, we applied the following assessments. a The query transcript must match the reference transcript. If the accession or version does not match, the variant is not assessed. b If the syntax for both expressions correspond as-is, the match is “exact”. c If the syntax for both expressions are equivalent, the match is “equivalent”. If the syntax is not an alternative expression of the other HGVS variant, the match is “incorrect”
Fig. 3Datasets by composition. Number of variants and distribution of variant types in the Ground Truth, ClinVar, and COSMIC dataset. Note that due to transcript discrepancies, the number of variants evaluated may be less than the number of variants in the input set. SNV refers to single nucleotide variant
Fig. 4Summary of Ground Truth set HGVS syntax assessment. a Fraction of unique transcript accessions and versions in the Ground Truth set that were available to the tools SnpEff (snpeff), VEP (vep), and Variation Reporter (vr). If a transcript was not accessible to the tool, the variant could not be annotated with respect to that transcript. b Exact concordance of HGVS syntax at the coding (left) and protein (right) levels among the tools. c Accuracy of annotation across variants (total n = 121) described as exact (turquoise) and equivalent (light turquoise). Fraction shown is with respect to annotations on the relevant transcripts on the test set. d Accuracy of annotation by variant type across the tools. Variant types evaluated were: deletions (del), insertions and deletions (indel), duplications (dup), insertions (ins), and single nucleotide variants (SNVs)
Fig. 5ClinVar and COSMIC HGVS syntax assessment. a Overall concordance in syntax by variant type between the tools and ClinVar. b Overall concordance in syntax by variant type between tools and COSMIC. All duplications were annotated as insertions in COSMIC. For both ClinVar and COSMIC, coding variants are shown in the upper panel and protein variants in the lower panel; bars represent fraction of exact (turquoise) and equivalent (light turquoise) matches
Exemplar variants demonstrating nomenclature discrepancies
| ClinVar | COSMIC | SnpEff | Vep | VR | Preferred HGVS | Reference ID | |
|---|---|---|---|---|---|---|---|
| Coding HGVS: variant type | |||||||
| Insertion | c.2339_2340insGGGCTCCCC | c.2331_2339dupGGGCTCCCC | c.2331_2339dupGGGCTCCCC | c.2331_2339dup | COSM12555* | ||
| Insertion | - | c.2262_2263ins14 | - | c.2262_2263insGGCATCTCAGCATC | - | c.2262_2263insGGCATCTCAGCATC | COSM5254274 |
| Deletion | c.1200-1delG | c.1200delG | c.1200delG | c.1201delAinsGA | c.1200-1delG | PTV021, rs63186960 | |
| Deletion | c.1895 + 1_1895 + 4delGTGA | c.1895 + 5_1895 + 8delGTGA | c.1895 + 5_1895 + 8delGTGA | c.1895 + 9GTGAC > C | c.1895 + 5_1895 + 8delGTGA | PTV003, rs386834023 | |
| Duplication | - | c.422_423insA | c.428dupA | c.428dupA | - | c.428dup | COSM4719972 |
| Indel | - | c.3141_3142GA > TT | c.3141_3142delGAinsTT | c.3141_3142delGAinsTT | - | c.3141_3142delinsTT | COSM4387531 |
| Indel | c.68-5_68-3delinsTT | - | c.68-5_68-3delCTCinsTT | c.68-5_68-3delCTCinsTT | c.68-5_68-3delCTCinsTT | c.68-5_68-3delinsTT | rs397516362 |
| SNV | c.1621A= | c.1621A>G | c.1621A>G | c.1621G= | c.1621G= | PTV099, rs2228006 | |
| Protein HGVS: consequence | |||||||
| Frameshift | p.Arg227Lysfs | - | p.Arg227fs | p.Arg227LysfsTer31 | - | All are acceptable | rs80356649 |
| Frameshift | - | p.P1176fs*>46 | p.Pro1176fs | p.Pro1176AlafsTer117 | - | p.Pro1176fs or p.Pro1176AlafsTer117 | COSM5196763 |
| Frameshift | - | - | p.Glu238fs | p.Glu238ProfsTer9 | p.Phe237_Glu238insPro | p.Glu238fs or p.Glu238ProfsTer9 | PTV009 |
| In-frame insertion | - | p.Pro780_Tyr781insGlySerPro | p.Pro780_Tyr781insGlySerPro | p.Gly778_Pro780dup | - | p.Gly778_Pro780dup | COSM125551 |
| Synonymous | p.Arg317= | - | p.Arg317Arg | p.= | p.Arg317= | p.Arg317= | rs111033272 |
| Synonymous | - | p.*1143* | p.Ter1143Ter | p.= | - | p.Ter1143= | COSM3558732 |
| Stop gained | p.Gln100Ter | - | p.Gln100* | p.Gln100Ter | - | All are acceptable | rs119103276 |
| Extention | - | p.*1133L | p.Ter1133Leuext*? | p.Ter1133LeuextTer22 | - | p.Ter1133LeuextTer22 | COSM1569676 |
| In-frame insertion | - | - | p.Arg309_Arg310insArgArg | p.Arg310_Arg311dup | p.Arg311_Lys312insArgArg | p.Arg310_Arg311dup | PTV082 |
| In-frame insertion | - | p.T502_H505delTTGH | p.Thr502_His505del | p.Thr502_His505del | - | p.Thr502_His505del | COSM1163654 |
| In-frame deletion | - | - | p.Ala1111_Ala1119del | p.Ala1111_Ala1119del | p.Ala1119_Gly1120insAlaAlaAlaAlaAlaAlaAlaAlaAla | p.Ala1111_Ala1119del | PTV021 |
1 Known in My Cancer Genome as “G778_P780dup”