| Literature DB >> 28967166 |
Peter J Freeman1, Reece K Hart2,3, Liam J Gretton4, Anthony J Brookes1, Raymond Dalgleish1.
Abstract
The Human Genome Variation Society (HGVS) variant nomenclature is widely used to describe sequence variants in scientific publications, clinical reports, and databases. However, the HGVS recommendations are complex and this often results in inaccurate variant descriptions being reported. The open-source hgvs Python package (https://github.com/biocommons/hgvs) provides a programmatic interface for parsing, manipulating, formatting, and validating of variants according to the HGVS recommendations, but does not provide a user-friendly Web interface. We have developed a Web-based variant validation tool, VariantValidator (https://variantvalidator.org/), which utilizes the hgvs Python package and provides additional functionality to assist users who wish to accurately describe and report sequence-level variations that are compliant with the HGVS recommendations. VariantValidator was designed to ensure that users are guided through the intricacies of the HGVS nomenclature, for example, if the user makes a mistake, VariantValidator automatically corrects the mistake if it can, or provides helpful guidance if it cannot. In addition, VariantValidator has the facility to interconvert genomic variant descriptions in HGVS and Variant Call Format with a degree of accuracy that surpasses most competing solutions.Entities:
Keywords: HGVS variant nomenclature; VCF; reference sequences; sequence variants; sequence variation; validation; variant call format
Mesh:
Year: 2017 PMID: 28967166 PMCID: PMC5765404 DOI: 10.1002/humu.23348
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878
Figure 1Mapping of variants onto alternative transcripts. Submitted variant descriptions are automatically mapped, via the selected genome build (GRCh38), onto all other transcripts that overlap the same genomic position. In this example, NM_182763.2:c.688+403C>T, which is intronic with respect to MCL1 transcript variant 2 mRNA, is mapped to an exonic variant in MCL1 transcript variant 1 mRNA, NM_021960.4:c.740C>T. The same initial variant description also maps to an exonic variant in MCL1 transcript variant 3 mRNA, NM_001197320.1:c.281C>T
Figure 2Variant descriptions at exon/intron boundaries. This illustrates how a three‐base deletion in the COL1A2 gene at the junction of the 3′ end of exon 19 with the adjacent intron might be described in two different ways in the context of the RefSeq transcript reference sequence NM_000089.3. Description A shows that the three deleted bases can be described at position NM_000089.3:c.1033_1035 where the deleted bases are GTT, but Description B shows that the variant can be normalized and described at position NM_000089.3:c.1035_1035+2 where the deleted bases are TGT. The latter description corresponds with the genomic variant description NC_000007.13:g.94039133_94039135delTGT. Formally, intronic variants described in the context of a transcript reference sequence must be accompanied by a genomic reference sequence to allow full verification of the variant. This is illustrated by Description C
Mutalyzer transcript designations do not correspond with the RefSeq transcript sequence definitions
| RefSeq accessions and versions for | RefSeq sequence definitions | Mutalyzer transcript designations |
|---|---|---|
| NM_022356.3 | Homo sapiens prolyl 3‐hydroxylase 1 (P3H1), transcript variant 1, mRNA. | LEPRE1_v003 |
| NM_001146289.1 | Homo sapiens prolyl 3‐hydroxylase 1 (P3H1), transcript variant 2, mRNA. | LEPRE1_v002 |
| NM_001243246.1 | Homo sapiens prolyl 3‐hydroxylase 1 (P3H1), transcript variant 3, mRNA. | LEPRE1_v001 |