| Literature DB >> 25273102 |
Reece K Hart1, Rudolph Rico2, Emily Hare2, John Garcia2, Jody Westbrook2, Vincent A Fusaro2.
Abstract
UNLABELLED: Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics.Entities:
Mesh:
Year: 2014 PMID: 25273102 PMCID: PMC4287946 DOI: 10.1093/bioinformatics/btu630
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Using the hgvs package to project a variant in MCL1 from one transcript to another via GRCh37 chromosome 1. (a) An object representation of the result of parsing ‘NM_182763.2:c.688+403C>T’. Selected attributes are shown beneath. (b) A diagram of the MCL1 locus with five representations of a single variant. (c) Python code that demonstrates parsing, mapping between sequences, formatting and validating. Gray outline boxes enclose input, and the results appear immediately beneath. Circled numbers indicate a correspondence between the variants in (a) and code in (c). An SNV❶, originally reported in literature as NM_182763.2:c.688+403C>T (rs201430561), is projected onto chromosome 1 as variant ❸, and then projected to an alternative transcript as variant. The inferred protein❹ changes of variants ❶ and ❹ are shown as protein variants❷ and ❺. The results are formatted by ‘stringifying’ them using standard Python printing commands. Validation for a valid variant (281C>T; no error generated) and an error for an invalid variant (281A>T) are shown