| Literature DB >> 21992071 |
Jeroen F J Laros1, André Blavier, Johan T den Dunnen, Peter E M Taschner.
Abstract
BACKGROUND: The use of a standard human sequence variant nomenclature is advocated by the Human Genome Variation Society in order to unambiguously describe genetic variants in databases and literature. There is a clear need for tools that allow the mining of data about human sequence variants and their functional consequences from databases and literature. Existing text mining focuses on the recognition of protein variants and their effects. The recognition of variants at the DNA and RNA levels is essential for dissemination of variant data for diagnostic purposes. Development of new tools is hampered by the complexity of the current nomenclature, which requires processing at the character level to recognize the specific syntactic constructs used in variant descriptions.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21992071 PMCID: PMC3194197 DOI: 10.1186/1471-2105-12-S4-S5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
New symbols and symbol applications in the extended standard human sequence variant nomenclaturea
| d | Downstream. Position number prefix for coding DNA positions following the end (3') of the transcript. Example: c.*405+d256G>T |
|---|---|
| n | Position number prefix for non-coding DNA positions. Numbering starts at the first nucleotide of the non-coding transcript. Example: n.46G>T |
| p | Suffix to specify protein isoforms in descriptions using LRG sequences (Locus Reference Genomic) [ |
| t | Suffix to specify transcript variants in descriptions using LRG sequences. Example: LRG_1t1 |
| u | Upstream. Position number prefix for coding DNA positions upstream (5') of the start of the transcript. Example: c.-110-u256G>T |
| _i | Gene symbol suffix to specify protein isoforms in protein variant descriptions using genomic reference sequences. Example: DMD_i2 |
| _v | Gene symbol suffix to specify transcript variants in coding DNA variant descriptions using genomic reference sequences. Example: DMD_v2 |
| ^ | Exclusive or: to combine DNA descriptions, which are derived from protein level descriptions. Example: backtranslation of p.Ser124Arg, where the Ser-124 codon at c.370_372 is AGC. The variant should be described as c.[370A>C^372C>R] to reflect that arginine can be encoded by six possible codons, AGR (AGC and AGT) and CGN (CGA, CGC, CGG and CGT), respectively. |
| / | Allele separator in mosaic cases. Used in ISCN [ |
| // | Allele separator in chimaeric cases. Used in ISCN [ |
| { } | Curly braces enclose "sub-alleles", changes within the range of duplications, inversions, gene conversions and insertions. Example: c.24_65inv{46G>T} (See [ |
| ; | Replaces + in SingleAlleleVarSet, MultiAlleleVars and MultiTranscriptVar |
| (;) | Replaces (+) indicating uncertain phase in UnkAlleleVars. In general, parentheses are used to indicate uncertainty. |
a See http://www.hgvs.org/mutnomen for a full list of symbols and their use.