| Literature DB >> 23317699 |
Kapil Raj Pandey1, Narendra Maden, Barsha Poudel, Sailendra Pradhananga, Amit Kumar Sharma.
Abstract
The curation of genetic variants from biomedical articles is required for various clinical and research purposes. Nowadays, establishment of variant databases that include overall information about variants is becoming quite popular. These databases have immense utility, serving as a user-friendly information storehouse of variants for information seekers. While manual curation is the gold standard method for curation of variants, it can turn out to be time-consuming on a large scale thus necessitating the need for automation. Curation of variants described in biomedical literature may not be straightforward mainly due to various nomenclature and expression issues. Though current trends in paper writing on variants is inclined to the standard nomenclature such that variants can easily be retrieved, we have a massive store of variants in the literature that are present as non-standard names and the online search engines that are predominantly used may not be capable of finding them. For effective curation of variants, knowledge about the overall process of curation, nature and types of difficulties in curation, and ways to tackle the difficulties during the task are crucial. Only by effective curation, can variants be correctly interpreted. This paper presents the process and difficulties of curation of genetic variants with possible solutions and suggestions from our work experience in the field including literature support. The paper also highlights aspects of interpretation of genetic variants and the importance of writing papers on variants following standard and retrievable methods.Entities:
Mesh:
Year: 2012 PMID: 23317699 PMCID: PMC5054708 DOI: 10.1016/j.gpb.2012.06.006
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Common automated mutation curation tools and their extraction strategies and quality measures
| MuteXt | Regular expression, word proximity, Swiss-Prot entry | Variant-protein (at amino acid level) | GPCR and NR protein related full texts and abstracts | 0.87; 0.87; U# | |
| MEMA | Regular expression, word proximity | Variant – gene (at amino acid and DNA levels) | Medline abstracts | 0.93;0.35;U∗ | |
| Mutation GraB | Regular expression, graph metric, sequence check | Variant–protein–organism (at amino acid level) | Full text articles | 0.84;0.90;0.87 | |
| Mutation miner | Regular expression, sentence co-mention | Variant-organism (at amino acid level) | Abstracts | 0.91;0.46;0.61 | |
| Mutation finder | Regular expression | Gene-variant (at amino acid level) | Full text articles | 0.98;0.81;0.81 | |
| Yip et al., 2007 | Regular expression, rule-based system | Gene-variant (at amino acid level) | Full text articles | 0.89;U;U | |
| coagMDB | Regular expression, graph metric, sequence check | Gene-variant (at amino acid level) | Full text articles; serine protease | 87-93;96-99;U | |
| MuGeX | Regular expression | Gene-variant (at protein and DNA levels) | Medline abstracts; Alzheimer’s disease associated genes | 88.9;91.3;U | |
| Krallinger et al., 2009 | Regular expression, residue disambiguation and classification | Gene-variant (at protein level); natural vs artificial variants | Abstract and full text articles; kinase protein | 72;U;U and 93.88;U;U for natural vs artificial variants | |
| PolySearch | Sentence co-mention, word association | SNP detection; gene-variant | Abstracts, full text articles | U;U;U |
Note: U indicates undetermined; #, G-protein-coupled receptor (GPCR) mutations; NR, nuclear hormone receptor. ∗ For example, when 100 abstracts were tested by MEMA for cited mutations in one letter code for variant-gene extraction pair, the quality measures, P and R values, were 0.93 and 0.35, respectively. P, precision; R, recall; F, F-score. See more details in the text.
Descriptive patterns or expression of variants in literature
| Amino acid | Missense | p.Arg30Gln; Arg30Gln; R30Q; Arg30 to Gln; Arg30 > Gln; Arg30 → Gln; Arg30Gln; Arg30toGln; Arg(30)-Gln; Arg-30 → Gln; Arg30 → Gln; Gln30; 30Gln; Arg/Gln at codon 30; Arginine to Glutamine (substitution) at (codon) 30; Arg > Gln change at amino acid 30; Glutamine for Arginine at residue 30; RQ30; Q30; Arg30 → Gln |
| Nonsense | R30X; p.Arg30X; R30Ter; R30∗. R30Stop | |
| Frameshift | R14fsX4; DeltaR30; ΔR30; 30delArg; Ins30Arg; deletion (or insertion) at codon 30 | |
| Silent | R30R; Arg30Arg; p.Arg30=; p.Arg30Arg. | |
| DNA | Substitution | c.90G>A; G90A; 90G/A; G-90>A; 90→A; 90G-A; G(90)→A; UTR: c.-90G>A; -90 G→A; G to A at -90; c.∗90G>A; c.∗+90G>A |
| Frameshift | c.90delG; c.90del; 90delG; c.90insG; 90insG; 90del2; 1-bp del, 90G; c.89_90insG; c.89_90delinsA; 90delinsA; c.90dupG; insertion of G at position 90; Arg30fsX2; R30fs; insertion (or duplication) of G at position 90; deletion of 2 bp at codon 30. | |
| Intronic | A to G at splice acceptor of intron 2; IVS31AS, A-T, -2; 3061(-1)G --> A; IVS32DS, G-A, +1; IVS2-2A>G; IVS2+1G>A; IVS2+1(G>A); Intron 2 nt-51A>G; 401(-1)G --> A; IVS1, G-A, -1; c.400+30A/G; c.400+30A>G; 400+30A>G; c.-8C>G; Intron 2 (-8G->A); IVS1+15del3; 400+30delG; 400+30insG; | |
| Large deletions/duplications | ### bp deletion; del exon 1, c.del exons 2_4; c.dup exons 2_4; | |
| SNP | rs# or ss#; for example: rs5495 | |
| Haplotype | Haplotype description is gene/locus specific; for example, (TG)m(T)n, | |
Mutation-like terms and overlapping names that need to be validated against the variants curated by automated tools
| Cell lines | T47D (breast cancer cells); L5178Y (lymphoblasts); C33A (human cervical cancer cells); V600E (BRAF thyroid cancer cells); H293K, T98G (Human glioblastoma cell lines); M14T (T-cell line); H294R (adrenocortical line); A375M, F30K, F5K, T14D, T24C, T20C (cancer cell lines); |
| Gene names | L23A, E2F, H4M, ER, |
| Protein names | A2V, S100D, S100C, S100E, P34S(sperm surface protein), C184L, A10L(viral), A11L(viral), A52R(viral), |
| Taxonomic entities | |
| Overlapping names | A13G, C13T. |
| Others | M24R (filter); A83586C (antibiotic), A27L (immunogen), A9145C (antifungal), |