| Literature DB >> 23110757 |
Steven Vercruysse1, Martin Kuiper.
Abstract
BACKGROUND: Ideally each Life Science article should get a 'structured digital abstract'. This is a structured summary of the paper's findings that is both human-verified and machine-readable. But articles can contain a large variety of information types and contextual details that all need to be reconciled with appropriate names, terms and identifiers, which poses a challenge to any curator. Current approaches mostly use tagging or limited entry-forms for semantic encoding.Entities:
Mesh:
Year: 2012 PMID: 23110757 PMCID: PMC3532140 DOI: 10.1186/1756-0500-5-601
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Screenshots of the ‘MineMap’ test-application. Clockwise: input box for one article’s digital abstract (cooperatively created); partial list of curated papers; part of a visualised network.
Illustration of the Life Sciences’ many synonymous and polysemous terms
| AT3G48750 | CDC2, CDC2A, CDC2AAT, | A. thaliana |
| AT1G52340 | ABA Deficient 2, ATABA2, ATSDR1, GIN1, Glucose Insensitive 1, Impaired Sucrose Induction 4, ISI4, Salt Resistant 1, SDR1, SIS4, SRE1, Sugar-Insensitive 4, … | A. thaliana |
| Cyclin-Dependent Kinase 2 | H. sapiens | |
| ABCC8 | H. sapiens | |
| S100A8 | H. sapiens | |
| AATK | AATYK, KIAA0641, LMTK1, | H. sapiens |
| Leishmaniasis Resistance 1 | M. musculus | |
| CHLREDRAFT_184328 | C. reinhardtii | |
| Fruit | achene, berry, capsule, caryopsis, circumcissile capsule, cypsela, drupe, follicle, grain, nut, pod, poricidal capsule, silicula, siliqua, silique, … | Plant Ontology |
| nodal root | crown root, seminal root | Plant Ontology |
| erythrocyte | red blood cell, RBC | Cell Type |
Many gene aliases and synonyms as used in literature. Terms in bold highlight polysemy conflicts, esp. some terms that are official gene symbols in two species. Note that ‘silique’ is a term used in Arabidopsis research to refer to the plant’s dry seed pods, although the official PO term is ‘fruit’. For A. thaliana we queried TAIR [18], for H. sapiens: HUGO [19], for M. musculus (mouse) and C. reinhardtii: NCBI [20], for Plant Ontology: PO [21], for Cell Type: OLS [22,23].
Figure 2Terms as guides to meanings, and proposed upgrade for ‘controlled language’ input. (a) Example with two synonyms, of which one is polysemous as well. (b-c) Although the term AATK is preferred for some human gene, a user can make the link between the non-preferred synonym LMR1 and the same concept. Via an autosuggestion-panel (here a mock-up), the curator would be able to appoint the intended meaning for a term. Underlined terms are official, non-italic terms are synonyms. In italic is extra info for disambiguating polysemous terms.
Figure 3Benefits of biocuration with support for using synonymous and polysemous terms as they are. Provided a mechanism for linking each term to a unique identifier, the controlled language could have improved expressiveness and usability.