| Literature DB >> 25619558 |
Weisong Liu1, Stanley J F Laulederkind2, G Thomas Hayman3, Shur-Jen Wang3, Rajni Nigam3, Jennifer R Smith3, Jeff De Pons3, Melinda R Dwinell1, Mary Shimoyama1.
Abstract
The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu.Entities:
Mesh:
Year: 2015 PMID: 25619558 PMCID: PMC4305386 DOI: 10.1093/database/bau129
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.OntoMate system architecture. The basic system consists of data collection, article database, information extraction and information retrieval (indexing and user interface). User interface can be adapted for different applications.
Figure 2.RGD’s old and new workflows for manual gene curation. White boxes represent tasks involving the PubMed interface and colored boxes represent processes done in the RGD curation tool interface. The new workflow has reduced the processes of the old workflow from two interfaces to one interface.
Comparison of term strings and gene query strings between PubMed and OntoMate searches
| Disease category | PubMed searches (manually constructed) | OntoMate searches (automatically generated based on the input RGD disease ontology ID and RGD gene ID) | ||
|---|---|---|---|---|
| Term query string | Sample gene query string | Term query string | Sample gene query string | |
| Kidney diseases | (kidney or renal or urethral or ureteral or urinary or bladder) and (disease or injury or disorder or insufficiency or obstruction or polycystic or cyst or failure or stones) | (ADSF or RSTN or XCP1 or FIZZ3 or retn or resistin) | ‘Kidney Diseases’ (RDO:0000692) | ‘Retn’ (RGD:628781) |
These searches involved a single gene for the recent Renal Disease Portal curation at RGD. The OntoMate search is based on RGD gene IDs and ontology term IDs.
Figure 3.OntoMate query results page. (A) ‘Query Condition’ displays the string of objects and terms used in the query. (B) The filter section allows users to adjust the results according to publication chronology or object/term refinements. The tabs display hyperlinked subsets of result categories. Any link may be selected to restrict or expand the selected results. A ‘filter path’ appears below the Query Condition to show the user what filters have been applied. (C) The search results are sorted by relevance by default, but can also be sorted by publication date or PMID. If the reference is in RGD already, an RGD logo appears (blue arrow) above the title. If there are any GO or disease vocabulary annotations from this reference, an aspect initial appears in the upper right corner of the reference entry (short red arrow, D = disease). By ‘mousing over’ the aspect letter, a pop-up appears to show what annotation(s) has been made (long red arrow).
Figure 4.Sample OntoMate abstract entry. The abstract has been opened by clicking on the ‘Abstract and other fields’ button underneath the title. A ‘Read by’ (red arrow) note shows that the abstract has been accessed by another user sometime before. The abstract can be entered into the main RGD database and into the curation tool interface by clicking on the bucket icon (blue arrow) above the title. Any of the hyperlinked terms can be placed in the curation tool term bucket by clicking the bucket icon (example at black arrow) to the left of the appropriate term. Terms may also be displayed in the term browser in the curation tool interface by clicking the hyperlinked term. Corresponding terms in the abstract are highlighted if the user ‘mouses over’ any of the terms listed below the abstract.