| Literature DB >> 28948232 |
Aravind Venkatesan1, Jee-Hyub Kim1, Francesco Talo1, Michele Ide-Smith1, Julien Gobeill2, Jacob Carter3, Riza Batista-Navarro3, Sophia Ananiadou3, Patrick Ruch2,4, Johanna McEntyre1.
Abstract
The tremendous growth in biological data has resulted in an increase in the number of research papers being published. This presents a great challenge for scientists in searching and assimilating facts described in those papers. Particularly, biological databases depend on curators to add highly precise and useful information that are usually extracted by reading research articles. Therefore, there is an urgent need to find ways to improve linking literature to the underlying data, thereby minimising the effort in browsing content and identifying key biological concepts. As part of the development of Europe PMC, we have developed a new platform, SciLite, which integrates text-mined annotations from different sources and overlays those outputs on research articles. The aim is to aid researchers and curators using Europe PMC in finding key concepts more easily and provide links to related resources or tools, bridging the gap between literature and biological data.Entities:
Keywords: Biocuration; Data Integration; Open Access; RDF; SPARQL; SciLite; Semantic Web; Text-Mining; Web Annotations
Year: 2017 PMID: 28948232 PMCID: PMC5527546 DOI: 10.12688/wellcomeopenres.10210.2
Source DB: PubMed Journal: Wellcome Open Res ISSN: 2398-502X
Figure 1. Overview of how text mining results are incorporated into SciLite.
Figure 2. The figure illustrates a sample annotation of protein MMP9 described in an article ( PMC4676863):
the figure lists the vocabularies used to represent the text-mined annotations. The annotation consists of a link for the tagged entity (Body - UniProt: P52176) and the mentions of the entity (Target) in the text snippet. The text is represented by: prefix – the text that occurs before the tagged entity; exact – tagged entity itself ( MMP9); and postfix – the text snippet that occurs after the tagged entity.
Figure 3. An illustration of a sample GeneRIF (gene function) annotation ( PMC4676863):
the figure lists the vocabularies used to represent the annotation. The annotation consists of: Body - text phrase about protein mTOR and a target - data source link for the described protein (UniProt: P09237).
Figure 5. The screenshot shows the front-end rendering of various annotation types for an article on Europe PMC.
Figure 6. A screenshot showing the 3D molecular structure for a given PDB accession number.
Figure 4. An illustration of the semi-automated feedback mechanism to improve annotations.
Erroneous annotations reported by users is used to prepare a report by the helpdesk at Europe PMC. This report is used to perform: a) a quick fix by deleting the particular annotation; b) further the reports are used to refine the text-mining algorithms in the longer term.