Literature DB >> 26220461

Using distant supervised learning to identify protein subcellular localizations from full-text scientific articles.

Wu Zheng1, Catherine Blake2.   

Abstract

Databases of curated biomedical knowledge, such as the protein-locations reflected in the UniProtKB database, provide an accurate and useful resource to researchers and decision makers. Our goal is to augment the manual efforts currently used to curate knowledge bases with automated approaches that leverage the increased availability of full-text scientific articles. This paper describes experiments that use distant supervised learning to identify protein subcellular localizations, which are important to understand protein function and to identify candidate drug targets. Experiments consider Swiss-Prot, the manually annotated subset of the UniProtKB protein knowledge base, and 43,000 full-text articles from the Journal of Biological Chemistry that contain just under 11.5 million sentences. The system achieves 0.81 precision and 0.49 recall at sentence level and an accuracy of 57% on held-out instances in a test set. Moreover, the approach identifies 8210 instances that are not in the UniProtKB knowledge base. Manual inspection of the 50 most likely relations showed that 41 (82%) were valid. These results have immediate benefit to researchers interested in protein function, and suggest that distant supervision should be explored to complement other manual data curation efforts.
Copyright © 2015 Elsevier Inc. All rights reserved.

Keywords:  BioNLP; Distant supervised learning; Protein subcellular localization extraction; Relation extraction; Text mining

Mesh:

Substances:

Year:  2015        PMID: 26220461     DOI: 10.1016/j.jbi.2015.07.013

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  4 in total

1.  Improving chemical disease relation extraction with rich features and weakly labeled data.

Authors:  Yifan Peng; Chih-Hsuan Wei; Zhiyong Lu
Journal:  J Cheminform       Date:  2016-10-07       Impact factor: 5.514

2.  LocText: relation extraction of protein localizations to assist database curation.

Authors:  Juan Miguel Cejuela; Shrikant Vinchurkar; Tatyana Goldberg; Madhukar Sollepura Prabhu Shankar; Ashish Baghudana; Aleksandar Bojchevski; Carsten Uhlig; André Ofner; Pandu Raharja-Liu; Lars Juhl Jensen; Burkhard Rost
Journal:  BMC Bioinformatics       Date:  2018-01-17       Impact factor: 3.169

3.  Using distant supervision to augment manually annotated data for relation extraction.

Authors:  Peng Su; Gang Li; Cathy Wu; K Vijay-Shanker
Journal:  PLoS One       Date:  2019-07-30       Impact factor: 3.240

4.  Identifying protein subcellular localisation in scientific literature using bidirectional deep recurrent neural network.

Authors:  Rakesh David; Rhys-Joshua D Menezes; Jan De Klerk; Ian R Castleden; Cornelia M Hooper; Gustavo Carneiro; Matthew Gilliham
Journal:  Sci Rep       Date:  2021-01-18       Impact factor: 4.379

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.