Literature DB >> 33462256

Identifying protein subcellular localisation in scientific literature using bidirectional deep recurrent neural network.

Rakesh David1, Rhys-Joshua D Menezes2, Jan De Klerk2, Ian R Castleden3, Cornelia M Hooper3, Gustavo Carneiro2, Matthew Gilliham4.   

Abstract

The increased diversity and scale of published biological data has to led to a growing appreciation for the applications of machine learning and statistical methodologies to gain new insights. Key to achieving this aim is solving the Relationship Extraction problem which specifies the semantic interaction between two or more biological entities in a published study. Here, we employed two deep neural network natural language processing (NLP) methods, namely: the continuous bag of words (CBOW), and the bi-directional long short-term memory (bi-LSTM). These methods were employed to predict relations between entities that describe protein subcellular localisation in plants. We applied our system to 1700 published Arabidopsis protein subcellular studies from the SUBA manually curated dataset. The system combines pre-processing of full-text articles in a machine-readable format with relevant sentence extraction for downstream NLP analysis. Using the SUBA corpus, the neural network classifier predicted interactions between protein name, subcellular localisation and experimental methodology with an average precision, recall rate, accuracy and F1 scores of 95.1%, 82.8%, 89.3% and 88.4% respectively (n = 30). Comparable scoring metrics were obtained using the CropPAL database as an independent testing dataset that stores protein subcellular localisation in crop species, demonstrating wide applicability of prediction model. We provide a framework for extracting protein functional features from unstructured text in the literature with high accuracy, improving data dissemination and unlocking the potential of big data text analytics for generating new hypotheses.

Entities:  

Year:  2021        PMID: 33462256     DOI: 10.1038/s41598-020-80441-8

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


  19 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Recent Advances in the Machine Learning-Based Drug-Target Interaction Prediction.

Authors:  Wen Zhang; Weiran Lin; Ding Zhang; Siman Wang; Jingwen Shi; Yanqing Niu
Journal:  Curr Drug Metab       Date:  2019       Impact factor: 3.731

3.  Finding the Subcellular Location of Barley, Wheat, Rice and Maize Proteins: The Compendium of Crop Proteins with Annotated Locations (cropPAL).

Authors:  Cornelia M Hooper; Ian R Castleden; Nader Aryamanesh; Richard P Jacoby; A Harvey Millar
Journal:  Plant Cell Physiol       Date:  2015-11-09       Impact factor: 4.927

4.  The Arabidopsis SUVR4 protein is a nucleolar histone methyltransferase with preference for monomethylated H3K9.

Authors:  Tage Thorstensen; Andreas Fischer; Silje V Sandvik; Sylvia S Johnsen; Paul E Grini; Gunter Reuter; Reidunn B Aalen
Journal:  Nucleic Acids Res       Date:  2006-10-04       Impact factor: 16.971

5.  SUBA4: the interactive data analysis centre for Arabidopsis subcellular protein locations.

Authors:  Cornelia M Hooper; Ian R Castleden; Sandra K Tanz; Nader Aryamanesh; A Harvey Millar
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

6.  Using machine learning tools for protein database biocuration assistance.

Authors:  Caroline König; Ilmira Shaim; Alfredo Vellido; Enrique Romero; René Alquézar; Jesús Giraldo
Journal:  Sci Rep       Date:  2018-07-05       Impact factor: 4.379

7.  UPCLASS: a deep learning-based classifier for UniProtKB entry publications.

Authors:  Douglas Teodoro; Julien Knafou; Nona Naderi; Emilie Pasche; Julien Gobeill; Cecilia N Arighi; Patrick Ruch
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

8.  Text mining for biology--the way forward: opinions from leading scientists.

Authors:  Russ B Altman; Casey M Bergman; Judith Blake; Christian Blaschke; Aaron Cohen; Frank Gannon; Les Grivell; Udo Hahn; William Hersh; Lynette Hirschman; Lars Juhl Jensen; Martin Krallinger; Barend Mons; Seán I O'Donoghue; Manuel C Peitsch; Dietrich Rebholz-Schuhmann; Hagit Shatkay; Alfonso Valencia
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

Review 9.  Biomedical relation extraction: from binary to complex.

Authors:  Deyu Zhou; Dayou Zhong; Yulan He
Journal:  Comput Math Methods Med       Date:  2014-08-19       Impact factor: 2.238

10.  Biophysical prediction of protein-peptide interactions and signaling networks using machine learning.

Authors:  Joseph M Cunningham; Grigoriy Koytiger; Peter K Sorger; Mohammed AlQuraishi
Journal:  Nat Methods       Date:  2020-01-06       Impact factor: 28.547

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.