Literature DB >> 32960948

Text mining for modeling of protein complexes enhanced by machine learning.

Varsha D Badal1, Petras J Kundrotas1, Ilya A Vakser1,2.   

Abstract

MOTIVATION: Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins.
RESULTS: We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. AVAILABILITYAND IMPLEMENTATION: The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Year:  2021        PMID: 32960948      PMCID: PMC8088328          DOI: 10.1093/bioinformatics/btaa823

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  39 in total

1.  Efficient extraction of protein-protein interactions from full-text articles.

Authors:  Jörg Hakenberg; Robert Leaman; Nguyen Ha Vo; Siddhartha Jonnalagadda; Ryan Sullivan; Christopher Miller; Luis Tari; Chitta Baral; Graciela Gonzalez
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2010 Jul-Sep       Impact factor: 3.710

Review 2.  Protein-protein interaction predictions using text mining methods.

Authors:  Nikolas Papanikolaou; Georgios A Pavlopoulos; Theodosios Theodosiou; Ioannis Iliopoulos
Journal:  Methods       Date:  2014-10-28       Impact factor: 3.608

3.  New advances in extracting and learning from protein-protein interactions within unstructured biomedical text data.

Authors:  J Harry Caufield; Peipei Ping
Journal:  Emerg Top Life Sci       Date:  2019-08-16

4.  BioRAT: extracting biological information from full-length papers.

Authors:  David P A Corney; Bernard F Buxton; William B Langdon; David T Jones
Journal:  Bioinformatics       Date:  2004-07-01       Impact factor: 6.937

5.  The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions.

Authors:  Rezarta Islamaj Dogan; Sun Kim; Andrew Chatr-Aryamontri; Christie S Chang; Rose Oughtred; Jennifer Rust; W John Wilbur; Donald C Comeau; Kara Dolinski; Mike Tyers
Journal:  Database (Oxford)       Date:  2017-01-10       Impact factor: 3.451

6.  Natural language processing in text mining for structural modeling of protein complexes.

Authors:  Varsha D Badal; Petras J Kundrotas; Ilya A Vakser
Journal:  BMC Bioinformatics       Date:  2018-03-05       Impact factor: 3.169

7.  An integration of deep learning with feature embedding for protein-protein interaction prediction.

Authors:  Yu Yao; Xiuquan Du; Yanyu Diao; Huaixu Zhu
Journal:  PeerJ       Date:  2019-06-17       Impact factor: 2.984

8.  ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins.

Authors:  Somnath Tagore; Alessandro Gorohovski; Lars Juhl Jensen; Milana Frenkel-Morgenstern
Journal:  PLoS Comput Biol       Date:  2019-08-22       Impact factor: 4.475

9.  Is searching full text more effective than searching abstracts?

Authors:  Jimmy Lin
Journal:  BMC Bioinformatics       Date:  2009-02-03       Impact factor: 3.169

10.  Challenges for automatically extracting molecular interactions from full-text articles.

Authors:  Tara McIntosh; James R Curran
Journal:  BMC Bioinformatics       Date:  2009-09-24       Impact factor: 3.169

View more
  1 in total

Review 1.  Natural product drug discovery in the artificial intelligence era.

Authors:  F I Saldívar-González; V D Aldas-Bulos; J L Medina-Franco; F Plisson
Journal:  Chem Sci       Date:  2021-12-13       Impact factor: 9.825

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.