Literature DB >> 32797179

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.

Amelia Villegas-Morcillo1, Stavros Makrodimitris2,3, Roeland C H J van Ham2,3, Angel M Gomez1, Victoria Sanchez1, Marcel J T Reinders2,4.   

Abstract

MOTIVATION: Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available.
RESULTS: We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining.
AVAILABILITY AND IMPLEMENTATION: Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Year:  2021        PMID: 32797179     DOI: 10.1093/bioinformatics/btaa701

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  14 in total

1.  Accurate protein function prediction via graph attention networks with predicted structure information.

Authors:  Boqiao Lai; Jinbo Xu
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

Review 2.  Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies.

Authors:  Rahmad Akbar; Habib Bashour; Puneet Rawat; Philippe A Robert; Eva Smorodina; Tudor-Stefan Cotet; Karine Flem-Karlsen; Robert Frank; Brij Bhushan Mehta; Mai Ha Vu; Talip Zengin; Jose Gutierrez-Marcos; Fridtjof Lund-Johansen; Jan Terje Andersen; Victor Greiff
Journal:  MAbs       Date:  2022 Jan-Dec       Impact factor: 5.857

3.  LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction.

Authors:  Zichen Wang; Steven A Combs; Ryan Brand; Miguel Romero Calvo; Panpan Xu; George Price; Nataliya Golovach; Emmanuel O Salawu; Colby J Wise; Sri Priya Ponnapalli; Peter M Clark
Journal:  Sci Rep       Date:  2022-04-27       Impact factor: 4.996

4.  Contrastive learning on protein embeddings enlightens midnight zone.

Authors:  Michael Heinzinger; Maria Littmann; Ian Sillitoe; Nicola Bordin; Christine Orengo; Burkhard Rost
Journal:  NAR Genom Bioinform       Date:  2022-06-11

Review 5.  Deep Learning Concepts and Applications for Synthetic Biology.

Authors:  William A V Beardall; Guy-Bart Stan; Mary J Dunlop
Journal:  GEN Biotechnol       Date:  2022-08-18

Review 6.  Machine learning for enzyme engineering, selection and design.

Authors:  Ryan Feehan; Daniel Montezano; Joanna S G Slusky
Journal:  Protein Eng Des Sel       Date:  2021-02-15       Impact factor: 1.952

Review 7.  Protein Design with Deep Learning.

Authors:  Marianne Defresne; Sophie Barbe; Thomas Schiex
Journal:  Int J Mol Sci       Date:  2021-10-29       Impact factor: 5.923

Review 8.  MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning.

Authors:  Haozheng Li; Yihe Pang; Bin Liu; Liang Yu
Journal:  Front Pharmacol       Date:  2022-03-08       Impact factor: 5.810

Review 9.  Representation learning applications in biological sequence analysis.

Authors:  Hitoshi Iuchi; Taro Matsutani; Keisuke Yamada; Natsuki Iwano; Shunsuke Sumi; Shion Hosoda; Shitao Zhao; Tsukasa Fukunaga; Michiaki Hamada
Journal:  Comput Struct Biotechnol J       Date:  2021-05-23       Impact factor: 7.271

Review 10.  Automatic Gene Function Prediction in the 2020's.

Authors:  Stavros Makrodimitris; Roeland C H J van Ham; Marcel J T Reinders
Journal:  Genes (Basel)       Date:  2020-10-27       Impact factor: 4.096

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.