Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.

Literature DB >> 32797179

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.

Amelia Villegas-Morcillo¹, Stavros Makrodimitris^2,3, Roeland C H J van Ham^2,3, Angel M Gomez¹, Victoria Sanchez¹, Marcel J T Reinders^2,4.

Abstract

MOTIVATION: Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available.
RESULTS: We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining.
AVAILABILITY AND IMPLEMENTATION: Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 32797179 DOI： 10.1093/bioinformatics/btaa701

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

14 in total

1. Accurate protein function prediction via graph attention networks with predicted structure information.

Authors: Boqiao Lai; Jinbo Xu
Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622

Review 2. Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies.

Authors: Rahmad Akbar; Habib Bashour; Puneet Rawat; Philippe A Robert; Eva Smorodina; Tudor-Stefan Cotet; Karine Flem-Karlsen; Robert Frank; Brij Bhushan Mehta; Mai Ha Vu; Talip Zengin; Jose Gutierrez-Marcos; Fridtjof Lund-Johansen; Jan Terje Andersen; Victor Greiff
Journal: MAbs Date: 2022 Jan-Dec Impact factor: 5.857

3. LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction.

Authors: Zichen Wang; Steven A Combs; Ryan Brand; Miguel Romero Calvo; Panpan Xu; George Price; Nataliya Golovach; Emmanuel O Salawu; Colby J Wise; Sri Priya Ponnapalli; Peter M Clark
Journal: Sci Rep Date: 2022-04-27 Impact factor: 4.996

4. Contrastive learning on protein embeddings enlightens midnight zone.

Authors: Michael Heinzinger; Maria Littmann; Ian Sillitoe; Nicola Bordin; Christine Orengo; Burkhard Rost
Journal: NAR Genom Bioinform Date: 2022-06-11

Review 5. Deep Learning Concepts and Applications for Synthetic Biology.

Authors: William A V Beardall; Guy-Bart Stan; Mary J Dunlop
Journal: GEN Biotechnol Date: 2022-08-18

Review 6. Machine learning for enzyme engineering, selection and design.

Authors: Ryan Feehan; Daniel Montezano; Joanna S G Slusky
Journal: Protein Eng Des Sel Date: 2021-02-15 Impact factor: 1.952

Review 7. Protein Design with Deep Learning.

Authors: Marianne Defresne; Sophie Barbe; Thomas Schiex
Journal: Int J Mol Sci Date: 2021-10-29 Impact factor: 5.923

Review 8. MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning.

Authors: Haozheng Li; Yihe Pang; Bin Liu; Liang Yu
Journal: Front Pharmacol Date: 2022-03-08 Impact factor: 5.810

Review 9. Representation learning applications in biological sequence analysis.

Authors: Hitoshi Iuchi; Taro Matsutani; Keisuke Yamada; Natsuki Iwano; Shunsuke Sumi; Shion Hosoda; Shitao Zhao; Tsukasa Fukunaga; Michiaki Hamada
Journal: Comput Struct Biotechnol J Date: 2021-05-23 Impact factor: 7.271

Review 10. Automatic Gene Function Prediction in the 2020's.

Authors: Stavros Makrodimitris; Roeland C H J van Ham; Marcel J T Reinders
Journal: Genes (Basel) Date: 2020-10-27 Impact factor: 4.096