Literature DB >> 32517697

Amino acid encoding for deep learning applications.

Hesham ElAbd1, Yana Bromberg2,3,4, Adrienne Hoarfrost2, Tobias Lenz5, Andre Franke6, Mareike Wendorff1.   

Abstract

BACKGROUND: The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction - a process called 'end-to-end learning' - has led to state-of-the-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN.
RESULTS: By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension.
CONCLUSION: Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.

Entities:  

Keywords:  Amino acid encoding; Amino acids embedding; Convoluted-neural network (CNN); Deep-learning; HLA-II peptide interaction; Human-leukocyte antigen (HLA); Machine-learning (ML); Protein-protein interaction (PPI); Recurrent neural network (RNN)

Year:  2020        PMID: 32517697     DOI: 10.1186/s12859-020-03546-x

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  10 in total

Review 1.  A roadmap for multi-omics data integration using deep learning.

Authors:  Mingon Kang; Euiseong Ko; Tesfaye B Mersha
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

2.  BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA-miRNA interaction prediction.

Authors:  Muhammad Nabeel Asim; Muhammad Ali Ibrahim; Christoph Zehe; Johan Trygg; Andreas Dengel; Sheraz Ahmed
Journal:  Interdiscip Sci       Date:  2022-08-10       Impact factor: 3.492

3.  Identification of all-against-all protein-protein interactions based on deep hash learning.

Authors:  Yue Jiang; Yuxuan Wang; Lin Shen; Donald A Adjeroh; Zhidong Liu; Jie Lin
Journal:  BMC Bioinformatics       Date:  2022-07-08       Impact factor: 3.307

4.  SperoPredictor: An Integrated Machine Learning and Molecular Docking-Based Drug Repurposing Framework With Use Case of COVID-19.

Authors:  Faheem Ahmed; Jae Wook Lee; Anupama Samantasinghar; Young Su Kim; Kyung Hwan Kim; In Suk Kang; Fida Hussain Memon; Jong Hwan Lim; Kyung Hyun Choi
Journal:  Front Public Health       Date:  2022-06-16

Review 5.  Deep generative models for peptide design.

Authors:  Fangping Wan; Daphne Kontogiorgos-Heintz; Cesar de la Fuente-Nunez
Journal:  Digit Discov       Date:  2022-03-31

6.  Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter.

Authors:  A Hoarfrost; A Aptekmann; G Farfañuk; Y Bromberg
Journal:  Nat Commun       Date:  2022-05-11       Impact factor: 17.694

7.  DeepNOG: Fast and accurate protein orthologous group assignment.

Authors:  Roman Feldbauer; Lukas Gosch; Lukas Lüftinger; Patrick Hyden; Arthur Flexer; Thomas Rattei
Journal:  Bioinformatics       Date:  2020-12-26       Impact factor: 6.937

Review 8.  A review on compound-protein interaction prediction methods: Data, format, representation and model.

Authors:  Sangsoo Lim; Yijingxiu Lu; Chang Yun Cho; Inyoung Sung; Jungwoo Kim; Youngkuk Kim; Sungjoon Park; Sun Kim
Journal:  Comput Struct Biotechnol J       Date:  2021-03-10       Impact factor: 7.271

9.  NIFtHool: an informatics program for identification of NifH proteins using deep neural networks.

Authors:  Jefferson Daniel Suquilanda-Pesántez; Evelyn Dayana Aguiar Salazar; Diego Almeida-Galárraga; Graciela Salum; Fernando Villalba-Meneses; Marco Esteban Gudiño Gomezjurado
Journal:  F1000Res       Date:  2022-02-09

10.  Organizing the bacterial annotation space with amino acid sequence embeddings.

Authors:  Susanna R Grigson; Jody C McKerral; James G Mitchell; Robert A Edwards
Journal:  BMC Bioinformatics       Date:  2022-09-23       Impact factor: 3.307

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.