Literature DB >> 32960943

A novel sequence alignment algorithm based on deep learning of the protein folding code.

Mu Gao1, Jeffrey Skolnick1.   

Abstract

MOTIVATION: From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the 'twilight zone' of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent 'd'). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures.
RESULTS: To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure α-helical proteins successfully recognizes pairs of structurally related pure β-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is ∼150% better than HHsearch for generating pairwise alignments and ∼50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration.
AVAILABILITY AND IMPLEMENTATION: Datasets and source codes of SAdLSA are available free of charge for academic users at http://sites.gatech.edu/cssb/sadlsa/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Year:  2021        PMID: 32960943     DOI: 10.1093/bioinformatics/btaa810

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  Local Alignment of DNA Sequence Based on Deep Reinforcement Learning.

Authors:  Yong-Joon Song; Dong-Ho Cho
Journal:  IEEE Open J Eng Med Biol       Date:  2021-04-27

2.  DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins.

Authors:  Sutanu Bhattacharya; Rahmatullah Roche; Bernard Moussad; Debswapna Bhattacharya
Journal:  Proteins       Date:  2021-10-11

3.  Contrastive learning on protein embeddings enlightens midnight zone.

Authors:  Michael Heinzinger; Maria Littmann; Ian Sillitoe; Nicola Bordin; Christine Orengo; Burkhard Rost
Journal:  NAR Genom Bioinform       Date:  2022-06-11

4.  High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.

Authors:  Mu Gao; Peik Lund-Andersen; Alex Morehead; Sajid Mahmud; Chen Chen; Xiao Chen; Nabin Giri; Raj S Roy; Farhan Quadir; T Chad Effler; Ryan Prout; Subil Abraham; Wael Elwasif; N Quentin Haas; Jeffrey Skolnick; Jianlin Cheng; Ada Sedova
Journal:  Workshop Mach Learn HPC Environ       Date:  2021-12-27

Review 5.  The role of local versus nonlocal physicochemical restraints in determining protein native structure.

Authors:  Jeffrey Skolnick; Mu Gao
Journal:  Curr Opin Struct Biol       Date:  2020-10-28       Impact factor: 7.786

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.