Literature DB >> 24316367

Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability.

Richa Mudgal1, Ramanathan Sowdhamini2, Nagasuma Chandra3, Narayanaswamy Srinivasan4, Sankaran Sandhya3.   

Abstract

Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like "linker" sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be "plugged-into" routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold.
Copyright © 2013 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  HMM; PSSM; hidden Markov model; in silico protein design; position-specific scoring matrix; protein evolution; remote homology detection; sequence analysis

Mesh:

Substances:

Year:  2013        PMID: 24316367     DOI: 10.1016/j.jmb.2013.11.026

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  7 in total

1.  Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection.

Authors:  Gayatri Kumar; Narayanaswamy Srinivasan; Sankaran Sandhya
Journal:  Methods Mol Biol       Date:  2022

2.  NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection.

Authors:  Richa Mudgal; Sankaran Sandhya; Gayatri Kumar; Ramanathan Sowdhamini; Nagasuma R Chandra; Narayanaswamy Srinivasan
Journal:  Nucleic Acids Res       Date:  2014-09-27       Impact factor: 16.971

3.  Master Blaster: an approach to sensitive identification of remotely related proteins.

Authors:  Chintalapati Janaki; Venkatraman S Gowri; Narayanaswamy Srinivasan
Journal:  Sci Rep       Date:  2021-04-22       Impact factor: 4.379

4.  Srinivasan (1962-2021) in Bioinformatics and beyond.

Authors:  M Michael Gromiha; Christine Orengo; Ramanathan Sowdhamini; Janet Thornton
Journal:  Bioinformatics       Date:  2022-02-03       Impact factor: 6.937

5.  Diversity and prevalence of ANTAR RNAs across actinobacteria.

Authors:  Dolly Mehta; Arati Ramesh
Journal:  BMC Microbiol       Date:  2021-05-29       Impact factor: 3.605

6.  De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods.

Authors:  Richa Mudgal; Sankaran Sandhya; Nagasuma Chandra; Narayanaswamy Srinivasan
Journal:  Biol Direct       Date:  2015-07-31       Impact factor: 4.540

7.  Use of designed sequences in protein structure recognition.

Authors:  Gayatri Kumar; Richa Mudgal; Narayanaswamy Srinivasan; Sankaran Sandhya
Journal:  Biol Direct       Date:  2018-05-09       Impact factor: 4.540

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.