Literature DB >> 12643768

A novel complexity measure for comparative analysis of protein sequences from complete genomes.

Tannistha Nandi1, Debasis Dash, Rohit Ghai, Chandrika B-Rao, K Kannan, Samir K Brahmachari, C Ramakrishnan, Srinivasan Ramachandran.   

Abstract

Analysis of sequence complexities of proteins is an important step in the characterization and classification of new genomes. A new measure has been proposed to compute sequence complexity in protein sequences based on linguistic complexity. The algorithm requires a single parameter, is computationally simple and provides a framework for comparative genomic analysis. Protein sequences were classified into groups of high or low complexity based on a quantitative measure termed F(c), which is proportional to the fraction of low complexity sequence present in the protein. The algorithm was tested on sequences of 196 non-homologous proteins whose crystal structures are available at </=2.0 A resolution. Protein sequences of high complexity had 'globular' structures (95% agreement), whereas those of low complexity had non-globular structures (80% agreement). Application of this measure to proteins of unknown structure/function from different genomes revealed that the sequences of high complexity constitute the majority in all genomes (about 90% in Archaea, about 93% in Eubacteria, 89% in Saccharomyces cerevisiae and 90% in Caenorhabditis elegans). Aeropyrum pernix among Archaeae and Deinococcus radiodurans among Eubacteria have the lowest fraction of high complexity proteins (75% and 80% respectively). Further, it was observed that a few bacterial pathogens (Mycobacterium tuberculosis, Pseudomonas aeruginosa) have high fraction of low complexity proteins. The program ScanCom is available from the authors as a PERL script (UNIX system).

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12643768     DOI: 10.1080/07391102.2003.10506882

Source DB:  PubMed          Journal:  J Biomol Struct Dyn        ISSN: 0739-1102


  8 in total

1.  Global analysis of predicted proteomes: functional adaptation of physical properties.

Authors:  Christopher G Knight; Rees Kassen; Holger Hebestreit; Paul B Rainey
Journal:  Proc Natl Acad Sci U S A       Date:  2004-05-18       Impact factor: 11.205

2.  Effect of low-complexity regions on protein structure determination.

Authors:  Ryan M Bannen; Craig A Bingman; George N Phillips
Journal:  J Struct Funct Genomics       Date:  2008-02-27

3.  LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains.

Authors:  Sean M Cascarina; David C King; Erin Osborne Nishimura; Eric D Ross
Journal:  NAR Genom Bioinform       Date:  2021-05-26

4.  Composition-modified matrices improve identification of homologs of saccharomyces cerevisiae low-complexity glycoproteins.

Authors:  Juan E Coronado; Oliver Attie; Susan L Epstein; Wei-Gang Qiu; Peter N Lipke
Journal:  Eukaryot Cell       Date:  2006-04

5.  SPAAN: a software program for prediction of adhesins and adhesin-like proteins using neural networks.

Authors:  Gaurav Sachdeva; Kaushal Kumar; Preti Jain; Srinivasan Ramachandran
Journal:  Bioinformatics       Date:  2004-09-16       Impact factor: 6.937

6.  Simple sequence proteins in prokaryotic proteomes.

Authors:  Mekapati Bala Subramanyam; Muthiah Gnanamani; Srinivasan Ramachandran
Journal:  BMC Genomics       Date:  2006-06-08       Impact factor: 3.969

7.  Atypical structural tendencies among low-complexity domains in the Protein Data Bank proteome.

Authors:  Sean M Cascarina; Mikaela R Elder; Eric D Ross
Journal:  PLoS Comput Biol       Date:  2020-01-27       Impact factor: 4.475

8.  Proteome-scale relationships between local amino acid composition and protein fates and functions.

Authors:  Sean M Cascarina; Eric D Ross
Journal:  PLoS Comput Biol       Date:  2018-09-24       Impact factor: 4.475

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.