Literature DB >> 15333459

A new algorithm for detecting low-complexity regions in protein sequences.

Sung W Shin1, Sam M Kim.   

Abstract

MOTIVATION: Pair-wise alignment of protein sequences and local similarity searches produce many false positives because of compositionally biased regions, also called low-complexity regions (LCRs), of amino acid residues. Masking and filtering such regions significantly improves the reliability of homology searches and, consequently, functional predictions. Most of the available algorithms are based on a statistical approach. We wished to investigate the structural properties of LCRs in biological sequences and develop an algorithm for filtering them.
RESULTS: We present an algorithm for detecting and masking LCRs in protein sequences to improve the quality of database searches. We developed the algorithm based on the complexity analysis of subsequences delimited by a pair of identical, repeating subsequences. Given a protein sequence, the algorithm first computes the suffix tree of the sequence. It then collects repeating subsequences from the tree. Finally, the algorithm iteratively tests whether each subsequence delimited by a pair of repeating subsequences meets a given criteria. Test results with 1000 proteins from 20 families in Pfam show that the repeating subsequences are a good indicator for the low-complexity regions, and the algorithm based on such structural information strongly compete with others. AVAILABILITY: http://bioinfo.knu.ac.kr/research/CARD/ CONTACT: swshin@bioinfo.knu.ac.kr

Mesh:

Substances:

Year:  2004        PMID: 15333459     DOI: 10.1093/bioinformatics/bth497

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  10 in total

1.  Effect of low-complexity regions on protein structure determination.

Authors:  Ryan M Bannen; Craig A Bingman; George N Phillips
Journal:  J Struct Funct Genomics       Date:  2008-02-27

2.  LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains.

Authors:  Sean M Cascarina; David C King; Erin Osborne Nishimura; Eric D Ross
Journal:  NAR Genom Bioinform       Date:  2021-05-26

3.  Understanding and identifying amino acid repeats.

Authors:  Hong Luo; Harm Nijveen
Journal:  Brief Bioinform       Date:  2014-07       Impact factor: 11.622

4.  Parasite infection of public databases: a data mining approach to identify apicomplexan contaminations in animal genome and transcriptome assemblies.

Authors:  Janus Borner; Thorsten Burmester
Journal:  BMC Genomics       Date:  2017-01-19       Impact factor: 3.969

5.  Disentangling the complexity of low complexity proteins.

Authors:  Pablo Mier; Lisanna Paladin; Stella Tamana; Sophia Petrosian; Borbála Hajdu-Soltész; Annika Urbanek; Aleksandra Gruca; Dariusz Plewczynski; Marcin Grynberg; Pau Bernadó; Zoltán Gáspári; Christos A Ouzounis; Vasilis J Promponas; Andrey V Kajava; John M Hancock; Silvio C E Tosatto; Zsuzsanna Dosztanyi; Miguel A Andrade-Navarro
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

6.  Atypical structural tendencies among low-complexity domains in the Protein Data Bank proteome.

Authors:  Sean M Cascarina; Mikaela R Elder; Eric D Ross
Journal:  PLoS Comput Biol       Date:  2020-01-27       Impact factor: 4.475

7.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions.

Authors:  Jaina Mistry; Robert D Finn; Sean R Eddy; Alex Bateman; Marco Punta
Journal:  Nucleic Acids Res       Date:  2013-04-17       Impact factor: 16.971

8.  iPDA: integrated protein disorder analyzer.

Authors:  Chung-Tsai Su; Chien-Yu Chen; Chen-Ming Hsu
Journal:  Nucleic Acids Res       Date:  2007-06-06       Impact factor: 16.971

9.  Sequence complexity of amyloidogenic regions in intrinsically disordered human proteins.

Authors:  Swagata Das; Uttam Pal; Supriya Das; Khyati Bagga; Anupam Roy; Arpita Mrigwani; Nakul C Maiti
Journal:  PLoS One       Date:  2014-03-03       Impact factor: 3.240

10.  Proteome-scale relationships between local amino acid composition and protein fates and functions.

Authors:  Sean M Cascarina; Eric D Ross
Journal:  PLoS Comput Biol       Date:  2018-09-24       Impact factor: 4.475

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.