Literature DB >> 10813826

Large-scale comparison of protein sequence alignment algorithms with structure alignments.

J M Sauder1, J W Arthur, R L Dunbrack.   

Abstract

Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11:739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling. Copyright 2000 Wiley-Liss, Inc.

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 10813826     DOI: 10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  77 in total

1.  Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments.

Authors:  I Friedberg; T Kaplan; H Margalit
Journal:  Protein Sci       Date:  2000-11       Impact factor: 6.725

2.  A database and tools for 3-D protein structure comparison and alignment using the Combinatorial Extension (CE) algorithm.

Authors:  I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

3.  Including biological literature improves homology search.

Authors:  J T Chang; S Raychaudhuri; R B Altman
Journal:  Pac Symp Biocomput       Date:  2001

4.  A comparison of position-specific score matrices based on sequence and structure alignments.

Authors:  Anna R Panchenko; Stephen H Bryant
Journal:  Protein Sci       Date:  2002-02       Impact factor: 6.725

Review 5.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

Authors:  A A Schäffer; L Aravind; T L Madden; S Shavirin; J L Spouge; Y I Wolf; E V Koonin; S F Altschul
Journal:  Nucleic Acids Res       Date:  2001-07-15       Impact factor: 16.971

6.  Pcons: a neural-network-based consensus predictor that improves fold recognition.

Authors:  J Lundström; L Rychlewski; J Bujnicki; A Elofsson
Journal:  Protein Sci       Date:  2001-11       Impact factor: 6.725

7.  Cyclic coordinate descent: A robotics algorithm for protein loop closure.

Authors:  Adrian A Canutescu; Roland L Dunbrack
Journal:  Protein Sci       Date:  2003-05       Impact factor: 6.725

8.  Sequence variations within protein families are linearly related to structural variations.

Authors:  Patrice Koehl; Michael Levitt
Journal:  J Mol Biol       Date:  2002-10-25       Impact factor: 5.469

9.  Analysis of protein sequence/structure similarity relationships.

Authors:  Hin Hark Gan; Rebecca A Perlow; Sharmili Roy; Joy Ko; Min Wu; Jing Huang; Shixiang Yan; Angelo Nicoletta; Jonathan Vafai; Ding Sun; Lihua Wang; Joyce E Noah; Samuela Pasquali; Tamar Schlick
Journal:  Biophys J       Date:  2002-11       Impact factor: 4.033

10.  MTRAP: pairwise sequence alignment algorithm by a new measure based on transition probability between two consecutive pairs of residues.

Authors:  Toshihide Hara; Keiko Sato; Masanori Ohya
Journal:  BMC Bioinformatics       Date:  2010-05-08       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.