Literature DB >> 21998158

An assessment of substitution scores for protein profile-profile comparison.

Xugang Ye1, Guoli Wang, Stephen F Altschul.   

Abstract

MOTIVATION: Pairwise protein sequence alignments are generally evaluated using scores defined as the sum of substitution scores for aligning amino acids to one another, and gap scores for aligning runs of amino acids in one sequence to null characters inserted into the other. Protein profiles may be abstracted from multiple alignments of protein sequences, and substitution and gap scores have been generalized to the alignment of such profiles either to single sequences or to other profiles. Although there is widespread agreement on the general form substitution scores should take for profile-sequence alignment, little consensus has been reached on how best to construct profile-profile substitution scores, and a large number of these scoring systems have been proposed. Here, we assess a variety of such substitution scores. For this evaluation, given a gold standard set of multiple alignments, we calculate the probability that a profile column yields a higher substitution score when aligned to a related than to an unrelated column. We also generalize this measure to sets of two or three adjacent columns. This simple approach has the advantages that it does not depend primarily upon the gold-standard alignment columns with the weakest empirical support, and that it does not need to fit gap and offset costs for use with each substitution score studied.
RESULTS: A simple symmetrization of mean profile-sequence scores usually performed the best. These were followed closely by several specific scoring systems constructed using a variety of rationales. CONTACT: altschul@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Substances:

Year:  2011        PMID: 21998158      PMCID: PMC3232366          DOI: 10.1093/bioinformatics/btr565

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  59 in total

1.  Gap costs for multiple sequence alignment.

Authors:  S F Altschul
Journal:  J Theor Biol       Date:  1989-06-08       Impact factor: 2.691

2.  Identification of protein sequence homology by consensus template alignment.

Authors:  W R Taylor
Journal:  J Mol Biol       Date:  1986-03-20       Impact factor: 5.469

3.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

4.  Weights for data related by a tree.

Authors:  S F Altschul; R J Carroll; D J Lipman
Journal:  J Mol Biol       Date:  1989-06-20       Impact factor: 5.469

5.  A tool for multiple sequence alignment.

Authors:  D J Lipman; S F Altschul; J D Kececioglu
Journal:  Proc Natl Acad Sci U S A       Date:  1989-06       Impact factor: 11.205

6.  Detecting homology of distantly related proteins with consensus sequences.

Authors:  L Patthy
Journal:  J Mol Biol       Date:  1987-12-20       Impact factor: 5.469

7.  Multiple sequence alignment by a pairwise algorithm.

Authors:  W R Taylor
Journal:  Comput Appl Biosci       Date:  1987-06

8.  Profile analysis: detection of distantly related proteins.

Authors:  M Gribskov; A D McLachlan; D Eisenberg
Journal:  Proc Natl Acad Sci U S A       Date:  1987-07       Impact factor: 11.205

9.  Simultaneous comparison of three protein sequences.

Authors:  M Murata; J S Richardson; J L Sussman
Journal:  Proc Natl Acad Sci U S A       Date:  1985-05       Impact factor: 11.205

10.  Progressive sequence alignment as a prerequisite to correct phylogenetic trees.

Authors:  D F Feng; R F Doolittle
Journal:  J Mol Evol       Date:  1987       Impact factor: 2.395

View more
  3 in total

1.  Dirichlet mixtures, the Dirichlet process, and the structure of protein space.

Authors:  Viet-An Nguyen; Jordan Boyd-Graber; Stephen F Altschul
Journal:  J Comput Biol       Date:  2013-01       Impact factor: 1.479

2.  BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.

Authors:  Hong-Liang Li; Yi-He Pang; Bin Liu
Journal:  Nucleic Acids Res       Date:  2021-12-16       Impact factor: 16.971

3.  ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach.

Authors:  Dimitrios P Lyras; Dirk Metzler
Journal:  BMC Bioinformatics       Date:  2014-08-07       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.