Literature DB >> 8483166

A protein alignment scoring system sensitive at all evolutionary distances.

S F Altschul1.   

Abstract

Protein sequence alignments generally are constructed with the aid of a "substitution matrix" that specifies a score for aligning each pair of amino acids. Assuming a simple random protein model, it can be shown that any such matrix, when used for evaluating variable-length local alignments, is implicitly a "log-odds" matrix, with a specific probability distribution for amino acid pairs to which it is uniquely tailored. Given a model of protein evolution from which such distributions may be derived, a substitution matrix adapted to detecting relationships at any chosen evolutionary distance can be constructed. Because in a database search it generally is not known a priori what evolutionary distances will characterize the similarities found, it is necessary to employ an appropriate range of matrices in order not to overlook potential homologies. This paper formalizes this concept by defining a scoring system that is sensitive at all detectable evolutionary distances. The statistical behavior of this scoring system is analyzed, and it is shown that for a typical protein database search, estimating the originally unknown evolutionary distance appropriate to each alignment costs slightly over two bits of information, or somewhat less than a factor of five in statistical significance. A much greater cost may be incurred, however, if only a single substitution matrix, corresponding to the wrong evolutionary distance, is employed.

Mesh:

Substances:

Year:  1993        PMID: 8483166     DOI: 10.1007/bf00160485

Source DB:  PubMed          Journal:  J Mol Evol        ISSN: 0022-2844            Impact factor:   2.395


  33 in total

1.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

2.  Amino acid sequence of a globin from the sea cucumber Caudina (Molpadia) arenicola.

Authors:  F Mauri; J Omnaas; L Davidson; C Whitfill; G B Kitto
Journal:  Biochim Biophys Acta       Date:  1991-05-30

3.  The rapid generation of mutation data matrices from protein sequences.

Authors:  D T Jones; W R Taylor; J M Thornton
Journal:  Comput Appl Biosci       Date:  1992-06

4.  Exhaustive matching of the entire protein sequence database.

Authors:  G H Gonnet; M A Cohen; S A Benner
Journal:  Science       Date:  1992-06-05       Impact factor: 47.728

5.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

6.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

7.  Profile analysis: detection of distantly related proteins.

Authors:  M Gribskov; A D McLachlan; D Eisenberg
Journal:  Proc Natl Acad Sci U S A       Date:  1987-07       Impact factor: 11.205

8.  The significance of protein sequence similarities.

Authors:  J F Collins; A F Coulson; A Lyall
Journal:  Comput Appl Biosci       Date:  1988-03

9.  On the PAM matrix model of protein evolution.

Authors:  W J Wilbur
Journal:  Mol Biol Evol       Date:  1985-09       Impact factor: 16.240

View more
  31 in total

Review 1.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

Authors:  A A Schäffer; L Aravind; T L Madden; S Shavirin; J L Spouge; Y I Wolf; E V Koonin; S F Altschul
Journal:  Nucleic Acids Res       Date:  2001-07-15       Impact factor: 16.971

2.  Comparative maps of human 19p13.3 and mouse chromosome 10 allow identification of sequences at evolutionary breakpoints.

Authors:  R Puttagunta; L A Gordon; G E Meyer; D Kapfhamer; J E Lamerdin; P Kantheti; K M Portman; W K Chung; D E Jenne; A S Olsen; M Burmeister
Journal:  Genome Res       Date:  2000-09       Impact factor: 9.043

3.  A strategy to retrieve the whole set of protein modules in microbial proteomes.

Authors:  Stéphanie Le Bouder-Langevin; Isabelle Capron-Montaland; Renaud De Rosa; Bernard Labedan
Journal:  Genome Res       Date:  2002-12       Impact factor: 9.043

4.  The compositional adjustment of amino acid substitution matrices.

Authors:  Yi-Kuo Yu; John C Wootton; Stephen F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  2003-12-08       Impact factor: 11.205

5.  Sequence conserved for subcellular localization.

Authors:  Rajesh Nair; Burkhard Rost
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

6.  Evolution of an autotransporter: domain shuffling and lateral transfer from pathogenic Haemophilus to Neisseria.

Authors:  J Davis; A L Smith; W R Hughes; M Golomb
Journal:  J Bacteriol       Date:  2001-08       Impact factor: 3.490

7.  Harnessing Next Generation Sequencing in Climate Change: RNA-Seq Analysis of Heat Stress-Responsive Genes in Wheat (Triticum aestivum L.).

Authors:  Ranjeet R Kumar; Suneha Goswami; Sushil K Sharma; Yugal K Kala; Gyanendra K Rai; Dwijesh C Mishra; Monendra Grover; Gyanendra P Singh; Himanshu Pathak; Anil Rai; Viswanathan Chinnusamy; Raj D Rai
Journal:  OMICS       Date:  2015-09-25

Review 8.  Protein database searches using compositionally adjusted substitution matrices.

Authors:  Stephen F Altschul; John C Wootton; E Michael Gertz; Richa Agarwala; Aleksandr Morgulis; Alejandro A Schäffer; Yi-Kuo Yu
Journal:  FEBS J       Date:  2005-10       Impact factor: 5.542

9.  Splitting the BLOSUM score into numbers of biological significance.

Authors:  Francesco Fabris; Andrea Sgarro; Alessandro Tossi
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

10.  Comparison of methods for searching protein sequence databases.

Authors:  W R Pearson
Journal:  Protein Sci       Date:  1995-06       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.