Literature DB >> 2315319

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

S Karlin1, S F Altschul.   

Abstract

An unusual pattern in a nucleic acid or protein sequence or a region of strong similarity shared by two or more sequences may have biological significance. It is therefore desirable to know whether such a pattern can have arisen simply by chance. To identify interesting sequence patterns, appropriate scoring values can be assigned to the individual residues of a single sequence or to sets of residues when several sequences are compared. For single sequences, such scores can reflect biophysical properties such as charge, volume, hydrophobicity, or secondary structure potential; for multiple sequences, they can reflect nucleotide or amino acid similarity measured in a wide variety of ways. Using an appropriate random model, we present a theory that provides precise numerical formulas for assessing the statistical significance of any region with high aggregate score. A second class of results describes the composition of high-scoring segments. In certain contexts, these permit the choice of scoring systems which are "optimal" for distinguishing biologically relevant patterns. Examples are given of applications of the theory to a variety of protein sequences, highlighting segments with unusual biological features. These include distinctive charge regions in transcription factors and protooncogene products, pronounced hydrophobic segments in various receptor and transport proteins, and statistically significant subalignments involving the recently characterized cystic fibrosis gene.

Entities:  

Mesh:

Substances:

Year:  1990        PMID: 2315319      PMCID: PMC53667          DOI: 10.1073/pnas.87.6.2264

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  24 in total

1.  The mas oncogene encodes an angiotensin receptor.

Authors:  T R Jackson; L A Blair; J Marshall; M Goedert; M R Hanley
Journal:  Nature       Date:  1988-09-29       Impact factor: 49.962

2.  A gene activated by growth factors is related to the oncogene v-jun.

Authors:  K Ryder; L F Lau; D Nathans
Journal:  Proc Natl Acad Sci U S A       Date:  1988-03       Impact factor: 11.205

3.  On the PAM matrix model of protein evolution.

Authors:  W J Wilbur
Journal:  Mol Biol Evol       Date:  1985-09       Impact factor: 16.240

4.  Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix.

Authors:  J L Risler; M O Delorme; H Delacroix; A Henaut
Journal:  J Mol Biol       Date:  1988-12-20       Impact factor: 5.469

5.  Efficient algorithms for molecular sequence analysis.

Authors:  S Karlin; M Morris; G Ghandour; M Y Leung
Journal:  Proc Natl Acad Sci U S A       Date:  1988-02       Impact factor: 11.205

Review 6.  Fos-associated protein p39 is the product of the jun proto-oncogene.

Authors:  F J Rauscher; D R Cohen; T Curran; T J Bos; P K Vogt; D Bohmann; R Tjian; B R Franza
Journal:  Science       Date:  1988-05-20       Impact factor: 47.728

7.  Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage.

Authors:  S F Altschul; B W Erickson
Journal:  Mol Biol Evol       Date:  1985-11       Impact factor: 16.240

8.  Cloning of the human cDNA for the U1 RNA-associated 70K protein.

Authors:  H Theissen; M Etzerodt; R Reuter; C Schneider; F Lottspeich; P Argos; R Lührmann; L Philipson
Journal:  EMBO J       Date:  1986-12-01       Impact factor: 11.598

9.  Structure and sequence of the Drosophila zeste gene.

Authors:  V Pirrotta; E Manet; E Hardon; S E Bickel; M Benson
Journal:  EMBO J       Date:  1987-03       Impact factor: 11.598

View more
  380 in total

1.  Diversity of functions of proteins with internal symmetry in spatial arrangement of secondary structural elements.

Authors:  K Kinoshita; A Kidera; N Go
Journal:  Protein Sci       Date:  1999-06       Impact factor: 6.725

2.  The estimation of statistical parameters for local alignment score distributions.

Authors:  S F Altschul; R Bundschuh; R Olsen; T Hwa
Journal:  Nucleic Acids Res       Date:  2001-01-15       Impact factor: 16.971

3.  ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches.

Authors:  T Rognes
Journal:  Nucleic Acids Res       Date:  2001-04-01       Impact factor: 16.971

4.  Reevaluation of the determinants of tyrosine sulfation.

Authors:  H B Nicholas; S S Chan; G L Rosenquist
Journal:  Endocrine       Date:  1999-12       Impact factor: 3.633

5.  Genome analysis: Assigning protein coding regions to three-dimensional structures.

Authors:  A A Salamov; M Suwa; C A Orengo; M B Swindells
Journal:  Protein Sci       Date:  1999-04       Impact factor: 6.725

6.  Molecular characterization of bacterial populations in petroleum-contaminated groundwater discharged from underground crude oil storage cavities.

Authors:  K Watanabe; K Watanabe; Y Kodama; K Syutsubo; S Harayama
Journal:  Appl Environ Microbiol       Date:  2000-11       Impact factor: 4.792

Review 7.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

Authors:  A A Schäffer; L Aravind; T L Madden; S Shavirin; J L Spouge; Y I Wolf; E V Koonin; S F Altschul
Journal:  Nucleic Acids Res       Date:  2001-07-15       Impact factor: 16.971

8.  BALSA: Bayesian algorithm for local sequence alignment.

Authors:  Bobbie-Jo M Webb; Jun S Liu; Charles E Lawrence
Journal:  Nucleic Acids Res       Date:  2002-03-01       Impact factor: 16.971

Review 9.  Comparative genomics of plant chromosomes.

Authors:  A H Paterson; J E Bowers; M D Burow; X Draye; C G Elsik; C X Jiang; C S Katsar; T H Lan; Y R Lin; R Ming; R J Wright
Journal:  Plant Cell       Date:  2000-09       Impact factor: 11.277

10.  Use of residue pairs in protein sequence-sequence and sequence-structure alignments.

Authors:  J Jung; B Lee
Journal:  Protein Sci       Date:  2000-08       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.