Literature DB >> 16646830

The relative inefficiency of sequence weights approaches in determining a nucleotide position weight matrix.

Lee A Newberg1, Lee Ann McCue, Charles E Lawrence.   

Abstract

Approaches based upon sequence weights, to construct a position weight matrix of nucleotides from aligned inputs, are popular but little effort has been expended to measure their quality. We derive optimal sequence weights that minimize the sum of the variances of the estimators of base frequency parameters for sequences related by a phylogenetic tree. Using these we find that approaches based upon sequence weights can perform very poorly in comparison to approaches based upon a theoretically optimal maximum-likelihood method in the inference of the parameters of a position-weight matrix. Specifically, we find that among a collection of primate sequences, even an optimal sequences-weights approach is only 51% as efficient as the maximum-likelihood approach in inferences of base frequency parameters. We also show how to employ the variance estimators to obtain a greedy ordering of species for sequencing. Application of this ordering for the weighted estimators to a primate collection yields a curve with a long plateau that is not observed with maximum-likelihood estimators. This plateau indicates that the use of weighted estimators on these data seriously limits the utility of obtaining the sequences of more than two or three additional species.

Year:  2005        PMID: 16646830      PMCID: PMC1479456          DOI: 10.2202/1544-6115.1135

Source DB:  PubMed          Journal:  Stat Appl Genet Mol Biol        ISSN: 1544-6115


  19 in total

1.  The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons.

Authors:  Nikolaus Rajewsky; Nicholas D Socci; Martin Zapotocky; Eric D Siggia
Journal:  Genome Res       Date:  2002-02       Impact factor: 9.043

2.  Optimal classification of protein sequences and selection of representative sets from multiple alignments: application to homologous families and lessons for structural genomics.

Authors:  A C May
Journal:  Protein Eng       Date:  2001-04

3.  Additivity in protein-DNA interactions: how good an approximation is it?

Authors:  Panayiotis V Benos; Martha L Bulyk; Gary D Stormo
Journal:  Nucleic Acids Res       Date:  2002-10-15       Impact factor: 16.971

4.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences.

Authors:  C E Lawrence; A A Reilly
Journal:  Proteins       Date:  1990

5.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation.

Authors:  P R Sibbald; P Argos
Journal:  J Mol Biol       Date:  1990-12-20       Impact factor: 5.469

6.  Weights for data related by a tree.

Authors:  S F Altschul; R J Carroll; D J Lipman
Journal:  J Mol Biol       Date:  1989-06-20       Impact factor: 5.469

7.  A fast and sensitive multiple sequence alignment algorithm.

Authors:  M Vingron; P Argos
Journal:  Comput Appl Biosci       Date:  1989-04

8.  Molecular phylogeny of Old World monkeys (Cercopithecidae) as inferred from gamma-globin DNA sequences.

Authors:  S L Page; Ch Chiu; M Goodman
Journal:  Mol Phylogenet Evol       Date:  1999-11       Impact factor: 4.286

9.  Phylogenetic shadowing of primate sequences to find functional regions of the human genome.

Authors:  Dario Boffelli; Jon McAuliffe; Dmitriy Ovcharenko; Keith D Lewis; Ivan Ovcharenko; Lior Pachter; Edward M Rubin
Journal:  Science       Date:  2003-02-28       Impact factor: 47.728

10.  Factors influencing the identification of transcription factor binding sites by cross-species comparison.

Authors:  Lee Ann McCue; William Thompson; C Steven Carmack; Charles E Lawrence
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

View more
  3 in total

1.  A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction.

Authors:  Lee A Newberg; William A Thompson; Sean Conlan; Thomas M Smith; Lee Ann McCue; Charles E Lawrence
Journal:  Bioinformatics       Date:  2007-05-08       Impact factor: 6.937

2.  Constructing a meaningful evolutionary average at the phylogenetic center of mass.

Authors:  Eric A Stone; Arend Sidow
Journal:  BMC Bioinformatics       Date:  2007-06-26       Impact factor: 3.169

3.  Phylogenetic weighting does little to improve the accuracy of evolutionary coupling analyses.

Authors:  Adam J Hockenberry; Claus O Wilke
Journal:  Entropy (Basel)       Date:  2019-10-12       Impact factor: 2.524

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.