Literature DB >> 18305160

Centroid estimation in discrete high-dimensional spaces with applications in biology.

Luis E Carvalho1, Charles E Lawrence.   

Abstract

Maximum likelihood estimators and other direct optimization-based estimators dominated statistical estimation and prediction for decades. Yet, the principled foundations supporting their dominance do not apply to the discrete high-dimensional inference problems of the 21st century. As it is well known, statistical decision theory shows that maximum likelihood and related estimators use data only to identify the single most probable solution. Accordingly, unless this one solution so dominates the immense ensemble of all solutions that its probability is near one, there is no principled reason to expect such an estimator to be representative of the posterior-weighted ensemble of solutions, and thus represent inferences drawn from the data. We employ statistical decision theory to find more representative estimators, centroid estimators, in a general high-dimensional discrete setting by using a family of loss functions with penalties that increase with the number of differences in components. We show that centroid estimates are obtained by maximizing the marginal probabilities of the solution components for unconstrained ensembles and for an important class of problems, including sequence alignment and the prediction of RNA secondary structure, whose ensembles contain exclusivity constraints. Three genomics examples are described that show that these estimators substantially improve predictions of ground-truth reference sets.

Mesh:

Year:  2008        PMID: 18305160      PMCID: PMC2265131          DOI: 10.1073/pnas.0712329105

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  15 in total

1.  Combining location and expression data for principled discovery of genetic regulatory network models.

Authors:  Alexander J Hartemink; David K Gifford; Tommi S Jaakkola; Richard A Young
Journal:  Pac Symp Biocomput       Date:  2002

2.  Toward high-resolution de novo structure prediction for small proteins.

Authors:  Philip Bradley; Kira M S Misura; David Baker
Journal:  Science       Date:  2005-09-16       Impact factor: 47.728

3.  RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble.

Authors:  Ye Ding; Chi Yu Chan; Charles E Lawrence
Journal:  RNA       Date:  2005-08       Impact factor: 4.942

4.  Clustering of RNA secondary structures with application to messenger RNAs.

Authors:  Ye Ding; Chi Yu Chan; Charles E Lawrence
Journal:  J Mol Biol       Date:  2006-02-02       Impact factor: 5.469

5.  CONTRAfold: RNA secondary structure prediction without physics-based models.

Authors:  Chuong B Do; Daniel A Woods; Serafim Batzoglou
Journal:  Bioinformatics       Date:  2006-07-15       Impact factor: 6.937

6.  A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction.

Authors:  Lee A Newberg; William A Thompson; Sean Conlan; Thomas M Smith; Lee Ann McCue; Charles E Lawrence
Journal:  Bioinformatics       Date:  2007-05-08       Impact factor: 6.937

7.  A reliable sequence alignment method based on probabilities of residue correspondences.

Authors:  S Miyazawa
Journal:  Protein Eng       Date:  1995-10

8.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

9.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  On side-chain conformational entropy of proteins.

Authors:  Jinfeng Zhang; Jun S Liu
Journal:  PLoS Comput Biol       Date:  2006-12-08       Impact factor: 4.475

View more
  34 in total

Review 1.  A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA).

Authors:  Michiaki Hamada; Kiyoshi Asai
Journal:  J Comput Biol       Date:  2012-02-07       Impact factor: 1.479

Review 2.  Advances in RNA structure prediction from sequence: new tools for generating hypotheses about viral RNA structure-function relationships.

Authors:  Susan J Schroeder
Journal:  J Virol       Date:  2009-04-15       Impact factor: 5.103

3.  Direct updating of an RNA base-pairing probability matrix with marginal probability constraints.

Authors:  Michiaki Hamada
Journal:  J Comput Biol       Date:  2012-12       Impact factor: 1.479

4.  Parameters for accurate genome alignment.

Authors:  Martin C Frith; Michiaki Hamada; Paul Horton
Journal:  BMC Bioinformatics       Date:  2010-02-09       Impact factor: 3.169

5.  A scale-free structure prior for graphical models with applications in functional genomics.

Authors:  Paul Sheridan; Takeshi Kamimura; Hidetoshi Shimodaira
Journal:  PLoS One       Date:  2010-11-05       Impact factor: 3.240

6.  An intuitive, informative, and most balanced representation of phylogenetic topologies.

Authors:  Wataru Iwasaki; Toshihisa Takagi
Journal:  Syst Biol       Date:  2010-09-03       Impact factor: 15.683

7.  Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models.

Authors:  Kasper Stovgaard; Christian Andreetta; Jesper Ferkinghoff-Borg; Thomas Hamelryck
Journal:  BMC Bioinformatics       Date:  2010-08-18       Impact factor: 3.169

8.  Genome-wide searching with base-pairing kernel functions for noncoding RNAs: computational and expression analysis of snoRNA families in Caenorhabditis elegans.

Authors:  Kensuke Morita; Yutaka Saito; Kengo Sato; Kotaro Oka; Kohji Hotta; Yasubumi Sakakibara
Journal:  Nucleic Acids Res       Date:  2009-01-07       Impact factor: 16.971

9.  CENTROIDFOLD: a web server for RNA secondary structure prediction.

Authors:  Kengo Sato; Michiaki Hamada; Kiyoshi Asai; Toutai Mituyama
Journal:  Nucleic Acids Res       Date:  2009-05-12       Impact factor: 16.971

10.  Faster computation of exact RNA shape probabilities.

Authors:  Stefan Janssen; Robert Giegerich
Journal:  Bioinformatics       Date:  2010-01-14       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.