Literature DB >> 23489379

Accuracy estimation and parameter advising for protein multiple sequence alignment.

John Kececioglu1, Dan DeBlasio.   

Abstract

Abstract We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for "feature-based accuracy estimator"), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality.

Mesh:

Substances:

Year:  2013        PMID: 23489379      PMCID: PMC3619150          DOI: 10.1089/cmb.2013.0007

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  26 in total

1.  BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations.

Authors:  A Bahr; J D Thompson; J C Thierry; O Poch
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  AL2CO: calculation of positional conservation in a protein sequence alignment.

Authors:  J Pei; N V Grishin
Journal:  Bioinformatics       Date:  2001-08       Impact factor: 6.937

3.  Towards a reliable objective function for multiple sequence alignments.

Authors:  J D Thompson; F Plewniak; R Ripp; J C Thierry; O Poch
Journal:  J Mol Biol       Date:  2001-12-07       Impact factor: 5.469

4.  Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method.

Authors:  Tobias Müller; Rainer Spang; Martin Vingron
Journal:  Mol Biol Evol       Date:  2002-01       Impact factor: 16.240

5.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

6.  RASCAL: rapid scanning and correction of multiple sequence alignments.

Authors:  J D Thompson; J C Thierry; O Poch
Journal:  Bioinformatics       Date:  2003-06-12       Impact factor: 6.937

7.  LEON: multiple aLignment Evaluation Of Neighbours.

Authors:  Julie D Thompson; Véronique Prigent; Olivier Poch
Journal:  Nucleic Acids Res       Date:  2004-02-24       Impact factor: 16.971

8.  An improved algorithm for matching biological sequences.

Authors:  O Gotoh
Journal:  J Mol Biol       Date:  1982-12-15       Impact factor: 5.469

9.  Rapid similarity searches of nucleic acid and protein data banks.

Authors:  W J Wilbur; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1983-02       Impact factor: 11.205

10.  PSAR: measuring multiple sequence alignment reliability by probabilistic sampling.

Authors:  Jaebum Kim; Jian Ma
Journal:  Nucleic Acids Res       Date:  2011-05-16       Impact factor: 16.971

View more
  5 in total

1.  Adaptive Local Realignment of Protein Sequences.

Authors:  Dan DeBlasio; John Kececioglu
Journal:  J Comput Biol       Date:  2018-06-11       Impact factor: 1.479

2.  GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.

Authors:  Itamar Sela; Haim Ashkenazy; Kazutaka Katoh; Tal Pupko
Journal:  Nucleic Acids Res       Date:  2015-04-16       Impact factor: 16.971

3.  Core column prediction for protein multiple sequence alignments.

Authors:  Dan DeBlasio; John Kececioglu
Journal:  Algorithms Mol Biol       Date:  2017-04-19       Impact factor: 1.405

4.  LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation.

Authors:  Emanuel Maldonado; Agostinho Antunes
Journal:  BMC Bioinformatics       Date:  2019-12-30       Impact factor: 3.169

5.  Automating parameter selection to avoid implausible biological pathway models.

Authors:  Chris S Magnano; Anthony Gitter
Journal:  NPJ Syst Biol Appl       Date:  2021-02-23
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.