Literature DB >> 31002504

Statistical Uncertainty Analysis for Small-Sample, High Log-Variance Data: Cautions for Bootstrapping and Bayesian Bootstrapping.

Barmak Mostofian1, Daniel M Zuckerman1.   

Abstract

Recent advances in molecular simulations allow the evaluation of previously unattainable observables, such as rate constants for protein folding. However, these calculations are usually computationally expensive, and even significant computing resources may result in a small number of independent estimates spread over many orders of magnitude. Such small-sample, high "log-variance" data are not readily amenable to analysis using the standard uncertainty (i.e., "standard error of the mean") because unphysical negative limits of confidence intervals result. Bootstrapping, a natural alternative guaranteed to yield a confidence interval within the minimum and maximum values, also exhibits a striking systematic bias of the lower confidence limit in log space. As we show, bootstrapping artifactually assigns high probability to improbably low mean values. A second alternative, the Bayesian bootstrap strategy, does not suffer from the same deficit and is more logically consistent with the type of confidence interval desired. The Bayesian bootstrap provides uncertainty intervals that are more reliable than those from the standard bootstrap method but must be used with caution nevertheless. Neither standard nor Bayesian bootstrapping can overcome the intrinsic challenge of underestimating the mean from small-size, high log-variance samples. Our conclusions are based on extensive analysis of model distributions and reanalysis of multiple independent atomistic simulations. Although we only analyze rate constants, similar considerations will apply to related calculations, potentially including highly nonlinear averages like the Jarzynski relation.

Entities:  

Year:  2019        PMID: 31002504      PMCID: PMC6754704          DOI: 10.1021/acs.jctc.9b00015

Source DB:  PubMed          Journal:  J Chem Theory Comput        ISSN: 1549-9618            Impact factor:   6.006


  28 in total

1.  Critical role of beta-hairpin formation in protein G folding.

Authors:  E L McCallister; E Alm; D Baker
Journal:  Nat Struct Biol       Date:  2000-08

2.  Computer-based redesign of a protein folding pathway.

Authors:  S Nauli; B Kuhlman; D Baker
Journal:  Nat Struct Biol       Date:  2001-07

3.  Rapid cooperative two-state folding of a miniature alpha-beta protein and design of a thermostable variant.

Authors:  Jia-Cherng Horng; Viktor Moroz; Daniel P Raleigh
Journal:  J Mol Biol       Date:  2003-02-28       Impact factor: 5.469

4.  Quantifying uncertainty and sampling quality in biomolecular simulations.

Authors:  Alan Grossfield; Daniel M Zuckerman
Journal:  Annu Rep Comput Chem       Date:  2009-01-01

Review 5.  How well can simulation predict protein folding kinetics and thermodynamics?

Authors:  Christopher D Snow; Eric J Sorin; Young Min Rhee; Vijay S Pande
Journal:  Annu Rev Biophys Biomol Struct       Date:  2005

6.  Molecular dynamics and protein function.

Authors:  M Karplus; J Kuriyan
Journal:  Proc Natl Acad Sci U S A       Date:  2005-05-03       Impact factor: 11.205

7.  Accurate and efficient corrections for missing dispersion interactions in molecular simulations.

Authors:  Michael R Shirts; David L Mobley; John D Chodera; Vijay S Pande
Journal:  J Phys Chem B       Date:  2007-10-19       Impact factor: 2.991

Review 8.  Long-timescale molecular dynamics simulations of protein structure and function.

Authors:  John L Klepeis; Kresten Lindorff-Larsen; Ron O Dror; David E Shaw
Journal:  Curr Opin Struct Biol       Date:  2009-04-08       Impact factor: 6.809

9.  Bayesian single-exponential kinetics in single-molecule experiments and simulations.

Authors:  Daniel L Ensign; Vijay S Pande
Journal:  J Phys Chem B       Date:  2009-09-10       Impact factor: 2.991

10.  The "weighted ensemble" path sampling method is statistically exact for a broad class of stochastic processes and binning procedures.

Authors:  Bin W Zhang; David Jasnow; Daniel M Zuckerman
Journal:  J Chem Phys       Date:  2010-02-07       Impact factor: 3.488

View more
  8 in total

1.  Computational Estimation of Microsecond to Second Atomistic Folding Times.

Authors:  Upendra Adhikari; Barmak Mostofian; Jeremy Copperman; Sundar Raman Subramanian; Andrew A Petersen; Daniel M Zuckerman
Journal:  J Am Chem Soc       Date:  2019-04-12       Impact factor: 15.419

2.  A gentle introduction to the non-equilibrium physics of trajectories: Theory, algorithms, and biomolecular applications.

Authors:  Daniel M Zuckerman; John D Russo
Journal:  Am J Phys       Date:  2021-10-22       Impact factor: 0.835

3.  Reproducible Inter-Personal Brain Coupling Measurements in Hyperscanning Settings With functional Near Infra-Red Spectroscopy.

Authors:  Bizzego Andrea; Azhari Atiqah; Esposito Gianluca
Journal:  Neuroinformatics       Date:  2021-10-29

4.  The RED scheme: Rate-constant estimation from pre-steady state weighted ensemble simulations.

Authors:  Alex J DeGrave; Anthony T Bogetti; Lillian T Chong
Journal:  J Chem Phys       Date:  2021-03-21       Impact factor: 3.488

5.  Mechanistic analysis of light-driven overcrowded alkene-based molecular motors by multiscale molecular simulations.

Authors:  Mudong Feng; Michael K Gilson
Journal:  Phys Chem Chem Phys       Date:  2021-03-25       Impact factor: 3.676

6.  WESTPA 2.0: High-Performance Upgrades for Weighted Ensemble Simulations and Analysis of Longer-Timescale Applications.

Authors:  John D Russo; She Zhang; Jeremy M G Leung; Anthony T Bogetti; Jeff P Thompson; Alex J DeGrave; Paul A Torrillo; A J Pratt; Kim F Wong; Junchao Xia; Jeremy Copperman; Joshua L Adelman; Matthew C Zwier; David N LeBard; Daniel M Zuckerman; Lillian T Chong
Journal:  J Chem Theory Comput       Date:  2022-01-19       Impact factor: 6.006

7.  Permeability of membranes in the liquid ordered and liquid disordered phases.

Authors:  An Ghysels; Andreas Krämer; Richard M Venable; Walter E Teague; Edward Lyman; Klaus Gawrisch; Richard W Pastor
Journal:  Nat Commun       Date:  2019-12-09       Impact factor: 14.919

8.  Impact of Bayesian Inference on the Selection of Psidium guajava.

Authors:  Flavia Alves da Silva; Alexandre Pio Viana; Caio Cezar Guedes Corrêa; Beatriz Murizini Carvalho; Carlos Misael Bezerra de Sousa; Bruno Dias Amaral; Moisés Ambrósio; Leonardo Siqueira Glória
Journal:  Sci Rep       Date:  2020-02-06       Impact factor: 4.379

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.