| Literature DB >> 30532167 |
Kathleen E Lotterhos1, Jason H Moore2, Ann E Stapleton3.
Abstract
Increasingly complex statistical models are being used for the analysis of biological data. Recent commentary has focused on the ability to compute the same outcome for a given dataset (reproducibility). We argue that a reproducible statistical analysis is not necessarily valid because of unique patterns of nonindependence in every biological dataset. We advocate that analyses should be evaluated with known-truth simulations that capture biological reality, a process we call "analysis validation." We review the process of validation and suggest criteria that a validation project should meet. We find that different fields of science have historically failed to meet all criteria, and we suggest ways to implement meaningful validation in training and practice.Entities:
Mesh:
Year: 2018 PMID: 30532167 PMCID: PMC6301703 DOI: 10.1371/journal.pbio.3000070
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Fig 1Flow chart showing some of the key steps in constructing simulations and validating data analysis methods.
PRC, precision-recall curve.