Literature DB >> 23663551

The limits of p-values for biological data mining.

James D Malley1, Abhijit Dasgupta, Jason H Moore.   

Abstract

Entities:  

Year:  2013        PMID: 23663551      PMCID: PMC3668262          DOI: 10.1186/1756-0381-6-10

Source DB:  PubMed          Journal:  BioData Min        ISSN: 1756-0381            Impact factor:   2.522


× No keyword cloud information.

Use of p-values is widespread in the sciences, especially so in biomedical research, and also underlies several analytic approaches in data mining. Its original intent is simple enough, but its application and interpretation are far from simple. If data is collected to evaluate an idea, a hypothesis, then accepting the idea, when it is true, is a good thing, and rejecting the idea when it is not true, is also good. Two errors in reasoning from the data then can occur: a true idea is rejected (making a Type I error), or a false idea is accepted (Type II error). However, simple criticisms and essential distinctions are immediate: (1) The p-value is not a probability of an idea being true; such a more evolved statement requires using Bayes theorem—at least—and a different frame for inference; (2) Just stating the result of a statistical test as a p-value is nearly uninformative, as a statistically significant outcome may have no practical biological importance; and continuing (3) The size of the departure from the proposed true idea, the effect size, could be quite small in the subject matter context; (4) The statistical method chosen for making a p-value declaration could be doubtful, or inappropriate (i.e. wrong); (5) The consequences of reasoning forward from a declared p-value has uneven consequences: so-called false positive and false negatives are rampant and often hard to reckon with in many biomedical testing environments (e.g. mammograms); all of which to say is (6) The utility or cost of false positives and false negatives is unexamined in simple p-value declarations. All the above is well-known in the statistical community and much-studied over many years [1,2]. More recent problems with p-values include: (1) Correction for multiple testing, over hundreds, thousands or even millions of tests, using methods such as Bonferroni or False Discovery Rate (FDR). This occurs often in genomics and data mining and the corrections, or adjustments are often scientifically ungrounded and assume the universal null hypothesis that all findings are due to chance [3]. The central problem is that such testing assumes the separate p-values are in effect, independent agents, and the power to detect biological associations from one gene or genetic variant to the next are sent to zero. Introducing biologically realistic entanglements and higher order correlations across genomic sites and events is deeply problematic and nearly impossible to get right; (2) Another problem is that the reported p-values of such tests of association are weirdly at odds with current basic science. Consider, for example, quantum mechanics that is the single most experimentally well-validated understanding of basic physics ever proposed in the history of science. Despite its scientific rigor, quantum mechanics is accurate only to about eight or nine significant digits. However, it is not uncommon for researchers to report p-values of less than 10-40. Such assertions are reporting experimental testing outcomes more accurate than quantum mechanics, comparable to making declarations for events rarer than the decay rate of the proton. Further, such small p-values cannot be justified by randomization or statistically grounded arguments given the relatively small sample sizes in play. They only announce a blind faith in the validity of an assumed distribution (like the chi-squared) for parsing an observed test outcome far into its tail. A problem closely related to the strict reliance on p-values—and the two kinds of errors, the false negatives and positives—is the wide use of Receiver Operator Curves (ROC) curves. This scheme arose in the 1940s for testing the performance of a radio receiver and for that kind of device good reception across an entire bandwidth makes sense. So the device needs to have low reception error and high rejection of noise at many frequencies. For a medical test this usually makes no sense: The researcher makes a practical and scientific decision about applying the test (setting the threshold), and proceeds to use the test accordingly on the next patient. But the patient is not a radio under test. Next, the area under the ROC curve, the AUC value, is thought useful and often reported as determinative. However, it is easy to construct simple and plausible examples where the AUC estimate is unstable with multiple test outcomes all having AUC exactly equal to 1, and all being distinct in terms of inference. So, given all the problems above, what good purpose is served, or could be served by p-values? This can be resolved by bringing the focus back to the scientific, data mining questions: What are the hypotheses of interest (are there different ways to frame the analysis)? Are the hypotheses under study related in some way (independent, not independent)? What are the costs of drawing the wrong conclusion (what at the underlying risks, estimated effect sizes)? Beyond p-values, FDR, ROC, and AUC, are there more efficient uses of the same data? What is truly predictive rather than being merely significant? This last question is, indeed, the single most critical and drives an informed and grounded response to all the others. We will explore these entangled issues in future editorials.
  3 in total

1.  No adjustments are needed for multiple comparisons.

Authors:  K J Rothman
Journal:  Epidemiology       Date:  1990-01       Impact factor: 4.822

Review 2.  A dirty dozen: twelve p-value misconceptions.

Authors:  Steven Goodman
Journal:  Semin Hematol       Date:  2008-07       Impact factor: 3.851

3.  Null misinterpretation in statistical testing and its impact on health risk assessment.

Authors:  Sander Greenland
Journal:  Prev Med       Date:  2011-08-17       Impact factor: 4.018

  3 in total
  7 in total

1.  Diverse convergent evidence in the genetic analysis of complex disease: coordinating omic, informatic, and experimental evidence to better identify and validate risk factors.

Authors:  Sarah A Pendergrass; Marquitta J White; Nuri Kodaman; Timothy H Ciesielski; Rafal S Sobota; Minjun Huang; Jacquelaine Bartlett; Jing Li; Qinxin Pan; Jiang Gui; Scott B Selleck; Christopher I Amos; Marylyn D Ritchie; Jason H Moore; Scott M Williams
Journal:  BioData Min       Date:  2014-06-30       Impact factor: 2.522

Review 2.  Kernel-based whole-genome prediction of complex traits: a review.

Authors:  Gota Morota; Daniel Gianola
Journal:  Front Genet       Date:  2014-10-16       Impact factor: 4.599

Review 3.  Breath analysis as a potential and non-invasive frontier in disease diagnosis: an overview.

Authors:  Jorge Pereira; Priscilla Porto-Figueira; Carina Cavaco; Khushman Taunk; Srikanth Rapole; Rahul Dhakne; Hampapathalu Nagarajaram; José S Câmara
Journal:  Metabolites       Date:  2015-01-09

4.  Unearthing new genomic markers of drug response by improved measurement of discriminative power.

Authors:  Cuong C Dang; Antonio Peón; Pedro J Ballester
Journal:  BMC Med Genomics       Date:  2018-02-06       Impact factor: 3.063

Review 5.  Unravelling the Potential of Salivary Volatile Metabolites in Oral Diseases. A Review.

Authors:  Jorge A M Pereira; Priscilla Porto-Figueira; Ravindra Taware; Pritam Sukul; Srikanth Rapole; José S Câmara
Journal:  Molecules       Date:  2020-07-07       Impact factor: 4.411

Review 6.  The common ground of genomics and systems biology.

Authors:  Ana Conesa; Ali Mortazavi
Journal:  BMC Syst Biol       Date:  2014-03-13

7.  Evolutionary triangulation: informing genetic association studies with evolutionary evidence.

Authors:  Minjun Huang; Britney E Graham; Ge Zhang; Reed Harder; Nuri Kodaman; Jason H Moore; Louis Muglia; Scott M Williams
Journal:  BioData Min       Date:  2016-04-02       Impact factor: 2.522

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.