Literature DB >> 28913944

The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted.

Olga A Vsevolozhskaya1, Chia-Ling Kuo2, Gabriel Ruiz3, Luda Diatchenko4, Dmitri V Zaykin5.   

Abstract

The increasing accessibility of data to researchers makes it possible to conduct massive amounts of statistical testing. Rather than follow specific scientific hypotheses with statistical analysis, researchers can now test many possible relationships and let statistics generate hypotheses for them. The field of genetic epidemiology is an illustrative case, where testing of candidate genetic variants for association with an outcome has been replaced by agnostic screening of the entire genome. Poor replication rates of candidate gene studies have improved dramatically with the increase in genomic coverage, due to factors such as adoption of better statistical practices and availability of larger sample sizes. Here, we suggest that another important factor behind the improved replicability of genome-wide scans is an increase in the amount of statistical testing itself. We show that an increase in the number of tested hypotheses increases the proportion of true associations among the variants with the smallest P-values. We develop statistical theory to quantify how the expected proportion of genuine signals (EPGS) among top hits depends on the number of tests. This enrichment of top hits by real findings holds regardless of whether genome-wide statistical significance has been reached in a study. Moreover, if we consider only those "failed" studies that produce no statistically significant results, the same enrichment phenomenon takes place: the proportion of true associations among top hits grows with the number of tests. The enrichment occurs even if the true signals are encountered at the logarithmically decreasing rate with the additional testing.
© 2017 WILEY PERIODICALS, INC.

Entities:  

Keywords:  Bayesian analysis; genome-wide association studies; genuine signal; multiple testing; probability of a true finding; sequencing studies; true association; variability of P-values

Mesh:

Year:  2017        PMID: 28913944      PMCID: PMC7080647          DOI: 10.1002/gepi.22064

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  23 in total

1.  P-value based analysis for shared controls design in genome-wide association studies.

Authors:  Dmitri V Zaykin; Damian O Kozbur
Journal:  Genet Epidemiol       Date:  2010-11       Impact factor: 2.135

2.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies.

Authors:  Jon Wakefield
Journal:  Am J Hum Genet       Date:  2007-07-03       Impact factor: 11.025

3.  Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false.

Authors:  John P A Ioannidis
Journal:  Biostatistics       Date:  2013-09-25       Impact factor: 5.899

4.  Revised standards for statistical evidence.

Authors:  Valen E Johnson
Journal:  Proc Natl Acad Sci U S A       Date:  2013-11-11       Impact factor: 11.205

5.  Methods to increase reproducibility in differential gene expression via meta-analysis.

Authors:  Timothy E Sweeney; Winston A Haynes; Francesco Vallania; John P Ioannidis; Purvesh Khatri
Journal:  Nucleic Acids Res       Date:  2016-09-14       Impact factor: 16.971

6.  p-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results.

Authors:  Uri Simonsohn; Leif D Nelson; Joseph P Simmons
Journal:  Perspect Psychol Sci       Date:  2014-11

7.  The fickle P value generates irreproducible results.

Authors:  Lewis G Halsey; Douglas Curran-Everett; Sarah L Vowler; Gordon B Drummond
Journal:  Nat Methods       Date:  2015-03       Impact factor: 28.547

8.  P-values in genomics: apparent precision masks high uncertainty.

Authors:  L C Lazzeroni; Y Lu; I Belitskaya-Lévy
Journal:  Mol Psychiatry       Date:  2014-01-14       Impact factor: 15.992

9.  Estimation of significance thresholds for genomewide association scans.

Authors:  Frank Dudbridge; Arief Gusnanto
Journal:  Genet Epidemiol       Date:  2008-04       Impact factor: 2.135

10.  Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.

Authors:  Sander Greenland; Stephen J Senn; Kenneth J Rothman; John B Carlin; Charles Poole; Steven N Goodman; Douglas G Altman
Journal:  Eur J Epidemiol       Date:  2016-05-21       Impact factor: 8.082

View more
  1 in total

1.  Multi-ethnic GWAS and meta-analysis of sleep quality identify MPP6 as a novel gene that functions in sleep center neurons.

Authors:  Samar Khoury; Qiao-Ping Wang; Marc Parisien; Pavel Gris; Andrey V Bortsov; Sarah D Linnstaedt; Samuel A McLean; Andrew S Tungate; Tamar Sofer; Jiwon Lee; Tin Louie; Susan Redline; Mari Anneli Kaunisto; Eija A Kalso; Hans Markus Munter; Andrea G Nackley; Gary D Slade; Shad B Smith; Dmitri V Zaykin; Roger B Fillingim; Richard Ohrbach; Joel D Greenspan; William Maixner; G Gregory Neely; Luda Diatchenko
Journal:  Sleep       Date:  2021-03-12       Impact factor: 5.849

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.