| Literature DB >> 29217835 |
Olga Vsevolozhskaya1, Gabriel Ruiz2, Dmitri Zaykin3.
Abstract
Increased availability of data and accessibility of computational tools in recent years have created an unprecedented upsurge of scientific studies driven by statistical analysis. Limitations inherent to statistics impose constraints on the reliability of conclusions drawn from data, so misuse of statistical methods is a growing concern. Hypothesis and significance testing, and the accompanying P-values are being scrutinized as representing the most widely applied and abused practices. One line of critique is that P-values are inherently unfit to fulfill their ostensible role as measures of credibility for scientific hypotheses. It has also been suggested that while P-values may have their role as summary measures of effect, researchers underappreciate the degree of randomness in the P-value. High variability of P-values would suggest that having obtained a small P-value in one study, one is, ne vertheless, still likely to obtain a much larger P-value in a similarly powered replication study. Thus, "replicability of P-value" is in itself questionable. To characterize P-value variability, one can use prediction intervals whose endpoints reflect the likely spread of P-values that could have been obtained by a replication study. Unfortunately, the intervals currently in use, the frequentist P-intervals, are based on unrealistic implicit assumptions. Namely, P-intervals are constructed with the assumptions that imply substantial chances of encountering large values of effect size in an observational study, which leads to bias. The long-run frequentist probability provided by P-intervals is similar in interpretation to that of the classical confidence intervals, but the endpoints of any particular interval lack interpretation as probabilistic bounds for the possible spread of future P-values that may have been obtained in replication studies. Along with classical frequentist intervals, there exists a Bayesian viewpoint toward interval construction in which the endpoints of an interval have a meaningful probabilistic interpretation. We propose Bayesian intervals for prediction of P-value variability in prospective replication studies. Contingent upon approximate prior knowledge of the effect size distribution, our proposed Bayesian intervals have endpoints that are directly interpretable as probabilistic bounds for replication P-values, and they are resistant to selection bias. We showcase our approach by its application to P-values reported for five psychiatric disorders by the Psychiatric Genomics Consortium group.Entities:
Mesh:
Year: 2017 PMID: 29217835 PMCID: PMC5802740 DOI: 10.1038/s41398-017-0024-3
Source DB: PubMed Journal: Transl Psychiatry ISSN: 2158-3188 Impact factor: 6.222
Fig. 1Randomly simulated Z-statistics (dots) with the corresponding 80% prediction intervals (vertical error bars). Tests were performed based on two samples (n 1 = n 2 = 50) from two different populations. The difference between population means was a random draw from the standard normal distribution. Pink color highlights intervals that did not capture the value of the future test statistic
Binomial probabilities for 80% prediction intervals, using a two-sample Z-test
| Type of | Prior variance, | Conjugate Bayes | Mixture Bayes |
|
|---|---|---|---|---|
| 0 ≤ | 0.25 | 80.1% | 80.2% | 80.2% |
| (no selection) | 0.50 | 80.0% | 80.0% | 79.9% |
| 1.00 | 80.0% | 80.0% | 80.0% | |
| 3.00 | 80.4% | 80.4% | 80.4% | |
| 5.00 | 80.2% | 80.2% | 80.3% | |
| 10.00 | 80.1% | 80.1% | 80.1% | |
| 0.045 ≤ | 0.25 | 79.8% | 79.8% | 58.4% |
| 0.50 | 80.1% | 80.1% | 66.7% | |
| 1.00 | 79.8% | 79.8% | 73.5% | |
| 3.00 | 80.0% | 80.0% | 80.2% | |
| 5.00 | 79.9% | 79.9% | 80.7% | |
| 10.00 | 80.1% | 80.1% | 80.8% | |
| 0 ≤ | 0.25 | 80.0% | 80.0% | 46.0% |
| 0.50 | 80.1% | 80.1% | 55.4% | |
| 1.00 | 80.2% | 80.2% | 65.5% | |
| 3.00 | 79.8% | 79.8% | 75.7% | |
| 5.00 | 80.4% | 80.4% | 78.4% | |
| 10.00 | 80.3% | 80.3% | 79.5% | |
| 0 ≤ | 0.25 | 80.1% | 80.1% | 17.0% |
| 0.50 | 80.1% | 80.1% | 29.7% | |
| 1.00 | 80.0% | 79.9% | 47.6% | |
| 3.00 | 80.0% | 80.0% | 70.2% | |
| 5.00 | 79.9% | 79.9% | 75.4% | |
| 10.00 | 79.7% | 79.8% | 78.2% | |
| 5 × 10 − 8 ≤ | 3.00 | 80.1% | 80.1% | 62.8% |
| 5.00 | 79.5% | 79.5% | 72.6% | |
| 10.00 | 79.8% | 79.8% | 78.3% | |
| 5 × 10 − 9 ≤ | 3.00 | 80.0% | 80.0% | 60.6% |
| 5.00 | 79.9% | 80.0% | 71.8% | |
| 10.00 | 80.2% | 80.2% | 78.1% |
The table illustrates the effect of thresholding, applied to observed P-values, e.g., selection of statistically significant P-values at 5% level, on binomial probabilities
Fig. 2Selection bias influences the performance of prediction intervals. Eighty-percent prediction intervals constructed for have noticeably poorer performance relative to the ones constructed for a random statistic
Binomial probabilities for 80% prediction intervals, using a two-sample Z-test
| Number of tests | Prior variance | Conjugate Bayes | Mixture Bayes |
|
|---|---|---|---|---|
|
| 0.25 | 80.4% | 80.4% | 63.8% |
| 0.50 | 79.9% | 79.9% | 66.2% | |
| 1.00 | 80.6% | 80.6% | 70.4% | |
| 3.00 | 80.0% | 80.0% | 75.1% | |
| 5.00 | 80.1% | 80.1% | 76.7% | |
| 10.00 | 80.1% | 80.1% | 78.3% | |
|
| 0.25 | 79.8% | 79.8% | 35.7% |
| 0.50 | 80.2% | 80.2% | 42.3% | |
| 1.00 | 79.9% | 79.9% | 51.1% | |
| 3.00 | 79.6% | 79.6% | 65.1% | |
| 5.00 | 80.0% | 80.0% | 70.0% | |
| 10.00 | 79.8% | 79.8% | 74.5% | |
|
| 0.25 | 80.0% | 80.1% | 16.9% |
| 0.50 | 79.9% | 79.8% | 23.9% | |
| 1.00 | 80.0% | 79.9% | 35.0% | |
| 3.00 | 80.0% | 79.9% | 55.5% | |
| 5.00 | 79.7% | 79.6% | 63.1% | |
| 10.00 | 80.2% | 80.1% | 70.7% | |
|
| 0.25 | 80.1% | 80.1% | 07.2% |
| 0.50 | 80.1% | 80.0% | 12.9% | |
| 1.00 | 79.8% | 79.6% | 23.1% | |
| 3.00 | 79.7% | 79.1% | 46.2% | |
| 5.00 | 80.2% | 79.6% | 56.5% | |
| 10.00 | 80.2% | 79.5% | 66.5% |
The table illustrates the effect of selecting the most significant P-value (out of L tests) on P-interval coverage
The effect of the prior variance mis-specification on the coverage of Bayesian-type prediction intervals
| Number of tests | Prior variance | Bayesian | Bayesian |
|
|---|---|---|---|---|
|
|
|
| ||
|
| 0.5 | 77.5% | 81.7% | 80.3% |
| 1 | 76.5% | 81.5% | 79.7% | |
|
| 0.25 | 77.6% | 81.1% | 63.8% |
| 0.5 | 75.9% | 80.8% | 66.2% | |
|
| 3 | 70.4% | 77.8% | 65.1% |
| 1 | 72.1% | 77.2% | 51.1% | |
|
| 0.25 | 76.4% | 78.2% | 16.9% |
| 0.5 | 72.9% | 75.8% | 23.9% | |
|
| 3 | 62.0% | 72.9% | 46.2% |
| 10 | 69.5% | 77.1% | 66.5% |
Revised predictions based on recent results from the Psychiatric Genomics Consortium with the prior effect size distribution estimated for the bipolar disorder susceptibility loci
| SNP | Disorder | Cases | Controls | One-sided | Prediction intervals for | |||
|---|---|---|---|---|---|---|---|---|
| Conjugate Bayesa | Mixture Bayesa | Mixture Bayesb | Lazerroni et al. | |||||
| rs2535629 | ADHD | 2787 | 2635 | 0.1005 | (0.023, 0.977) | (0.023, 0.977) | (0.023, 0.977) | (2.57e-5, 0.93) |
| ASD | 4949 | 5314 | 0.098 | (0.022, 0.977) | (0.022, 0.977) | (0.022, 0.977) | (2.39e-5, 0.93) | |
| BP | 6990 | 4820 | 3.305e-06 | (0.017, 0.977) | (0.017, 0.977) | (6.92e-7, 0.93) | (1.69e-13, 0.04) | |
| MDD | 9227 | 7383 | 0.000108 | (0.016, 0.977) | (0.016, 0.977) | (5.25e-6, 0.95) | (4.89e-11, 0.18) | |
| Schizophrenia | 9379 | 7736 | 3.355e-05 | (0.015, 0.977) | (0.015, 0.977) | (3.98e-7, 0.95) | (6.92e-12, 0.11) | |
| All | 33,332 | 27,888 | 1.27e-12 | (0.001, 0.871) | (0.001, 0.871) | (2.2e-18, 1.5e-5) | (7.41e-23, 1.17e-5) | |
ADHD attention deficit-hyperactivity disorder, ASD autism spectrum disorder, BP bipolar disorder, MDD major depressive disorder
aThe prior effect size distribution using the conjugate model with the variance estimated based on the tabulated values of effect sizes reported in Chen et al.
bThe prior effect size distribution specified directly by the estimates reported in Chen et al.
Revised predictions based on recent results from the Psychiatric Genomics Consortium with the prior effect size distribution estimated for cancer risk loci
| SNP | Disorder | Cases | Controls | One-sided | Prediction intervals for | |||
|---|---|---|---|---|---|---|---|---|
| Conjugate Bayesa | Mixture Bayesa | Mixture Bayesb | Lazerroni et al. | |||||
| rs2535629 | ADHD | 2787 | 2635 | 0.1005 | (0.023, 0.977) | (0.023, 0.977) | (0.023, 0.977) | (2.57e-5, 0.93) |
| ASD | 4949 | 5314 | 0.098 | (0.022, 0.977) | (0.022, 0.977) | (0.022, 0.977) | (2.39e-5, 0.93) | |
| BP | 6990 | 4820 | 3.305e-06 | (0.017, 0.977) | (0.017, 0.977) | (5.89e-9, 0.93) | (1.69e-13, 0.04) | |
| MDD | 9227 | 7383 | 0.000108 | (0.016, 0.977) | (0.016, 0.977) | (7.41e-5, 0.98) | (4.89e-11, 0.18) | |
| Schizophrenia | 9379 | 7736 | 3.355e-05 | (0.015, 0.977) | (0.015, 0.977) | (9.33e-7, 0.95) | (6.92e-12, 0.11) | |
| All | 33,332 | 27,888 | 1.27e-12 | (0.001, 0.871) | (0.001, 0.871) | (8.91e-20, 4.2e-5) | (7.41e-23, 1.17e-5) | |
ADHD attention deficit-hyperactivity disorder, ASD autism spectrum disorder, BP bipolar disorder, MDD major depressive disorder
aThe prior effect size distribution using the conjugate model with the variance estimated based on the tabulated values of effect sizes reported in Park et al.
bThe prior effect size distribution specified directly by the estimates reported in Park et al.
Fig. 3Complex diseases have intrinsically weak genetic effects, as illustrated by a Manhattan plot with only a few significant P-values highlighted in green color. The effect sizes corresponding to P-values in the Manhattan plot look “L-shaped,” reflecting the idea that the majority of signals are just noise with very little effect sizes (e.g., as measured by so the bulk of the effect size distribution is around zero and it is increasingly less likely to find a signal with a large effect size