| Literature DB >> 34841950 |
Sarah Nostedt1,2, Ari R Joffe1,2.
Abstract
BACKGROUND: Misinterpretations of the p-value in null-hypothesis statistical testing are common. We aimed to determine the implications of observed p-values in critical care randomized controlled trials (RCTs).Entities:
Keywords: Bayesian; critical care; false positive rate; p-value; positive predictive value; randomized controlled trial
Mesh:
Year: 2021 PMID: 34841950 PMCID: PMC9149268 DOI: 10.1177/08850666211053793
Source DB: PubMed Journal: J Intensive Care Med ISSN: 0885-0666 Impact factor: 2.889
Glossary of Terminology and Methods Used.
| - |
| - |
| - |
| - |
| - |
| - |
| - |
| - |
| [the necessary Pr(H1)/Pr(Ho)] = [the desired Pr(H1│data)/Pr(Ho│data)] / BF. |
P-Value Categories Obtained in the Cohorts of Critical Care RCTs.
| P-value category | P = .051 to .10 | p = .05 to .0051 | p ≤ .005 |
|---|---|---|---|
|
| |||
| n studies | 13 | 33 | 11 |
| % of studies with p ≤ .10 | 22.8% | 57.9% | 19.3% |
| % of all studies | 6.0% | 15.3% | 5.1% |
|
| |||
| n studies | 13 | 10 | 2 |
| % of studies with p ≤ .10 | 52% | 40% | 8% |
| % of all studies | 10.8% | 8.3% | 1.7% |
|
| |||
| Primary outcome n studies | 7 | 46 | 37 |
| % of studies with p ≤ .10 | 7.8% | 51.1% | 41.1% |
| Secondary outcome n studies | 17 | 44 | 26 |
| % of studies | 19.5%
| 50.6% | 29.9% |
For Consecutive RCTs, only studies with p ≤ .10 for the primary outcome were screened for inclusion.
For secondary outcomes, 14 (16%) of p-values were >.10.
Reverse Bayesian Implications of the Obtained p-Values ≤.10 in Adult RCTs in the Field of Critical Care Research.
| Study group | Overall | p > .05 to .10 | p = .05 to .0051 | p ≤ .005 |
|---|---|---|---|---|
|
| ||||
| Likelihood Ratio: Pr(data│H1)/Pr(data│Ho) | 5.0 [2.8, 14.0] | 1.8 [1.4, 2.2] | 5.0 [3.6, 7.9] | 107.2 [43.6, 107.2] |
| Prior Pr(H1) to have FPR 5% | 79.2 [58.0, 87.3] | 91.3 [89.6, 93.1] | 79.2 [70.6, 84.1] | 15.1 [15.1, 28.6] |
| Minimum FPR (using Prior Pr[H1] of 0.5) | 16.7 [7.0, 26.6] | 35.7 [31.3, 41.8] | 16.7 [11.2, 21.8] | 0.9 [0.9, 2.1] |
| Realistic FPR (using Prior Pr[H1] of 0.1) | 64.3 [39.8, 76.6] | 83.3 [80.4, 86.6] | 64.3 [53.2, 71.4] | 7.7 [7.7, 16.0] |
|
| ||||
| BF Bound [upper bound for Pr(data│H1)/Pr(data│Ho)] | 3.5 [2.5, 7.4] | 2.0 [1.8, 2.2] | 3.5 [2.9, 4.8] | 58.3 [21.1, 58.3] |
| PPV: Highest Posterior PrU(H1│data) using Prior Pr(H1) of 0.5. | 77.8 [71.1, 87.9] | 66.4 [63.7, 68.5] | 77.8 [74.1, 82.8] | 98.3 [95.5, 98.3] |
| PPV: Realistic Posterior PrR(H1│data) using Prior Pr(H1) of 0.1 | 28.0 [21.4, 44.9] | 18.0 [16.4, 19.5] | 28.0 [24.1, 34.8] | 86.6 [70.1, 86.6] |
|
| ||||
| 10th percentile of p-value prediction interval | 0.0001 [<0.0001, 0.0002] | 0.0003 [0.0002, 0.0004] | 0.0001 [<0.0001, 0.0001] | <0.0001 |
| 90th percentile of p-value prediction interval | 0.72 [0.47, 0.88] | 0.99 [0.95, 0.99] | 0.72 [0.60, 0.81] | 0.13 [0.13, 0.24] |
| Probability a replication study will have p ≤ 0.05 | 58.3 [50.0, 71.7] | 44.1 [40.7, 46.8] | 58.3 [53.7, 64.7] | 91.3 [84.3, 91.3] |
Values as: median [IQR] (range). BF: Bayes factor; FPR: false positive rate; Ho: null hypothesis; H1: alternative hypothesis; Pr: probability; PPV: positive predictive value. We report values to one decimal place for consistency of presentation; this is not intended to suggest the values can be known with such precision given the small sample sizes.
Reverse Bayesian Implications of the Obtained p-Values ≤0.10 in Pediatric RCTs in the Field of Critical Care Research.
| Study group | Overall | p > 0.05 to 0.10 | p = 0.05 to 0.0051 | p ≤ 0.005 |
|---|---|---|---|---|
|
| ||||
| Likelihood Ratio: Pr(data│H1)/Pr(data│Ho) | 2.2 [1.4, 7.4] | 1.5 [1.3, 1.8] | 6.1 [4.9, 16.6] | 59.9 |
| Prior Pr(H1) to have FPR 5% | 89.6 [72.1, 93.1] | 92.6 [91.3, 93.8] | 75.9 [53.6, 79.6] | 24.1 |
| Minimum FPR (using Prior Pr[H1] of 0.5) | 31.3 [12.0, 41.6] | 39.8 [35.7, 44.3] | 14.4 [5.8, 17.1] | 1.6 |
| Realistic FPR (using Prior Pr[H1] of 0.1) | 80.4 [55.1, 86.5] | 85.6 [83.3, 87.8] | 59.9 [34.5, 64.9] | 13.1 |
|
| ||||
| BF Bound [upper bound for Pr(data│H1)/Pr(data│Ho)] | 2.2 [1.8, 4.5] | 1.8 [1.7, 2.0] | 4.0 [3.4, 8.5]] | 29.6 |
| PPV: Highest Posterior PrU(H1│data) using Prior Pr(H1) of 0.5. | 68.5 [63.8, 82.0] | 64.5 [62.7, 66.4] | 79.8 [77.5, 89.4] | 96.7 |
| PPV: Realistic Posterior PrR(H1│data) using Prior Pr(H1) of 0.1 | 19.5 [16.4, 33.5] | 16.8 [15.8, 18.0] | 30.6 [27.7, 48.5] | 76.7 |
|
| ||||
| 10th percentile of p-value prediction interval | 0.0002 [<0.0001, 0.0004] | 0.0004 [0.0003, 0.0005] | 0.0001 [<0.0001, 0.0001] | <0.0001 |
| 90th percentile of p-value prediction interval | 0.95 [0.62, 0.99] | 0.99 [0.99, 0.99] | 0.67 [0.44, 0.73] | 0.20 |
| Probability a replication study will have p ≤ 0.05 | 46.8 [40.8, 63.6] | 41.7 [39.3, 44.1] | 60.9 [57.9, 73.9] | 87.1 |
Values as: median [IQR] (range). BF: Bayes factor; FPR: false positive rate; Ho: null hypothesis; H1: alternative hypothesis; Pr: probability; PPV: positive predictive value. We report values to one decimal place for consistency of presentation; this is not intended to suggest the values can be known with such precision given the small sample sizes.
Reverse Bayesian Implications of the Obtained p-Values ≤ 0.10 for Primary Outcomes in Consecutive RCTs Recently Published in the Field of Critical Care Research.
| Study group | Overall | p > .05 to .10 | p = .05 to .0051 | p ≤ .005 |
|---|---|---|---|---|
|
| ||||
| Likelihood Ratio: Pr(data│H1)/Pr(data│Ho) | 14.1 [5.0, 99.6] | 1.9 [1.3, 2.3] | 5.5 [3.7, 11.3] | 99.6 [85.2, 99.6] |
| Prior Pr(H1) to have FPR 5% | 57.5 [16.0, 79.2] | 90.9 [89.4, 93.6] | 77.6 [62.7, 83.8] | 16.0 [16.0, 18.6] |
| Minimum FPR (using Prior Pr[H1] of 0.5) | 6.7 [1.0, 16.7] | 34.4 [30.8, 43.7] | 15.4 [8.2, 21.4] | 1.0 [1.0, 1.2] |
| Realistic FPR (using Prior Pr[H1] of 0.1) | 39.1 [8.3, 64.3] | 82.5 [80.0, 87.5] | 62.1 [44.4, 71.0] | 8.3 [8.3, 9.8] |
|
| ||||
| BF Bound [upper bound for Pr(data│H1)/Pr(data│Ho)] | 7.5 [3.5, 53.3] | 2.0 [1.7, 2.2] | 3.7 [2.9, 6.3] | 53.3 [44.5, 53.3] |
| PPV: Highest Posterior PrU(H1│data) using Prior Pr(H1) of 0.5. | 88.2 [77.8, 98.2] | 67.0 [62.9, 68.8] | 78.8 [74.3, 86.2] | 98.2 [97.8, 98.2] |
| PPV: Realistic Posterior PrR(H1│data) using Prior Pr(H1) of 0.1 | 45.3 [28.0, 85.5] | 18.4 [15.9, 19.7] | 29.3 [24.3, 41.0] | 85.5 [82.7, 85.5] |
|
| ||||
| 10th percentile of p-value prediction interval | 0.000013 [0.00000033, 0.000068] | 0.00027 [0.00022, 0.00045] | 0.000059 [0.000019, 0.00011] | 0.00000033 [0.00000033, 0.00000050] |
| 90th percentile of p-value prediction interval | 0.47 [0.14, 0.72] | 0.98 [0.94, 1.0] | 0.70 [0.52, 0.80] | 0.14 [0.14, 0.16] |
| Probability a replication study will have p ≤ 0.05 | 72.1 [58.3, 90.8] | 44.9 [39.6, 47.1] | 59.7 [54.1, 69.3] | 90.8 [89.6, 90.8] |
Values as: median [IQR] (range). BF: Bayes factor; FPR: false positive rate; Ho: null hypothesis; H1: alternative hypothesis; Pr: probability; PPV: positive predictive value. We report values to one decimal place for consistency of presentation; this is not intended to suggest the values can be known with such precision given the small sample sizes.
Bayesian Analysis of Credibility Results for the Critical Care RCTs Cohorts.
| Analysis of Credibility of Results | p = .05 to .0051 | p ≤ .005 | p > .05 to .10 | p = .11 to .20 |
|---|---|---|---|---|
|
| ||||
| Number of studies | n = 33 | n = 11 | n = 13 | n = 13 |
| Study ARD in mortality (%) | 18.0 [9.8, 21.2] | 15.0 [6.1, 25.2] | 9.0 [8.0, 12.3] | - |
| Skepticism Limit for Odds or Hazard Ratios | 3.83 [2.46, 10.59] | 1.29 [1.22, 1.60] | - | - |
| Advocacy Limit for Odds or Hazard Ratios | - | - | 19 012 [5339,
5.7 × 109] | 38.45 [2.02, 69.23] |
|
| ||||
| Number of studies | n = 10 | n = 2 | n = 13 | n = 13 |
| Study ARD in mortality (%) | 21.1 [12.7, 40.3] | 27.4 and 33.3 | 10.0 [4.2, 23.1] | 10.1 [5.4, 20.6] |
| Skepticism Limit for Odds or Hazard Ratios | 7.28 [4.34, 20.34] | 2.39 and 5.31 | - | - |
| Advocacy Limit for Odds or Hazard Ratios | - | - | 105 [1606.86,
>105] | 7406 [1073, 90 725] |
|
| ||||
| Number of studies | n = 44 | n = 37 | n = 7 | - |
| Skepticism Limit for Odds or Hazard Ratios | 3.40 [2.08, 12.21] | 1.50 [1.20, 2.38] | - | - |
| Skepticism Limit for d | 1.55 [.92, 2.58] | .71 [0.19, 1.81] | - | - |
| Advocacy Limit for Odds or Hazard Ratios | - | - | 11 111.0 [160.2, 16 424.2]
(41.39-17 487.80) | - |
| Advocacy Limit for d | - | - | 5.09 [4.57, -] | - |
|
| ||||
| Number of studies | n = 43 | n = 26 | n = 17 | - |
| Skepticism Limit for Odds or Hazard Ratios | 2.73 [1.68, 8.56] | 1.64 [1.40, 1.92] | - | - |
| Skepticism Limit for d | 1.14 [0.58, 2.60] | 0.87 [0.39, 1.50] | - | - |
| Advocacy Limit for Odds or Hazard Ratios | - | - | 9.90 [3.19, 347910] | - |
| Advocacy Limit for d | - | - | 2.31 [0.60, -] | - |
Values as: Median [IQR] (range). ARD: absolute risk difference; d: standardized mean difference.
The credibility of significance (a finding with p ≤ .05) can be challenged by skeptics who believe prior evidence is unlikely to exceed the Skepticism Limit effect size. The credibility of non-significance (a finding with p > .05) can be challenged by advocates who accept effect sizes are unlikely to exceed the Advocacy Limit.
For the SL in the Consecutive RCTs, the SL for OR/HR for the 45 human studies was 2.29 [1.55, 4.89] and for the 5 NHA studies was 16.95 [5.99, >105]; and for d for the 18 human studies was 0.66 [0.18, 1.92] and for the 13 NHA studies was 1.55 [1.09, 3.44]. For AL, the only NHA study had AL for OR 15360.
For the SL in the Consecutive RCTs, for the secondary outcome, the SL for OR/HR included only human studies (there were no NHA studies), and for d for the 21 human studies was 0.62 [0.33, 1.65], and for the 15 NHA studies was 1.53 [1.01, 2.60]. For AL, there was 1 NHA study for OR/HR with AL 1083, and 1 NHA study for d with AL 2.74.
Reverse Bayesian Implications of the Obtained p-Values for Secondary Outcomes in Consecutive RCTs Recently Published in the Field of Critical Care Research.
| Study group | Overall | p > 0.05
| p = 0.05 to 0.0051 | p ≤ 0.005 |
|---|---|---|---|---|
|
| ||||
| Likelihood Ratio: Pr(data│H1)/Pr(data│Ho) | 7.7 [3.1, 47.7] | 0.50 [0.13, 1.93] | 7.0 [3.6, 12.9] | 99.6 [59.9, 99.6] |
| Prior Pr(H1) to have FPR 5% | 71.1 [28.8, 85.9] | 97.4 [90.8, 99.3] | 73.1 [59.6, 84.1] | 16.0 [16.0, 24.1] |
| Minimum FPR (using Prior Pr[H1] of 0.5) | 11.5 [2.1, 24.2] | 66.5 [34.6, 88.7] | 12.5 [7.2, 21.8] | 1.0 [1.0, 1.6] |
| Realistic FPR (using Prior Pr[H1] of 0.1) | 53.8 [16.1, 74.2] | 94.7 [82.5, 98.6] | 56.3 [41.1, 71.4] | 8.3 [8.2, 13.1] |
|
| ||||
| BF Bound [upper bound for Pr(data│H1)/Pr(data│Ho)] | 4.7 [2.6, 23.2] | 1.2 [1.0, 2.0] | 4.4 [2.9, 6.9] | 53.3 [29.6, 53.3] |
| PPV: Highest Posterior PrU(H1│data) using Prior Pr(H1) of 0.5. | 82.5 [72.5, 95.8] | 55.0 [50.1, 67.0] | 81.4 [74.1, 87.4] | 98.2 [96.7, 98.2] |
| PPV: Realistic Posterior PrR(H1│data) using Prior Pr(H1) of 0.1 | 34.3 [22.7, 71.8] | 11.9 [10.1, 18.5] | 32.7 [24.1, 43.5] | 85.5 [76.7, 85.5] |
|
| ||||
| 10th percentile of p-value prediction interval | 0.000035 [0.0000016, 0.00014] | 0.0015 [0.00028, 0.0071] | 0.000041 [0.000015, 0.00011] | 0.00000033 [0.00000033, 0.00000095] |
| 90th percentile of p-value prediction interval | 0.61 [0.24, 0.85] | 1.0 [0.98, 1.0] | 0.63 [0.48, 0.81] | 0.14 [0.14, 0.20] |
| Probability a replication study will have p ≤ 0.05 | 64.3 [51.8, 85.0] | 27.8 [14.4, 44.9] | 62.9 [53.7, 71.0] | 90.8 [87.1, 90.8] |
14 (16.3%) of p-values were >.10. BF: Bayes factor; FPR: false positive rate; Ho: null hypothesis; H1: alternative hypothesis; Pr: probability; PPV: positive predictive value. We report values to one decimal place for consistency of presentation; this is not intended to suggest the values can be known with such precision given the small sample sizes.