| Literature DB >> 30120797 |
Abstract
Concerns over reproducibility in research has reinvigorated the discourse on P-values as measures of statistical evidence. In a position statement by the American Statistical Association board of directors, they warn of P-value misuse and refer to the availability of alternatives. Despite the common practice of comparing P-values across different hypothesis tests in genetics, it is well-appreciated that P-values must be interpreted alongside the sample size and experimental design used for their computation. Here, we discuss the evidential statistical paradigm (EP), an alternative to Bayesian and Frequentist paradigms, that has been implemented in human genetics studies. Using applications in Cystic Fibrosis genetic association analyses, and describing recent theoretical developments, we review how to measure statistical evidence using the EP in the presence of covariates, model misspecification, and for composite hypotheses. Novel graphical displays are presented, and software for their computation is highlighted. The implications of multiple hypothesis testing for the EP are delineated in the analyses, demonstrating a view more consistent with scientific reasoning; the EP provides a theoretical justification for replication that is a requirement in genetic association studies. As genetic studies grow in size and complexity, a fresh look at measures of statistical evidence that are sensible amid the analysis of big data are required.Entities:
Keywords: foundations of statistics; inference; likelihood paradigm; multiple hypothesis testing; statistical evidence
Mesh:
Substances:
Year: 2018 PMID: 30120797 PMCID: PMC6284518 DOI: 10.1002/gepi.22151
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Properties of a diagnostic test for disease
|
|
| |
|---|---|---|
|
| 0.95 | 0.05 |
|
| 0.02 | 0.98 |
Under hypothesis A the disease is present, under hypothesis B the disease is absent, and the observations x = 1 or 0 represent a positive or negative test result, respectively.
Figure 1Standardized likelihood function for the proportion of p.Phe508del homozygotes, n = 1,000, x = 510 p.Phe508del/p.Phe508del observed. This standardized likelihood function provides a graphical representation of all possible likelihood ratios, with the 1/8 and 1/32 likelihood intervals (LI) providing the values for that are consistent with the data at the k = 8 and 32 evidence level. The value of 0.51 is more than 8‐fold greater supported over values less than 0.48, but the data provides only weak evidence supporting 0.51 over 0.50
Figure 2The bump function and relationships between EP error probabilities for a normal mean. Probabilities displayed as a function of the distance between the two hypothesized values for the mean in standard error units. Maximum probability of misleading evidence is 0.02 at for k = 8 and 0.004 at for k = 32, for any n. (a) Probability of misleading evidence, k = 8 and k = 32 displayed. (b) Probabilities of weak, strong and misleading evidence; k = 8. Weak evidence probabilities are close to 1 for small parameter differences; strong evidence probabilities increase as differences increase. EP: evidential paradigm
Figure 3P‐value‐based analysis of the SLC26A9 chromosome 1 CF modifier locus with meconium ileus using the LocusZoom software (http://locuszoom.sph.umich.edu/genform.php?type=yourdata). Analysis of CF participants from the International CF Gene Modifier consortium including siblings (n = 6,770). Analysis adjusted for consortium site and genotyping platform. P‐values are from the Wald χ 2 test using generalized estimating equations with a logit link and an exchangeable covariance structure. CF: cystic fibrosis
EP analysis demonstrating values of sample size (n), number of observed p.Phe508del homozygotes (x) and proportion (x/n) for which strong evidence at k = 8 can be observed for versus (left) and versus (right)
|
|
| |||
|---|---|---|---|---|
|
|
|
|
|
|
| 100 | 102 (> | NA | 570 (> | NA |
| 1,000 | 557 | 0.557 | 1,020 (> | NA |
| 10,000 | 5,102 | 0.5102 | 5,525 | 0.5525 |
| 100,000 | 55,552 | 0.5055 | 50,570 | 0.5057 |
| 1,000,000 | 505,052 | 0.5050 | 501,020 | 0.5010 |
Note. NA indicates there is no value of x for the given n that could produce evidence of strength 8.
EP: evidential paradigm.
The cells with (>n) indicate that the required x to demonstrate strong evidence is greater than n, which is not possible. As sample size increases, the required observed x/n needed to demonstrate strong evidence becomes less extreme. As gets closer to , one needs larger n to produce evidence of k‐fold.
The required evidence strength to reject the null hypothesis in favor (left) or (right)
| The | ||
|---|---|---|
|
|
|
|
| 100 | 1.36 | 1.033 |
| 1,000 | 2.31 | 1.10 |
| 10,000 | 3.63 | 1.36 |
| 100,000 | 6.77 × 10−5 | 2.31 |
| 1,000,000 | 2.58 × 10−73 | 3.63 |
| 10,000,000 | <10−100 | 2.68 × 10−73 |
Required evidence changes as n or changes, and as n increases one rejects when the data overwhelmingly favors .
Figure 4EP analysis of the SLC26A9 chromosome 1 locus with mecomium ileus in CF. Analysis of CF participants of the International CF Gene Modifier consortium. All analyses adjusted for consortium site and genotyping platform. Analysis includes n = 5,869 unrelated individuals using a logistic regression likelihood. LIs for SNPs with max LRs> 1,000 noted in color. 1/k LIs, k = 8, 100, 1,000 displayed in red, green, and blue, respectively. MLE denoted in black on each LI. OR= 1 horizontal line noted as black solid line. Max LR and SNP name for the three SNPs with largest value noted on the figure, along with rs7512462 and rs4077488 which were identified in previous CF studies. CF: cystic fibrosis; MLE: maximum likelihood estimate
Summary statistics for the simple versus simple EP analyses for the unrelated (a) and related (b) CF samples
| MAF | max LR | MLE | 1/8 LI | 1/100 LI | 1/1,000 LI | Robust factor | |
|---|---|---|---|---|---|---|---|
| a. Unrelated | |||||||
| rs7512462 | 0.4089 | 317,462,247 | 1.3699 | 1.2345,1.5163 | 1.1763, 1.5953 | 1.1353, 1.6530 | NA |
| rs142245823 | 0.4650 | 13,423,000,000 | 1.3980 | 1.2662,1.5474 | 1.2066, 1.6239 | 1.1645, 1.6827 | NA |
| rs7549173 | 0.3951 | 9,652,137,218 | 1.3982 | 1.2632,1.5438 | 1.2037, 1.6201 | 1.1646, 1.6744 | NA |
| rs4951271 | 0.4194 | 6,695,658,478 | 1.3980 | 1.2630,1.5514 | 1.2035, 1.6322 | 1.1615, 1.6870 | NA |
| rs4077468 | 0.4129 | 4,679,117,865 | 1.3945 | 1.2598,1.5474 | 1.2005, 1.6280 | 1.1586, 1.6870 | NA |
| b. Related | |||||||
| rs7512462 | 0.4091 | 16,656,196 | 1.3221 | 1.1974,1.4597 | 1.1410, 1.5318 | 1.1040, 1.5872 | 0.975 |
| rs142245823 | 0.4652 | 502,318,282 | 1.3526 | 1.2282,1.4896 | 1.1704, 1.5632 | 1.1324, 1.6157 | 0.966 |
| rs7549173 | 0.3937 | 376,895,001 | 1.3494 | 1.2253,1.4861 | 1.1706, 1.5556 | 1.1326, 1.6078 | 0.970 |
| rs4951271 | 0.4193 | 278,958,808 | 1.3424 | 1.2220,1.4783 | 1.1645, 1.5514 | 1.1266, 1.6034 | 1.004 |
| rs4077468 | 0.4129 | 237,653,274 | 1.3458 | 1.2220,1.4821 | 1.1645, 1.5553 | 1.1266, 1.6075 | 0.995 |
Note. SNPs with the largest max LRs displayed, and statistics for variants rs4077468 and rs7512462 that have displayed previous association evidence with CF. Robust adjustment factor applied to the analysis of the related sample.
MAF: minor allele frequency; MLE: maximum likelihood estimate.
Figure 5Estimated robust adjustment factor and its impact on association evidence. (a) The distribution of the estimated robust adjustment factor across the 222 variants of the SLC26A9 locus; (b) the impact of the robust adjusted profile likelihood function on the association evidence for rs61814952. The robust adjustment factor increases the tails of the likelihood function (Lp) (robust adjusted [solid line] versus unadjusted [dotted line]), widening the LIs. The likelihood function indicates that OR values around 1 are consistent with the data. OR: odds ratio
Figure 6EP analysis with composite hypotheses in the unrelated CF participants of the International CF Gene Modifier consortium as in Figure 4 at the SLC26A9 chromosome 1 locus with mecomium ileus. Analysis uses the generalized likelihood ratio (GLR) with null and alternative hypotheses defined as and ; log10(GLR) < 0 represents evidence favoring the null hypothesis. All analyses adjusted for consortium site and genotyping platform. CF: cystic fibrosis; OR: odds ratio
The probabilities of misleading, weak and strong evidence for alternative ORs (OR1) of 1.10, 1.20, and 1.30 compared to OR = 1 as a function of k; n = 5,869
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|
| OR1 | 1.10 | 1.20 | 1.30 | 1.10 | 1.20 | 1.30 | 1.10 | 1.20 | 1.30 |
|
| 0.0204 | 0.0085 | 0.0013 | 0.5384 | 0.0988 | 0.0122 | 0.4411 | 0.8927 | 0.9865 |
|
| 0.0028 | 0.0028 | 0.0005 | 0.8075 | 0.1924 | 0.0253 | 0.1897 | 0.8048 | 0.9742 |
|
| 0.0004 | 0.0010 | 0.0002 | 0.9302 | 0.2919 | 0.0417 | 0.0695 | 0.7070 | 0.9580 |
|
| 0.0000 | 0.0001 | 0.0000 | 0.9965 | 0.5357 | 0.0989 | 0.0035 | 0.4642 | 0.9010 |
OR: odds ratio.
| Evidential paradigm | Frequentist paradigm | |
|---|---|---|
| Evidence for two simple hypothesized values of |
|
|
| Error favoring |
|
|
| Strong evidence |
|
|
| Intervalsa (e.g., mean of a normal distribution) |
|
|
| Other errors to minimize for study planning |
|
|
| Relationships | Strong evidence, | Strong evidence, |
| aThere is a relationship, beyond the normal distribution, between exact confidence intervals and likelihood intervals but confidence intervals are also coupled with the Type I error probability. | ||
|
bThe Type II error, | ||