| Literature DB >> 32441825 |
Peter Cahusac1,2.
Abstract
In an earlier article, Gordon Drummond summarized ongoing changes in how statistics are being used in experimental physiology. He described the near-ubiquitous use of the P value, cautioning against declaring an analysis 'statistically significant'. He mentioned alternative approaches, including Bayesian and likelihood approaches. This article focuses on the latter approach, although I initially take another look at the P value. Then the likelihood approach is introduced with a very artificial example to enable the concept to be grasped easily. Next, a more realistic example is described, with associated calculations. A further example using real categorical data is explained, showing how it relates to and is superior to the oft-used χ2 test. A final discussion reveals that the likelihood approach, although mathematically and statistically accurate, is poorly supported by literature and training.Entities:
Keywords: P values; evidential approach; likelihood; support
Mesh:
Year: 2020 PMID: 32441825 PMCID: PMC7280729 DOI: 10.1113/EP088664
Source DB: PubMed Journal: Exp Physiol ISSN: 0958-0670 Impact factor: 2.858
Interpreting support values
|
|
|
|
|---|---|---|
| 1 (1.00) | 0 | No evidence either way |
| 2.7 (0.37) | 1 | Weak evidence |
| 7.4 (0.14) | 2 | Moderate evidence |
| 20 (0.05) | 3 | Strong evidence |
| 55 (0.02) | 4 | Extremely strong evidence |
The first column gives the likelihood ratio (LR) and inverse (in parenthesis). The second column gives the natural logarithm of LR (log LR), known as the support (S). The third column gives the verbal description for the level of support for hypothesis H 1 versus H 2 for each value of S. Negative values for S represent support for H 2 versus H 1. Support values can be given to one decimal place. This scale provides graded evidence without a threshold, unlike P values. The scale runs from −∞ to +∞; the midpoint of zero represents no evidence either way. This table has been adapted from that suggested by Goodman & Royall (1988). British courts use the same LR scale, but ramped up, meaning that the court considers S = 4 as only ‘moderate evidence’, and S = 8.6 as ‘strong evidence’. Therefore, compared with scientists, courts require more than twice the evidence to influence their judgements.
FIGURE 1The distribution of heights for men (m) and women (w) from 20 countries (aggregate of Europe, North America, Australia and East Asia) in a cohort born between 1980 and 1994. The curves are likelihood functions calculated using the obtained means and standard deviations. Likelihood functions are scaled so that their maximum occurs at one (see vertical axis). The mean for each distribution occurs at the peak likelihood value. The curve for men (dashed) is slightly wider because the distribution has a larger standard deviation. A randomly selected person has a height of 180 cm and is indicated by the thin vertical line. The likelihoods, shown as horizontal dashed lines, represent the heights of the two curves where the thin vertical line hits them. The brackets are labelled with the likelihood values, and at the top right is given the likelihood ratio (LR) calculation. At the top left of the figure, the log LR, the support (S), is shown
FIGURE 2Graphical illustration of the different scales for likelihood ratio (LR) and support (S). The LR scale on the left shows that all the LRs in favour of hypothesis H 2 are ‘compressed’ between zero and one. Above one, the LR favours H 1, proceeding up to infinity. On the right is the linear scale provided by the logarithmic transform of the LR, S. This has as its midpoint zero, representing no evidence either way. Above this are values in favour of H 1 that proceed to +∞. Below this are negative values in favour of H 2 that proceed to −∞. If the null hypothesis, H 0, is used, then this would replace H 2
FIGURE 3The likelihood function for the observed data. The function is centred on the mean value of 2.2, which is the maximum likelihood estimate, indicated by the dashed vertical line. The fine lines hitting the curve represent the different hypothesis means. These are labelled with callouts for the null, the minimal and previous values. The lines for the null value are so low that they are not visible. Each has a horizontal line from the curve to the nearest vertical axis, and the likelihood value given. These represent their height as a proportion of one (the maximum). The thicker rectangular lines represent the likelihood interval for S = −2, which reaches e−2 = 0.1353 on the vertical axis
Likelihood and support values calculated from t values for hypothesized population mean values
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| Null | 0 | −4.1695 | 0.0039 | 254.8447 | 5.541 |
|
| Minimal | 2 | −0.3790 | 0.9245 | 1.0816 | 0.078 |
|
| Previous | 3 | 1.5162 | 0.3204 | 3.1206 | 1.138 |
The three different hypotheses (H) are numbered in the first column, with the second column giving what they represent and the third column being the hypothesized mean value for each of them. The likelihood (L) value is calculated from the t value. The next column gives the reciprocal of the likelihood, and the last is the support (S) for the reciprocal.
Likelihood ratios (LR) and support (S) values given for three pairs of hypothesis comparisons
|
|
|
|
|---|---|---|
|
| 235.6143 | 5.462 |
|
| 81.6649 | 4.403 |
|
| 0.3466 | −1.060 |
Likelihood ratios (LR) and support (S) values when N = 33
|
|
|
|
|---|---|---|
|
| 6805159.2786 | 15.73 |
|
| 338997.6799 | 12.73 |
|
| 0.0498 | −3.00 |
FIGURE 4Sample size (N) increases the evidence in a linear fashion. This plot represents the support (S) values calculated when comparing hypothesis H 2 versus H 1. The negative values for S are interpreted as evidence against H 2 relative to H 1, i.e. weakening of evidence. Alternatively, the evidence for H 1 relative to H 2 is increasing. Using z rather than t would have produced a line that went through zero: no data = no evidence
FIGURE 5The support function for differences in the dead male cell counts. The support intervals for two and four are shown by dashed lines. The changes assume fixed marginal totals. The term ‘cell’ here refers to one of the four values located within the body of the coronavirus disease 2019 (COVID‐19) data shown in the main text
FIGURE 6The support function for the odds ratio, which is maximal at the observed odds ratio of 1.945. The support intervals for two and four are shown by dashed lines. The function assumes fixed marginal totals
|
| The null hypothesis, which typically represents no effect of a treatment on our measurements. |
|
| The alternative or experimental hypothesis, which represents some/any effect produced by a treatment on our measurements. |
| Standard deviation | Roughly speaking, it is the average variability of individual data points. It tells us how much data varies. It is the square root of the variance. |
| Standard error | Like the standard deviation but for a statistic, such as a mean. It tells us how much a statistic, such as a mean, varies. It is the standard deviation for the sampling distribution. |
| Sampling distribution | The distribution of a statistic, such as a mean, when repeated samples of specified size are taken from a large population. This is best demonstrated by computer simulation. For a sampling distribution of the mean (which we are most often interested in), the sampling distribution becomes more normally distributed the larger the sample size. The standard deviation of the sampling distribution gives us the standard error. |
|
| Obtained from the standard normal distribution, this statistic represents the number of standard deviations a value is from a specified mean. If this concerns the sampling distribution, it represents the number of standard errors from the specified mean. It requires the population standard deviation to be known. |
|
| The commonest statistical test produces |
|
| This is obtained in an analysis of variance (ANOVA) and represents the variance ratio from two estimates of population variance: the between‐groups to the within‐groups variance. If there are only two samples then the square root of the |
| χ2 | This is the sum of independent squared standard normal ( |
|
| In a bivariate correlation, it is the correlation coefficient |
| MLE | Maximum likelihood estimate is a statistic which, for a given model, has the highest probability of being predicted from the data. It is therefore the value with the highest likelihood given the data. As the data sample increases to infinity it has desirable properties, such as statistical consistency (converges to the true value) and statistical efficiency (no other estimator is more efficient). |
|
| What we obtain from a statistical test which tells us the probability of obtaining our data or more extreme data assuming the null hypothesis is true. Often misunderstood. |
|
|
|
|
| |||
|---|---|---|---|
|
|
|
|
|
| Male | 41 | 3095 | 3136 |
| Female | 34 | 4992 | 5026 |
| Total | 75 | 8087 | 8162 |