| Literature DB >> 31560143 |
Daniela Dunkler1, Maria Haller1,2, Rainer Oberbauer3, Georg Heinze1.
Abstract
Most research in transplant medicine includes statistical analysis of observed data. Too often authors solely rely on P-values derived by statistical tests to answer their research questions. A P-value smaller than 0.05 is typically used to declare "statistical significance" and hence, "proves" that, for example, an intervention has an effect on the outcome of interest. Especially in observational studies, such an approach is highly problematic and can lead to false conclusions. Instead, adequate estimates of the observed size of the effect, for example, expressed as the risk difference, the relative risk or the hazard ratio, should be reported. These effect size measures have to be accompanied with an estimate of their precision, like a 95% confidence interval. Such a duo of effect size measure and confidence interval can then be used to answer the important question of clinical relevance.Entities:
Keywords: clinical significance; effect size measure; statistical inference; statistical significance; statistical tests
Year: 2019 PMID: 31560143 PMCID: PMC6972498 DOI: 10.1111/tri.13535
Source DB: PubMed Journal: Transpl Int ISSN: 0934-0874 Impact factor: 3.782
Review of all manuscripts published in Transplant International in the category “clinical research” in 2018
Simplified definitions of selected concepts of statistical testing and estimation
| Some key ingredients to statistical testing | |
| Null hypothesis | States that two different interventions (or exposures) lead to the same outcome, that is, that the effect size is 0 if expressed as a difference, or 1 if expressed as a ratio. |
| Alternative hypothesis | States that two different interventions lead to different outcomes, that is, that the effect size is not equal to 0 if expressed as a difference, or not equal to 1 if expressed as a ratio. |
| Statistical test | Depending on the research question (e.g., scale of the outcome), various statistical tests, like a t‐test, are available. A test statistic measures the “distance” between the data and the null hypothesis. A test is only valid if its underlying assumptions are met. These assumptions do not only encompass direct assumptions of the test, like approximate normal distribution in the case of a t‐test, but also assumptions about the conduct of the study, like random selection of subjects and treatment or that no interim analyses were conducted. |
|
| To facilitate interpretation and comparison, a test statistic is usually transformed to a probability scale and expressed as a |
| Key ingredients to estimation | |
| Effect size estimate | Expresses the expected difference or ratio in the outcome between two interventions. |
| Confidence interval | Expresses the imprecision of an estimate of effect size that arises from a limited sample size. Technically, when the study could be repeated very often and the confidence level is set to 95%, then 95% of the confidence intervals computed on the study repetitions will cover the true effect size. |
| Clinical relevance | Based on the effect size estimate and confidence interval in addition to subject matter knowledge and other published results, a researcher can finally answer the question of clinical relevance “Are observed differences between the two study groups large enough to be of clinical significance?” |
For methodologically correct definitions, we refer to Greenland et al. 2. For information on statistical testing, we refer to textbooks on statistics, for example, Agresti et al. 14.
Some commonly used effect size measures to compare two interventions in transplantation research
| Scale of the outcome | Examples | Effect size measures | Example of interpretation “If intervention 1 is compared to intervention 2, …” | Statistical method for generalization |
|---|---|---|---|---|
| Continuous | Glomerular filtration rate, glucose level | Difference of means | “…the expected difference in glomerular filtration rate is 5 mL/min/1.73 m2.” | General linear model (linear regression, ANCOVA) |
| Binary within a fixed, fully observable time frame | Complications during transplantation, delayed graft function | Risk difference p1‐p2 | “… in 5% of all people the occurrence of a complication during transplantation could be avoided.” | Risk prediction after logistic regression |
| Relative risk (RR) p1/p2 | “… the probability of the occurrence of a complication during transplantation multiplies by 1.25.” | Poisson regression (for rare outcome events) | ||
| Odds ratio (OR) Odds1/Odds2 | “… the odds of the occurrence of a complication during transplantation multiples by 1.5.” [Odds1 = p1/(1−p1)] | Logistic regression | ||
| Binary within varying follow‐up time | Incidence of acute rejection episodes | Incidence rate ratio | “… the expected number of acute rejection episodes per patient year multiplies by 1.15.” | Poisson regression |
| Time‐to‐event | Patient survival, graft survival | Survival difference at | “… in 7% of all people graft loss within the first two years could be avoided.” | Survival estimation after Cox regression |
| Hazard ratio (HR) | “… the instantaneous mortality multiplies by 1.2.” | Cox regression |
The choice of effect size measure depends on the scale of the outcome. Examples of correct interpretations for comparison of two interventions are given. Statistical methods to generalize the analysis for adjustment for potential confounders or continuous exposure variables are presented. p1, p2, the observed event rates after intervention 1 or 2; S 1(t), S 2(t), the observed survival proportions at t years after interventions 1 and 2.
*For comparing exposures, change to “If exposed individuals are compared to unexposed individuals, …”