| Literature DB >> 29053734 |
Pentti Nieminen1, Jorma I Virtanen2,3, Hannu Vähänikkilä2,3.
Abstract
BACKGROUND: There is widespread evidence that statistical methods play an important role in original research articles, especially in medical research. The evaluation of statistical methods and reporting in journals suffers from a lack of standardized methods for assessing the use of statistics. The objective of this study was to develop and evaluate an instrument to assess the statistical intensity in research articles in a standardized way.Entities:
Mesh:
Year: 2017 PMID: 29053734 PMCID: PMC5650171 DOI: 10.1371/journal.pone.0186882
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Basic statistical methods used in medical research by research goal and type of outcome variable.
| Research goal | Type of outcome variable | |||
|---|---|---|---|---|
| Measurement from symmetric distribution | Measurement from very skewed distribution | Categorical variable | Time to event | |
| Mean, SD | Median, interquartile range | Proportion | Kaplan Meier curve | |
| Independent samples t-test | Mann-Whitney test | Chi-square test | Kaplan Meier curves and log-rank test | |
| One-way ANOVA | Kruskal-Wallis test | Chi-square test | Kaplan Meier curves and log-rank test | |
| t-test for repeated measurements | Wilcoxon test | McNemar test | ||
| Repeated-measures ANOVA | Friedman test | Cochrane Q test | ||
| Pearson correlation | Spearman correlation | Cross-tabulation with chi-square test, RR or OR statistics | ||
| Multiple linear regression | Negative binomial regression | Logistic regression | Cox proportional hazard regression | |
Advanced statistical methods.
| Research goal | Brief description of methods |
|---|---|
| Includes weighting procedures, imputation based procedures and direct model based analysis for handling incomplete data. | |
| Steps for constructing a multivariable model: Stepwise variable selection, covariate adjustments, goodness of fit statistics and model validation, analyzing interaction, influence analysis and other diagnostic statistics. | |
| Methods for analysing clustered data where repeated measurements are made for same individuals over time or individuals are nested within groups. Extensions to basic regression methods can handle the dependencies between observations and the following terms refer to these extensions: generalized estimating equations (GEE), hierarchical models, multilevel models, nested models, generalized linear mixed models, mixed effects models, random effect models. | |
| Measures to assess agreement between raters or observers for the same set of subjects or patients. For categorical outcomes Cohen’s kappa and more stable AC1 coefficient are the most-used measures. Intra-class correlation coefficients (ICC) with several versions for different experimental designs and aims of the study are applied for assessing agreement with continuous outcomes. | |
| Meta-analysis uses data from numerous primary studies to produce an estimate of an overall associations, and explores variation between the studies. | |
| Factor analysis combines multiple related variables into a small number of new variables which then represent the assumed latent characteristics in the subjects. | |
| Structural equation models (SEM) are composed of several causal statements which hypothesize causal relationships between several observed or unobserved (latent) variables. | |
| Cluster analysis identifies sets of individuals who are more like each other, than they are like other individuals. This method is used to search for patterns in data and then to construct laws or rules that explain the pattern. | |
| Bayesian methods offer an alternative way of analysing data. Bayesian statistics creates and combines numerical values of prior belief, exiting data and new data. | |
| Fractional polynomials, spline functions and generalized additive models (GAM) intend to extract full information from continuous variables in a multivariable setting with plausible functional form. | |
| Artificial neural networks and machine learning are fields of computer science that apply algorithms that can identify patterns, establish relationships to solve problems through data analysis, learn from and make predictions on these large data sets. | |
| Bootstrapping allows statistical inference and estimation of almost any statistic using a very general resampling procedure for estimating. | |
| Propensity scores are calculations of the likelihood of individuals being in a particular treatment or research group. Scores depend on those variables thought to influence group membership. Propensity score can be used as a covariate in a regression model, as a variable on which to match subjects or as a variable on which to stratify subjects. |
Basic statistics of the intensity score by study design, sample size and main outcome.
| Number of articles | Mean (SD) of SIMA score | P-value of ANOVA | |
|---|---|---|---|
| < 0.001 | |||
| • cross-sectional survey | 209 | 15.2 (6.0) | |
| • longitudinal cohort study | 142 | 19.2 (5.1) | |
| • case-control | 49 | 16.3 (5.2) | |
| • intervention study (clinical trial) | 218 | 17.3 (5.9) | |
| • reliability / diagnostic study | 37 | 13.7 (6.4) | |
| • laboratory work | 111 | 7.9 (4.3) | |
| • meta-analysis | 39 | 18.5 (6.3) | |
| • case study | 13 | 2.6 (2.3) | |
| • other | 22 | 11.8 (7.6) | |
| < 0.001 | |||
| • <30 | 135 | 10.1 (6.0) | |
| • 30–99 | 142 | 13.4 (5.6) | |
| • 100–300 | 133 | 16.5 (5.5) | |
| • >300 | 369 | 19.0 (5.1) | |
| < 0.001 | |||
| • Not significant | 143 | 17.3 (5.6) | |
| • Significant | 486 | 16.9 (5.8) | |
| • Not evaluated | 211 | 10.4 (7.1) | |
| 840 | 15.3 (6.8) |
Fig 1Intensity of statistical methods and reporting by the publication journal.
The dotted horizontal line shows the median value of all evaluated 840 original research articles.
Fig 2The distribution of the intensity score by publication year in the Lancet and NEJM.
Inter-observer reliability of the statistical intensity score (SIMA score).
All raters ICC = 0.878 (agreement definition, single measures, mixed model.
| 0.984 | 0.943 | 0.909 | 0.843 | |
| 0.917 | 0.795 | |||
| 0.861 |
a Test-retest reliability
Summary statistics of inter-rater (and intra-rater) reliability between the raters, based on percent agreement, kappa and AC1from a total of 63 items.
| Mean | Median | Minimum | Maximum | |
|---|---|---|---|---|
| • % Agreement | 98.9 | 100.0 | 92.5 | 100.0 |
| • Kappa | 0.94 | 1.00 | 0.00 | 1.00 |
| • AC1 | 0.98 | 1.00 | 0.90 | 1.00 |
| • % Agreement | 95.2 | 97.5 | 72.5 | 100.0 |
| • Kappa | 0.75 | 0.84 | -0.03 | 1.00 |
| • AC1 | 0.93 | 0.96 | 0.50 | 1.00 |
| • % Agreement | 92.1 | 95.0 | 62.5 | 100.0 |
| • Kappa | 0.56 | 0.68 | -0.08 | 1.00 |
| • AC1 | 0.88 | 0.94 | 0.286 | 1.00 |
| • % Agreement | 91.8 | 95.0 | 22.5 | 100.0 |
| • Kappa | 0.63 | 0.78 | -0.04 | 1.00 |
| • AC1 | 0.87 | 0.94 | -0.50 | 1.00 |
| • % Agreement | 92,9 | 95,0 | 67.5 | 100,0 |
| • Kappa | 0.59 | 0.73 | -0.06 | 1,00 |
| • AC1 | 0,86 | 0,95 | 0,41 | 1,00 |
| • % Agreement | 91.7 | 95.0 | 25.0 | 100.0 |
| • Kappa | 0.62 | 0.77 | -0.05 | 1.00 |
| • AC1 | 0.87 | 0.95 | -0.41 | 1.00 |
| • % Agreement | 92,2 | 95,0 | 40,0 | 100,0 |
| • Kappa | 0,57 | 0,70 | -0,06 | 1,00 |
| • AC1 | 0,88 | 0,95 | -0,07 | 1,00 |
a SB = senior biostatistician, SB re = senior biostatistician rescoring, JB = junior biostatistician, SMR = senior medical researcher and JMR = junior medical researcher