| Literature DB >> 21179459 |
Lutz Bornmann1, Rüdiger Mutz, Hans-Dieter Daniel.
Abstract
BACKGROUND: This paper presents the first meta-analysis for the inter-rater reliability (IRR) of journal peer reviews. IRR is defined as the extent to which two or more independent reviews of the same scientific document agree. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2010 PMID: 21179459 PMCID: PMC3001856 DOI: 10.1371/journal.pone.0014331
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Overview of mean reliabilities with confidence interval.
|
|
|
|
|
|
|
|
|
| ||||||
| Fixed effects | van Houwelingen, Arends and Stijnen | 2 | 44 | .234 | .222 | .246 |
| Random effects | van Houwelingen, Arends and Stijnen | 2 | 44 | .341 | .289 | .392 |
| van Houwelingen, Arends and Stijnen | 3 | 44 | .340 | .283 | .396 | |
|
| Hunter & Schmidt | 2 | 26 | .17 | .13 | .21 |
Notes: To obtain the reliability estimates (ICC/r2) shown in this table, correlations (r) were squared. N = number of coefficients included. Levels = number of levels in the meta-analysis.
Figure 1Forest plot of the predicted inter-rater reliability (Bayes estimate) for each study (random effects model without covariates) with 95% confidence interval (as bars) for each reliability coefficient (sorted in ascending order).
The 95% confidence interval of the mean value (vertical line) is shaded grey. Predicted values for the same author and year but with different letters (e.g., Herzog 2005a and Herzog 2005b) belong to the same study.
Description of the covariates included in the meta-regression analyses (n = 32 studies with 44 coefficients).
| Variable (metric) | Range | Mean | Standard Deviation |
| Number of manuscripts | 15→1983 | 321.98 | 398.13 |
| Variable (categorical) | Range | Frequency | Percent |
|
| |||
| ICC | 0–1 | 35 | 80 |
| r (RC) | 0–1 | 9 | 20 |
|
| |||
| Economics/Law | 0–1 | 3 | 7 |
| Natural Sciences | 0–1 | 7 | 16 |
| Medical Sciences | 0–1 | 11 | 25 |
| Social Sciences (RC) | 0–1 | 23 | 52 |
|
| |||
| Paper | 0–1 | 29 | 66 |
| Abstract (RC) | 0–1 | 15 | 34 |
|
| |||
| 1950–1979 | 0–1 | 15 | 34 |
| 1980–1989 | 0–1 | 9 | 21 |
| 1990–1999 | 0–1 | 5 | 11 |
| 2000–2008 | 0–1 | 8 | 18 |
| Unknown (RC) | 0–1 | 7 | 16 |
|
| |||
| Single | 0–1 | 22 | 50 |
| Double | 0–1 | 3 | 7 |
| Unknown (RC) | 0–1 | 19 | 43 |
|
| |||
| Categorical | 0–1 | 35 | 80 |
| Metric | 0–1 | 5 | 11 |
| Unknown (RC) | 0–1 | 4 | 9 |
Note: RC = reference category in meta-regression analysis. Unknown = this information is missing in a study.
*Rating systems are classified as categorical, if they have nine or fewer categories; in case of more than nine categories, the classification is made as metric [110].
Multilevel meta-analyses of the metric inter-rater-reliabilities (Fisher-Z √rtt or r).
| Model | Model 0Intercept | Model 1Number of Manuscripts | Model 2Method | Model 3Discipline | Model 4Object of Appraisal | Model 5Cohort | Model 6Blinding | Model 7Rating System | Model 8Number of Manuscripts, Rating System |
|
|
|
|
|
|
|
|
|
|
|
| Intercept | .67 / .04 | .77 / .04 | .69 / .07 | .66 / .05 | .74 / .07 | .60 / .08 | .71 / .05 | 1.03 / .13 | 1.06 / .11 |
| Number of Manuscripts/100 | −.03 / .007 | −.03 / .006 | |||||||
| Method (RC = r) | |||||||||
| ICC | −.02 / .08 | ||||||||
| Discipline (RC = Social Sciences) | |||||||||
| Economics/Law | .08 / .13 | ||||||||
| Natural Sciences | −.02 / .09 | ||||||||
| Medical Sciences | −.006 / .09 | ||||||||
| Object of Appraisal (RC = Abstract) | |||||||||
| Paper | −.06 / 0.08 | ||||||||
| Cohort (RC = unknown) | |||||||||
| 1950–1979 | .10 / .09 | ||||||||
| 1980–1989 | .07 / .11 | ||||||||
| 1990–1999 | .15 / .15 | ||||||||
| 2000–2008 | −.007 / .12 | ||||||||
| Blinding (RC = unknown) | |||||||||
| Double | .15 / .11 | ||||||||
| Single | −.05 / .08 | ||||||||
| Rating System (RC = unknown) | |||||||||
| Categorical | −.40 / .14 | −.32 / .11 | |||||||
| Metric | −.33 / .16 | −.33 / .13 | |||||||
|
| |||||||||
| Study Level 3 | .03 /.01 | .016 / .009 | .03 / .012 | .03 / .01 | .027 / .01 | .03 / .01 | .03 / .01 | .017 / .01 | .0036 / .02 |
| Coefficient Level 2 | .01 /.009 | .007 / .007 | .009 / .009 | .009 / .009 | .009 / .009 | .007 / .008 | .005 / .007 | .01 /.01 | .01 / .02 |
| −2LL | −8.4 | −23.7 | −8.5 | −8.8 | −12.7 | −10.7 | −10.7 | −15.3 | −30.0 |
| BIC | 2.0 | −9.9 | 1.9 | 8.5 | −2.4 | 10.1 | 3.2 | −1.4 | −12.6 |
Note: For each categorical variable, one category was chosen as a reference category (RC, e.g., RC = Social Sciences for the categorical variable discipline). For categorical variables, effect for each predictor variable (a dummy variable representing one of the categories) is a regression coefficient (Coeff) that should be interpreted in relation to its standard error (SE) and the effect of the reference category. Variance components for level 1 are derived from the data, but variance components at level 2 and level 3 indicate the amount of variance that can be explained by differences between studies (level 3) and differences between single reliability coefficients nested within studies (level 2). The loglikelihood test provided by SAS/proc mixed (−2LL) can be used to compare different models, as can also the Bayes Information Criteria (BIC). The smaller the BIC, the better the model is.
*p<.05.