| Literature DB >> 33206715 |
Alberto Baccini1, Lucio Barabesi1, Giuseppe De Nicolao2.
Abstract
This paper analyzes the concordance between bibliometrics and peer review. It draws evidence from the data of two experiments of the Italian governmental agency for research evaluation. The experiments were performed by the agency for validating the adoption in the Italian research assessment exercises of a dual system of evaluation, where some outputs were evaluated by bibliometrics and others by peer review. The two experiments were based on stratified random samples of journal articles. Each article was scored by bibliometrics and by peer review. The degree of concordance between the two evaluations is then computed. The correct setting of the experiments is defined by developing the design-based estimation of the Cohen's kappa coefficient and some testing procedures for assessing the homogeneity of missing proportions between strata. The results of both experiments show that for each research areas of science, technology, engineering and mathematics the degree of agreement between bibliometrics and peer review is-at most-weak at an individual article level. Thus, the outcome of the experiments does not validate the use of the dual system of evaluation in the Italian research assessments. More in general, the very weak concordance indicates that metrics should not replace peer review at the level of individual article. Hence, the use of the dual system in a research assessment might worsen the quality of information compared to the adoption of peer review only or bibliometrics only.Entities:
Mesh:
Year: 2020 PMID: 33206715 PMCID: PMC7673579 DOI: 10.1371/journal.pone.0242520
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flowchart of EXP1.
Flowchart has been drawn with diagrams.net and by adopting its symbols and conventions.
Fig 2Flowchart of EXP2.
Flowchart has been drawn with diagrams.net and by adopting its symbols and conventions.
Population, sample and sub-sample sizes for scientific areas in EXP1.
| Scientific Areas | Population | Sample | Sub-sample |
|---|---|---|---|
| Area 1—Mathematics and Informatics | 6758 | 631 | 438 |
| Area 2—Physics | 15029 | 1412 | 1212 |
| Area 3—Chemistry | 10127 | 927 | 778 |
| Area 4—Earth Sciences | 5083 | 458 | 377 |
| Area 5—Biology | 14043 | 1310 | 1058 |
| Area 6—Medicine | 21191 | 1984 | 1602 |
| Area 7—Agricultural and Veterinary Sciences | 6284 | 532 | 425 |
| Area 8a—Civil Engineering | 2460 | 225 | 198 |
| Area 9—Industrial and Information Engineering | 12349 | 1130 | 919 |
| Area 13—Economics and Statistics | 5681 | 590 | 590 |
| 99005 | 9199 | 7597 |
Source: ANVUR (2013, Appendix B).
Population, sample, sub-sample sizes and number of missing articles for scientific areas in EXP2.
| Scientific Areas | Population | Sample | Sub-sample | Missing |
|---|---|---|---|---|
| Area 1—Mathematics and Informatics | 4631 | 467 | 344 | 23 |
| Area 2—Physics | 10182 | 1018 | 926 | 10 |
| Area 3—Chemistry | 6625 | 662 | 549 | 9 |
| Area 4—Earth Sciences | 3953 | 394 | 320 | 6 |
| Area 5—Biology | 10423 | 1037 | 792 | 86 |
| Area 6—Medicine | 15400 | 1524 | 1071 | 231 |
| Area 7—Agricultural and Veterinary Sciences | 6354 | 638 | 489 | 8 |
| Area 8b—Civil Engineering | 2370 | 237 | 180 | 3 |
| Area 9—Industrial and Information Engineering | 9930 | 890 | 739 | 108 |
| Area 11b—Psychology | 1801 | 180 | 133 | 5 |
| Area 13—Economics and Statistics | 5490 | 512 | 498 | 14 |
| 77159 | 7667 | 6041 | 503 |
Source: ANVUR (2017, Appendix B).
Cohen’s kappa coefficient estimates (percent) for EXP1 (95% confidence level intervals in parenthesis), bibliometric vs peer review ratings.
| Area | ANVUR |
|
|
|
|---|---|---|---|---|
| 1 | 31.73(23.00,40.00) | 31.73(25.21,38.26) | 33.40(26.80,40.00) | 15.07(11.76,18.38) |
| 2 | 25.15(21.00,29.00) | 25.15(21.10,29.19) | 29.15(25.29,33.01) | 18.91(16.24,21.58) |
| 3 | 22.96(17.00,29.00) | 22.96(18.05,27.86) | 23.98(19.09,28.88) | 14.52(11.32,17.71) |
| 4 | 29.85(21.00,39.00) | 29.85(23.32,36.37) | 30.24(23.69,36.79) | 20.32(15.66,24.99) |
| 5 | 34.53(29.00,40.00) | 34.53(30.51,38.54) | 36.62(32.72,40.51) | 23.85(21.13,26.58) |
| 6 | 33.51(29.00,38.00) | 33.51(30.30,36.72) | 34.62(31.47,37.77) | 22.73(20.51,24.95) |
| 7 | 34.37(27.00,42.00) | 34.37(27.99,40.75) | 36.62(30.59,42.65) | 22.60(18.43,26.77) |
| 8a | 22.61(11.00,34.00) | 22.61(12.70,32.52) | 22.99(13.06,32.92) | 16.35(8.90,23.80) |
| 9 | 17.10(13.00,21.00) | 17.10(13.17,21.03) | 21.95(17.78,26.11) | 12.56(10.12,15.01) |
| 13 | 61.04(53.00,69.00) | 54.17(49.37,58.98) | 54.17(49.37,58.98) | 54.17(49.37,58.98) |
| All | 38.00(36.00,40.00) | 34.15(32.64,35.66) | 35.76(34.28,37.24) | 23.28(22.23,24.33) |
a Source: [12, Appendix B]. Reproduced in [4].
b Estimated with the wrong system of weights as documented in [8]. Benedetto et al. [15] justified it as “factual error in editing of the table” and published a corrected estimate of 54.17.
c Ancaiani et al. [4] reported a different estimate of 34.41, confirmed also in [15].
Cohen’s kappa coefficient estimates (percent) for EXP2 (95% confidence level intervals in parenthesis), bibliometric vs peer review ratings.
| Area | ANVUR |
|
|
|
|---|---|---|---|---|
| 1 | 21.50(15.10,27.80) | 21.48(15.38,27.58) | 22.85(16.71,29.00) | 14.97(11.79,18.16) |
| 2 | 26.50(22.40,30.50) | 26.48(22.61,30.34) | 28.66(24.86,32.46) | 22.35(19.46,25.23) |
| 3 | 19.50(14.30,24.70) | 19.49(14.60,24.38) | 20.85(16.01,25.69) | 13.71(10.71,16.72) |
| 4 | 23.90(16.60,31.20) | 23.90(17.02,30.77) | 24.52(17.75,31.28) | 15.78(11.55,20.01) |
| 5 | 24.10(19.70,28.40) | 24.07(19.98,28.15) | 25.01(20.97,29.05) | 19.93(17.75,22.11) |
| 6 | 22.80(19.50,26.20) | 22.83(19.62,26.04) | 24.47(21.32,27.62) | 21.00(19.49,22.51) |
| 7 | 27.00(21.30,32.70) | 27.01(21.66,32.36) | 28.76(23.56,33.96) | 16.02(13.05,18.99) |
| 8b | 17.20(8.80,25.60) | 17.21(9.183,25.23) | 20.36(12.55,28.16) | 11.45(7.23,15.67) |
| 9 | 16.90(12.90,21.00) | 16.91(13.04,20.78) | 19.62(15.82,23.42) | 18.51(16.58,20.44) |
| 11b | 24.10(13.70,34.50) | 24.09(14.30,33.88) | 25.45(15.93,34.97) | 14.76(9.556,19.95) |
| 13 | 30.90(26.20,35.50) | 30.85(26.36,35.34) | 30.85(26.36,35.34) | 31.54(27.51,35.57) |
| All | 26.00(24.50,27.60) | 26.10(24.64,27.56) | 27.31(25.87,28.74) | 20.88(20.05,21.71) |
a Source: [17, Appendix B].
Fig 3”Error-bar” plots of the Cohen’s kappa coefficient estimates (percent) for EXP1, bibliometric vs peer review ratings.
The confidence intervals are at 95% confidence level and estimates corresponding to , and are in red, green and blue, respectively.
Fig 4”Error-bar” plots of the Cohen’s kappa coefficient estimates (percent) for EXP2, bibliometric vs peer review ratings.
The confidence intervals are at 95% confidence level and estimates corresponding to , and are in red, green and blue, respectively.
Cohen’s kappa coefficient estimates (percent) for EXP1 (95% confidence level intervals in parenthesis), P1 vs P2 ratings.
| Area | ANVUR |
|
|
|
|---|---|---|---|---|
| 1 | 35.16(26.00,44.00) | 33.31(27.54,39.09) | 35.16(25.26,45.06) | 28.87(18.17,39.57) |
| 2 | 22.71(18.00,28.00) | 23.42(19.44,27.41) | 22.71(17.28,28.14) | 19.31(9.227,29.39) |
| 3 | 23.81(17.00,30.00) | 20.83(16.00,25.65) | 23.81(17.73,29.89) | 2.56(-7.01,12.15) |
| 4 | 25.48(15.00,36.00) | 23.27(16.55,30.00) | 25.48(16.59,34.36) | 12.37(-3.47,28.23) |
| 5 | 27.17(21.00,33.00) | 24.85(20.76,28.93) | 27.17(21.56,32.78) | 11.12(1.77,20.46) |
| 6 | 23.56(19.00,29.00) | 21.85(18.57,25.12) | 23.56(19.09,28.03) | 11.84(4.19,19.48) |
| 7 | 26.56(21.00,33.00) | 17.47(11.34,23.61) | 16.99(8.15,25.83) | 16.41(2.91,29.90) |
| 8a | 19.43(6.00,32.00) | 19.92(9.64,30.21) | 19.43(6.65,32.20) | 23.77(-7.45,54.99) |
| 9 | 18.18(12.00,24.00) | 19.39(14.93,23.84) | 18.18(11.72,24.64) | 21.1(10.70,31.50) |
| 13 | 45.99(38.00,54.00) | 38.98(33.50,44.47) | 38.98(33.50,44.47) | - |
| All | 33.00(31.00,35.00) | 26.68(25.16,28.20) | 27.92(25.90,29.95) | 18.90(15.30,22.50) |
a Source: [12]. Reproduced in [4].
b Estimated with the wrong system of weights as reported in [8]. Benedetto et al. [15] justified it as “factual error in editing of the table” and published a corrected estimate of 16.99.
c Estimated with the wrong system of weights as reported in [8]. Benedetto et al. [15] justified it as “factual error in editing of the table” and published a corrected estimate of 38.998.
d Ancaiani et al. [4] reported a different estimate of 28.16, confirmed also in [15].
e Weighted Cohen’s kappa for the sets of articles with a definite bibliometric rating (DBR).
f Weighted Cohen’s kappa for the sets of articles without a definite bibliometric rating and submitted to informed peer review (IPR).
Cohen’s kappa coefficient estimates (percent) for EXP2 (95% confidence level intervals in parenthesis), P1 vs P2 rating.
| Area | ANVUR |
|
|
|
|---|---|---|---|---|
| 1 | 20.20(12.90,27.50) | 23.92(17.68,30.15) | 20.18(11.11,29.25) | 35.71(22.11,49.31) |
| 2 | 19.50(14.60,24.40) | 21.13(16.75,25.52) | 19.50(14.24,24.77) | 20.29(6.81,33.77) |
| 3 | 14.00(7.90,20.10) | 14.67(9.36,19.98) | 13.99(7.14,20.83) | 15.53(3.04,28.02) |
| 4 | 18.90(11.10,26.80) | 18.63(11.97,25.29) | 18.94(10.14,27.75) | 12.4(-2.21,27.02) |
| 5 | 19.50(14.60,24.50) | 20.21(15.80,24.63) | 19.53(13.65,25.41) | 20.73(9.82,31.63) |
| 6 | 19.10(17.90,23.20) | 17.84(14.24,21.44) | 19.08(14.29,23.87) | 7.69(-0.73,16.13) |
| 7 | 19.60(13.40,25.80) | 22.38(17.14,27.63) | 19.57(11.58,27.57) | 28.34(17.54,39.15) |
| 8b | 3.50(-0.06,13.20) | 8.70(0.22,17.19) | 3.47(-9.42,16.37) | 22.41(5.90,38.92) |
| 9 | 15.10(9.90,20.30) | 15.36(10.84,19.89) | 15.09(8.87,21.31) | 12.71(1.65,23.76) |
| 11b | 25.70(13.30,38.20) | 25.79(15.39,36.19) | 25.72(8.93,42.50) | 20.68(0.22,41.15) |
| 13 | 31.20(25.40,36.90) | 31.15(25.69,36.61) | 31.15(25.69,36.61) | - |
| All | 23.40(21.60,25.20) | 23.54(21.97,25.10) | 23.50(21.48,25.52) | 19.85(15.94,23.77) |
a Source: [17].
b Weighted Cohen’s kappa for the sets of articles with a definite bibliometric rating (DBR).
c Weighted Cohen’s kappa for the sets of articles without a definite bibliometric rating and submitted to informed peer review (IPR).
Fig 5”Error-bar” plots of the Cohen’s kappa coefficient estimates (percent) for EXP1, P1 vs P2 ratings.
The confidence intervals are at 95% confidence level and estimates corresponding to , (DBR) and (IR) are in red, green and blue, respectively.
Fig 6”Error-bar” plots of the Cohen’s kappa coefficient estimates (percent) for EXP2, P1 vs P2 ratings.
The confidence intervals are at 95% confidence level and estimates corresponding to , (DBR) and (IR) are in red, green and blue, respectively.