| Literature DB >> 26126111 |
David G Pina1, Darko Hren2, Ana Marušić3.
Abstract
We analysed the peer review of grant proposals under Marie Curie Actions, a major EU research funding instrument, which involves two steps: an independent assessment (Individual Evaluation Report, IER) performed remotely by 3 raters, and a consensus opinion reached during a meeting by the same raters (Consensus Report, CR). For 24,897 proposals evaluated from 2007 to 2013, the association between average IER and CR scores was very high across different panels, grant calls and years. Median average deviation (AD) index, used as a measure of inter-rater agreement, was 5.4 points on a 0-100 scale (interquartile range 3.4-8.3), overall, demonstrating a good general agreement among raters. For proposals where one rater disagreed with the other two raters (n=1424; 5.7%), or where all 3 raters disagreed (n=2075; 8.3%), the average IER and CR scores were still highly associated. Disagreement was more frequent for proposals from Economics/Social Sciences and Humanities panels. Greater disagreement was observed for proposals with lower average IER scores. CR scores for proposals with initial disagreement were also significantly lower. Proposals with a large absolute difference between the average IER and CR scores (≥10 points; n=368, 1.5%) generally had lower CR scores. An inter-correlation matrix of individual raters' scores of evaluation criteria of proposals indicated that these scores were, in general, a reflection of raters' overall scores. Our analysis demonstrated a good internal consistency and general high agreement among raters. Consensus meetings appear to be relevant for particular panels and subsets of proposals with large differences among raters' scores.Entities:
Mesh:
Year: 2015 PMID: 26126111 PMCID: PMC4488366 DOI: 10.1371/journal.pone.0130753
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Mean consensus report (CR) scores (±standard deviation, SD) across evaluation panels for all proposals and for proposals with disagreements among raters in their Individual Evaluation Report (IER) score and between Consensus Report (CR) and average Individual Evaluation Report (AVIER) scores.
| Panel | Mean score (±SD) in proposals where: | ||||
|---|---|---|---|---|---|
| Total | All raters agree | One rater differs | All raters differ | Mean score (±SD) in proposals with AVIER vs CR difference | |
|
|
|
|
|
|
|
| Chemisty | 81.0±9.8 (n = 2665) | 81.9±9.2 (n = 2362) | 75.3±13.2 (n = 132) | 73.2±10.0 (n = 171) | 70.6±19.9 (n = 32) |
| Economic and Social Sciences/Humanities | 78.1±12.9 | 79.8±12.4 (n = 3646) | 74.6±13.1 (n = 431) | 70.7±12.9 (n = 600) | 73.1±19.5 (n = 142) |
| Information Science/Engineering | 76.9±11.9 | 78.3±11.1 (n = 2478) | 70.9±13.7 (n = 199) | 69.2±12.7 (n = 306) | 62.7±18.0 (n = 50) |
| Environment | 80.4±10.4 (n = 3243) | 81.5±9.4 (n = 2860) | 74.5±13.3 (n = 153) | 70.1±13.8 (n = 230) | 66.1±20.9 (n = 42) |
| Life Sciences | 80.9±10.3 (n = 7658) | 82.0±9.4 (n = 6785) | 74.5±13.3 (n = 354) | 71.4±13.2 (n = 519) | 65.8±20.4 (n = 71) |
| Mathematics | 78.2±10.2 | 79.6±8.6 (n = 623) | 71.1±15.2 (n = 41) | 69.2±13.6 (n = 67) | 79.1±9.6 (n = 5) |
| Physics | 80.8±9.2 (n = 2940) | 81.6±8. (n = 2644) | 75.3±11.7 (n = 114) | 72.4±12.0 (n = 182) | 72.4±17.9 (n = 26) |
*Mean consensus report scores were compared only for all proposals (One-way ANOVA: F6,24890 = 81.5, p<0.001)
†Significantly lower than Chemistry, Environment, Life Sciences and Physics panels (Tukey post-hoc test, p<0.001 for all comparisons).
‡Significantly lower than Chemistry, Economic and Social Sciences/Humanities, Environment, Life Sciences and Physics panels (Tukey-post hoc test, p<0.001 for all comparisons).
§Disagreement is defined as one rater differing 10 or more points from other two raters, who agree within 5 points (scale 0–100).
¶Disagreement is defined as all raters differing 10 or more points (scale 0–100).
**Disagreement is defined equal or greater than 10 points (scale 0–100).
Associations and differences between consensus reports (CR) and average individual evaluation report (AVIER) scores and inter-rater agreement (average deviation index, AD index) for all proposals, and for different actions and panels*.
| Actions and panels (No. proposals) | rCR/AVIER
| Difference CR-AVIER | Median AD index (interquartile Q1–Q3 range) |
|---|---|---|---|
|
| 0.957 | 0.30 (<0.001) | 5.4 (3.4–8.3) |
| IAPP (n = 759) | 0.970 | -0.58 (<0.001) | 7.3 (4.6–10.6) |
| IEF (n = 20593) | 0.958 | 0.38 (<0.001) | 5.2 (3.3–8.0) |
| ITN (n = 3545) | 0.946 | 0.07 (0.319) | 6.3 (3.9–9.6) |
|
| |||
| Chemistry (n = 63) | 0.974 | -0.89 (0.053) | 7.2 (3.9–11.0) |
| Economic and Social Sciences/Humanities (n = 68) | 0.966 | -0.61 (0.284) | 7.2 (4.7–11.3) |
| Information Science/Engineering (n = 296) | 0.966 | -0.29 (0.154) | 7.3 (4.3–10.3) |
| Environment/Geosciences (n = 84) | 0.972 | -1.37 (0.003) | 7.6 (4.6–10.9) |
| Life Sciences (n = 203) | 0.973 | -0.72 (0.009) | 7.8 (5.6–10.9) |
| Mathematics (n = 6) | 0.994 | -0.68 (0.594) | 8.8 (5.3–13.4) |
| Physics (n = 39) | 0.974 | 0.21 (0.617) | 7.0 (4.0–9.8) |
|
| |||
| Chemistry (n = 2204) | 0.963 | 0.51 (<0.001) | 4.9 (3.1–7.3) |
| Economic and Social Sciences/Humanities (n = 4228) | 0.945 | 0.71 (<0.001) | 6.7 (4.2–10.0) |
| Information Science/Engineering (n = 1888) | 0.962 | -0.00 (0.967) | 5.7 (3.5–8.4) |
| Environment/Geosciences (n = 2731) | 0.955 | 0.41 (<0.001) | 4.9 (3.1–7.5) |
| Life Sciences (n = 6408) | 0.964 | 0.22 (<0.001) | 4.8 (3.0–7.3) |
| Mathematics (n = 665) | 0.966 | 0.20 (0.054) | 5.3 (3.2–8.2) |
| Physics (n = 2469) | 0.966 | 0.39 (<0.001) | 4.7 (2.9–7.1) |
|
| |||
| Chemistry (n = 398) | 0.909 | 0.11 (0.611) | 6.2 (3.7–9.6) |
| Economic and Social Sciences/Humanities (n = 381) | 0.940 | -0.43 (0.094) | 7.6 (5.0–11.0) |
| Information Science/Engineering (n = 799) | 0.953 | 0.12 (0.360) | 6.7 (4.5–9.6) |
| Environment/Geosciences (n = 428) | 0.950 | 0.15 (0.400) | 5.8 (3.4–9.3) |
| Life Sciences (n = 1047) | 0.953 | 0.14 (0.228) | 6.3 (3.8–9.4) |
| Mathematics (n = 60) | 0.903 | 0.67 (0.142) | 7.6 (4.5–11.3) |
| Physics (n = 432) | 0.925 | 0.02 (0.921) | 5.5 (3.4–8.8) |
*Abbreviations: IAPP—Industry-Academia Partnerships and Pathways, IEF—Intra-European Fellowships, ITN—Initial Training Networks.
†p<0.001 for all (Pearson correlation).
‡Difference between the scores on the scale from 0–100.
§Paired samples t-test.
Fig 1Distribution of average deviation (AD) indices for all proposals.
Positively skewed distribution shows that the majority of proposals had AD index below 10 points (20,988 out of 24,897 proposals or 84.3%).
Distribution of proposals with disagreements among raters in their Individual Evaluation Report (IER) score and between Consensus Report (CR) and average Individual Evaluation Report (AVIER) scores across evaluation panels*
| No. proposals (row %) with disagreement | |||
|---|---|---|---|
| Panel (No. proposals) | One rater differs | All raters differ | Difference in AVIER vs CR |
| Chemistry (n = 2665) | 132 (5.0) | 171 (6.4) | 32 (1.2) |
| Economic and Social Sciences/Humanities (n = 4677) | 431 (9.2) | 600 (12.8) | 142 (3.0) |
| Information Science/Engineering (n = 2983) | 199 (6.7) | 306 (10.3) | 50 (1.7) |
| Environment/Geosciences (n = 3243) | 153 (4.7) | 230 (7.1) | 42 (1.3) |
| Life Sciences (n = 7658) | 354 (4.6) | 519 (6.8) | 71 (0.9) |
| Mathematics (n = 731) | 41 (5.6) | 67 (9.2) | 5 (0.7) |
| Physics (n = 2940) | 114 (3.9) | 182 (6.2) | 26 (0.9) |
|
|
|
|
|
*Disagreement is defined as one rater differing 10 or more points from other two raters, who agree within 5 points (scale 0–100).
†Disagreement is defined as all raters differing 10 or more points (scale 0–100).
‡Disagreement is defined equal or greater than 10 points (scale 0–100).
Fig 2Association between raters' agreement (AD index—lower score means greater agreement) and the average Individual Evaluation Report (AVIER) (A) or consensus report (CR) (B).
Line indicates the regression line. Circles—individual proposals.
Pearson’s inter-correlations of IER criteria of different raters*.
| Rater 1 | Rater2 | Rater 3 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S&T quality | Training/ToK | Researcher | Implementation | Impact | S&T quality | Training/ToK | Researcher | Implementation | Impact | S&T quality | Training/ToK | Researcher | Implementation | Impact | |
|
| 1 | 0.698 | 0.600 | 0.668 | 0.693 | 0.291 | 0.279 | 0.231 | 0.278 | 0.274 | 0.296 | 0.290 | 0.231 | 0.289 | 0.282 |
|
| |||||||||||||||
|
| 1 | 0.582 | 0.718 | 0.740 | 0.282 | 0.361 | 0.248 | 0.319 | 0.324 | 0.270 | 0.357 | 0.236 | 0.324 | 0.320 | |
|
| 1 | 0.582 | 0.646 | 0.217 | 0.231 | 0.293 | 0.230 | 0.241 | 0.234 | 0.246 | 0.306 | 0.249 | 0.251 | ||
|
| 1 | 0.740 | 0.281 | 0.330 | 0.247 | 0.360 | 0.328 | 0.282 | 0.335 | 0.254 | 0.367 | 0.330 | |||
|
| 1 | 0.278 | 0.325 | 0.251 | 0.318 | 0.341 | 0.277 | 0.327 | 0.260 | 0.328 | 0.341 | ||||
|
| |||||||||||||||
|
| 1 | 0.694 | 0.590 | 0.668 | 0.685 | 0.295 | 0.286 | 0.230 | 0.285 | 0.276 | |||||
|
| 1 | 0.583 | 0.713 | 0.734 | 0.287 | 0.369 | 0.250 | 0.335 | 0.328 | ||||||
|
| 1 | 0.564 | 0.639 | 0.228 | 0.240 | 0.294 | 0.244 | 0.244 | |||||||
|
| 1 | 0.730 | 0.282 | 0.332 | 0.245 | 0.367 | 0.330 | ||||||||
|
| 1 | 0.275 | 0.322 | 0.256 | 0.329 | 0.342 | |||||||||
|
| |||||||||||||||
|
| 1 | 0.695 | 0.606 | 0.665 | 0.690 | ||||||||||
|
| 1 | 0.589 | 0.710 | 0.737 | |||||||||||
|
| 1 | 0.573 | 0.645 | ||||||||||||
|
| 1 | 0.733 | |||||||||||||
|
| 1 | ||||||||||||||
*Evaluation criteria: Science and Technology (S&T) quality; Training (for ITN and IEF) or Transfer of Knowledge (ToK, for IAPP); Researcher (criterion used only for IEF); Implementation; Impact. N = 24897 for all except for “Researcher” where n = 20593. All correlations were statistically significant at p<0.001 level.