| Literature DB >> 27327085 |
Esther Kaufmann1, Werner W Wittmann2.
Abstract
The success of bootstrapping or replacing a human judge with a model (e.g., an equation) has been demonstrated in Paul Meehl's (1954) seminal work and bolstered by the results of several meta-analyses. To date, however, analyses considering different types of meta-analyses as well as the potential dependence of bootstrapping success on the decision domain, the level of expertise of the human judge, and the criterion for what constitutes an accurate decision have been missing from the literature. In this study, we addressed these research gaps by conducting a meta-analysis of lens model studies. We compared the results of a traditional (bare-bones) meta-analysis with findings of a meta-analysis of the success of bootstrap models corrected for various methodological artifacts. In line with previous studies, we found that bootstrapping was more successful than human judgment. Furthermore, bootstrapping was more successful in studies with an objective decision criterion than in studies with subjective or test score criteria. We did not find clear evidence that the success of bootstrapping depended on the decision domain (e.g., education or medicine) or on the judge's level of expertise (novice or expert). Correction of methodological artifacts increased the estimated success of bootstrapping, suggesting that previous analyses without artifact correction (i.e., traditional meta-analyses) may have underestimated the value of bootstrapping models.Entities:
Mesh:
Year: 2016 PMID: 27327085 PMCID: PMC4915695 DOI: 10.1371/journal.pone.0157914
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The process of identifying relevant studies for the meta-analysis.
Studies included in the meta-analyses by decision domain and decision-maker expertise.
| Study | Judges | Number of judgments | Number of cues | Judgment task | Criterion | Task results | |
|---|---|---|---|---|---|---|---|
| 1) | Nystedt & Magnusson [ | 4 clinical psychologists | 38 | 3 | Judge patients based on patient | Rating on three | |
| protocols: | psychological tests (■) | ||||||
| (*, +, s) | |||||||
| 2) | Levi [ | 9 nuclear medicine | 280 | 5 | Assess probability of significant | Coronary angiography | |
| physicians | (60 replications) | coronary artery disease based on patient | (*, s) | ||||
| profiles | |||||||
| 3) | LaDuca, Engel, & Chovan [ | 13 physicians | 30 | 5 | Judge the degree of severity | A single physician’s | |
| (congestive heart failure) based on | judgment (▲) | (*, s) | |||||
| patient profiles | |||||||
| 4) | Smith, Gilhooly, & Walker [ | 40 general practitioners | 20 | 8 | Decision to prescribe an antidepressant | Guideline expert (▲) | |
| based on patient profile | (s) | ||||||
| 5a) | Einhorn [ | 3 pathologists | 9 | Evaluate the severity of Hodgkin’s | Actual number of | ||
| contains two studies) | disease based on biopsy slides | months of survival | (s) | ||||
| 6a) | Grebstein [ | 10 clinical experts | 30 profiles | 10 | Judge Wechsler-Bellevue IQ scores | IQ test scores (■) | |
| (varying in amounts of | from Rorschach psychograms | ||||||
| clinical experience) | |||||||
| 5b | Einhorn [ | 29 clinicians | I: 77 MMPI profiles | 11 | Judge the degree of neuroticism- | Actual diagnosis (■) | |
| First study (This publication | II: 181 MMPI profiles | psychoticism | |||||
| Contains two studies) | (*, +, s) | ||||||
| 7) | Todd (1955, see [ | 10 clinical judges | 78 | 19 | Estimate patient IQ from the Rorschach | IQ test scores (■) | |
| test | |||||||
| 8) | Speroff, Connors, & Dawson | 123 physicians: | 440 | 32 | Judge intensive care unit patients’ | Patients’ actual | |
| [ | 105 house staff, | hemodynamic status | hemodynamic status | (s) | |||
| 15 fellows, | (physicians’ estimation) | ||||||
| 3 attending physicians | |||||||
| 6b) | Grebstein [ | 5 students | 30 | 10 | Judge Wechsler-Bellevue IQ scores | IQ test scores (■) | |
| from Rorschach psychograms based on | |||||||
| paper profiles | |||||||
| 9) | Ashton [ | 13 executives, managers, | 42 | 5 | Predict advertising sales for | Actual advertising pages | |
| sales personnel | magazine based on case descriptions | sold | (*, +, s) | ||||
| 10) | Roose & Doherty [ | 16 agency managers | 200 / 160 | 64 / 5 | Predict the success of life insurance | One-year criterion for | |
| salesmen based on paper profiles | success | (*, +, s) | |||||
| 11) | Goldberg [ | 43 bank loan officers | 60 | 5 | Predict bankruptcy experience based on | Actual bankruptcy | |
| large corporation profiles | experience | ||||||
| 12) | Kim, Chung, & Paradice [ | 3 experienced loan | 119 | 7 | Judge whether a firm would be able to | Actual financial data | |
| officers | repay the loan requested based on | ||||||
| financial profiles | (*, +, s) | ||||||
| 13) | Mear & Firth [ | 38 professional security | 30 | 10 | Predict security returns based on | Actual security returns | |
| analysts | financial profiles | (s) | |||||
| 14) | Ebert & Kruse [ | 5 securities analysts | 35 | 22 | Estimate future returns of common | Actual returns | |
| stocks | |||||||
| 15) | Wright [ | 47 students | 50 | 4 | Predict price changes for stocks from | Actual stock prices | |
| 1970 until 1971 based on paper profiles | (*, +, s) | ||||||
| of securities | |||||||
| 16) | Harvey & Harries [ | 24 psychology students | 40 | Not | Forecast sales outcomes based on paper | Actual sales outcome | Δ23 = -.07 |
| (1. experiment) | known | profiles | (s) | ||||
| 17) | Singh, 1990 [ | 52 business students | 35 | Not | Estimate of the stock price of a | Actual stock prices | |
| known | company based on paper profiles | (s) | |||||
| 18) | Dawes [ | 1 admission committee | 111 | 4 | Admission decision for graduate school | Faculty ratings of l | |
| based on paper profiles | performance in graduate | ||||||
| school (▲) | |||||||
| 19) | Cooksey, Freebody, & Davidson | 20 teachers | 118 | 5 | Judge I: Reading comprehension | ||
| [ | And II: Word knowledge of | scores (■) | |||||
| kindergarten children based on paper | (*, +, s) | ||||||
| profiles | |||||||
| 20) | Wiggins & Kohen [ | 98 psychology graduate | 110 | 10 | Forecast first-year-graduate grade point | Actual first-year- | |
| students | averages based on paper profiles | graduate grade point | (s) | ||||
| averages | |||||||
| 21) | Wiggins, Gregory, & Diller, | 41 psychology students | 90 | 10 | Forecast first-year-graduate grade point | Actual first-year- | |
| see Dawes and Corrigan [ | averages based on paper profiles | graduate grade point | |||||
| repl. Wiggins and Kohen [ | averages | ||||||
| 22) | Athanasou & Cooksey [ | 18 technical and further | 120 | 20 | Judge whether students are interested in | Actual level of students’ | |
| education students | learning based on paper profile | interest | (*, +, s) | ||||
| 23) | Szucko & Kleinmuntz [ | 6 experienced polygraph | 30 | 3–4 | Judge truthful / untruthful response | Actual theft | |
| interpreters | based on polygraph protocols | (*, +, s) | |||||
| 24) | Cooper & Werner [ | 18 | 33 | 17 | Forecast violent behavior during the | Actual violent behavior | |
| (9 psychologists, | first six months of incarceration based | during the first six | (s) | ||||
| 9 case managers) | on inmates’ data forms | months of imprisonment | |||||
| 25) | Werner, Rose, Murdach, & | 5 social workers | 40 | 19 | Predict imminent violence of | Actual violent acts | |
| Yesavage [ | psychiatric inpatients in the first 7 days | in the first 7 days | (*, +, s) | ||||
| following admission based on | following admission | ||||||
| admission data | |||||||
| 26) | Werner, Rose, & Yesavage [ | 30 | 40 | 19 | Predict male patients’ violent behavior | Actual violence during | |
| (15 psychologists, | during the first 7 days following | the first 7 days following | (s) | ||||
| 15 psychiatrists) | admission based on case material | admission | |||||
| 27) | Gorman, Clover, & Doherty [ | 8 students | 75: | Predict students’ scores on an attitude | Actual data: | ||
| scale ( | |||||||
| examination ( | |||||||
| interviews (I, III) and paper profiles | (■) | ||||||
| (II, IV) | (*, s) (.08), see | ||||||
| Camerer [ | |||||||
| 28) | Lehman [ | 14 students | 40 | 19 | Assess imminent violence of male | Actual violent acts in the | |
| patients in the first 7 days following | first 7 days following | (*, +, s) | |||||
| admission based on case material | admission | ||||||
▲ = subjective criterion;
■ = test criterion;
(*) = idiographic approach (cumulating across individuals);
(*, +) = both research approaches are considered;
Δ = the success of bootstrapping models (see Eq 2); s = sub-sample of tasks for the second evaluation (psychometric corrected bootstrapping models).
Miscellaneous studies included in the meta-analysis.
| Study | Judges | Number of judgments | Number of cues | Judgment task | Criterion | Domain | Task results | |
|---|---|---|---|---|---|---|---|---|
| 29) | Stewart [ | 7 meteorologists | 75 (25) | 6 | Assess probability of | Observed event | Meteorology | |
| hail or severe hail based on radar volume | (*, s) | |||||||
| scans | ||||||||
| 30) | Stewart, Roebber, & Bosart [ | 4 | 12 | Forecast 24-h maximum temperature, | Meteorology | |||
| (2 students, | 13 | 12-h minimum temperature, | temperature | |||||
| 2 experts) | 24 | 12-h precipitation, and | ||||||
| 24 | 24-h precipitation for each day | precipitation | ||||||
| (*, +, s) | ||||||||
| 31) | Steinmann & Doherty [ | 22 students | 192: | 2 | Decide which of two randomly chosen | A hypothetical | Other | |
| (2 sessions with 96 | bags a sequence of chips had been drawn | “judge” | (*, s) | |||||
| judgments) | (▲) | |||||||
| 32) | MacGregor & Slovic [ | 4 | Estimate the time to complete a marathon | Actual time to | Sport | |||
| 40 | based on runner profiles | complete the | ||||||
| marathon | ||||||||
| (s) | ||||||||
| 33) | McClellan, Bernstein, & Garbin | 26 psychology | 128 | 5 | Estimate magnitude of fins-in and fins-out | Actual magnitude | Perception | |
| [ | students | Mueller Lyer stimuli | of fins-in and fins- | (s) | ||||
| out Mueller Lyer | ||||||||
| stimuli | ||||||||
| 34) | Trailer & Morgan [ | 75 students | 50 | 11 | Predict the motion of objects based on | Actual motion | Intuitive | |
| situations in a questionnaire | physics | (*, +, s) | ||||||
| 35) | Camerer [ | 21 | 18 | — | — | — | — | |
▲ = subjective criterion;
(*) = idiographic approach (cumulating across individuals);
(*, +) = both research approaches are considered;
Δ = the success of bootstrapping models (see Eq 2); s = subsample of tasks for the second evaluation (psychometric corrected bootstrapping models).
Results of the bare-bones meta-analysis organized by decision domain and decision maker’s expertise.
| Domains (expertise) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Medical | 14 | 293 | .00 | .00 | -.10 - .12 | .00 - .00 | 1.3 n.s. | 0.00 | 0.00 | 1,171 |
| +3 | 324 | .03 | .00 | -.02 - .04 | .03 - .03 | 39.15** | 59.1 | 0.00 | 667 | |
| Expert | 13 | 288 | .01 | .00 | -.10 - .12 | .01 - .01 | 1.19 n.s. | 0.00 | 0.00 | 1,262 |
| +2 | 305 | .02 | .00 | -.02 - .04 | .02 - .03 | 36.59*** | 61.7 | 0.00 | 895 | |
| Novice | — | — | — | — | — | — | — | — | — | — |
| Business | 10 | 244 | .02 | .00 | -.10 - .14 | .02 - .02 | .49 n.s. | 0.00 | 0.00 | 2,338 |
| Expert | 7 | 121 | .02 | .00 | -.15 - .20 | .02 - .02 | .22n.s. | 0.00 | 0.00 | 3,791 |
| Novice | 3 | 123 | .00 | .00 | -.15 - .19 | .02 - .02 | .26 n.s. | 0.00 | 0.00 | 1,146 |
| +1 | 125 | .02 | .00 | -.01 - .09 | .02 - .02 | 15.38*** | 80.5 | 0.001 | 1,686 | |
| Education | 6 | 198 | .11 | .00 | -.02 - .25 | .11 - .11 | .68 | 0.00 | 0.00 | > 10,000 |
| +3 | 208 | .12 | .00 | .11 - .21 | .12 - .12 | 67.14*** | 88.1 | 0.003 | > 10,000 | |
| Expert | 3 | 41 | .04 | .00 | -.26 - .34 | .00 - .00 | .00 n.s. | 0.00 | 0.00 | > 10,000 |
| Novice | 3 | 157 | .13 | .00 | -.03 - .28 | .13 - .13 | .42 n.s. | 0.00 | 0.00 | 707 |
| +2 | 162 | .13 | .00 | .11 - .22 | .13 - .13 | 47.16*** | 91.5 | 0.003 | 1,214 | |
| Psychology | 9 | 105 | .14 | .00 | -.05-.33 | .14-.14 | 6.5 n.s. | 0.00 | 0.00 | > 10,000 |
| Expert | 4 | 59 | .03 | .00 | -.22 - .28 | .03 - .03 | .01 n.s. | 0.00 | 0.00 | 4,971 |
| +2 | 62 | .03 | .00 | .01 - .10 | .03-.03 | 3.31 n.s. | 0.00 | 0.00 | > 10,000 | |
| Novice | 5 | 46 | .29 | .00 | .00 - .58 | .29 - .29 | 4.59 n.s. | 0.00 | 0.00 | 102 |
| +1 | 47 | .30 | .00 | -.08 - .49 | .3 - .3 | 67.15*** | 92.6 | 0.11 | > 10,000 | |
| Miscellaneous | 13 | 270 | .13 | .00 | .01 - .25 | .13 - .13 | 1.54 n.s. | 0.00 | 0.00 | 929 |
| Expert | 5 | 15 | .00 | .00 | -.51 - .50 | .00 - .00 | .00 n.s. | 0.00 | 0.00 | > 10,000 |
| +3 | 27 | -.01 | .00 | -.23 - .21 | -.01 -.01 | .00 n.s. | 0.00 | 0.00 | > 10,000 | |
| Novice | 12 | 255 | .14 | .00 | .02 - .26 | .14 - .14 | 1.25 n.s. | 0.00 | 0.00 | 1,269 |
| Overall Experts | 32 | 532 | .03 | .00 | -.07 - .10 | .03 - .03 | 1.56 n.s. | 0.00 | 0.00 | > 10,000 |
| +5 | 820 | .04 | .00 | .01 - .05 | .04 - .04 | 53.33** | 32.5 | 0.006 | > 10,000 | |
| Overall Novices | 20 | 578 | .12 | .00 | .03 - .20 | .12-.12 | 9.65 n.s. | 0.00 | 0.00 | > 10,000 |
| Overall | 52 | 1,110 | .07 | .00 | .01 - .13 | .07 - .07 | 14.21n.s. | 0.00 | 0.00 | > 10,000 |
| + 12 | 1,365 | .10 | .00 | .73 - .12 | .10 - .10 | 398*** | 84.2 | 0.005 | > 10,000 |
k = number of judgment tasks;
N = number of success indices;
Δ = the success of bootstrapping models (see Eq 2); SD = standard deviation of true score correlation; 95% CI = confidence interval; 80% CI = 80% credibility interval including lower 10% of the true score and the upper 10% of the true score; 75% = percent variance in observed correlation attributable to all artifacts; Publ. bias = publication bias corrected estimation by the trim-and-fill method (see [63]);
+ = the number of missing tasks indicated by the trim-and-fill method.
Results of the bare-bones meta-analysis of the success bootstrapping organized by type of evaluation criterion.
| Evaluation criteria | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Subjective | 4 | 76 | .03 | .00 | -.19 - .25 | .03 - .03 | .60 n.s. | 0.00 | 0.00 | 520 |
| | +2 | 81 | .02 | .00 | -.16 - .06 | .02 - .02 | 44.41*** | 88.7 | 0.01 | > 10,000 |
| Objective | 33 | 857 | .08 | .00 | .01 - .14 | .08 - .08 | 4.78 n.s. | 0.00 | 0.01 | 778 |
| | +9 | 1,020 | .10 | .00 | .06 - .12 | .10 - .10 | 216*** | 81.1 | 0.00 | 639 |
| Test | 15 | 177 | .07 | .00 | -.08 - .21 | .07 - .07 | 8.68n.s. | 0.00 | 0.00 | 197 |
| | +3 | 330 | -.01 | .01 | -.12 - .09 | -.14 - .11 | 149.33*** | 88.6 | 0.03 | 86.14 |
k = number of judgment tasks;
N = number of success indices;
Δ = the success of bootstrapping (see Eq 2);
SD = standard deviation of true score correlation; 95% CI = confidence interval; 80% CI = 80% credibility interval including lower 10% of the true score and the upper 10% of the true score; 75% = percent variance in observed correlation attributable to all artifacts; Publ. bias = publication bias-corrected estimation by the trim-and-fill method (see [63]); + = the number of missing tasks indicated by the trim-and-fill method.
Fig 2Scatter plot of the success of 365 bootstrapping procedures across 28 different tasks organized by decision domain and decision maker expertise.
Fig 3Forest plots of the success of bootstrapping models organized by decision domain and decision maker expertise.
Positive values indicate that bootstrapping resulted in more accurate judgments than human judgment.
The success of bootstrapping according to bare-bones (in brackets) and psychometrically-corrected lens model indices.
| Domains | Δoverall | Δexperts | Δnovices | ||
|---|---|---|---|---|---|
| Medical science | 10 | 258 | .35 (.01) | .35 (-.01) | .35 (-.01) |
| Business | 9 | 239 | .018 | .05 | .09 |
| Education | 4 | 156 | .21 (.12) | .18 (.15) | .14 (.04) |
| Psychology | 9 | 105 | .08 (.04) | .23 | .04 (.04) |
| Miscellaneous | 12 | 249 | .26 (.16) | .27 | .01 (-.02) |
| Overall | 44 | 1,007 | .23 (.07) | .22 (.13) | .17 (.02) |
k = number of judgment tasks; N = number of success indices; Δ = estimated success of bootstrapping (see Eq 2).
a = no correction of the Re component, because this component includes only objective criteria.
b = this column is the same as in Kaufmann et al. [11], Table 7, columns 5 and 6.