| Literature DB >> 24391781 |
Esther Kaufmann1, Ulf-Dietrich Reips2, Werner W Wittmann3.
Abstract
Achieving accurate judgment ('judgmental achievement') is of utmost importance in daily life across multiple domains. The lens model and the lens model equation provide useful frameworks for modeling components of judgmental achievement and for creating tools to help decision makers (e.g., physicians, teachers) reach better judgments (e.g., a correct diagnosis, an accurate estimation of intelligence). Previous meta-analyses of judgment and decision-making studies have attempted to evaluate overall judgmental achievement and have provided the basis for evaluating the success of bootstrapping (i.e., replacing judges by linear models that guide decision making). However, previous meta-analyses have failed to appropriately correct for a number of study design artifacts (e.g., measurement error, dichotomization), which may have potentially biased estimations (e.g., of the variability between studies) and led to erroneous interpretations (e.g., with regards to moderator variables). In the current study we therefore conduct the first psychometric meta-analysis of judgmental achievement studies that corrects for a number of study design artifacts. We identified 31 lens model studies (N = 1,151, k = 49) that met our inclusion criteria. We evaluated overall judgmental achievement as well as whether judgmental achievement depended on decision domain (e.g., medicine, education) and/or the level of expertise (expert vs. novice). We also evaluated whether using corrected estimates affected conclusions with regards to the success of bootstrapping with psychometrically-corrected models. Further, we introduce a new psychometric trim-and-fill method to estimate the effect sizes of potentially missing studies correct psychometric meta-analyses for effects of publication bias. Comparison of the results of the psychometric meta-analysis with the results of a traditional meta-analysis (which only corrected for sampling error) indicated that artifact correction leads to a) an increase in values of the lens model components, b) reduced heterogeneity between studies, and c) increases the success of bootstrapping. We argue that psychometric meta-analysis is useful for accurately evaluating human judgment and show the success of bootstrapping.Entities:
Mesh:
Year: 2013 PMID: 24391781 PMCID: PMC3877076 DOI: 10.1371/journal.pone.0083528
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The lens model applied to physicians' diagnosis of cancer (see [8]).
Figure 2The process of identifying relevant studies for the meta-analysis.
Study characteristics ordered according to decision domain and expertise.
| Study | Number of judges | Number of judgments | Number of cues | Judgment task | Criterion | Results | |
|
|
| ||||||
| 1) | Nystedt, Magnusson | 4 clinical psychologists | 38 patient protocols | 3 | Evaluate patients on three traits: | Rating on three psychologist tests |
|
|
| psychologist tests | ||||||
| 2) | Levi | 9 nuclear medicine physicians | 280 patient cases, 60 replications | 5 | Assess probability of significant coronary artery disease | Coronary angiography |
|
| 3) | LaDuca, Engel, Chovan | 13 physicians | 30 patient profiles | 5 | Degree of severity (Congestive heart failure) | A single physician's judgment (▴) |
|
| 4) | Smith, Gilhooly, Walker | 40 general practitioners | 20 case profiles | 8 | Prescription of an antidepressant | Guideline expert (▴) |
|
| 5a) | Einhorn | 3 pathologists |
| 9 | Evaluate the severity of Hodgkin's disease | Actual number of months of survival |
|
| 5b) | Einhorn, | 29 clinicians |
| 11 | Judging of the degree of neuroticism- psychoticism | Actual diagnosis |
|
| 6) | Speroff, Connors, Dawson | 123 physicians: 105 house staff, 15 fellows, 3 attending physicians | 440 intensive care unit patients | 32 | Patients' hemodynamic status (Physicians' estimation) | The patient's actual hemodynamic status |
|
|
|
| ||||||
| 7) | Ashton | 13 executives, managers, sales personnel | 42 cases in a booklet | 5 | Predictions of advertising sales for | Actual advertising pages sold |
|
| 8) | Roos, Doherty | 16 agency managers | 200/160 profiles | 64/5 | Predictability of success of life insurance salesman | One-year criterion for success |
|
| 9) | Goldberg | 43 bank loan officers | 60 large industrial corporations profiles | 5 | Bankruptcy experience | Actual bankruptcy experience |
|
| 10) | Kim, Chung, Paradice | 3 experienced loan officers | 119 financial profiles: | 7 | To judge whether a firm would be able to | Actual financial data |
|
|
| repay the loan requested |
| |||||
|
| (*, +) | ||||||
| 11) | Mear, Firth | 38 professional security analysts | 30 financial profiles | 10 | Predicted security returns | Actual security returns |
|
|
| |||||||
| 12) | Wright | 47 students | 50 securities profiles | 4 | Price changes for stocks (from 1970 until 1971) | Actual financial data |
|
| 13) | Harvey, Harries, (Experiment 1) | 24 psychology students | 40 profiles | Not known | Forecast sales outcomes | Actual sales outcome |
|
| 14) | Singh | 52 business students | 35 profiles | Not known | Estimates of the stock of a company | Actual realized values |
|
|
|
| ||||||
| 15) | Cooksey, Freebody, Davidson | 20 teachers | 118 profiles of kindergarten children | 5 |
|
|
|
|
| |||||||
| 16) | Wiggins, Kohen | 98 psychology graduate students | 110 profiles | 10 | Forecast first-year-graduate grade point averages | Actual first-year-graduate grade point averages |
|
| 17) | Athanasou, Cooksey | 18 technical and further education students | 120 student profiles | 20 | Deciding that students are interested in learning | Actual level of students' interest |
|
|
|
| ||||||
| 18) | Szucko, Kleinmuntz | 6 experienced polygraph interpreters | 30 polygraph protocols | 3-4 | Truthful/untruthful responses | Actual theft |
|
| 19) | Cooper, Werner | 18: 9 psychologists, 9 case managers | 33 inmates' data forms | 17 | Forecast violence during the first 6 months of incarceration. | Actual violent behavior within 6 months of imprisonment |
|
| 20) | Werner, Rose, Murdach, Yesavage | 5 social workers | 40 Admission data for psychiatric inpatients | 19 | Assess imminent violence in the first 7 days following admission | Actual outcome: violent acts in the first 7 days following admission |
|
| 21) | Werner, Rose, Yesavage | 30: 15 psychologists, 15 psychiatrists | Case material for 40 male patients | 19 | Predicting patients' violence during the first 7 days following admission | Actual violence during the first 7 days following admission |
|
|
| |||||||
| 22) | Gorman, Clover, Doherty | 8 students | 75: |
| Prediction of each student's scores in an attitude scale ( | Actual data: |
|
| 23) | Reynolds, Gifford |
| Videotapes | I: 7, II: 8, III: 9 | To assess the intelligence: I: Audio condition, II: Visual condition, III: Audio plus visual condition | Wonderlic Personnel Test (a brief intelligence test) |
|
| 24) | Bernieri, Gillis, Davis, Grahe |
| 50 videotaped debates |
| Rapport judgments | Interactants: self-reports context: |
|
| 25) | Lehman | 14 students | Case material for 40 male patients | 19 | Assess imminent violence in the first 7 days following admission | Actual outcome (violent acts in the first 7 days following admission) |
|
Note. ▴ = subjective criterion. r = type of correlation is unknown. (*) = idiographic approach (cumulating across individuals). (*, +) = both research approaches are considered.
1 This publication contains two studies.
Characteristics of studies in the ‘miscellaneous’ domain ordered by expertise.
| Study | Number of judges | Number of judgments | Number of cues | Judgment task | Criterion | Research area | Results | |
|
|
| |||||||
| 26) | Stewart | 7 meteorologists | 75 radar volume scans (25) | 6 | Assess probability of hail or severe hail | Observed event | Meteorology |
|
|
| ||||||||
| 27) | Stewart, Roebber, Bosart | 4: 2 students, 2 experts |
| 12 | 24-h maximum temperature forecasts 12-h minimum temperature forecasts |
| Meteorology |
|
|
| 13 |
| ||||||
|
| 24 | 12-h precipitation forecasts |
|
| ||||
|
| 24 | 24-h precipitation forecasts |
| |||||
|
| ||||||||
| 28) | Steinmann, Doherty | 22 students | 192: 2 sessions with 96 judgments | 2 | To decide from which of two randomly chosen bags a sequence of chips had been drawn | A hypothetical “judge” (▴) | Other |
|
| 29) | MacGregor, Slovic |
|
| 4 | Estimation of the time to complete a marathon | Actual time to complete the marathon | Sport |
|
| 30) | McClellan, Bernstein, Garbin | 26 psychology students | 128 experimental stimuli | 5 | Magnitude estimations of fins-in and fins-out Mueller Lyer stimuli | Actual magnitude of fins-in and fins-out Mueller Lyer stimuli | Perception |
|
| 31) | Trailer, Morgan | 75 students | 50 situations in a questionnaire | 11 | Predicting the motion of objects | Actual motion | Intuitive physics |
|
Note. ▴ = subjective criterion. r = type of correlation is unknown. (*) = idiographic approach (cumulating across individuals). (*, +) = both research approaches are considered.
Figure 3Forest plots of judgmental achievement and the underlying components.
Comparison of estimations of judgmental achievement (ra) with different meta-analytical approaches ordered by domain and experience level.
| Bare-bones meta-analysis | Psychometric meta-analysis | |||||||||||
|
|
| |||||||||||
| Domains, experience levels |
|
|
|
| 75% | .10 | .80 |
|
| 75% | .10 | .80 |
| Medical science | 10 | 258 | .40 | .00 | 157.00 | .40 | .40 | .53 | .00 | 170.93 | .53 | .53 |
| Business science | 9 | 239 | .50 | .07 | 24.45 | .16 | .83 | .55 | .09 | 24.45 | .16 | .93 |
|
|
|
| . | . |
|
| . | |||||
| Experts | 6 | 116 | .36 | .00 | 87.73 | .36 | .36 | .40 | .00 | 87.73 | .40 | .40 |
|
|
|
| . | . |
|
| . | |||||
| Students | 3 | 123 | .63 | .10 | 8.52 | .22 | 1.00 | .70 | .11 | 8.52 | .27 | 1.12 |
| Education science | 4 | 156 | .39 | .00 | 177.89 | .39 | .39 | .51 | 00 | 355.11 | .51 | .51 |
|
|
|
| . | . |
| . | . | |||||
| Experts | 2 | 40 | .57 | .00 | 975.69 | .57 | .57 | .62 | .00 | 975.69 | .62 | .62 |
| Students | 2 | 116 | .33 | .00 | 27,143 | .33 | .33 | .55 | .00 | 82,558 | .55 | .55 |
|
| 2 | 116 | .36 | .00 | 27,136 | .36 | .36 | |||||
| Psychology | 14 | 249 | .22 | .00 | 448.50 | .22 | .22 | .24 | .00 | 448.54 | .24 | .24 |
| Experts | 4 | 59 | .10 | .00 | 975.77 | .10 | .10 | .11 | .00 | 975.77 | .11 | .11 |
| Students | 10 | 190 | .26 | .00 | 606 | .26 | .26 | .29 | .00 | 607.07 | .29 | .29 |
| Miscellaneous | 12 | 249 | .44 | .02 | 67.55 | .25 | .62 | .49 | .02 | 67.55 | .31 | .67 |
| Experts | 5 | 15 | .65 | .00 | 401.61 | .65 | .65 | .68 | .00 | 401.61 | .68 | .68 |
|
|
|
| . | . |
| . | . | |||||
| Students | 11 | 234 | .43 | .00 | 86.40 | .43 | .43 | .48 | .00 | 86.55 | .48 | .48 |
|
|
|
| . | . |
| . | . | |||||
| Overall | 49 | 1,151 | .39 | .02 | 69.42 | .21 | .57 | .45 | .02 | 74.55 | .27 | .63 |
| Experts | 27 | 488 | .37 | .00 | 129.00 | .37 | .37 | .47 | .00 | 135.00 | .47 | .47 |
| Students | 26 | 663 | .40 | .02 | 58.94 | .21 | .58 | .46 | .02 | 64.20 | .27 | .64 |
Note. k = Number of correlations (tasks) according to Hunter and Schmidt [16]. N = Total sample size according to Hunter and Schmidt [16]. r = mean true score correlation according to Hunter and Schmidt [16]. var = corrected variation according to Hunter and Schmidt (2004, variance of true score correlation). 75% rule = Percentage variance of observed correlations due to all artifacts, if below 75%, it indicates moderator variable. – mean true score correlation increased the value of 1.
a In medical science only experts are included.
b we reran the analysis and substituted the .09 value with a .90 value.
Comparison of estimations of the task-predictability component (Re) with different meta-analytical approaches ordered by domain and experience level.
|
|
| |||||||||||
|
|
| |||||||||||
| Domains, experience level |
|
|
|
| 75% | .10 | .80 |
|
| 75% | .10 | .80 |
| Medical science | 10 | 258 | .67 | .00 | 105.89 | .67 | .67 | .92 | .00 | 198.55 | .92 | .92 |
| Business science | 9 | 239 | .71 | .02 | 34.97 | .52 | .89 |
| ||||
| Experts | 6 | 116 | .62 | .00 | 108.29 | .62 | .62 |
| ||||
| Students | 3 | 123 | .79 | .02 | 13.91 | .60 | .97 |
| ||||
| Education science | 4 | 156 | .70 | .00 | 257.26 | .70 | .70 | .74 | .00 | 257.26 | .74 | .74 |
| Experts | 2 | 40 | .68 | .00 | 1,690 | .68 | .68 | .72 | .00 | 1,690 | .72 | .72 |
| Students | 2 | 116 | .71 | .00 | 145.93 | .71 | .71 | .75 | .00 | 145.93 | .75 | .75 |
| Psychology | 14 | 249 | .68 | .00 | 77.79 | .68 | .68 | .72 | .00 | 78.62 | .72 | .72 |
|
|
|
| . | . |
| . | . | |||||
| Experts | 4 | 59 | .80 | .00 | 256.36 | .80 | .80 |
| ||||
| Students | 10 | 176 | .63 | .00 | 91.12 | .63 | .63 | .68 | .00 | 91.12 | .68 | .68 |
|
|
|
| . | . |
| . | . | |||||
| Miscellaneous | 12 | 249 | .88 | .01 | 23.75 | .75 | 1.00 | .93 | .01 | 23.74 | .80 | 1.00 |
| Experts | 5 | 15 | .69 | .00 | 356.44 | .69 | .69 |
| ||||
| Students | 11 | 234 | .89 | .00 | 39.67 | .89 | .89 | .94 | .00 | 39.67 | .94 | .94 |
| Overall | 49 | 1,151 | .73 | .01 | 44.21 | .60 | .85 | .81 | .01 | 66.00 | .69 | .93 |
| Experts | 27 | 488 | .68 | .00 | 126.13 | .68 | .68 | .76 | .00 | 157.52 | .76 | .76 |
|
|
|
| . | . |
| . | . | |||||
| Students | 26 | 663 | .77 | .02 | 31.23 | .18 | .58 | .81 | .02 | 31.45 | .62 | .99 |
Note. k = Number of correlations (tasks) according to Hunter and Schmidt [16]. N = Total sample size according to Hunter and Schmidt [16]. R = mean true score correlation according to Hunter and Schmidt [16]. var = corrected variation according to Hunter and Schmidt ([16], variance of true score correlation). 75% rule = Percentage variance of observed correlations due to all artifacts, if below 75%, it indicates moderator variable. – mean true score correlation increased the value of 1.
a In medical science only experts are included.
b see bare-bones meta-analysis, no correction because this category includes only objective criterions.
Comparison of estimations of the linear knowledge component (G) with different meta-analytical approaches ordered by domain and experience level.
| Bare-bones meta-analysis | Psychometric meta-analysis | |||||||||||
|
|
| |||||||||||
| Domains, experience levels |
|
|
|
| 75% | .10 | .80 |
|
| 75% | .10 | .80 |
| Medical science | 10 | 258 | .61 | .02 | 50.72 | .42 | .79 | .82 | .02 | 68.47 | .64 | 1.00 |
| Business science | 9 | 239 | .66 | .07 | 15.85 | .32 | .99 | .73 | .08 | 15.85 | .37 | 1.09 |
| Experts | 6 | 116 | .55 | .05 | 33.56 | .26 | .83 | .60 | .06 | 35.56 | .28 | .91 |
| Students | 3 | 123 | .78 | .05 | 6.95 | .49 | 1.0 | .86 | .06 | 6.95 | .54 | 1.17 |
| Education science | 4 | 156 | .73 | .01 | 35.62 | .60 | .85 | .81 | .01 | 35.62 | .66 | .85 |
| Experts | 2 | 40 | .89 | .00 | 313.80 | .89 | .89 | .98 | .00 | 313.80 | .98 | .98 |
| Students | 2 | 116 | .68 | .00 | 51.03 | .68 | .68 | |||||
| Psychology | 9 | 105 | .38 | .02 | 73.77 | .19 | .56 | .42 | .03 | 73.77 | .19 | .64 |
|
|
|
| . | . |
|
| . | |||||
| Experts | 4 | 59 | .17 | .00 | 444.93 | .17 | .17 | .18 | .00 | 444.93 | .18 | .18 |
| Students | 5 | 46 | .65 | .03 | 57.65 | .42 | .87 | .72 | .04 | 57.65 | .46 | .97 |
|
|
|
| . | . |
|
| . | |||||
| Miscellaneous | 12 | 249 | .68 | .07 | 19.26 | .34 | 1.00 | .75 | .08 | 19.27 | .38 | 1.11 |
|
|
|
| . | . |
|
|
| |||||
| Experts | 5 | 15 | .92 | .00 | 768.55 | .92 | .92 | .96 | .00 | 768.55 | .96 | .96 |
| Students | 11 | 234 | .66 | .06 | 24.15 | .34 | .97 | .74 | .08 | 24.15 | .37 | 1.1 |
| Overall | 44 | 1,007 | .63 | .05 | 24.91 | .34 | .91 | .77 | .07 | 37.11 | .43 | 1.1 |
| Experts | 27 | 488 | .57 | .04 | 43.69 | .31 | .82 | .71 | .06 | 49.80 | .39 | 1.0 |
| Students | 21 | 519 | .69 | .04 | 21.81 | .43 | .95 | .77 | .06 | 22.00 | .45 | 1.0 |
Note. k = Number of correlations (tasks) according to Hunter and Schmidt [16]. N = Total sample size according to Hunter and Schmidt [16]. G = mean true score correlation according to Hunter and Schmidt [16]. var = corrected variation according to Hunter and Schmidt ([16]., variance of true score correlation). 75% rule = Percentage variance of observed correlations due to all artifacts, if below 75%, it indicates moderator variable. – mean true score correlation increased the value of 1.
a In the medical science only experts are included.
Comparison of estimations of the consistency component (Rs) with different meta-analytical approaches ordered by domain and experience level.
| Bare-bones meta-analysis | Psychometric meta-analysis | |||||||||||
|
|
| |||||||||||
| Domains, experience level |
|
|
|
| 75% | .80 | .10 |
|
| 75% | .80 | .10 |
| Medical science | 10 | 258 | .81 | .00 | 74.95 | .81 | .81 | .96 | .00 | 126.87 | .96 | .96 |
| Business science | 9 | 239 | .81 | .01 | 28.60 | .68 | .93 | .89 | .02 | 28.60 | .70 | 1.0 |
| Experts | 6 | 116 | .62 | .00 | 268.23 | .62 | .62 | .69 | .00 | 108.29 | .69 | .69 |
| Students | 3 | 123 | .77 | .03 | 12.68 | .54 | .99 | .85 | .03 | 12.68 | .62 | 1.00 |
| Education science | 4 | 156 | .73 | .00 | 43.52 | .60 | .85 | .93 | .00 | 554.87 | .93 | .93 |
|
|
|
| . | . |
| . |
| |||||
| Experts | 2 | 40 | .92 | .00 | 1,241 | .92 | .92 | .96 | .00 | 1,241 | .96 | .96 |
| Students | 2 | 116 | .66 | .00 | 422.27 | .66 | .66 | 1.00 |
| |||
| Psychology | 12 | 150 | .79 | .01 | 71.34 | .66 | .91 | .83 | .01 | 71.34 | .07 | .96 |
| Experts | 4 | 59 | .85 | .01 | 48.83 | .72 | .98 | .89 | .01 | 48.83 | .76 | 1.0 |
| Students | 8 | 91 | .74 | .00 | 107.28 | .74 | .74 | .78 | .00 | 107.28 | .78 | .78 |
|
|
|
| . | . |
| . | . | |||||
| Miscellaneous | 12 | 249 | .71 | .00 | 90.86 | .71 | .71 | .75 | .00 | 90.92 | .75 | .75 |
|
|
|
| . | . |
|
| 1.5 | |||||
| Experts | 5 | 15 | .95 | .00 | 1,724 | .95 | .95 | .98 | .00 | 1,724 | .98 | .98 |
|
|
|
| . | . |
| . | 1.09 | |||||
| Students | 11 | 234 | .69 | .00 | 148.50 | .69 | .69 | .73 | .00 | 148.50 | .73 | .73 |
| Overall | 47 | 1,052 | .77 | .00 | 53.34 | .64 | .90 | .85 | .00 | 100.63 | .85 | .85 |
|
| 58 | 1,260 | .66 | .14 | 17.31 | .18 | 1.13 | |||||
| Experts | 27 | 488 | .83 | .00 | 89.61 | .83 | .83 | .92 | .00 | 138.60 | .92 | .92 |
|
|
|
| . | . |
| . | 1.02 | |||||
| Students | 24 | 564 | .71 | .01 | 75.37 | .63 | .80 | .78 | .00 | 139.48 | .78 | .78 |
|
|
|
| . | . |
| . |
| |||||
Note. k = Number of correlations (tasks) according to Hunter and Schmidt [16]. N = Total sample size according to Hunter and Schmidt [16]. R = mean true score correlation according to Hunter and Schmidt [16]. var = corrected variation according to Hunter and Schmidt ([16]., variance of true score correlation). 75% rule = Percentage variance of observed correlations due to all artifacts, if below 75%, it indicates moderator variable.
a In medical science only experts are included.
b mean true score correlation increased the value of 1.
Comparison of the success of bootstrapping judges with a linear judgment model (GRe) based on different meta-analytical approaches (bare-bones vs. psychometric approach).
| Research area |
|
| Bare-bones ( | Psychometric ( | Estimated success (Bare-bones) | Estimated success (Psychometric) |
| Medical | 10 | 258 | .41 | .76 | .01 | .36 |
| Business | 9 | 239 | .47 | .52 | −.03 | .02 |
| Educational | 4 | 156 | .51 | .72 | .12 | .22 |
| Psychological | 9 | 105 | .26 | .30 | .04 | .08 |
| Miscellaneous | 12 | 249 | .60 | .70 | .16 | .26 |
| Overall | 44 | 1007 | .46 | .62 | .07 | .23 |
a = there are 5 more R values of studies included having 144 participants;
= component R is not corrected.