| Literature DB >> 34429084 |
Andrew James Amos1,2, Kyungmi Lee3, Tarun Sen Gupta4, Bunmi S Malau-Aduli4.
Abstract
PURPOSE: There is growing concern that inequities in methods of selection into medical specialties reduce specialist cohort diversity, particularly where measures designed for another purpose are adapted for specialist selection, prioritising reliability over validity. This review examined how empirical measures affect the diversity of specialist selection. The goals were to summarise the groups for which evidence is available, evaluate evidence that measures prioritising reliability over validity contribute to under-representation, and identify novel measures or processes that address under-representation, in order to make recommendations on selection into medical specialties and research required to support diversity.Entities:
Keywords: Application; Bias; Diversity; Equity; Ethnicity; Gender; Justice; Matching; Residency; Specialist selection
Mesh:
Year: 2021 PMID: 34429084 PMCID: PMC8385860 DOI: 10.1186/s12909-021-02685-w
Source DB: PubMed Journal: BMC Med Educ ISSN: 1472-6920 Impact factor: 2.463
Common instruments for selection into medical specialist training programmes [14, 20]
| Instrument | Description |
|---|---|
| Includes standardised and non-standardised interviews, which may be supported by psychometric evidence, although frequently involve subjective judgements. | |
| Particularly school results measured against a year-cohort, but may include other information, such as extra-curricular activities, awards, etc | |
Includes exams which test general medical, not specialist, aptitude: • Standardised exams used for selection into medical school or licensure for practice, such as the United States Medical Licensing Exam(s) and the UK’s Multi-Specialty Recruitment Assessment And exams designed for particular specialties, including: • OSCE format interviews • Situational judgement tests which assess non-cognitive characteristics by presenting workplace-based scenarios requiring non-clinical decisions • Clinical problem-solving tests (CPST) which involve multiple-choice responses to clinical scenarios requiring clinical reasoning | |
| Structured or free-form document(s) provided by candidate outlining their education, training, and work experiences. | |
| Structured or free-form letters expressing an opinion on the candidates’ specific or general capacities, often weighted for the perceived expertise or prestige of the undersigned; for example greater weight may be given to a LoR by the Dean of a prominent medical school than a consultant in a medical specialty. | |
| Structured or free-form statements by the candidate usually addressing specific criteria such as motivation, priorities, and personal circumstances. | |
| Structured or free-form reports by referees with knowledge of the candidate addressing specific selection criteria. | |
| The criteria used for selection into individual specialist training programmes may not be precisely defined. Locally defined criteria may involve algorithms weighting various of the instruments described above, and may or may not involve objective thresholds or subjective judgements |
Inclusion and exclusion study criteria
| Inclusion criteria | Exclusion criteria |
|---|---|
• Selection into medical specialty training program • Results report empirical evidence about a measure used for medical specialty selection • Focus of article is on diversity or under-represented minority in medical specialty training • Published between 1.01.2000 and 31.12.2020 • English | • Selection into medical school • Selection into non-medical training: ○ Nursing ○ Allied Health ○ Dental ○ Pharmacy • Articles where diversity or underrepresented minority in medical specialty training is not the focus • Not in English • Published prior to 1.1.2000 or after 31.12.2020 • Survey results only • Empirical results relate only to preferences, perceptions, motivations to apply, and not measures used as basis of selection |
Fig. 1PRISMA Flowchart of literature search and article inclusion/exclusion
Summary of Reviewed Articles
| Article (bolded authors claimed evidence of bias) | Description | Main findings | Diversity conclusions | Strengths/limitations | MERSQI Score |
|---|---|---|---|---|---|
| MacLellan et al. (2010) [ | Compared IMG and DMG performance on in- and end-training exams | End-training exam pass rate: IMG 56% versus DMG 93.5% ( | |||
| Compared IMG with DMG performance on end-training exams (GP/Family medicine) | URM failed first attempt more than white DMG (OR 3.5, IMG failed first attempt more than white DMG (OR 14.7, | ||||
| McManus et al. (2014) [ | Compared IMG with DMG performance on end-training exams (GP/Family medicine & Internal medicine) | IMG performed worse than DMG on end-training exams (~ 1.25 SD) | Raising cutoffs is needed for equivalence with DMG but would affect workforce | ||
| Measured factors associated with differences in performance of IMG and DMG on end-training exams (GP/Family medicine) | Clinical skill performance better predicted by SJT than CPST (beta 0.26 v 0.17) SJT mediated relationship between English fluency and clinical skills performance | ||||
| Tiffin et al. (2014) [ | Measure IMG performance during residency | IMG more likely to receive unsatisfactory ARCP than DMG (OR 1.63, | Thresholds would need to be increased to achieve equivalence, but would affect workforce and decrease diversity | ||
| Measure bias against IMG in resident selection comparing pre-training academic attainment with in-training assessment | UK overseas graduates more likely deemed appointable than IMG (OR 1.29, | ||||
| Wakeford et al. (2015) [ | Measure correlation between GP/Family medicine and Internal medicine exam performance by ethnicity | High correlation between GP/IM exam performance, suggesting validity of each assessment (and does not suggest bias against URM) URM performed less well | |||
Woolf et al. (2019) [ | Measure effect of gender on specialty training selection | Across all specialties female applicants had: • No difference in applications • Increased offers (OR 1.4, • Increased acceptance (OR 1.43, 2 specialties had significant gender differences in applications (both favouring women): • Paediatrics (OR 1.57, • GP (OR 1.23, | |||
Aisen et al. (2018) [ | Examine effect of gender on | Higher % of males matched (73% v 67%) Among matched applicants: • Males less honors (2.8 v 2.2, • Males higher USMLE1 (245.9 v 240.8, | |||
| Brandt et al. (2013) [ | Examine effect of gender on | No gender difference on USMLE Females more likely to have honors (51% v 41%, | |||
| Chapman et al. (2019) [ | Identify factors associated with under-representation of women across | Female representation higher in specialties with lower mean USMLE1 entry score ( 1% increase in female faculty prevalence associated with 1.45% increase in female trainees in specialty ( | Association between female faculty and female trainees suggests mentoring may increase diversity | ||
| Measure factors associated with selection to | Factors associated with selection: • Female • Younger • Higher USMLE 2 • DMG | ||||
Dirschl et al. (2006) [ | Measure whether gender and academic scores can predict | 12.5% female applicants Faculty ratings of training were not associated with academic scores | |||
| Driver et al. (2014) [ | Identify factors associated with | Increased % of selection associated with: • Higher USMLE1 (OR 3.22, • Letters of recommendation (OR 6.2, • Publications (OR 3, | |||
| Measure effect of gender on selection into | 13.8% female applicants USMLE1 higher for selected (233 v 211, Females had lower OR of matching (0.59, Females had lower mean USMLE1 scores (222 v 230, | Possible bias remains after multivariate analysis | |||
| Measure bias against African Americans due to | Mean USMLE1 of African Americans was 200, non-AA was 216 OR for rejection of AA varied from 3 to 6 ( | ||||
| Measure gender bias in letters of recommendation for | LoR for males had: • More authentic tone • More references to personal drive, work, and power LoR referring to power more likely to be associated with selection | ||||
| French et al. (2019) [ | Measure gender bias in LoR for | Female authors wrote longer letters | |||
| Measure gender bias in standardised versus narrative LoR for | No difference in ranking of male/female applicants Female writers produce LoRs different to male writers (p < .05) LoRs written for female applicants less positive than those written for male applicants ( | ||||
| Gardner et al. (2019) [ | Measure effect of USMLE cutoffs on underrepresented minorities in | Reducing USMLE1 cutoffs and adding SJT screening increased URMs offered interview by 8% | Does not claim bias | ||
| Girzadas et al. (2004) [ | Measure effect of gender on SLoR for | Female author with female applicant OR 2 to get highest ranking on LoR ( | |||
| Measure gender bias in | 24% female applicants Females were • 30% of offered interviews • 38% of top quartile ( • 25% of selected Female applicants average USMLE1 score was 5 points lower ( Female applicants had higher mean interview scores ( | Associated with lower female USMLE1 scores Associated with higher female interview scores | |||
| Measure gender bias in LoR for | Female LoR had more communal phrases ( | ||||
| Measure gender bias in LoR for | Male applicant LoR had more agentic terms ( LoR written by senior staff more likely to describe female applicants with communal terms ( | ||||
Hopson et al. (2019) [ | Measure influence of gender on outcome of | No significant difference on standardised video interview | |||
| Kobayashi et al. (2019) [ | Measure influence of gender on LoR in | Female applicants had: • Longer LoR ( • More “achieve” words ( No differences for male v female authors | |||
| Measure gender bias in LoR for | M/F applicants had similar: • USMLE1 • Academic achievement LoR for male applicants had: • Less feel words ( • Less biological words ( | ||||
| Measure correlation between USMLE scores and clinical competence at beginning of residency | USMLE1 scores lower for URM (212 v 230, URM not significantly worse than non-URM on OSCE stations at beginning of residency | ||||
| Norcini et al. (2014) [ | Predict patient outcomes of IMGs from USMLE scores | Increased USMLE2 CK score associated with decreased mortality as a physician 1 SD on USMLE 2 CK associated with 4% improvement in mortality | |||
Poon et al. (2019) [ | Compare | URM were 29% of applicants and 25% of enrolments White/Asian applicants had higher USMLE1 than Black applicants (234 v 218, | Bias not evaluated | ||
| Measure effect of personality similarity to bias the selection of | Clinicians rated candidates more favourably when they shared personality characteristics ( | ||||
| Scherl et al. (2001) [ | Measure gender bias in | No significant difference in selection of male and female charts | |||
Stain et al. (2013) [ | Measure attributes of top-ranked applicants to | Males had higher USMLE1 (238 v 230, Males/Females had similar USMLE2 scores (245 v 244, Highly competitive programs associated with • USMLE1 (RR 1.36) • Publications (RR 2.2) • Asian (RR 1.7 v white) | |||
| Unkart et al. (2016) [ | Measure reduction in | URM were: • Older at entry (24 v 23, • Lower MCAT (30 v 33, • More likely to choose a less competitive specialty ( | |||
Villwock et al. (2019) [ | Measure effect of STAR tool for selecting | USMLE scores significantly increased after STAR tool No differences in gender/URM before/after introduction of STAR selection tool | |||
ARCP Annual Review of Competence Progression, CPST Clinical Problem Solving Test, DMG Domestic Medical Graduate, IMG International Medical Graduate, LoR Letter of Recommendation, PLAB Professional and Linguistic Assessment Board, SJT Situational Judgement Test, URM Underrepresented minority
a MERSQI scores include subscales which are not applicable for all articles; scores are scaled after removal of these subscales to allow comparison with a maximum score of 18 for all articles (Reed et al, 2007) [17]
Fig. 2Length of follow-up