Mark A Ward1, Debra L Palazzi1, Martin I Lorin1, Anoop Agrawal2, Hilel Frankenthal3, Teri L Turner1. 1. a Department of Pediatrics , Baylor College of Medicine , Houston , TX , USA. 2. b Department of Medicine , Baylor College of Medicine , Houston , TX , USA. 3. c Department of Child Health , University of Missouri Hospital and Clinics , St. Louis , MO , USA.
Abstract
BACKGROUND: The Medical Student Performance Evaluation (MSPE) is a primary source of information used by residency programs in their selection of trainees. The MSPE contains a narrative description of the applicant's performance during medical school. In 2002, the Association of American Medical Colleges' guideline for preparation of the MSPE recommended inclusion of a comparative summative assessment of the student's overall performance relative to his/her peers (final adjective). OBJECTIVE: We hypothesize that the inclusion of a final adjective in the MSPE affects a reviewer's assessment of the applicant's desirability more than the narrative description of performance and designed a study to evaluate this hypothesis. DESIGN: Fifty-six faculty members from the Departments of Pediatrics and Medicine with experience reviewing MSPEs as part of the intern selection process reviewed two pairs of mock MSPE letters. In each pair, the narrative in one letter was superior to that in the other. Two final adjectives describing relative class ranks were created. Each subject was first presented with a pair of letters with mismatched final adjective (study), i.e., the letter with the stronger narrative was presented with the weaker final adjective and vice versa. The subject was then presented with a second pair of letters without final adjectives (control). Subjects ranked the relative desirability of the two applicants in each pair. RESULTS: The proportion of rankings congruent with the strength of the narratives under study and control conditions were compared. Subjects were significantly less likely to rank the applicants congruent with the strength of the narratives when the strength of the final adjectives conflicted with the strength of the narrative; 42.9% of study letters were ranked congruent with the narrative versus 82.1% of controls (p = 0.0001). CONCLUSION: The MSPE final adjective had a greater impact than the narrative description of performance on the determination of applicant desirability. ABBREVIATIONS: MSPE: Medical Student Performance Evaluation; AAMC: Association of American Medical Colleges; BCM: Baylor College of Medicine.
BACKGROUND: The Medical Student Performance Evaluation (MSPE) is a primary source of information used by residency programs in their selection of trainees. The MSPE contains a narrative description of the applicant's performance during medical school. In 2002, the Association of American Medical Colleges' guideline for preparation of the MSPE recommended inclusion of a comparative summative assessment of the student's overall performance relative to his/her peers (final adjective). OBJECTIVE: We hypothesize that the inclusion of a final adjective in the MSPE affects a reviewer's assessment of the applicant's desirability more than the narrative description of performance and designed a study to evaluate this hypothesis. DESIGN: Fifty-six faculty members from the Departments of Pediatrics and Medicine with experience reviewing MSPEs as part of the intern selection process reviewed two pairs of mock MSPE letters. In each pair, the narrative in one letter was superior to that in the other. Two final adjectives describing relative class ranks were created. Each subject was first presented with a pair of letters with mismatched final adjective (study), i.e., the letter with the stronger narrative was presented with the weaker final adjective and vice versa. The subject was then presented with a second pair of letters without final adjectives (control). Subjects ranked the relative desirability of the two applicants in each pair. RESULTS: The proportion of rankings congruent with the strength of the narratives under study and control conditions were compared. Subjects were significantly less likely to rank the applicants congruent with the strength of the narratives when the strength of the final adjectives conflicted with the strength of the narrative; 42.9% of study letters were ranked congruent with the narrative versus 82.1% of controls (p = 0.0001). CONCLUSION: The MSPE final adjective had a greater impact than the narrative description of performance on the determination of applicant desirability. ABBREVIATIONS: MSPE: Medical Student Performance Evaluation; AAMC: Association of American Medical Colleges; BCM: Baylor College of Medicine.
Entities:
Keywords:
Medical Student Performance Evaluation; dean’s letter; intern selection; medical student; residency applicants
Residency programs consider multiple factors when ranking applicants for the National Residency Match Program. Program directors have indicated that the Medical Student Performance Evaluation (MSPE) is an important factor used by 94% of directors [1]. The MSPE, formerly known as the ‘dean’s letter,’ aims to provide a comprehensive assessment for residency program directors of a student’s noteworthy characteristics, professional behaviors, salient experiences, and academic achievements.In 2002, in response to concerns about the value of the dean’s letter, the Association of American Medical Colleges (AAMC) published guidelines for the letter, including changing the name to the MSPE to reflect its purpose as an evaluation of the student’s performance rather than a recommendation [2]. These guidelines recommended a specific format, which included the traditional narratives describing performance during clinical rotations and a summative assessment of the student’s overall performance relative to his or her peers if a school-wide comparison of the applicant is made. The 2016 recommendations of the AAMC refer to this assessment as the final adjective or the overall rating [3]. This term has often informally been referred to as the bottom line. Examples of final adjectives include ‘outstanding,’ ‘superior,’ ‘very good,’ and ‘qualified.’ Currently there is no standard set of comparative descriptors used by every US medical school. The 2016 AAMC statement also recommends that if a final adjective is used, the MSPE should list the full range of descriptors and the percentage of students falling into each comparison group [3]. However, if a medical school does not provide a school-wide comparison then the final adjective or overall rating should not be included.Mallott suggested that many program directors focus primarily or solely on the final adjective when evaluating the MSPE [4]. While studies have looked at the value placed on clinical grades contained in the MSPE compared to class rank [5,6], there are no studies assessing the weight given to the narrative description of clinical performance versus the final adjective by faculty members when ranking applicants. Currently, within the same applicant pool for any given year, some MSPEs contain final adjectives and others do not. The purpose of our study was to determine the impact of the narrative description of clinical performance compared to the final adjective on the reviewer’s assessment of applicant desirability.
Methods
We designed a post-test only study in which subjects served as their own controls. Subjects were full-time faculty members of the Department of Pediatrics or the Department of Medicine at Baylor College of Medicine (BCM). All subjects were serving or had recently served on their respective departments’ resident selection committees and had experience reviewing MSPEs. Participation was voluntary. After ranking the letters, participants completed a demographic survey that included a question as to whether or not they felt rushed during the study. This question was included as a surrogate measure to assess whether or not the faculty had sufficient time to read and reflect critically on the MSPEs. The study was conducted over a 6-month period.One of the authors (MIL), with over 30 years of experience reviewing dean’s letters and MSPEs, created two pairs (pair AB and pair CD) of mock narrative descriptions of clinical performance (without grades), such that one narrative in each pair was superior to that of the other: narrative A was superior to B and narrative C was superior to D. Then, without being told which were the stronger applicants, the other five authors read and ranked A against B and C against D. Revisions of the narratives were made (by MIL) until the other authors reached consensus as to correct rank order within each pair, consensus meaning that at least four of the five authors correctly identified the stronger applicant (see Appendix).The authors then created two final adjectives describing the applicant as ‘outstanding’ or ‘excellent’ followed by an explanation that ‘outstanding’ indicated that the applicant ranked in the top 20% and ‘excellent’ indicated that he or she ranked in the second 20% of students at that school. Each letter also included the full range of descriptors and the percentage of students falling into each comparison group. The outcome measure assessed was whether or not the faculty member correctly (congruent with strength of narratives) assessed the ranking of each pair of MSPEs both with and without a final adjective. To minimize implicit bias, the gender of the students was the same in all letters and the name of the medical school was not stated. Student names were ethnically similar. None of the students deviated from the expected graduation date, remediated coursework, or were the recipient of any adverse action(s). Each MSPE contained a short description of unique characteristics (2016 AAMC recommendations now called ‘noteworthy’ characteristics) and similar preclinical/basic science curricular performance and United States Medical Licensing Examination Step 1 scores.Faculty members were instructed that they would be presented with two pairs of MSPEs representing four unique students from four different medical schools. They were to review the first pair of letters and determine which of the two students was the more desirable applicant, in effect, ranking the two applicants. The first pair of letters (study condition) had mismatched final adjectives (i.e., the letter with the stronger performance narrative was coupled with the weaker final adjective and vice versa). For example, the stronger narrative included a weaker final adjective of ‘excellent’ which was comparatively described as placing in the second quintile. The weaker narrative had a final adjective of ‘outstanding’ placing the applicant in the first quintile. Subjects were then to review and rank the second pair of letters (control condition), which had the sentence with the final adjective and the comparative description removed.To eliminate bias in case the difference between the narratives in one pair was more obvious than in the other pair, subjects were randomized as to whether they received letters A and B with mismatched final adjectives and letters C and D as controls (no final adjective) or C and D with mismatched final adjectives and A and B as controls (Figure 1). Additionally, as there could be a tendency to favor the first or last letter read, the order of presentation of letters within each pair (i.e., stronger narrative/weaker adjective or weaker narrative/stronger adjective) was also randomized. After evaluating the first pair of letters, the subjects were given the second pair. Thus, every faculty member reviewed both pairs of letters (i.e., all four letters). Half of the subjects received the MSPEs A and B with mismatched final adjectives followed by the MSPEs C and D with the final adjectives removed. The other half received MSPEs C and D with mismatched final adjectives followed by the MSPEs A and B with the final adjectives removed. The pair of letters with mismatched final adjectives was always presented first so that when subjects reviewed the second set (control) and realized that there were no final adjectives, they had no choice but to evaluate based on the narratives. Had the subject been given the pair with no final adjective first, when receiving the second pair with final adjectives he or she might realize the nature of the study and be influenced to pay more attention to the narratives.
Figure 1.
Legend for figure for impact of the final adjective in the Medical Student Performance Evaluation on determination of applicant desirability.
Narratives A and B were always paired with each other, as were narratives C and D. Subjects were randomized as to which pair of letters was received under study versus control conditions. The order of presentation within each pair was also randomized (e.g., some received A then B, while others received B then A)
Legend for figure for impact of the final adjective in the Medical Student Performance Evaluation on determination of applicant desirability.Narratives A and B were always paired with each other, as were narratives C and D. Subjects were randomized as to which pair of letters was received under study versus control conditions. The order of presentation within each pair was also randomized (e.g., some received A then B, while others received B then A)Data analysis was conducted with SYSTAT 13, version 1 (SYSTAT Richmond, CA). When comparing two or more proportions, the Pearson chi-square test was used. If the sample size in any group was five or less, a Fisher exact test was performed. The alpha level for all analyses was set at 0.05, and a two-tailed test of significance was used for all calculations. The study was conducted with approval of the BCM Institutional Review Board for Human Research Protocols.
Results
A total of 56 of 60 invited faculty members agreed to participate Thirty-seven participants (66%) were from the Department of Pediatrics and 19 (34%) from the Department of Medicine; 28 (50%) were male and 28 (50%) female; 11 (20%) were full professors. The majority (61%) had more than 3 years of experience on an intern selection committee.The proportion of rankings congruent with the strength of the narratives under study and control conditions were compared. When ranking the applicants whose MSPEs had the mismatched final adjectives, subjects preferred the candidate with the stronger adjective (even though the narrative was weaker) the majority of the time (57.1%; 32). When ranking applicants whose letters had narratives only, with no final adjectives, most (82.1%; 46) subjects ranked the applicants congruent with the strength of the narratives. When comparing whether or not the faculty member correctly assessed the ranking of each pair of MSPEs, subjects were significantly more likely to rank the applicants incongruently when reviewing letters in which the strength of the final adjectives conflicted with the strength of the narrative (χ2 = 18.4 with 1 degree of freedom, p < 0.0001 (p = 0.001)). There were no significant differences in accuracy of rank order by subject specialty (pediatrics or medicine), gender, academic rank, or years of experience as an interviewer. None of the subjects reported feeling rushed.
Discussion
Under the conditions of our study, for a majority (57.1%) of subjects the MSPE final adjective had a greater impact than the narrative description of clinical performance on the ranking of applicant desirability. This was despite the fact that the applicants were from different schools, making comparison based on their class quintile problematic.A study by Loftus et al. compared the performance of first-year residents with their reported performance as students on clinical rotations and with their medical school class ranks [5]. The authors concluded that a student’s clinical performance had a stronger relationship to performance as a resident than did the student’s class rank. Weissman has proposed that clerkship evaluation reports (narrative descriptions) were the critical elements in the MSPE [7]. If Loftus [5] and Weissman [7] are correct, and the student’s clinical clerkship performance record is a better predictor of residency performance than the student’s class rank, it is problematic that a majority (57%) of our subjects put more weight on the final adjective than on the narrative description of the clerkship performance. We propose that the two most likely reasons for this are: (1) the relative ease of ranking by final adjective (in this case, quintile); (2) a belief by reviewers that clerkship performance descriptions are generally difficult to interpret. If the prime reason is ease of ranking by final adjective, we hypothesize the inclusion of a final adjective describing the student’s performance relative to his peers will diminish the attention paid by reviewers to the narrative descriptions of performance in clinical clerkships and thereby diminish the usefulness of the narratives. If the prime reason is a belief that clerkship evaluations are difficult to interpret, this would support arguments for a uniform and transparent system of evaluation of student performance on clinical clerkships [7,8]. We agree with the AAMC recommendation that narratives be unedited for content.In 2007, Lurie and colleagues, surveyed residency program directors in regards to the performance of the graduates of a single US medical school, comparing the director’s perception of the resident’s performance with the student’s rankings in his or her MSPE. The authors concluded, ‘Dean’s letter rankings are a significant predictor of later performance in internship.' [9]. In 2008, a review of MSPEs from 106 US medical schools found that while 80% followed the recommended format, only 17% provided comparative data such as quartiles or a specified hierarchy of categories such as outstanding, excellent, very good, and good [10]. In an additional 30%, comparative data were provided, but the information required to interpret the category was in the appendices rather than in the letters themselves. However, in a review of the literature in 2011, Harfmann and Zirwas concluded that the relationship between the MSPE and performance during residency was unpredictable [11].The insistence of the 2016 AAMC recommendations that any final adjective be accompanied by a specific explanation of how it was determined is especially important in view of the high reliance on the final adjective found in our study.In light of our findings of undue reliance on the final adjective, we agree with the AAMC recommendations that: (1) a final adjective or overall rating be provided only if accompanied by a school-wide comparison with the school’s students; (2) information used to rank students be stated clearly; (3) comparative data be provided in the body of the MSPE eliminating Appendices A–D; (4) the MSPE provides information on how final grades and comparative data are derived.
Limitations
There are several limitations to our study. Our subjects had to read and evaluate only four letters and compare (rank) each letter with the other in the pair. In reality, faculty members usually have to evaluate a larger number of letters and rank each against all the others. In the study, subjects had set aside a more than adequate amount of time to read and rank the four letters; none of the subjects reported feeling rushed. In reality, reviewers may have significant time constraints. We hypothesize that if pressed for time, readers would be even more likely to rely on the final adjective, and so our study may have underestimated the percentage of readers relying on the final adjective. Another limitation is representation from only two disciplines at one medical school. We did not provide grades, and it is possible that grades might influence reviewers more than either the narrative descriptions of clerkship performance or class rank. Finally, although we attempted to address and adjust for all biases, we recognize that there is no way to anticipate all biases that the reviewers may have had.
Conclusions
The MSPE serves as an evaluation of student performance, and the AAMC has created guidelines to enhance the value of this letter. Under study conditions, where reviewers of MSPEs were presented with mock letters from different schools with mismatched final adjective (indicating class standing by quintile), the majority of reviewers were misled by the mismatched final adjective. Our finding that the final adjective had greater impact than the narrative description of clinical performance on the determination of applicant desirability suggests the need for further standardization or other reforms in the narrative reporting of clerkship evaluations.
Authors: Jeffrey B Bird; Karen A Friedman; Thurayya Arayssi; Doreen M Olvet; Rosemarie L Conigliaro; Judith M Brenner Journal: Med Educ Online Date: 2021-12