| Literature DB >> 30225409 |
Andrea N Leep Hunderfund1, Yoon Soo Park2, Frederic W Hafferty3, Kelly M Nowicki4, Steven I Altchuler5, Darcy A Reed6.
Abstract
OBJECTIVE: To provide validity evidence for a multifaceted organizational program for assessing physician performance and evaluate the practical and psychometric consequences of 2 approaches to scoring (mean vs top box scores). PARTICIPANTS AND METHODS: Participants included physicians with a predominantly outpatient practice in general internal medicine (n=95), neurology (n=99), and psychiatry (n=39) at Mayo Clinic from January 1, 2013, through December 31, 2014. Study measures included hire year, patient complaint and compliment rates, note-signing timeliness, cost per episode of care, and Likert-scaled surveys from patients, learners, and colleagues (scored using mean ratings and top box percentages).Entities:
Keywords: FPPE, focused practice performance evaluation; GIM, general internal medicine; MSF, multisource feedback
Year: 2017 PMID: 30225409 PMCID: PMC6135024 DOI: 10.1016/j.mayocpiqo.2017.05.005
Source DB: PubMed Journal: Mayo Clin Proc Innov Qual Outcomes ISSN: 2542-4548
Demographic Characteristics of the 233 Study Participantsa
| Characteristic | Physicians |
|---|---|
| Age (y), mean ± SD | 50.1±11.4 |
| Male sex (No. [%]) | 151 (65) |
| Specialty (No. [%]) | |
| General internal medicine | 95 (41) |
| Neurology | 99 (42) |
| Psychiatry | 39 (17) |
| Academic rank (No. [%]) | |
| Professor | 47 (20) |
| Associate professor | 30 (13) |
| Assistant professor | 114 (49) |
| Instructor | 17 (7) |
| No rank | 25 (11) |
| Hire year (No. [%]) | |
| 2010-2014 | 52 (22) |
| 2005-2009 | 38 (16) |
| 2000-2004 | 39 (17) |
| 1995-1999 | 37 (16) |
| 1990-1994 | 28 (12) |
| 1985-1989 | 13 (5) |
| 1980-1984 | 12 (5) |
| Before 1980 | 14 (6) |
Percentages may not sum to 100% due to rounding.
Participating physicians were those identified by their department or division chair as having a predominantly outpatient clinical practice.
Age and academic rank as of January 1, 2014 (the midpoint of the 2-year study time frame).
Age and sex data were not linked to physician performance data to protect the anonymity of study participants.
Physician Clinical Performance Assessments: Corresponding Content Domains and Scoresa
| Assessment (physicians, No.) | Scale | Content domain | Mean scores | Top box scores | ||
|---|---|---|---|---|---|---|
| Potential scores | Observed scores, mean ± SD | Potential scores | Observed scores, mean ± SD | |||
| Patient complaint rate | Complaints per 100 outpatient visits (No.) | Patient satisfaction | 0.00+ | 0.32±1.78 | NA | NA |
| Patient compliment rate | Compliments per 100 outpatient visits (No.) | Patient satisfaction | 0.00+ | 0.12±0.76 | NA | NA |
| Timeliness of note signing (n=231) | Clinical notes signed on time (%) | Clinical processes | 0-100 | 96.0±6.6 | NA | NA |
| Mean internal cost per episode of care | Costs | −3 to +3 SD | 0.56±0.59 | NA | NA | |
| Patient satisfaction survey (n=201) | 5-Point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree) | Patient satisfaction | 1.00-5.00 | 4.73±0.27 | 0%-100% | 85.8 (11.0) |
| Learner evaluations (n=141) | 5-Point Likert scale ranging from 1 (needs improvement) to 5 (top 10%) | Clinical processes | 1.00-5.00 | 4.06±0.31 | 0%-100% | 18.6 (16.8) |
| MSF, internal medicine (n=10) | 5-Point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree) | Clinical processes | 1.00-5.00 | 4.41±0.49 | 0%-100% | 45.0 (25.7) |
| Peer feedback, neurology (n=94) | 5-Point Likert scale ranging from 1 (never) to 5 (always) | Clinical processes | 1.00-5.00 | 4.94±0.08 | 0%-100% | 90.7 (10.5) |
| MSF, psychiatry (n=36) | 4-Point Likert scale ranging from 1 (strongly disagree) to 4 (strongly agree) | Clinical processes | 1.00-4.00 | 3.77±0.25 | 0%-100% | 81.5 (21.9) |
MSF = multisource feedback; NA = not applicable.
Of 233 physicians (95 internists, 99 neurologists, and 39 psychiatrists); assessment data are from 2013 and 2014 except for general internal medicine MSF data, which were collected only during 2014.
Using the Value Compass as a conceptual framework.
For Likert-scaled assessments, means were calculated first at the level of individual survey items, then across all items on a given instrument. For all measures, separate scores were calculated for 2013 and 2014, then averaged to summarize overall performance.
For Likert-scaled assessments, scores represent the percentage of optimal ratings (ie, the highest possible Likert scale rating) across all items for a given a physician over the course of a year; separate scores were calculated for 2013 and 2014, then averaged to summarize overall performance.
Unsolicited complaints and compliments related to physician care.
Cost represents the internal costs of providing care to a patient, reflects utilization (eg, physicians who order more [or more costly] tests and consultations have higher internal cost per episode of care), and is unrelated to prices or charges to patients/insurers. Internal costs are attributed to the physician with the highest evaluation and management billing code on the first day of a patient's evaluation, and the subsequent days or weeks over which tests and consultations are performed are considered an episode of care.
Captures greater than 99% of normally distributed data.
Entry of free-text comments was required for ratings of 1 or 5.
Data are from 2014 only (n=30) because psychiatry MSF data from 2013 were stored in a way that precluded calculation of top box scores.
Physician Clinical Performance Assessments: Response Process and Internal Structure Validity Evidenceab
| Assessment | Items (No.) | Response process (No.), mean ± SD | Internal structure | |||
|---|---|---|---|---|---|---|
| Physicians assessed per year | Raters per physician per year | Ratings per physician per year | Cronbach α | Item discrimination index, mean ± SD | ||
| Patient complaint rate | NA | 217 (1) | NA | NA | NA | NA |
| Patient compliment rate | NA | 217 (1) | NA | NA | NA | NA |
| Timeliness of note signing | NA | 225 (1) | NA | NA | NA | NA |
| Mean internal cost per episode of care | NA | 205 (3) | NA | NA | NA | NA |
| Patient satisfaction survey | 9 | 191 (2) | 36 (18) | 314 (156) | 0.97 | 0.88 (0.04) |
| Learner evaluations | 22 | 115 (21) | 6 (2) | 126 (47) | 0.96 | 0.74 (0.12) |
| MSF (general internal medicine) | 7 | 11 | 4 | 27 | 0.89 | 0.88 (0.08) |
| Peer feedback (neurology) | 6 | 92 (1) | 7 (0) | 58 (19) | 0.83 | 0.78 (0.06) |
| MSF (psychiatry) | 5 | 26 (8) | 6 (2) | 17 (13) | 0.96 | 0.73 (0.11) |
MSF = multisource feedback; NA = not applicable.
Assessment data are from 2013 and 2014 except for general internal medicine MSF data, which were collected only during 2014.
Of 233 eligible physicians (although the number of physicians eligible for assessment by learner evaluations was likely <233 because not all physicians interact with residents and fellows). Specialties aimed to collect multisource or peer feedback for each physician every 2 to 3 years (general internal medicine, 95 physicians), every year (psychiatry, 39 physicians), or twice per year (neurology, 99 physicians).
Item discrimination indices were calculated at the item level using item-rest correlation coefficients, then averaged across all items within a given assessment to generate a mean item discrimination index.
Total pool of items; individual learner evaluation forms contained subsets of items.
No standard deviation because only 2014 data were available.
Physician Clinical Performance Assessments: Correlation Matrix (Using Mean Scores)ab
| Patient complaint rate | Patient compliment rate | Timeliness of note signing | Mean internal cost per episode of care | Patient satisfaction survey | Learner evaluations | MSF (GIM) | Peer feedback (neurology) | MSF (psychiatry) | |
|---|---|---|---|---|---|---|---|---|---|
| Patient complaint rate | 1.00 | ||||||||
| Patient compliment rate | −0.01 (.91) | 1.00 | |||||||
| Timeliness of note signing | 0.06 (.38) | 0.05 (.50) | 1.00 | ||||||
| Mean internal cost per episode of care | −0.01 (.90) | 0.02 (.75) | 0.05 (.47) | 1.00 | |||||
| Patient satisfaction survey | −0.02 (.74) | 0.04 (.56) | −0.12 (.09) | 0.02 (.81) | 1.00 | ||||
| Learner evaluations | −0.10 (.25) | −0.04 (.60) | −0.09 (.29) | −0.16 (.07) | 0.26 (.003) | 1.00 | |||
| MSF (GIM) | −0.34 (.32) | NA | −0.50 (.11) | −0.22 (.51) | 0.42 (.19) | −0.93 (.24) | 1.00 | ||
| Peer feedback (neurology) | 0.08 (.46) | 0.12 (.27) | −0.08 (.43) | −0.27 (.008) | 0.12 (.25) | 0.10 (.35) | NA | 1.00 | |
| MSF (psychiatry) | −0.03 (.86) | NA | 0.18 (.29) | 0.11 (.53) | 0.10 (.59) | −0.07 (.71) | NA | NA | 1.00 |
GIM = general internal medicine; MSF = multisource feedback; NA = not applicable.
Data are given as correlation coefficients (P values); mean scores were calculated first at the item level, then across all items within a given instrument.
Insufficient variability precluded calculation of a correlation coefficient.
Physician Clinical Performance Assessments: Consequences of Measurementab
| Assessment | Physicians (No.) | Threshold | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 SD from the mean | 2 SD from the mean | ||||||||
| Mean scores | Top box scores | Mean scores | Top box scores | ||||||
| Cutoff score | Trigger rate (No. [%]) | Cutoff score | Trigger rate (No. [%]) | Cutoff score | Trigger rate (No. [%]) | Cutoff score | Trigger rate (No. [%]) | ||
| Patient complaint rate | 226 | >2.1 | 4 (2) | NA | NA | >3.9 | 3 (1) | NA | NA |
| Timeliness of note signing | 231 | <89.5% | 24 (10) | NA | NA | <82.9% | 10 (4) | NA | NA |
| Mean internal cost per episode of care | 210 | <−0.02 or >1.15 | 63 (30) | NA | NA | <−0.61 or >1.74 | 13 (6) | NA | NA |
| Patient satisfaction survey | 201 | <4.46 | 15 (7) | <74.8% | 18 (9) | <4.19 | 4 (2) | <63.8% | 0 |
| Learner evaluations | 141 | <3.75 | 13 (9) | <1.8% | 20 (15) | <3.43 | 4 (3) | <1% | 0 |
| MSF (internal medicine) | 10 | <3.92 | 1 (10) | <19.3% | 2 (18) | <3.43 | 1 (10) | <1% | 0 |
| Peer feedback (neurology) | 94 | <4.86 | 13 (14) | <80.3% | 13 (14) | <4.78 | 3 (3) | <69.8% | 4 (4) |
| MSF (psychiatry) | 36 | <3.52 | 6 (17) | <59.6% | 11 (31) | <3.26 | 1 (3) | 37.7% | 8 (22) |
MSF = multisource feedback; NA = not applicable.
Assessment data are from 2013 and 2014 except for general internal medicine MSF data, which were collected only during 2014; cutoff scores were not applied to patient compliments.
Hypothetical cutoff scores set at 1 or 2 SD above the mean for patient complaints; 1 or 2 SD below the mean for timeliness of note signing, patient satisfaction survey, learner evaluations, and multisource or peer feedback surveys; or 1 or 2 SD above and below the mean for mean internal costs per episode of care.
For Likert-scaled assessments, means were calculated first at the level of individual survey items, then across all items on a given instrument. For all measures, separate scores were calculated for 2013 and 2014, then averaged to summarize overall performance.
For Likert-scaled assessments, scores represent the percentage of optimal ratings (ie, the highest possible Likert scale rating) across all items for a given physician over the course of a year; separate scores were calculated for 2013 and 2014, then averaged to summarize overall performance.
A cutoff score of less than 1% was used when 2 SD below the mean was a negative value.
Data are from 2014 only (psychiatry MSF data from 2013 were stored in a way that precluded calculation of top box scores).