Literature DB >> 26770710

Who is watching the watchmen: Is quality reporting ever harmful?

Abstract

BACKGROUND: Quality reporting is increasingly used as a tool to encourage health systems, hospitals, and their practitioners to deliver the greatest health benefit. However, quality reporting systems may have unintended negative consequences, such as inadvertently encouraging "cherry-picking" by inadequately adjusting for patients who are challenging to take care of, or underpowering to reliably detect meaningful differences in care. There have been no reports seeking to identify a minimum level of accuracy that ought to be viewed as a prerequisite for quality reporting.
METHOD: Using a decision analytic model, we seek to delineate minimal standards for quality measures to meet, using the simplest assumptions to illustrate what those standards may be.
RESULTS: We find that even under assumptions regarding optimal performance of the quality reporting system (sensitivity and specificity of 1), we can identify a minimal level of accuracy required for the quality reporting system to "do no harm": the increase in health-related quality of life from a higher rather than lower quality practitioner must be greater than the number of practitioners per patient divided by the proportion of patients willing to switch from a lower to a higher quality provider.
CONCLUSION: Quality measurement systems that have not been demonstrated to improve health outcomes should be held to a specific standard of measurement accuracy.

Entities: Disease Species

Keywords: Quality reporting; decision analysis; pay for performance; physician reporting; quality

Year: 2014 PMID： 26770710 PMCID： PMC4607192 DOI： 10.1177/2050312114523425

Source DB: PubMed Journal: SAGE Open Med ISSN： 2050-3121

Quality reporting is increasingly used as a tool to encourage health systems, hospitals, and their practitioners to deliver the consistent care.[1-4] Indeed, there has been no shortage of published reports describing the possible advantages of quality reporting,[5-9] and a “report card” displaying adherence with quality measures can theoretically help consumers, payers, and employers make informed health plan choices and identify the highest quality providers.[10] Report cards could even be part of real-time feedback mechanisms incentivizing health plans to be active participants in improvement of cycle planning, doing, studying, and acting.[11-13] Quality reporting is given great importance in health reform and state- and federal-sponsored exchanges of health information.[4] However, no measurement system is perfect. While there have been many reports describing the obvious limitations and unintended consequences of quality reporting systems (e.g. inadvertently encouraging “cherry-picking”[14] by inadequately adjusting for patients that are challenging to take care of, or underpowering to reliably detect meaningful differences in care),[15-17] when quality metrics have not been demonstrated to improve health benefits, the dangers of harms from unintended consequences loom particularly large. We seek to delineate minimal standards for quality measures to meet, using the simplest assumptions to illustrate what those standards may be.

Development of a criterion

We believe a minimal moral requirement for any quality measurement ought to be based on the normative requirement exhibited in the Hippocratic Oath and many other codes of ethics of “first do no harm.” Accordingly, we lay out minimal metrics for quality measures to meet and using the simplest assumptions to illustrate what those points are. One simple way to operationalize the principle of “doing no harm” is to ensure that expected utility is not decreased. Accordingly, a criterion for quality measurement ought not to decrease expected utility and, ideally, should increase it. Other simplifying assumptions can be identified that may facilitate our objective, regarding characteristics of a particular health system, its patients, and practitioners: The relevant population consists of patients served by the health system and the practitioners who are being subjected to the quality standard. The quality measurement system will partition all practitioners into a higher performing subgroup and a lower-performing subgroup. While it is common and desirable for quality rating systems to have many more stratifications, we choose a simple partition for the sake of illustration, and with the explicit understanding that this model can be generalized to more complicated systems. Any measurement system can be characterized in terms of its operating characteristics (e.g. sensitivity and specificity, true positives, and false positives). A perfect classification system would have a sensitivity and specificity of 1, with all true positives and no false positives. Inadequate risk adjustment and/or insufficient statistical power will be reflected in subpar sensitivity and specificity (e.g. a practitioner who is reported falsely as exhibiting negative quality because of insufficient consideration of her willingness to take care of poor adherers). It is incumbent upon the creator of the measurement system to seek to optimize its performance characteristics. In the long-term, there will be realignment of the supply of practitioners and the demand by patients for those practitioners as a result of the quality measurement system. The realignment will not be perfect. Some people really like their doctor or have insurmountable barriers to choosing alternative doctors and will stick with them regardless of what a quality measurement system recommends. However, in the long-term, some realignment will occur and will extend over a sufficiently long duration so that its impacts will dwarf short-term effects.[18] Accordingly, consequences of true positives from the quality measurement system will be that some additional patients will seek out and intentionally receive above-average care, whereas consequences of false positives will be that some additional patients will seek out and unintentionally receive below-average care. The quality metric under consideration has not yet been demonstrated to improve health outcomes. Additional assumptions include the following: (a) truly labeling a practitioner as low quality may transiently lower practitioner utility, but this utility will revert to baseline with rapidity that is insignificant for our analysis; (b) falsely labeling a practitioner as high quality will transiently raise practitioner utility, but this utility will revert to baseline with a rapidity that is insignificant for our analysis; and (c) truly labeling a practitioner as high quality may raise practitioner utility (e.g. due to enhanced self-regard and pride), but this elevation will be sufficiently small to be ignored. Falsely labeling a practitioner as low quality will lower utility by a substantial amount. We tested decrements ranging from 0 (base case analysis) to 0.5 (consistent with other negative transformative events).[19] In order for a quality reporting system to do no harm, the overall utility after adoption of the quality rating system needs to be at least as great as the overall utility before adoption of the quality rating system. In accord with the above assumptions, standard decision analytic methods can be used to create a decision tree comparing the expected value of “use of a quality measurement” system to the expected value of “no use of a quality measurement system” (Figure 1). This decision tree can then be used to identify the circumstances under which expected utility would be expected to increase by using a quality measurement system.

Figure 1.

Ratio: ratio of practitioners to providers; p: probability that a provider is high quality; sens: sensitivity of the RC; spec: specificity of the RC; pswitch: proportion of patients who would switch from a low-quality provider to a high-quality provider based on information from the RC; µ: baseline utility; µ: change from baseline utility (magnitude of improvement in health-related quality of life) that would be expected to result from changing to a higher-quality provider. True positive is TP = . False positive is FP = .

Solving the tree, we start with

This reduces to

Making the following assumptions: , the equation further simplifies to .

Decision tree and relevant calculations. The decision tree reads from left to right and represents possible pathways through the model. The square node at the left of the diagram is a “choose” node, representing the choice of using a quality reporting system (Report Card (RC)) or not using a quality reporting system (No RC). The circles at the origin of each branch are chance nodes, representing events that may or may not occur with a specified probability, depending on the use of RCs. The relevant population consists of providers and patients. Under the RC scenarios, providers are either in a high performing group (True+) or a lower-performing group (True−). However, the RC can either categorize that provider as high performing (Test+) or low performing (Test−). Ratio: ratio of practitioners to providers; p: probability that a provider is high quality; sens: sensitivity of the RC; spec: specificity of the RC; pswitch: proportion of patients who would switch from a low-quality provider to a high-quality provider based on information from the RC; µ: baseline utility; µ: change from baseline utility (magnitude of improvement in health-related quality of life) that would be expected to result from changing to a higher-quality provider. True positive is TP = . False positive is FP = . Solving the tree, we start with This reduces to This reduces to Making the following assumptions: , the equation further simplifies to .

Results

Assuming that a quality measurement system has perfect performance characteristics (sensitivity of 1 and specificity of 1) and assuming a large loss of utility for a provider who is labeled as “low quality,” we can identify a particular criterion in order for this measure to “do no harm”: the increase in health-related quality of life that would occur as a consequence of correctly identifying a higher rather than lower quality practitioner must be greater than the number of practitioners per patient divided by the proportion of patients willing to switch from a lower to a higher quality provider. For example, if there is 0.001 practitioners for every patient (corresponding to a practice panel of 1000 for a primary care physician), and the proportion of patients who would switch from a low-quality provider to a high-quality provider based on information from the quality measurement system is 0.1 (meaning that 10% would switch), this would imply that higher quality practitioners would need to confer a health-related quality-of-life improvement of at least 0.01 utility units compared to the lower quality practitioners (0.001 divided by 0.1). In sensitivity analyses in which we assume that practitioners sustain more mild decrements in utility from being labeled as “low quality,” results are still notable, with higher quality practitioners needing to confer a health-related quality-of-life improvement of at least 0.005 utility units compared to the lower-quality practitioners.

Discussion

This result has important implications for developers of quality measurement systems, for consumer groups, health systems, and payers: in situations in which implementation of the quality metric has not been demonstrated to improve health outcomes, the onus should be on proponents of using that metric to demonstrate that it “does no harm,” and this evaluation can be accomplished based on few inputs (Table 1): the proportion of patients who would switch practitioners, the ratio of practitioners to patients, and the magnitude of improvement in health-related quality of life that would be expected to result from higher rather than lower quality practitioners (represented as µ in Figure 1). The proportion of patients who would switch practitioners can be estimated via surveys, the ratio of practitioners to patients is well known by any health system, and the magnitude of improvement in health-related quality of life can be estimated using validated models.[20]

Table 1.

Number of patients per practitioner	Proportion of patients willing to switch practitioners based on quality data (%)	Minimum increase in health-related quality of life between higher and lower quality physicians necessary to avoid doing harm
200	5	0.10
500	5	0.04
1000	5	0.02
2000	5	0.01
200	10	0.05
500	10	0.02
1000	10	0.01
2000	10	0.005
200	20	0.025
500	20	0.01
1000	20	0.005
2000	20	0.0025

Calculations of how much of an improvement in health-related quality of life for higher versus lower quality practitioners would be necessary in order for the quality rating to “do no harm.” Note that these calculations assume the ideal scenario of a quality reporting system with a sensitivity and specificity of 1 of correctly identifying higher and lower quality practitioners. Health-related quality of life is expressed in terms of utility units, a preference-weighted quality-of-life metric on a scale of 0 (worst) to 1 (best). While a 0.01 increment in utility may seem like a very small number intuitively, it is important to note that utility is an overall health-related quality-of-life measure rather than a disease-related quality-of-life measure. For this reason, very few improvements in medical care produce substantially large changes in utility once averaged across the entire population. For example, if higher quality practitioners improved pain control for 10% of their population with chronic pain, and if this lowering resulted in an improvement of 0.03 utility units for those patients who are affected (a typical minimum change reflecting clinical significance),[21] then the higher quality practitioners would be increasing overall utility across their panel by 0.003 utility units, which would be insufficient to meet the criterion for quality measurement. Our base case calculations make the optimistic assumption that the quality measurement system has perfect sensitivity and specificity. In truth, sensitivity and specificity of many quality measurement systems will be far below 1 because of many factors, including inadequate risk adjustment[22,23] and insufficient statistical power. However, assumptions of perfect sensitivity and specificity often yield useful bounding analyses (i.e. if a quality reporting system causes harm even under the idealized assumption of perfect performance characteristics, it would also cause harm under actual performance characteristics). Additionally, the base case calculation is not grossly inaccurate even with more realistic sensitivity and specificity estimates. Across a wide range of sensitivity and specificity assumptions, the minimum difference in health for high- versus low-quality providers would vary between one and two times the number of practitioners per patient divided by the proportion of patients willing to switch from a lower to a higher quality provider (Table 1).

Limitations

We seek to delineate minimal standards for quality measures to meet, using the simplest assumptions to illustrate what those standards are. Sensitivity and specificity in real life will be lesser than 1 and may be difficult to estimate because of ambiguity regarding the best gold standard;[24] it might not always be necessary to do so. If a quality reporting system would cause harm even under the idealized assumption of perfect performance characteristics, it would also cause harm under actual performance characteristics. Quality metrics should be adjusted for those patient characteristics over which the practitioner and/or health system has locus of control, but not those characteristics over which the practitioner and/or health system does not have locus of control.[25] If this principle is disregarded, risk adjustment degenerates into a logistical rather than a scientific discussion, focused on the question of what data are routinely available for risk adjustment, rather than the question of the data’s suitability, completeness for risk adjustment, or position in the causal pathway of quality of care.[26] Indeed, these and other principles in quality metric formulation have been well described, and disregarding them out of convenience (e.g. using what data are available even if other unavailable data are important) merely increases the likelihood of doing harm. Consequently, it can be argued that practitioners who are going to be subject to a quality measurement themselves ought to make a list of patient characteristics that are likely to be associated with the quality outcome of interest and that peers regard as being within their locus of control, and these characteristics should be used as the adjustors.[27] A fair, explicit, and transparent procedure such as this not only reduces the likelihood that a quality metric may cause harm but also may encourage “buy in” from practitioners themselves. Other limitations of this approach to measuring quality involve an explicit consideration of the well-being of practitioners as well as the well-being of patients. It may be argued that health systems should only be concerned with optimizing the health of their subscribers. However, this is a short-sighted perspective. Practitioner noncompliance and burnout will ultimately have pernicious effects on the health system overall. Finally, it can be argued that our approach is too simple, merely dividing practitioners into two strata, one of higher performers and one of lower performers. However, our approach can be applied to more sophisticated quality measurement systems and stratifications, albeit with a commensurate increase in mathematical complexity.

Conclusion

Quality measurement systems that have not been demonstrated to improve health outcomes should be held to a specific standard of measurement accuracy. The hypothesized benefit in quality of life resulting from the higher quality outcomes should exceed the number of practitioners per patient divided by the proportion of patients willing to switch from a lower to a higher quality provider. However, the most important reason to develop such a standard is to hold those who seek to measure the performance of health-care providers to the same standard demanded of the practitioners themselves—do no harm.

25 in total

Review 1. Control charts in healthcare quality improvement. A systematic review on adherence to methodological criteria.

Authors: A Koetsier; S N van der Veer; K J Jager; N Peek; N F de Keizer
Journal: Methods Inf Med Date: 2012-04-05 Impact factor: 2.176

2. How report cards on physicians, physician groups, and hospitals can have greater impact on consumer choices.

Authors: Anna D Sinaiko; Diana Eastman; Meredith B Rosenthal
Journal: Health Aff (Millwood) Date: 2012-03 Impact factor: 6.301

3. Association of National Hospital Quality Measure adherence with long-term mortality and readmissions.

Authors: David M Shahian; Gregg S Meyer; Elizabeth Mort; Susan Atamian; Xiu Liu; Andrew S Karson; Lawrence D Ramunno; Hui Zheng
Journal: BMJ Qual Saf Date: 2012-03-02 Impact factor: 7.035

4. Partnering for quality.

Authors: Peter J Pronovost; Christine G Holzmueller
Journal: J Crit Care Date: 2004-09 Impact factor: 3.425

Review 5. Making performance indicators work: experiences of US Veterans Health Administration.

Authors: Eve A Kerr; Barbara Fleming
Journal: BMJ Date: 2007-11-10

Review 6. Evaluation criteria for report cards of healthcare providers.

Authors: Jesse D Schold
Journal: Adv Health Econ Health Serv Res Date: 2008

7. The 'Global Outcomes Score': a quality measure, based on health outcomes, that compares current care to a target level of care.

Authors: David M Eddy; Joshua Adler; Macdonald Morris
Journal: Health Aff (Millwood) Date: 2012-11 Impact factor: 6.301

8. Development of the Multidimensional Health Locus of Control (MHLC) Scales.

Authors: K A Wallston; B S Wallston; R DeVellis
Journal: Health Educ Monogr Date: 1978

9. Do bad report cards have consequences? Impacts of publicly reported provider quality information on the CABG market in Pennsylvania.

Authors: Justin Wang; Jason Hockenberry; Shin-Yi Chou; Muzhe Yang
Journal: J Health Econ Date: 2010-12-10 Impact factor: 3.883

10. The association of candidate mortality rates with kidney transplant outcomes and center performance evaluations.

Authors: Jesse D Schold; Titte R Srinivas; Richard J Howard; Ian R Jamieson; Herwig-Ulf Meier-Kriesche
Journal: Transplantation Date: 2008-01-15 Impact factor: 4.939

2 in total

Review 1. Measuring Antibiotic Stewardship Programmes and Initiatives: An Umbrella Review in Primary Care Medicine and a Systematic Review of Dentistry.

Authors: Leanne Teoh; Alastair J Sloan; Michael J McCullough; Wendy Thompson
Journal: Antibiotics (Basel) Date: 2020-09-16

2. What role does performance information play in securing improvement in healthcare? a conceptual framework for levers of change.

Authors: Jean-Frederic Levesque; Kim Sutherland
Journal: BMJ Open Date: 2017-08-28 Impact factor: 2.692

2 in total