Literature DB >> 25774321

A bayesian approach to laboratory utilization management.

Ronald G Hauser¹, Brian R Jackson², Brian H Shirts³.

Abstract

BACKGROUND: Laboratory utilization management describes a process designed to increase healthcare value by altering requests for laboratory services. A typical approach to monitor and prioritize interventions involves audits of laboratory orders against specific criteria, defined as rule-based laboratory utilization management. This approach has inherent limitations. First, rules are inflexible. They adapt poorly to the ambiguity of medical decision-making. Second, rules judge the context of a decision instead of the patient outcome allowing an order to simultaneously save a life and break a rule. Third, rules can threaten physician autonomy when used in a performance evaluation.
METHODS: We developed an alternative to rule-based laboratory utilization. The core idea comes from a formula used in epidemiology to estimate disease prevalence. The equation relates four terms: the prevalence of disease, the proportion of positive tests, test sensitivity and test specificity. When applied to a laboratory utilization audit, the formula estimates the prevalence of disease (pretest probability [PTP]) in the patients tested. The comparison of PTPs among different providers, provider groups, or patient cohorts produces an objective evaluation of laboratory requests. We demonstrate the model in a review of tests for enterovirus (EV) meningitis.
RESULTS: The model identified subpopulations within the cohort with a low prevalence of disease. These low prevalence groups shared demographic and seasonal factors known to protect against EV meningitis. This suggests too many orders occurred from patients at low risk for EV.
CONCLUSION: We introduce a new method for laboratory utilization management programs to audit laboratory services.

Entities: Chemical

Keywords: Delivery of health care; efficiency; guideline adherence; health care; organization; physicians’ practice patterns; process assessment (health care); quality assurance; utilization review

Year: 2015 PMID： 25774321 PMCID： PMC4355837 DOI： 10.4103/2153-3539.151921

Source DB: PubMed Journal: J Pathol Inform

INTRODUCTION

The question can be raised, who benefits from which test, when, where, and at what cost?[1] From journals devoted to such diverse areas of medical practice including emergency medicine to medical education, coagulation, quality improvement, and HIV, a continued interest in the appropriateness of laboratory testing has existed from the 1940s to the present day.[234567] Pathologists and others who evaluate laboratory test utilization management, generally echo a common theme of improving patient care and decreasing medical costs.[89101112] The dominant method used to evaluate test utilization management involves a retrospective comparison of clinical practice guidelines to actual decisions.[7] We refer to this method as a rule-based approach because of its reliance on rules such as “patients taking warfarin should have at least one prothrombin time/international normalized ratio test within 60 days of beginning the drug.”[13] Because these rules have minimal ambiguity, they translate easily into database queries, which eliminate the need for chart review, an opinionated, time-consuming and costly endeavor.[14] In addition healthcare administrators, generally find such rules easy to implement because of their black-and-white interpretation, origin in empirical trials, and buy-in from physicians, whose professional societies and expert panels participate in their development. However, rule-based utilization management has limits, primarily due to its inability to monitor utilization in the ambiguous gray-zones of medical decision-making. Rule-based utilization management does not always perfectly reflect recognized guidelines and physicians, who at times see the simplicity of rules as overgeneralized and unrealistic, may justifiably be reluctant to follow strict rules.[15] A limit on the number of well-researched, clear-cut medical decisions may explain the difficulty of rule-based utilization management programs like the Centers for Medicare and Medicaid Services’ Clinical Quality Measures, or HEDIS to expand past a few hundred rules, even as the lower estimate on the number of medical scenarios exceeds this number by multiple orders of magnitude.[16] Rules may become more numerous and complex, but the rules-based approach shares many of the same limitations as another long-standing issue in the evaluation of medical decisions, the determination of pretest probability (PTP).[171819202122] Both attempt to probe the black-box that is the patient-physician relationship, whose nuances may not easily translate to structured fields in a database. A sufficiently advanced rule system may obviate the need for an autonomous physician, many of whom may have already left medical practice because autonomy is key to their job satisfaction.[23] Perhaps more importantly, utilization management rules, like pre-test probability, focus on the pretext of a decision rather than its outcome. An alternative approach would evaluate the outcome of patterns of behavior over time, not a single decision formally separated from its complete context. We sought to create a method for test utilization management that shares the strengths of the rule-based approach, namely its empiricism, simplicity and avoidance of manual chart review, while ameliorating at least some of its limitations, such as its difficulty with ambiguous decisions, infringement on provider autonomy, and emphasis on the ordering decision rather than its outcome.

METHODS

Method Explanation

Our approach to test utilization management closely follows medical decision-making theory.[24] In medical decision making the optimal strategy when faced with a diagnostic dilemma is to choose the option with the highest utility: nonintervention, test, or treatment. The PTP of disease in the patient under evaluation informs the choice. Figure 1 shows a prototypical example of the utility curve for each choice across a range of PTP. At low PTP, the patient has a low probability of disease, and nonintervention maximizes utility. The patient likely has the disease at high PTP and will benefit most from treatment. The patient's disease state has the most uncertainty and the test has the greatest expected utility between the extreme values of PTP. Along the continuum of PTP the three decisions (nonintervention, test, and treat) form two points of equivalent utility (nonintervention-test and test-treat). We define these two cutoff points as PTPLow and PTPHigh [Figure 1].

Figure 1

Deciding to test, a graph of pretest probability (PTP) versus expected utility. Utility curves for treatment, test, and nonintervention are shown as gray solid lines. The dotted line represents the maximum utility of the available choices. The cutoff PTPs were labeled PTPLow and PTPHigh. PTP: Pretest probability, EU: Expected utility The physician's estimate of the patient probability of disease (PTPEst) compared to PTPLow and PTPHigh determines their choice. The decision to not intervene (PTPEst < PTPLow), test (PTPLow ≤ PTPEst ≤ PTPHigh), and treat (PTPHigh < PTPEst) becomes as simple as evaluating an inequality. Our model takes an identical approach by defining the appropriate use of a test as an estimate of PTP between PTPLow and PTPHigh. If we obtain the constant values of PTPLow and PTPHigh from cost-efficacy analysis, an estimate of the physician's PTP PTPEst would allows us to determine their decision-making strategy. To determine the value of PTPEst in the patient population tested, whose value falls between PTPLow and PTPHigh for appropriate testing, we defer to the Rogan and Gladen equation.[25] The Rogan and Gladen equation provides an unbiased estimate of the prevalence of disease as a function of three parameters: The proportion of tests indicating disease (t), the sensitivity of the test (α) and its specificity (β).[25] The variance of PTPEst relates to the sample size (N).[25] Thus, as the sample size approaches a large number, PTPEst approaches the expected value of the true PTP. Although the original paper does not mention a connection between the Rogan and Gladen equation and Bayes theorem, they can be demonstrated to be mathematically equivalent [Appendix 1].

Method Application

Our model relies on cost-efficacy analysis to determine the PTP values (PTPLow, PTPHigh) for decision making. It is applicable whenever a reasonable cost efficacy analysis estimate is available. In its most general application one would evaluate the testing practices for an entire institution by determining if test outcomes are consistent with PTPs in the cost effective range. Actual PTPs and expected cost-effective ideal ranges would be reported as feedback to ordering providers. This feedback does not specify which specific ordering actions were noncompliant, but instead provides general feedback relating test orders to the results of those tests, allowing clinicians to re-calibrate their internal PTPs. There are several logical alternative applications of this method. Instead of stratifying the test results by ordering provider, they could be stratified by other variables, including disease risk factors or patient care setting with different PTP cutoffs for each group. For example, a provider who works in a clinic and an intensive care unit could receive separate feedback for each setting. For comparisons with smaller test volumes, the estimate of PTP is not biased because it normalizes test volume by using the positive rate (number of positives/total). However, the variance of PTPEst decreases with increasing sample size, which makes the conclusion of appropriate or inappropriate utilization management more robust with large samples. As physicians or health systems incorporate feedback about PTP in the context of their risk aversion strategies and situation-specific exposure to clinical scenarios, this feedback will inform their future decision making strategy without assigning blame to a specific clinical decision. Our model does not attempt to quantify the specific factors involved in the decision-making, but rather infers the provider's decision-making strategy from the estimate of PTP. In summary, we present a Bayesian approach to test utilization management that relates closely to decision-making theory.

Example Method Application

We sought to observe the appropriate or inappropriate utilization of a diagnostic test when applied to a cohort stratified by disease risk. This allowed us to analyze how the decision to test varied by patient risk factors. We selected a disease with known risk factors and its corresponding diagnostic test. Enterovirus (EV) causes seasonal meningitis that primarily affects infants during summer and autumn (June to November).[26] A common test for its diagnosis is real-time reverse transcriptase polymerase chain reaction (PCR) for EV RNA from cerebrospinal fluid (CSF) (EV-PCR). We obtained the test results from a cohort of patients, stratified them by age and month of testing, and determined if appropriate or inappropriate testing occurred with our model for each stratum. We then looked for patterns of utilization across the risk-stratified cohort to qualitatively determine the effect of population risk on the physician's decision to test.

Variables: Positive rate and disease risk factors

Two tertiary-care hospitals (Yale-New Haven Hospital, New Haven CT and the University of Washington Medical Center, St. Louis, MO) and one national reference laboratory (ARUP laboratories, Salt Lake City, UT) retrospectively contributed three data elements for each EV-PCR test performed between 2010 and 2012: The test results, patient age at the time of testing, and the month of order. Each institution employed a similar test methodology. The study used anonymized data and was determined to be human subjects exempt at participating institutions.

Constants: Low pretest probability, high pretest probability, sensitivity, and specificity

We obtained sensitivity, specificity, PTPLow, and PTPHigh through a systematic literature review. The review searched Medline with “EV [Mesh] NOT polio NOT poliovirus + English-only + Humans-only + Journal categories: Core clinical journals.” An author (RH) screened the resulting publications in two steps, first by title and abstract then by full text, to identify cost-efficacy studies with PTPLow, PTPHigh, sensitivity and specificity.

Analysis

We analyzed the data according to our model in steps. First, we stratified the test results by patient age and month of the order. The categories for age were < 1, 1, 2, 3–10, 11–20, 21–30, 31–40, 41–50, and > 50 years. We arrived at these age categories by modification of an age interval published by the Center for Disease Control on EV.[26] Second, we calculated the PTP (PTPEst) for each stratum. Third, we compared PTPEst to cost-effectiveness recommendations for testing (PTPLow, PTPHigh). We labeled as appropriate utilization each stratum with PTPEst between PTPLow and PTPHigh. Otherwise, the stratum received the label of inappropriate utilization. We performed the analysis separately for each individual site and again with data combined from all three sites.

Sensitivity analysis

To explore the robustness of the outcome, we repeated the analysis after modifying the test performance and cohort stratification. We altered the performance of the test by an increase in both sensitivity and specificity, and, in a separate trial, decease in both sensitivity and specificity. We also changed the data stratification by aggregating by season instead of month, and again by aggregating age at different intervals: <1, 1–4, 5–9, 10–19, 20–44, ≥45 years.[26]

RESULTS

Constants: Low Pretest Probability, High Pretest Probability, Sensitivity and Specificity

To identify the constants required by our model, we screened the 642 publications returned by our Medline search. We filtered the publications to 14 with the title and abstract review. We found one cost efficacy study after full-text review.[27] The study assessed the use of EV-PCR in infants with fever and CSF pleocytosis admitted from an emergency department. It estimated sensitivity at 95%, specificity at 99%, and the PTPLow to exist between 5.9% (PTPLow1) and 12.8% (PTPLow2). It did not provide a value for PTPHigh.

Comparison to Recommendations

We collected 16,648 samples from Yale-New Haven (n = 1197), University of Washington Medical Center (n = 1000) and ARUP laboratories (n = 14451). ARUP samples originated from 608 different hospitals and clinics across the United States. After stratifying the samples by age and season, we calculated the utilization management metrics of volume, positive rate, and PTPEst as shown in Table 1. We label each strata as overuse (PTPEst < PTPLow1 = 5.9%), equivocal (5.9% = PTPLow1 ≤ PTPEst ≤ PTPLow2 = 12.8%), or not overuse (12.8% = PTPLow2 < PTPEst) in Figure 2. Overall we observed testing with overuse or equivocal benefit in 81 of the 108 stratum (70.8% of total tests).

Table 1

Figure 2

Comparison of clinician behavior to cost-efficacy analysis recommendations. We compared the pretest probability estimates (PTPEst) from Table 1 to cost efficacy recommendations (PTPLow1, PTPLow2) to determine overuse (PTPEst < PTPLow1 = 5.9%), equivocal benefit (5.9% = PTPLow1 ≤ PTPEst ≤ PTPLow2 = 12.8%), or not overuse (12.8% = PTPLow2 < PTPEst)

Test results, positive rate and PTPEst by age and month. For each cell the top line represents positive results/total, and the second line the positive rate. We calculated the PTPEst in the third line with a sensitivity of 95% and a specificity of 99%. PTP: Pretest probability Comparison of clinician behavior to cost-efficacy analysis recommendations. We compared the pretest probability estimates (PTPEst) from Table 1 to cost efficacy recommendations (PTPLow1, PTPLow2) to determine overuse (PTPEst < PTPLow1 = 5.9%), equivocal benefit (5.9% = PTPLow1 ≤ PTPEst ≤ PTPLow2 = 12.8%), or not overuse (12.8% = PTPLow2 < PTPEst) Testing for EV occurred in low risk groups including people age 21 or older and the months of December to May. People 21 or older represented 58.2% of total tests, had a positive rate of 3.9%, a PTPEst of 3.1%, and 44 of their 48 stratum (54.2% of total tests) received a label of overuse or equivocal. Tests ordered from December to May represented 41.9% of total tests, had a positive rate of 4.5%, a PTPEst of 3.7%, and 50 of their 54 (40.5% of total tests) stratum received a label of overuse or equivocal. In contrast, the highest value testing occurred in people under 21 years of age and between June and November. They produced 28.0% of total tests, had a positive rate of 6.6%, a PTPEst of 5.9%, and 11 of their 30 stratum (1.7% of total tests) had a label of overuse or equivocal. We noticed an exception to the general age trend at 1-year of age. 1-year-old represented 2.1% (n = 357) of total tests, had a positive rate of 0.6%, a PTPEst of 0%, and all 12 monthly stratum received a label of overuse or equivocal. We observed the trends for season and age, including the trend for 1-year of age, across the three sites.

Sensitivity Analysis

For sensitivity analysis, we altered several constant parameters [Table 2]. To increase the performance of the test, we changed the sensitivity from 95% to 100% and specificity from 97% to 100%. We also decreased test performance by decreasing sensitivity from 95% to 92% and specificity from 99% to 97%. We changed the stratification of order date from month to season, and the stratification of age (original: <1, 1, 2, 3–10, 11–20, 21–30, 31–40, 41–50, >50; altered: <1, 1–4, 5–9, 10–19, 20–44, ≥45). We observed the largest change in tests labeled overuse or equivocal, a 2.7% decrease (32% to 29.3%) when altering the age stratification. The alternative age stratification grouped 1-year-old, a stratum shown to have low-value testing, with age one to four.

Table 2

Sensitivity analysis of percent overuse

DISCUSSION

We have created a unique approach to providing feedback on physician's ordering habits. Our method derives from decision-making theory, and we base the conclusion of appropriate or inappropriate testing on cost-efficacy analysis. Unlike a rule-based approach, which dictates a particular physician decision in a given medical context, this method evaluates patterns of utilization allowing the physician to tailor their decision-making strategy to the clinical context. It also differs from a rule-based utilization management in that it evaluates the consequence of the decision to test, specifically the positive or negative result, rather than the context in which the physician made the choice. Because it uses retrospective test results, the information needed by the model exists in nearly all clinical laboratories without the need for manual chart review. The method for utilization management presented here demonstrates a basic concept upon which future authors could expand. For example, we focused on a prototypical test, a test with a binary result and a diagnostic interpretation. A variation of our model may apply to tests informing prognosis or monitoring for side-effects of therapy. Another model variation could measure the significance of sample size differences among the strata as well as uncertainty in the sensitivity and specificity.[28] For a situation where a cost-efficacy analysis does not exist, our approach could be adapted to identify variations in utilization among groups. Like other statistical models, random variation may influence the model's conclusions. For example, a physician could test a population with a high PTP of disease, but a possibility exists where no patient has a positive result. Thus, the PTPEst would appear low because it depends on the proportion of positive results. The physician would also appear to over-utilize the test. Fortunately, the probability of this scenario decreases as the sample size increases. The increase in sample size decreases the variance of PTPEst. Each patient in the cohort of tested patients has an individual PTP, and the distribution of the PTP among the cohort represents a second important factor in the interpretation of the model. In a large sample, the PTPEst reflects the average PTP of the patients tested, but it does not provide information on the distribution of the patients’ PTP. If therefore PTPEst < PTPLow in a large population, it would be incorrect to conclude the majority of tests occurred in patients with too low of PTP. This statement assumes PTPEst represents the population median, which it does not. In contrast, the conclusion could state the average PTP of the cohort had a value less than the lower limit of the suggested PTP range. The systematic literature review to find a cost-efficacy analysis for EV-PCR found one paper that may overestimate the benefit of EV-PCR by supposing certain assumptions. First, the paper assumes a positive EV-PCR can rule-out bacterial meningitis faster than the gold standard diagnosis for bacterial meningitis, CSF culture. Second, it proposes EV-PCR had a turnaround time of 1-day. Third, patients with confirmed EV meningitis were assumed discharged in 1-day. When evaluated at one of the study's sites, three cases of positive CSF culture had a result within 1-day, while only half (15/30) of the positive cases of EV meningitis met the 1-day turnaround time and 1-day discharge time. The submission of a sample to a reference laboratory, as occurred in the majority of samples in this study, would presumably require longer than 1-day. Because our method views decisions over time, not in the moment, it can separate the utilization metric (PTPEst) from the judgment of the metric (i.e. appropriate, inappropriate). The modularity of our approach allows a future cost-efficacy study to re-evaluate our conclusions with different PTP cutoff values (PTPLow, PTPHigh). Similarly, a health care center could retrospectively trend its utilization knowing only its test results. We demonstrate our approach to utilization on a large retrospective cohort of patients tested by real-time reverse transcriptase EV-PCR. The sample, gathered over multiple years from sites across the United States, represents one of the largest published EV cohorts. By stratifying the cohort across age and season, we determined specific patient subgroups receiving low-value testing. As the next step, we could then provide quantitative feedback to inform physician decision making on individual patients with population-based concerns of resource allocation.

26 in total

1. Heuristics and biases: selected errors in clinical reasoning.

Authors: A S Elstein
Journal: Acad Med Date: 1999-07 Impact factor: 6.893

2. Choosing wisely: helping physicians and patients make smart decisions about their care.

Authors: Christine K Cassel; James A Guest
Journal: JAMA Date: 2012-04-04 Impact factor: 56.272

3. Utilization management in a large urban academic medical center: a 10-year experience.

Authors: Ji Yeon Kim; Walter H Dzik; Anand S Dighe; Kent B Lewandrowski
Journal: Am J Clin Pathol Date: 2011-01 Impact factor: 2.493

4. Comparison of administrative-only versus administrative plus chart review data for reporting HEDIS hybrid measures.

Authors: L Gregory Pawlson; Sarah Hudson Scholle; Anne Powers
Journal: Am J Manag Care Date: 2007-10 Impact factor: 2.229

5. The complexity of disease combinations in the Medicare population.

Authors: James Sorace; Hui-Hsing Wong; Chris Worrall; Jeffrey Kelman; Shahin Saneinejad; Thomas MaCurdy
Journal: Popul Health Manag Date: 2011-01-17 Impact factor: 2.459

Review 6. Do we now know what inappropriate laboratory utilization is? An expanded systematic review of laboratory clinical audits.

Authors: Ronald G Hauser; Brian H Shirts
Journal: Am J Clin Pathol Date: 2014-06 Impact factor: 2.493