| Literature DB >> 35318706 |
Wenbo Wu1,2, Yuan Yang3, Jian Kang1,2, Kevin He1,2.
Abstract
Provider profiling has been recognized as a useful tool in monitoring health care quality, facilitating inter-provider care coordination, and improving medical cost-effectiveness. Existing methods often use generalized linear models with fixed provider effects, especially when profiling dialysis facilities. As the number of providers under evaluation escalates, the computational burden becomes formidable even for specially designed workstations. To address this challenge, we introduce a serial blockwise inversion Newton algorithm exploiting the block structure of the information matrix. A shared-memory divide-and-conquer algorithm is proposed to further boost computational efficiency. In addition to the computational challenge, the current literature lacks an appropriate inferential approach to detecting providers with outlying performance especially when small providers with extreme outcomes are present. In this context, traditional score and Wald tests relying on large-sample distributions of the test statistics lead to inaccurate approximations of the small-sample properties. In light of the inferential issue, we develop an exact test of provider effects using exact finite-sample distributions, with the Poisson-binomial distribution as a special case when the outcome is binary. Simulation analyses demonstrate improved estimation and inference over existing methods. The proposed methods are applied to profiling dialysis facilities based on emergency department encounters using a dialysis patient database from the Centers for Medicare & Medicaid Services.Entities:
Keywords: Poisson-binomial distribution; divide-and-conquer; emergency department encounters; exact test; parallel computing
Mesh:
Year: 2022 PMID: 35318706 PMCID: PMC9314652 DOI: 10.1002/sim.9387
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.497
FIGURE 1(1) Runtime of SerBIN and glm with provider counts varying from 100 to 2000 (left). To accommodate large provider counts for glm, experiments were conducted on an Intel® Xeon® Gold 6254 quad‐processor with base frequency 3.1GHz and RAM 576GB. SerBIN was implemented using Rcpp and RcppArmadillo. , , Three covariates were included in model fitting with . The vertical axis is set as the base‐10 log scale. (2) Runtime of SerBIN and BAN with provider counts varying from 2000 to 8000 (middle). Experiments conducted on an Intel® CoreTM i9‐9900K processor with base frequency 3.6GHz and RAM 16GB. BAN was implemented using Rcpp and RcppArmadillo. A design matrix of 100 covariates was drawn based on (6), and then dichotomized column‐wise according to the column median. Regression parameters were jointly sampled from a standard multivariate normal distribution. (3) Speedup of DACBIN relative to SerBIN with various thread and provider counts (right). Speedup with a given number of threads is defined as the ratio of the runtime of SerBIN to the runtime of DACBIN. Experiments conducted on the Intel® CoreTM i9‐9900K processor with 100 covariates generated as in (2). DACBIN was implemented using Rcpp and RcppArmadillo
FIGURE 2Type I error rates and powers of exact, score and Wald tests. All values were calculated based on 1000 independent replicates with , , and significance level . With correlation varying from 0 to 0.9, rates in Panel A were obtained assuming . In Panel B, correlation was fixed at , whereas is allowed to vary in terms of relative deviation
FIGURE 3Coverage probability (CP) vs correlation with varying levels of provider effect . In each scenario, 1000 data sets are simulated with providers, with the first provider having subjects
Summary of model fitting for risk factors (binary) with 2018‐2019 ED visits data (reference group in parentheses)
| Risk factor | Count | Proportion | OR | SE | Z‐stat |
| LB | UB |
|---|---|---|---|---|---|---|---|---|
| Year 2018 | 381 400 | 50.4% | 0.970 | 0.007 |
|
| 0.958 | 0.982 |
| Female | 358 157 | 47.3% | 1.015 | 0.008 | 1.932 | 0.053 | 1.000 | 1.031 |
| Diabetes as cause of ESRD | 371 643 | 49.1% | 0.998 | 0.008 |
| 0.785 | 0.983 | 1.013 |
| Cardiogenic shocks | 99 201 | 13.1% | 0.879 | 0.010 |
|
| 0.862 | 0.896 |
| Age in years (60‐74) | ||||||||
| 18‐24 | 4034 | 0.5% | 1.542 | 0.042 | 10.330 |
| 1.420 | 1.674 |
| 25‐44 | 87 330 | 11.5% | 1.346 | 0.012 | 25.506 |
| 1.315 | 1.377 |
| 45‐59 | 204 969 | 27.1% | 1.176 | 0.008 | 19.025 |
| 1.156 | 1.195 |
| | 154 396 | 20.4% | 0.954 | 0.010 |
|
| 0.936 | 0.973 |
| BMI (18.5‐25) | ||||||||
| | 22 708 | 3.0% | 1.010 | 0.020 | 0.520 | 0.603 | 0.971 | 1.051 |
| 25‐30 | 198 852 | 26.3% | 1.002 | 0.009 | 0.214 | 0.831 | 0.984 | 1.020 |
| | 346 225 | 45.7% | 0.982 | 0.009 |
| 0.033 | 0.966 | 0.999 |
| Time on ESRD (1‐2 years) | ||||||||
| 91 days to 6 months | 33 355 | 4.4% | 1.121 | 0.018 | 6.337 |
| 1.082 | 1.162 |
| 6 months to 1 year | 59 437 | 7.9% | 1.019 | 0.015 | 1.293 | 0.196 | 0.990 | 1.048 |
| 2‐3 years | 98 224 | 13.0% | 1.001 | 0.012 | 0.049 | 0.961 | 0.976 | 1.025 |
| 3‐5 years | 160 276 | 21.2% | 1.009 | 0.011 | 0.833 | 0.405 | 0.987 | 1.031 |
| | 296 878 | 39.2% | 1.007 | 0.010 | 0.626 | 0.531 | 0.986 | 1.027 |
| LOHS (1st quartile) | ||||||||
| 2nd quartile | 230 587 | 30.5% | 0.945 | 0.009 |
|
| 0.930 | 0.961 |
| 3rd quartile | 131 203 | 17.3% | 0.923 | 0.010 |
|
| 0.905 | 0.942 |
| 4th quartile | 196 958 | 26.0% | 0.910 | 0.009 |
|
| 0.894 | 0.927 |
| NHS (0 day) | ||||||||
| 1‐89 days | 131 289 | 17.3% | 0.943 | 0.010 |
|
| 0.925 | 0.960 |
| 90‐365 days | 78 628 | 10.4% | 0.859 | 0.012 |
|
| 0.839 | 0.881 |
Note: LB and UB stand for lower and upper bounds of the 95% confidence intervals. A complete list of risk factors with summary statistics is available in Appendix E of the Supplementary Information.
Abbreviations: BMI, body mass index; ESRD, end‐stage renal disease; LOHS, length of hospital stay; NHS, nursing home stay (past 365 days); OR, odds ratio; PC, prevalent comorbidity; SE, standard error; Z‐stat, Z‐statistics (ratio of coefficient estimate to SE).
FIGURE 4A matrix of histograms and scatter plots of test statistics using 2018‐2019 ED visits data. Facilities are stratified by ED visit rate or discharge count. Dashed lines represent 2.5% and 97.5% quantiles of the standard normal distribution. 45‐degree lines are in solid black
Facility flagging (count/proportion) based on exact, score and Wald tests at significance level using 2018‐2019 ED visits data
| Score | Wald | ||||||
|---|---|---|---|---|---|---|---|
| Exact | better | expected | worse | Total | better | expected | worse |
| better | 426/5.89% | 63/0.87% | 0/0% | 489/6.76% | 366/5.06% | 123/1.70% | 0/0% |
| expected | 0/0% | 6024/83.30% | 82/1.13% | 6106/84.43% | 0/0% | 6079/84.06% | 27/0.37% |
| worse | 0/0% | 0/0% | 637/8.81% | 637/8.81% | 0/0% | 10/0.13% | 627/8.67% |
| Total | 426/5.89% | 6087/84.17% | 719/9.94% | 7232/100% | 366/5.06% | 6212%/85.90% | 654/9.04% |
Note: “better” indicates that the facility effect is significantly less than the national norm; “worse” indicates that the facility effect is significantly greater than the national norm; “expected” means that the facility effect is not significantly different from the national norm.
Exact‐test based facility flagging (count/proportion) with and without empirical null (EN) adjustment at significance level using 2018‐2019 ED visits data
| Exact text with EN | ||||
|---|---|---|---|---|
| Exact test without EN | better | expected | worse | Total |
| better | 140/1.94% | 349/4.82% | 0/0% | 489/6.76% |
| expected | 0/0% | 6106/84.43% | 0/0% | 6106/84.43% |
| worse | 0/0% | 389/5.38% | 248/3.43% | 637/8.81% |
| Total | 140/1.94% | 6844/94.63% | 248/3.43% | 7232/100% |
Note: “better” indicates that the facility effect is significantly less than the national norm; “worse” indicates that the facility effect is significantly greater than the national norm; “expected” means that the facility effect is not significantly different from the national norm.