| Literature DB >> 34836524 |
Md Mahmudul Hasan1, Gary J Young2, Jiesheng Shi1, Prathamesh Mohite1, Leonard D Young3, Scott G Weiner4, Md Noor-E-Alam5.
Abstract
BACKGROUND: Buprenorphine is a widely used treatment option for patients with opioid use disorder (OUD). Premature discontinuation from this treatment has many negative health and societal consequences.Entities:
Mesh:
Year: 2021 PMID: 34836524 PMCID: PMC8620531 DOI: 10.1186/s12911-021-01692-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Conceptual framework for the study design
Fig. 2Sample selection flow chart
List of given grids of hyper-parameters and selected hyper-parameters used in the machine learning models for the prediction of buprenorphine treatment discontinuation
| Treatment stage for making prediction | Machine learning model | Given hyperparameters | Selected hyperparameters |
|---|---|---|---|
| First stage models with baseline predictors | Logistic regression | solver: newton-cg, lbfgs, liblinear | solver: newton-cg |
| penalty: l1, l2, elasticnet, none | penalty: none | ||
| penalty/regularization strength, C: 0.001, 0.01, 0.1, 1, 10 | C: 0.001 | ||
| Decision tree | criterion: gini, entropy | criterion: gini | |
| min_samples_leaf: 10, 20, 30, 40, 50 | min_samples_leaf: 20 | ||
| Random forest | criterion: gini, entropy | criterion: gini | |
| min_samples_leaf: 5, 10, 20, 30, 40 | min_samples_leaf: 20 | ||
| n_estimators: 100, 200, 300, 400, 500 | n_estimators: 100 | ||
| Extreme gradient boosting | learning_rate: 0.0001, 0.001, 0.01, 0.1, 1 | learning_rate: 1 | |
| max_depth: 10, 20, 30, 40, 50 | max_depth: 40 | ||
| Neural network | activation: relu, tanh, sigmoid, hard_sigmoid, linear | activation: relu | |
| neurons: 10, 50, 100 | neurons: 100 | ||
| optimizer: SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam | optimizer: Nadam | ||
| epochs: 1, 10 | epochs: 10 | ||
| batch_size: 1000, 2000 | batch_size: 1000 | ||
| Support vector machine | degree: 3, 4, 5, 6 | degree: 3 | |
| gamma: 0.001, 0.01, 0.1 | gamma: 0.1 | ||
| C: 1, 10, 100 | C: 10 | ||
| Second stage models including 2 months PDC as continuous measure | Logistic regression | solver: newton-cg, lbfgs, liblinear | solver: liblinear |
| penalty: l1, l2, elasticnet, none | penalty: l2 | ||
| penalty/regularization strength, C: 0.001, 0.01, 0.1, 1, 10 | C: 0.1 | ||
| Decision tree | criterion: gini, entropy | criterion: entropy | |
| min_samples_leaf: 10, 20, 30, 40, 50 | Min_samples_leaf: 40 | ||
| Random forest | criterion: gini, entropy | criterion: gini | |
| min_samples_leaf: 5, 10, 20, 30, 40 | min_samples_leaf: 10 | ||
| n_estimators: 100, 200, 300, 400, 500 | n_estimators: 200 | ||
| Extreme gradient boosting | learning_rate: 0.0001, 0.001, 0.01, 0.1, 1 | learning_rate: 1 | |
| max_depth: 10, 20, 30, 40 | max_depth: 20 | ||
| Neural network | activation: relu, tanh, sigmoid, hard_sigmoid, linear | activation: linear | |
| neurons: 10, 50, 100 | neurons: 100 | ||
| optimizer: SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam | optimizer: RMSprop | ||
| epochs: 1, 10 | epochs: 10 | ||
| batch_size: 1000, 2000 | batch_size: 1000 | ||
| Support vector machine | degree: 3, 4, 5, 6 | degree: 3 | |
| gamma: 0.001, 0.01, 0.1 | gamma: 0.1 | ||
| C: 1, 10, 100 | C: 100 | ||
| Second stage models including 3 months PDC as continuous measure | Logistic regression | solver: newton-cg, lbfgs, liblinear | solver: liblinear |
| penalty: l1, l2, elasticnet, none | penalty: l2 | ||
| penalty/regularization strength, C: 0.001, 0.01, 0.1, 1, 10 | C: 0.01 | ||
| Decision tree | criterion: gini, entropy | criterion: gini | |
| min_samples_leaf: 10, 20, 30, 40, 50 | Min_samples_leaf: 40 | ||
| Random forest | criterion: gini, entropy | criterion: gini | |
| min_samples_leaf: 5, 10, 20, 30, 40 | min_samples_leaf: 10 | ||
| n_estimators: 100, 200, 300, 400, 500 | n_estimators: 100 | ||
| Extreme gradient boosting | learning_rate: 0.0001, 0.001, 0.01, 0.1, 1 | learning_rate: 1 | |
| max_depth: 10, 20, 30, 40 | max_depth: 30 | ||
| Neural network | activation: relu, tanh, sigmoid, hard_sigmoid, linear | activation: tanh | |
| neurons: 10, 50, 100 | neurons: 100 | ||
| optimizer: SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam | optimizer: Adam | ||
| epochs: 1, 10 | epochs: 10 | ||
| batch_size: 1000, 2000 | batch_size: 1000 | ||
| Support vector machine | degree: 3, 4, 5, 6 | degree: 3 | |
| gamma: 0.001, 0.01, 0.1 | gamma: 0.1 | ||
| C: 1, 10, 100 | C: 100 | ||
| Second stage models including 2 months PDC as categorical measure | Logistic regression | solver: newton-cg, lbfgs, liblinear | solver: liblinear |
| penalty: l1, l2, elasticnet, none | penalty: l2 | ||
| penalty/regularization strength, C: 0.001, 0.01, 0.1, 1, 10 | C: 0.01 | ||
| Decision tree | criterion: gini, entropy | criterion: gini | |
| min_samples_leaf: 10, 20, 30, 40, 50 | Min_samples_leaf: 30 | ||
| Random forest | criterion: gini, entropy | criterion: gini | |
| min_samples_leaf: 5, 10, 20, 30, 40 | min_samples_leaf: 10 | ||
| n_estimators: 100, 200, 300, 400, 500, 600 | n_estimators: 500 | ||
| Extreme gradient boosting | learning_rate: 0.0001, 0.001, 0.01, 0.1, 1 | learning_rate: 1 | |
| max_depth: 10, 20, 30, 40 | max_depth: 10 | ||
| Neural network | activation: relu, tanh, sigmoid, hard_sigmoid, linear | activation: tanh | |
| neurons: 10, 50, 100 | neurons: 100 | ||
| optimizer: SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam | optimizer: RMSprop | ||
| epochs: 1, 10 | epochs: 10 | ||
| batch_size: 1000, 2000 | batch_size: 2000 | ||
| Support vector machine | degree: 3, 4, 5, 6 | degree: 3 | |
| gamma: 0.001, 0.01, 0.1 | gamma: 0.1 | ||
| C: 1, 10, 100 | C: 10 | ||
| Second stage models including 3 months PDC as categorical measure | Logistic regression | solver: newton-cg, lbfgs, liblinear | solver: liblinear |
| penalty: l1, l2, elasticnet, none | penalty: l2 | ||
| penalty/regularization strength, C: 0.001, 0.01, 0.1, 1, 10 | C: 0.1 | ||
| Decision tree | criterion: gini, entropy | criterion: gini | |
| min_samples_leaf: 10, 20, 30, 40, 50 | Min_samples_leaf: 40 | ||
| Random forest | criterion: gini, entropy | criterion: entropy | |
| min_samples_leaf: 5, 10, 20, 30, 40 | min_samples_leaf: 10 | ||
| n_estimators: 100, 200, 300, 400, 500, 600 | n_estimators: 500 | ||
| Extreme gradient boosting | learning_rate: 0.0001, 0.001, 0.01, 0.1, 1 | learning_rate: 1 | |
| max_depth: 10, 20, 30, 40 | max_depth: 40 | ||
| Neural network | activation: relu, tanh, sigmoid, hard_sigmoid, linear | activation: relu | |
| neurons: 10, 50, 100 | neurons: 50 | ||
| optimizer: SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam | optimizer: RMSprop | ||
| epochs: 1, 10 | epochs: 10 | ||
| batch_size: 1000, 2000 | batch_size: 2000 | ||
| Support vector machine | degree: 3, 4, 5, 6 | degree: 3 | |
| gamma: 0.001, 0.01, 0.1 | gamma: 0.1 | ||
| C: 1, 10, 100 | C: 10 |
Characteristics of Massachusetts commercially insured patients with buprenorphine prescriptions and by treatment discontinuationa status
| Total | Continued | Discontinued | RR (95% CI) | ||
|---|---|---|---|---|---|
| Number of patients | 5190 | 2665 | 2525 | ||
| Patient characteristics | |||||
| Patient age | 36.4 ± 11.3 | 37.3 ± 10.9 | 35.6 ± 11.5 | N/A | < 0.001 |
| Patient income | 72,109 ± 13,087 | 71,581 ± 12,961 | 72,667 ± 13,199 | N/A | 0.003 |
| Patient gender | |||||
| Male | 3165 (61) | 1596 (59.9) | 1569 (62.1) | 0.95(0.89, 1.01) | 0.09 |
| Female | 2025 (39) | 1069 (40.1) | 956 (37.9) | Reference | Reference |
| Patient location | |||||
| Urban | 3582 (69) | 1793 (67.3) | 1789 (70.9) | Reference | Reference |
| Rural | 1608 (31) | 872 (32.7) | 736 (29.1) | 0.92 (0.86, 0.97) | 0.005 |
| Insurance type | |||||
| HMO | 1212 (23.4) | 521 (19.5) | 691 (27.4) | 1.27 (1.19, 1.35) | < 0.001 |
| PPO | 427 (8.2) | 187 (7) | 240 (9.5) | 1.25 (1.14, 1.37) | < 0.001 |
| Others | 3551 (68.4) | 1957 (73.5) | 1594 (63.1) | Reference | Reference |
| Provider specialty | |||||
| Primary care | 2951 (56.9) | 1537 (57.7) | 1414 (56) | 0.93 (0.87, 0.99) | 0.06 |
| Psychiatry | 1055 (20.3) | 553 (20.8) | 502 (19.9) | 0.92 (0.85, 1.01) | 0.04 |
| Others | 1184 (22.8) | 575 (21.5) | 609 (24.1) | Reference | Reference |
| Addiction treatment specialist | |||||
| No | 4831 (84.4) | 2481 (93.1) | 2350 (93.1) | Reference | Reference |
| Yes | 359 (15.6) | 184 (6.9) | 175 (6.9) | 1.00 (0.90, 1.12) | 0.97 |
| Pain specialist | |||||
| No | 5114 (98.5) | 2633 (98.8) | 2481 (98.3) | Reference | Reference |
| Yes | 76 (1.5) | 32 (1.2) | 44 (1.7) | 1.19 (0.98, 1.44) | 0.11 |
| Patients’ early treatment adherence | |||||
| Patients’ PDC within first two months (continuous measure) | 88.0 ± 25.6 | 95.3 ± 16.7 | 80.3 ± 30.6 | N/A | < 0.001 |
| Patients’ PDC within first three months (continuous measure) | 84.3 ± 28.9 | 94.2 ± 19.0 | 73.9 ± 33.6 | N/A | < 0.001 |
| PDC within first two months | |||||
| High ( | 4209 (81.1) | 2459 (92.3) | 1750 (69.3) | Reference | Reference |
| Low ( | 981 (18.9) | 206 (7.7) | 775 (30.7) | 1.90 (1.81, 1.99) | < 0.001 |
| PDC within first three months | |||||
| High ( | 3924 (75.6) | 2424 (91) | 1500 (59.4) | Reference | Reference |
| Low ( | 1266 (24.4) | 241 (9) | 1025 (40.6) | 2.11 (2.01, 2.22) | < 0.001 |
aTreatment discontinuation was defined as a gap of 60 consecutive days without a buprenorphine prescription
Differences were calculated for the patient-level characteristics with continuous measures using p-values computed from a t-test
Buprenorphine treatment discontinuation prediction performance of machine learning models at first and second stage of treatment
| Treatment stage for making prediction | Performance metrices | Decision tree | Random forest | Extreme gradient boosting | Logistic regression | Neural network | Support vector machine |
|---|---|---|---|---|---|---|---|
| First stage models with baseline predictors | Precision | 0.57 ± 0.01 | 0.56 ± 0.02 | 0.53 ± 0.02 | 0.56 ± 0.02 | 0.56 ± 0.04 | 0.57 ± 0.03 |
| Recall | 0.44 ± 0.02 | 0.49 ± 0.02 | 0.52 ± 0.03 | 0.38 ± 0.02 | 0.42 ± 0.05 | 0.38 ± 0.03 | |
| F1 score | 0.49 ± 0.02 | 0.53 ± 0.02 | 0.52 ± 0.03 | 0.45 ± 0.01 | 0.47 ± 0.03 | 0.38 ± 0.02 | |
| C-statistics | 0.58 ± 0.01 | 0.59 ± 0.02 | 0.55 ± 0.02 | 0.57 ± 0.02 | 0.57 ± 0.02 | 0.56 ± 0.02 | |
| Second stage models with 2 months PDC* and baseline predictors | Precision | 0.67 ± 0.02 | 0.68 ± 0.01 | 0.59 ± 0.02 | 0.69 ± 0.03 | 0.67 ± 0.06 | 0.66 ± 0.03 |
| Recall | 0.54 ± 0.04 | 0.55 ± 0.03 | 0.59 ± 0.03 | 0.46 ± 0.02 | 0.46 ± 0.05 | 0.51 ± 0.01 | |
| F1 score | 0.60 ± 0.03 | 0.61 ± 0.02 | 0.59 ± 0.02 | 0.55 ± 0.02 | 0.58 ± 0.01 | 0.58 ± 0.01 | |
| C-statistics | 0.69 ± 0.01 | 0.71 ± 0.01 | 0.65 ± 0.02 | 0.68 ± 0.02 | 0.67 ± 0.02 | 0.68 ± 0.02 | |
| Second stage models with 3 months PDC* and baseline predictors | Precision | 0.69 ± 0.02 | 0.71 ± 0.02 | 0.62 ± 0.02 | 0.74 ± 0.03 | 0.73 ± 0.05 | 0.72 ± 0.07 |
| Recall | 0.62 ± 0.04 | 0.60 ± 0.03 | 0.63 ± 0.04 | 0.40 ± 0.01 | 0.50 ± 0.04 | 0.49 ± 0.07 | |
| F1 score | 0.65 ± 0.03 | 0.65 ± 0.03 | 0.62 ± 0.03 | 0.40 ± 0.01 | 0.58 ± 0.02 | 0.58 ± 0.04 | |
| C-statistics | 0.74 ± 0.02 | 0.75 ± 0.03 | 0.69 ± 0.03 | 0.70 ± 0.02 | 0.71 ± 0.02 | 0.72 ± 0.03 |
*PDC is included in the model as a continuous variable
Each performance metric is reported as mean and standard deviation of the five values obtained from each of the five folds of cross validation
Fig. 3ROC curve (left) and precision-recall curve (right) for the 1st-stage model including baseline predictors (Panel A); ROC curve (left) and precision-recall curve (right) for the 2nd-stage model including baseline predictors and two-months PDC as a continuous variable (Panel B); ROC curve (left) and precision-recall curve (right) for the 2nd-stage model including baseline predictors and three-months PDC as a continuous variable (Panel C)
Buprenorphine treatment discontinuation prediction performance of machine learning models at first and second stage of treatment
| Treatment stage for making prediction | Performance metrices | Decision tree | Random forest | Extreme gradient boosting | Logistic regression | Neural network | Support vector machine |
|---|---|---|---|---|---|---|---|
| First stage models with baseline predictors | Precision | 0.57 ± 0.01 | 0.56 ± 0.02 | 0.53 ± 0.02 | 0.56 ± 0.02 | 0.56 ± 0.04 | 0.57 ± 0.03 |
| Recall | 0.44 ± 0.02 | 0.49 ± 0.02 | 0.52 ± 0.03 | 0.38 ± 0.02 | 0.42 ± 0.05 | 0.38 ± 0.03 | |
| F1 score | 0.49 ± 0.02 | 0.53 ± 0.02 | 0.52 ± 0.03 | 0.45 ± 0.01 | 0.47 ± 0.03 | 0.38 ± 0.02 | |
| C-statistics | 0.58 ± 0.01 | 0.59 ± 0.02 | 0.55 ± 0.02 | 0.57 ± 0.02 | 0.57 ± 0.02 | 0.56 ± 0.02 | |
| Second stage models with 2 months PDC* and baseline predictors | Precision | 0.66 ± 0.02 | 0.67 ± 0.02 | 0.58 ± 0.02 | 0.76 ± 0.02 | 0.67 ± 0.07 | 0.66 ± 0.03 |
| Recall | 0.50 ± 0.04 | 0.55 ± 0.01 | 0.57 ± 0.02 | 0.37 ± 0.01 | 0.51 ± 0.12 | 0.52 ± 0.01 | |
| F1 score | 0.57 ± 0.02 | 0.60 ± 0.01 | 0.58 ± 0.02 | 0.49 ± 0.01 | 0.55 ± 0.05 | 0.58 ± 0.01 | |
| C-statistics | 0.67 ± 0.01 | 0.69 ± 0.01 | 0.63 ± 0.02 | 0.67 ± 0.02 | 0.67 ± 0.02 | 0.64 ± 0.01 | |
| Second stage models with 3 months PDC* and baseline predictors | Precision | 0.67 ± 0.01 | 0.70 ± 0.02 | 0.62 ± 0.02 | 0.78 ± 0.03 | 0.72 ± 0.04 | 0.77 ± 0.06 |
| Recall | 0.58 ± 0.04 | 0.59 ± 0.02 | 0.61 ± 0.01 | 0.44 ± 0.02 | 0.53 ± 0.03 | 0.44 ± 0.06 | |
| F1 score | 0.63 ± 0.02 | 0.63 ± 0.02 | 0.61 ± 0.01 | 0.56 ± 0.02 | 0.61 ± 0.02 | 0.56 ± 0.04 | |
| C-statistics | 0.71 ± 0.02 | 0.73 ± 0.02 | 0.68 ± 0.02 | 0.70 ± 0.02 | 0.70 ± 0.03 | 0.67 ± 0.02 |
*PDC is included in the model as a dichotomous variable
Each performance metric is reported as mean and standard deviation of the five values obtained from each of the five folds of cross validation
Fig. 6ROC curve (left) and precision-recall curve (right) for the 2nd-stage model including baseline predictors and two-months PDC as dichotomous variable (Panel A); ROC curve (left) and precision-recall curve (right) for the 2nd-stage model including baseline predictors and three-months PDC as a dichotomous variable (Panel B)
Fig. 4Variable importance plots using SHAP values from extreme gradient boosting model including two-months PDC (Panel A) and three-months PDC (Panel B) as a continuous measure; SHAP value computed from individual feature’s values and their impact (both positive and negative) on treatment discontinuation (left in Panel A and Panel B); Average SHAP values of features showing average impact on and correlation with treatment discontinuation (right in Panel A and Panel B)
Fig. 7Variable importance plots using SHAP values from extreme gradient boosting model including two-months PDC (Panel A) and three-months PDC (Panel B) as a dichotomous variable; SHAP value computed from individual feature’s values and their impact (both positive and negative) on treatment discontinuation (left in Panel A and Panel B); Average SHAP values of features showing average impact on and correlation with treatment discontinuation (right in Panel A and Panel B
Fig. 5Decision classification rules for the prediction of treatment discontinuation using decision tree model (PDC considered as a continuous variable)
Fig. 8Decision classification rules for prediction of treatment discontinuation using decision tree model (PDC considered as a dichotomous variable)