| Literature DB >> 32685578 |
Bruce Pyenson1, Maggie Alston1, Jeffrey Gomberg2, Feng Han3, Nikhil Khandelwal4, Motoharu Dei3, Monica Son3, Jaime Vora4.
Abstract
BACKGROUND: Exocrine pancreatic insufficiency (EPI) is a serious condition characterized by a lack of functional exocrine pancreatic enzymes and the resultant inability to properly digest nutrients. EPI can be caused by a variety of disorders, including chronic pancreatitis, pancreatic cancer, and celiac disease. EPI remains underdiagnosed because of the nonspecific nature of clinical symptoms, lack of an ideal diagnostic test, and the inability to easily identify affected patients using administrative claims data.Entities:
Keywords: case-finding technique; claims data analysis; exocrine pancreatic insufficiency (EPI); identifying/predicting undiagnosed EPI; machine learning; predictive modeling
Year: 2019 PMID: 32685578 PMCID: PMC7299452 DOI: 10.36469/9727
Source DB: PubMed Journal: J Health Econ Outcomes Res ISSN: 2326-697X
Terminology Used in Machine Learning and Statistical Analysis
| Machine Learning | Statistical Analysis |
|---|---|
| Feature | Explanatory variable |
| Confusion matrix | Contingency table of predicted and actual status |
| Recall | Sensitivity |
| Precision | Positive predictive value |
| F1 score | Harmonic mean of sensitivity and positive predictive value |
Performance Metrics Definitions
| Metric | Definition |
|---|---|
| Precision | Measure of model exactness; the ratio of successful model predictions over all cases predicted to be positive |
| Recall | Measure of model completeness; the ratio of successful model predictions over all cases that are actual positives |
| F1 score | Measure of model accuracy; the harmonic mean of precision and recall (reciprocal of the mean of the reciprocals of precision and recall) |
| Fbeta=10 score | Similar to F1 score with an additional parameter (beta) that assigns greater weight to the recall measure of the model |
| Positive-unlabeled score | Measure of model performance that is positively correlated with the F1 score |
| Brier score loss | Measure of the mean squared differences of the outcome and predictive probability |
Results of Baseline and High-performing Models
| Baseline | 1 | 2 | 3 | |
|---|---|---|---|---|
| Model Type | LASSO | Random Forest | Random Forest | Random Forest |
| Unlabeled data | Ignored unlabeled data | Assumed negative, ignored actual negative cases | Assumed negative, ignored actual negative cases | Assumed negative, ignored actual negative cases |
| Imbalanced data | N/A | Downsample, class weight | Downsample, subsample balanced weight | Repeated random subsampling |
| Validation method | 80% / 20% split validation | Nested cross validation | Nested cross validation | Nested cross validation |
| Optimized metric in hyperparameter selection | None | F(beta=10) using 100 random iterations | F(beta=10) × 100 + PU score using 100 random iterations | F(beta=10) × 100 + PU score using 60 random iterations |
| Fbeta=10 score (all data) | 0.32 | 0.71 | 0.71 | 0.72 |
| PU score (all data) | 0.93 | 9.22 | 12.45 | 10.69 |
| Recall (labeled data) | 0.92 | 0.81 | 0.77 | 0.80 |
| Brier score loss (labeled data) | 0.10 | 0.14 | 0.16 | 0.15 |
| Brier score loss (unlabeled data assumed negative) | 0.60 | 0.06 | 0.03 | 0.04 |
| F1 score (labeled data) | 0.90 | 0.86 | 0.84 | 0.86 |
| Precision (labeled data) | 0.88 | 0.91 | 0.93 | 0.91 |
| Probability of unlabeled cases to be labeled as positive | 0.93 | 0.07 | 0.04 | 0.06 |
LASSO: least absolute shrinkage and selection operator; PU: positive-unlabeled
Confusion Matrix of Model −3
| Predicted Condition | |||
|---|---|---|---|
| No EPI | EPI | ||
| 183 (True negatives) | 32 (False positives, type I error) | ||
| 82 (False negatives, type II error) | 336 (True positives) | ||
| 82 546 | 4961 | ||
Note: The 2×2 cells shaded in gray represent the confusion matrix
EPI: exocrine pancreatic insufficiency
Probability of EPI Assigned to Each Patient by Model 3
| Probability of EPI | Number of Patients | Predicted to have EPI |
|---|---|---|
| 0% – < 5% | 35868 | Not likely to have EPI |
| 5% – < 10% | 18022 | |
| 10% – < 15% | 11895 | |
| 15% – < 20% | 7470 | |
| 20% – < 25% | 4186 | |
| 25% – < 30% | 2248 | Possibly likely to have EPI |
| 30% – < 35% | 1085 | |
| 35% – < 40% | 711 | |
| 40% – < 45% | 671 | |
| 45% – < 50% | 655 | |
| 50% – < 55% | 719 | Likely to have EPI |
| 55% – < 60% | 666 | |
| 60% – < 65% | 678 | |
| 65% – < 70% | 884 | |
| 70% – < 75% | 1006 | |
| 75% – < 80% | 767 | Highly likely to have EPI |
| 80% – < 85% | 397 | |
| 85% – < 90% | 192 | |
| 90% – < 95% | 20 | |
| 95% – 100% | 0 | |
| 0% – < 25% | 77 441 | |
| 25% – < 50% | 5370 | |
| 50% – < 75% | 3953 | |
| 75% – < 100% | 1376 |
EPI: exocrine pancreatic insufficiency
Most Frequent Features in Patients with 10% to 20% EPI Probability
| Feature | Code Type | Portion of Patients with ≥1 Observations | Average Number of Observations for Patients with ≥1 Observations |
|---|---|---|---|
| Evaluation and Management | CPT | 100% | 15.81 |
| Pathology | CPT | 97% | 28.03 |
| Medicine | CPT | 90% | 17.72 |
| Special Encounters | DIAG | 78% | 24.07 |
| Other Symptoms | DIAG | 77% | 21.96 |
| Radiology | CPT | 74% | 5.85 |
| Cardiovascular | CPT | 73% | 3.84 |
| Metabolic | DIAG | 69% | 24.45 |
| Musculoskeletal | DIAG | 64% | 24.15 |
| Laboratory | REVCODE | 62% | 16.59 |
| Diabetes | DIAG | 56% | 41.46 |
| Hypertensive | DIAG | 56% | 23.94 |
| Ulcer Drugs | NDC | 55% | 4.96 |
| Other Analgesics | NDC | 55% | 4.56 |
| Antibiotics | NDC | 55% | 2.22 |
| Respiratory | DIAG | 54% | 14.80 |
| Other Special Encounters | DIAG | 52% | 7.98 |
| Insulin | NDC | 50% | 7.08 |
| Genitourinary | DIAG | 49% | 25.70 |
| Antihypertensives | NDC | 46% | 6.43 |
| Digestive | DIAG | 45% | 15.10 |
| Antihyperlipidemics | NDC | 44% | 6.39 |
| Pharmacy | REVCODE | 43% | 4.48 |
| Labs Vitamin Levels Test | CPT | 43% | 2.56 |
| Anesthesia | CPT | 42% | 1.93 |
| Mental | DIAG | 42% | 17.15 |
| Esophagus | DIAG | 40% | 12.62 |
| Radiology – Diagnostic | REVCODE | 40% | 2.07 |
| Antidepressants | NDC | 39% | 6.96 |
| Drugs Requiring Specific Identification | REVCODE | 38% | 6.54 |
CPT: current procedural terminology; DIAG: diagnosis; EPI: exocrine pancreatic insufficiency; NDC: national drug code; REVCODE: revenue code
Note: Observations were tabulated at the claim level
Most Frequent Features in Patients with 80% to 90% EPI Probability
| Feature | Code Type | Portion of Patients With ≥1 Observations | Average Number of Observations for Patients With ≥1 Observations |
|---|---|---|---|
| Evaluation and Management | CPT | 100% | 23.03 |
| Pathology | CPT | 96% | 35.25 |
| Medicine | CPT | 83% | 15.11 |
| Other Pancreatic Conditions | DIAG | 82% | 14.50 |
| Inflammatory Conditions Of Pancreas | DIAG | 81% | 44.52 |
| Special Encounters | DIAG | 76% | 34.87 |
| Other Symptoms | DIAG | 76% | 35.14 |
| Cardiovascular | CPT | 73% | 4.35 |
| Ulcer Drugs | NDC | 72% | 6.03 |
| Radiology | CPT | 69% | 6.53 |
| Other Analgesics | NDC | 67% | 10.23 |
| Symptoms (Abdominal And Pelvis) | DIAG | 65% | 49.05 |
| Laboratory | REVCODE | 65% | 23.98 |
| Musculoskeletal | DIAG | 63% | 23.38 |
| Digestive | DIAG | 58% | 42.34 |
| Metabolic | DIAG | 58% | 33.28 |
| Radiology Abdominal | CPT | 57% | 4.77 |
| Esophagus | DIAG | 55% | 23.38 |
| Mental | DIAG | 53% | 34.26 |
| Antibiotics | NDC | 53% | 2.25 |
| Other Special Encounters | DIAG | 53% | 9.54 |
| Respiratory | DIAG | 52% | 26.16 |
| Pharmacy | REVCODE | 50% | 9.30 |
| Digestive Surgery | CPT | 50% | 4.90 |
| Hypertensive | DIAG | 50% | 34.33 |
| Anesthesia | CPT | 49% | 2.82 |
| Genitourinary | DIAG | 48% | 26.69 |
| Emergency Room | REVCODE | 47% | 5.09 |
| Antidepressants | NDC | 45% | 7.59 |
| Drugs Requiring Specific Identification | REVCODE | 44% | 8.47 |
CPT: current procedural terminology; DIAG: diagnosis; EPI: exocrine pancreatic insufficiency; NDC: national drug code; REVCODE: revenue code
Note: Observations were tabulated at the claim level
Summary of Model 3 Feature Importance as Measured by Each Feature’s Contribution to Decrease in Gini Impurity
CPT: current procedural terminology; DIAG: diagnosis; NDC: national drug code; REVCODE: revenue code
Note: The sum of all features’ contributions to the decrease in Gini impurity = 1.00