| Literature DB >> 32956069 |
Yasmeen Adar Almog1, Angshu Rai1, Patrick Zhang1, Amanda Moulaison1, Ross Powell1, Anirban Mishra1, Kerry Weinberg1, Celeste Hamilton2, Mary Oates3, Eugene McCloskey4, Steven R Cummings5.
Abstract
BACKGROUND: Fractures as a result of osteoporosis and low bone mass are common and give rise to significant clinical, personal, and economic burden. Even after a fracture occurs, high fracture risk remains widely underdiagnosed and undertreated. Common fracture risk assessment tools utilize a subset of clinical risk factors for prediction, and often require manual data entry. Furthermore, these tools predict risk over the long term and do not explicitly provide short-term risk estimates necessary to identify patients likely to experience a fracture in the next 1-2 years.Entities:
Keywords: AI; EHR; NLP; artificial intelligence; bone; deep learning; electronic health record; fracture; low bone mass; machine learning; natural language processing; osteoporosis; prediction
Mesh:
Year: 2020 PMID: 32956069 PMCID: PMC7600029 DOI: 10.2196/22550
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Sliding window algorithm schematic. This schematic depicts the sliding window algorithm for a multifracture and nonfracture patient. Dx:diagnosis; ICD: International Classification of Diseases.
Figure 22D projection of ICD-10 code embeddings from the ICD code vectorization model: (a) All ICD-10 codes by the first letter (high-level category) of the code, (b) a cluster of codes related to alcohol near coordinates (2.3, 3) by code subgroups, (c) a cluster of codes related to kidney function near coordinates (3.75, 0.025) by code subgroups, and all ICD-10 fracture codes in region C (d) by region of the body, and (e) by frequency of occurrence. ICD: International Classification of Diseases; UMAP: uniform manifold approximation and projection.
Figure 3High-level architecture of the long short-term memory neural network including the dimensionality of the inputs, as well as the number of nodes in each layer. Dx: diagnosis; Icd2vec: ICD code vectorization; LSTM: long short-term memory.
List of physician interventions for human-level performance analysis.
| Type and name | Pharmacologic | ||
|
|
| ||
|
| Dual-energy x-ray absorptiometry | No | |
|
| Vertebral fracture assessment | No | |
|
| Quantitative computed tomography | No | |
|
| Other bone density measurements (single energy x-ray absorptiometry, radiographic absorptiometry, ultrasound, single-photon absorptiometry) | No | |
|
| Bone turnover markers | No | |
|
| Administration of any medications referenced below | Yes | |
|
|
| ||
|
| Bisphosphonates (alendronate, alendronate-cholecalciferol, ibandronate, risedronate, zoledronic acid) | Yes | |
|
| Abaloparatide | Yes | |
|
| Denosumab | Yes | |
|
| Raloxifene | Yes | |
|
| Bazedoxifene | Yes | |
|
| Romosozumab | Yes | |
|
| Teriparatide | Yes | |
|
| Calcitonin | Yes | |
|
|
| ||
|
| Osteoporosis (M80, M81, 733.0) | No | |
Comparison of model performance metrics.
| Model | AUROCa | Recall | Specificity | Precision | AUPRCb |
| ICD code vectorization + LSTMc | 0.812 | 0.646 | 0.812 | 0.192 | 0.462 |
| Patient level vectorization + XGBoostd | 0.790 | 0.670 | 0.758 | 0.161 | 0.358 |
| Ensemble | 0.818 | 0.693 | 0.777 | 0.177 | 0.463 |
| Baseline (age, sex) | 0.667 | 0.787 | 0.416 | 0.0855 | 0.119 |
| Baseline (age, sex, diagnosis count) | 0.668 | 0.547 | 0.707 | 0.114 | 0.130 |
aAUROC: area under the receiver operating characteristics curve.
bAUPRC: area under the precision-recall curve.
cLSTM: long short-term memory.
dXGBoost: extreme gradient boosting.
Human-level performance results.
| Cohort | Windows, n (%) | Flag, n (%) | No flag, n (%) | |||||
|
| 630,445 (100) | —a | — | |||||
|
|
| 561,247 (89.0) | — | — | ||||
|
|
| Fracture | 28,626 (5.1) | 16,127 (56.3) | 12,449 (43.7) | |||
|
|
| Nonfracture | 532,621 (94.9) | 91,717 (17.2) | 440,904 (82.8) | |||
|
|
| 69,198 (11.0) | — | — | ||||
|
|
| Fracture | 12,244 (17.7) | 10,277 (83.9) | 1967 (16.1) | |||
|
|
| Nonfracture | 56,954 (82.3) | 19,235 (33.8) | 37,719 (66.2) | |||
aNot reported.
Figure 4Exploration of model interpretability by comparison of various characteristics of the input data for the 4 prediction cohorts of the confusion matrix. FN: false negative; FP: false positive; ICD: International Classification of Diseases;TN: true negative; TP: true positive; UMAP: uniform manifold approximation and projection.