| Literature DB >> 35387650 |
Laura Oliva1, Eric Horlick2,3, Bo Wang2,3,4,5, Ella Huszti1,6, Ruth Hall1,7, Lusine Abrahamyan8,9,10.
Abstract
PURPOSE: Routinely collected administrative data is widely used for population-based research. However, although clinically very different, atrial septal defects (ASD) and patent foramen ovale (PFO) share a single diagnostic code (ICD-9: 745.5, ICD-10: Q21.1). Using machine-learning based approaches, we developed and validated an algorithm to differentiate between PFO and ASD patient populations within healthcare administrative data.Entities:
Keywords: Atrial; Foramen ovale; Machine learning; Patent; Septal defects; Septal occluder device
Mesh:
Year: 2022 PMID: 35387650 PMCID: PMC8988372 DOI: 10.1186/s12911-022-01837-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Exclusions from original ICES data linkage of transcatheter closures from CIHI-DAD/SDS to create study cohort (reference standard)
Demographic, Clinical Characteristics and secondary interventions of study cohort
| PFO (n = 697) | ASD (n = 785) | ||
|---|---|---|---|
| Sex (Female) ‒ n (%) | 305 (43.8) | 498 (63.4) | < 0.001 |
| Age group ‒ n (%) | < 0.001 | ||
| 18–60 | 542 (77.8) | 540 (68.8) | |
| > 60 | 155 (22.2) | 245 (31.2) | |
| Ischemic stroke (total number ≥ 1)—n (%) | 275 (39.5) | 29 (3.7) | < 0.001 |
| Hemorrhagic stroke (total number ≥ 1)—n (%) | < 61 | < 61 | 0.600 |
| TIA (total number ≥ 1)—n (%) | 67 (9.6) | 15 (1.9) | < 0.001 |
| Other CHD hospitalizations—n (%) | 144 (20.7) | 167 (21.3) | 0.821 |
| Peripheral embolism, pulmonary embolism, or DVT—n (%) | 40 (5.7) | 13 (1.7) | < 0.001 |
| Dyslipidemia—n (%) | < 61 | < 61 | 1.000 |
| Thrombophilia—n (%) | < 61 | < 61 | 0.918 |
| Migraine—n (%) | 81 (11.6) | 31 (3.9) | < 0.001 |
| Renal failure—n (%) | 12 (1.7) | 32 (4.1) | 0.012 |
| AF—n (%) | 50 (7.2) | 120 (15.3) | < 0.001 |
| CAD—n (%) | 114 (16.4) | 166 (21.1) | 0.022 |
| CHF—n (%) | 34 (4.9) | 63 (8.0) | 0.019 |
| COPD—n (%) | 93 (13.3) | 97 (12.4) | 0.625 |
| Diabetes—n (%) | 72 (10.3) | 106 (13.5) | 0.073 |
| HTN—n (%) | 258 (37.0) | 302 (38.5) | 0.601 |
| Fluoroscopy, heart NEC without contrast—n (%) | 20 (2.9) | 26 (3.3) | 0.734 |
| Thoracic cavity NEC—n (%) | 41 (5.9) | 17 (2.2) | < 0.001 |
| Intravenous contrast injection, coronary veins—n (%) | 127 (18.2) | 118 (15.0) | 0.114 |
| Intraarterial contrast injection, pulmonary artery—n (%) | 298 (42.8) | 343 (43.7) | 0.755 |
| Intracardiac contrast injection, pulmonary artery | 39 (5.6) | 10 (1.3) | < 0.001 |
| Steady state respiratory function study—n (%) | 134 (19.2) | 85 (10.8) | < 0.001 |
| Heart capacity measurement, oxygen consumption technique—n (%) | 123 (17.6) | 129 (16.4) | 0.581 |
| Pressure measurement—n (%) | 169 (24.2) | 318 (40.5) | < 0.001 |
| Ultrasound heart NEC, cardiac catheter inspection—n (%) | 52 (7.5) | 70 (8.9) | 0.356 |
| Heart and coronary artery ultrasound—n (%) | 55 (7.9) | 115 (14.6) | < 0.001 |
Ontario residents 18 years of age and older who had a transcatheter closure procedure for PFO or ASD between October 2002 and December 2017 (N = 1482) in the CorHealth Registry and CIHI Discharge Abstract Database and Same Day Surgery Database
AF atrial fibrillation, CAD coronary artery disease, CHD congenital heart disease, CHF congestive heart failure, COPD chronic obstructive pulmonary disease, DVT deep vein thrombosis, HTN hypertension, NEC not elsewhere classified by CCI/CCP codes, TIA transient ischemic attack
1Small cells (≤ 6 patients) were suppressed to comply with ICES privacy policies
2The 10 most frequent intervention codes beyond transcatheter closure
Description and performance of final random forest model to identify PFO patients
| Model | Description | Accuracy | Sensitivity | Specificity | ||||
|---|---|---|---|---|---|---|---|---|
| Test | Train | Test | Train | Test | Train | |||
| 7 | Age group Sex AF CAD CHF COPD DM HTN Migraine Other CHD admissions Emb.* | Number of events < 5 years prior to closure Ischemic stroke Hemorrhagic stroke TIA Top 10 (yes/no) | 0.946 | 0.756 | 0.908 | 0.657 | 0.978 | 0.848 |
| 7 (tuned) | Same variables as model 7 (above), but with hyperparameters tuned: mtry = 3 Classification threshold cut-off = 0.38,0.62 | 0.918 | 0.757 | 0.896 | 0.751 | 0.936 | 0.763 | |
*Emb. peripheral arterial embolism, pulmonary embolism, or deep vein thrombosis
Fig. 2Variable importance graph, based on mean decrease in Gini index
Comparison of classification performance between the “traditional model”, versus the random forest model that considered additional variables
| Model | Test accuracy | Test sensitivity | Test specificity |
|---|---|---|---|
| Traditional* | 0.68 | 0.36 | 0.96 |
| Final random forest model (original) | 0.76 | 0.66 | 0.85 |
| Final random forest model (tuned) | 0.76 | 0.75 | 0.76 |
*Patients were assigned as a PFO based only on ‘any stroke or TIA within 1 year of closure’, and if not, they were assigned as ASD