| Literature DB >> 35643554 |
Kyle P Schuler1, Anna R Hemnes2, Jeffrey Annis3,4, Eric Farber-Eger3,4, Brandon D Lowery3,4, Stephen J Halliday5, Evan L Brittain6.
Abstract
BACKGROUND: Study of pulmonary arterial hypertension (PAH) in claims-based (CB) cohorts may facilitate understanding of disease epidemiology, however previous CB algorithms to identify PAH have had limited test characteristics. We hypothesized that machine learning algorithms (MLA) could accurately identify PAH in an CB cohort.Entities:
Keywords: Algorithm; Machine learning; Pulmonary hypertension
Mesh:
Year: 2022 PMID: 35643554 PMCID: PMC9145474 DOI: 10.1186/s12931-022-02055-0
Source DB: PubMed Journal: Respir Res ISSN: 1465-9921
Fig. 1Features of screening tool. The screening tool included PAH medications, including brand and generic names, current procedural terminology (CPT) codes for right heart catheterization and International Classification of Diseases (ICD) 9 or 10 codes for pulmonary hypertension. With final testing of the algorithm, characteristics of the algorithm that were tested included the strength, persistence and durability
Fig. 2Study flow diagram. PAH pulmonary arterial hypertension, Spec specificity, Sens sensitivity, PPV positive predictive value, NPV negative predictive value
Test characteristics of the RF algorithm for the Testing Algorithm in the Final Cohort
| Dataset | AUC | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|
| Test (n = 338) | 0.96 | 0.88 | 0.93 | 0.89 | 0.92 |
| Training (n = 1356) | 0.94 (0.94–0.95) | 0.85 (0.83–0.87) | 0.92 (0.91–0.92) | 0.87 (0.85–0.88) | 0.91 (0.91–0.92) |
For the cohort that was split into training and test sets (labeled “Final Test Set” and “Final Training Set” in Fig. 1). Values for training represent means and 95% confidence intervals based on 30 samples from tenfold cross validation repeated 3 times
Fig. 3Ranked features of final random forests algorithm. Importance of features of the algorithm are depicted. Strength was defined as the number of mentions throughout the medical record. Persistence was defined as the number of days that code or term stayed on the record. Durability was defined as the persistence divided by the length of the record after the first appearance on the record
Test characteristics of cases identified by the random forest algorithm when deployed on the Synthetic Derivative (including cases from the test set)
| Test Characteristics | RF Algorithm in SD | RF Algorithm SD Non-Cases | French Registry | REVEAL | UK-Ireland Registry |
|---|---|---|---|---|---|
| n | 265 | 2,270,971 | 674 | 2525 | 482 |
| Age (years) | 52.0 (13.7) | 52.0 (21.4) | 50.0 (15.0) | 50.1 (14.4) | 50.1 (17.1) |
| Sex, % female (n) | 72.1% (191) | 54.2% (1,253,660) | 65.3% | 79.5% | 69.9% |
| CTD (n) | 33.2% (88) | 2.0% (465) | 15.3% | 25% | – |
| CHD (n) | 19.2% (51) | 0.9% (19,950) | 11.3% | 10% | – |
| ERA (n) | 70.6% (187) | < 0.1% (26) | – | 47% | 44.2% |
| GS stimulators (n) | 1.5% (4) | < 0.1% (9) | – | – | – |
| PDE5 inhibitors (n) | 84.2% (223) | 0.4% (9,224) | – | 49% | 29.2% |
| Prostanoids (n) | 63.0% (167) | < 0.1% (2) | – | – | 18.8% |
| (n = 8806) | |||||
| RA pressure (mmHg) | 10.1 (6.0) | 8.8 (8.3) | 8 (5.) | 9.3 (5.6) | 10.1 (6.0) |
| Mean PA (mmHg) | 49.8 (13.6) | 27.7 (11.7) | 55 (15) | 50.7 (13.6) | 54.1 (13.9) |
| PWP (mmHg) | 11.3 (6.1) | 15.5 (8.3) | 8.0 (3) | 9.1 (3.5) | 9.2 (3.5) |
| CO, Fick (L/min) | 4.8 (2) | 5.6 (3.7) | – | – | 4.0 (1.5)** |
| CO, TD (L/min) | 4.8 (1.7) | 5.1 (2.4) | – | – | 4.0 (1.5)** |
| CI, Fick (L/min/M2) | 2.5 (1.0) | 2.9 (2.7) | 2.9 (0.9)* | 2.4 (0.8)* | 2.1 (6.3) |
| CI, TD (L/min/M2) | 2.5 (0.8) | 2.6 (2.2) | 2.9 (0.9)* | 2.4 (0.8)* | 2.1 (6.3) |
| PVR (Wood units) | 9.9 (5.9) | 2.6 (2.5) | *** | *** | 12.8 (6.3) |
SD synthetic derivative, CTD connective tissue disease, CHD congenital heart disease, ERA endothelin receptor antagonists, GC guanylate cyclase, PDE5 phosphodiesterase type 5, RA right atrial, PA pulmonary arterial, PWP pulmonary wedge pressure, CO cardiac output, TD thermodilution, CI cardiac index, PVR pulmonary vascular resistance. Data expressed as % (n) or mean (SD) unless otherwise noted. For French Registry [32], all values taken from Table 1. For REVEAL [34], demographics and hemodynamics are taken from Table 1, and medications are taken from Table 2. For UK-Ireland Registry [33], demographics and hemodynamics are taken from Table 1, and medications are taken from Table 2 (values added across all years, n = 479)
*For REVEAL, Fick CI was used unless it was missing, in which case thermodilution CI was used. For French Registry, CI method was not indicated
**Method not indicated
***PVRI was reported for French Registry (M = 20.5, SD = 10.2) and REVEAL (M = 21.1, SD = 12.5)