| Literature DB >> 35960155 |
Pulkit Singh1, Julian Haimovich2,3,4, Christopher Reeder1, Shaan Khurshid2,3,5, Emily S Lau3,4, Jonathan W Cunningham4,6, Anthony Philippakis1,7, Christopher D Anderson8,9,10, Jennifer E Ho4,11, Steven A Lubitz2,3,4,5, Puneet Batra1.
Abstract
BACKGROUND: Cardiac magnetic resonance imaging (CMR) is a powerful diagnostic modality that provides detailed quantitative assessment of cardiac anatomy and function. Automated extraction of CMR measurements from clinical reports that are typically stored as unstructured text in electronic health record systems would facilitate their use in research. Existing machine learning approaches either rely on large quantities of expert annotation or require the development of engineered rules that are time-consuming and are specific to the setting in which they were developed.Entities:
Keywords: cardiac MRI; clinical outcomes; deep learning; machine learning; natural language processing; transformers
Year: 2022 PMID: 35960155 PMCID: PMC9526125 DOI: 10.2196/38178
Source DB: PubMed Journal: JMIR Med Inform
Figure 1CONSORT (Consolidated Standards of Reporting Trials) diagram for study sample. CMR: cardiac magnetic resonance imaging; EWOC: Enterprise Warehouse of Cardiology.
Baseline characteristics of training the set, test set, and CMR outcomes set.
| Training set | Test set | CMRa outcomes setb
| ||||
| Age (years), median (Q1, Q3) | 54 (46, 64) | 58 (45, 66) | 57 (46, 67) | |||
| Female sex, n (%) | 95 (34.2) | 33 (33) | 3666 (39.5) | |||
| Diabetes mellitus, n (%) | 23 (8.3) | 10 (10) | 1216 (13.1) | |||
| Coronary artery disease, n (%) | 69 (24.8) | 31 (31) | 3406 (36.7) | |||
| Myocardial infarction, n (%) | 42 (15.1) | 15 (15) | 1791 (19.3) | |||
| Atrial fibrillation, n (%) | 104 (37.4) | 24 (24) | 3164 (34.1) | |||
| Obesity, n (%) | 12 (4.3) | 7 (7) | 631 (6.8) | |||
| Chronic kidney disease, n (%) | 26 (9.4) | 7 (7) | 1123 (12.1) | |||
| Hypertension, n (%) | 130 (46.8) | 55 (55) | 5563 (59.9) | |||
|
| ||||||
|
| White | 237 (85.3) | 93 (93) | 7814 (84.2) | ||
|
| Asian | 14 (5.0) | 1 (1) | 251 (2.7) | ||
|
| Black | 13 (4.7) | 2 (2) | 520 (5.6) | ||
|
| Other | 7 (2.5) | 1 (1) | 195 (2.1) | ||
|
| Hispanic | 4 (1.4) | 0 (0) | 111 (1.2) | ||
|
| Unknown | 3 (1.1) | 3 (3) | 390 (4.2) | ||
aCMR: cardiac magnetic resonance imaging.
bIncludes all individuals in Enterprise Warehouse of Cardiology with a CMR report.
Figure 2Example text from 3 cardiac magnetic resonance imaging reports (A,B,C) quantifying right ventricular function. The lack of consistency in how equivalent measurements are presented makes accurately extracting measurements challenging. Yellow highlighted features indicate right ventricular end diastolic volume (RVEDV), whereas blue highlighted features indicate right ventricular end diastolic volume index (RVEDVI). Example C does not contain the RVEDVI feature. EDV: end diastolic volume; EF: ejection fraction; ESV; end systolic volume; RVEF: right ventricular ejection fraction; RVESV: right ventricular end systolic volume; RVESVI: right ventricular end systolic volume index; RVSV: right ventricular stroke volume.
Numerical transformations for an example snippet of text.
| Transformation name | Transformed snippet | Notes |
| Original | RVESVa: 51.01 ml | No transformation; for reference |
| Replaced decimal | RVESV: 51|01 ml | Decimal points replaced with special separator character; enables parsing as a single token rather than being broken up |
| Consistent digits | RVESV: 051010 ml | All numbers converted to be 6 digits in length |
| Scientific notation | RVESV: 5.10100e+01 | All numbers converted to scientific notation, with 5 significant digits |
| Words | RVESV: fifty one point zero one ml | Number converted to corresponding word representation |
aRVESV: right ventricular end systolic volume.
Figure 3Architecture for fine-tuning pretrained transformer architecture with gold-standard cardiac resonance imaging annotations and predicting labels for each token. BERT: Bidirectional Encoder Representations from Transformers; ESV: end systolic volume.
Figure 4Natural language processing workflow for collecting clinical annotations, modeling, and extracting measurements from cardiac magnetic resonance imaging reports. BERT: Bidirectional Encoder Representations from Transformers; ESV: end systolic volume; CMR: cardiac magnetic resonance imaging; PRAnCER: Platform Enabling Rapid Annotation for Clinical Entity Recognition; RVEDV: right ventricular end diastolic volume; RVESV: right ventricular end systolic volume.
Maximum macroaveraged F1-scores and bootstrapped 95% CIs on gold-standard test labels by pretrained weight initialization and numerical representation.
| Architecture | Numerical representation, maximum macroaveraged | ||||
| Original | Replaced decimal | Consistent digits | Scientific | Words | |
| PubMedBERTa | 0.954 | 0.952 | 0.950 | 0.955b
| 0.953 |
| SapBERT | 0.955 | 0.954 | 0.955 | 0.955 | 0.956b
|
| Bio+Discharge | 0.950 | 0.953b
| 0.953 | 0.952 | 0.946 |
| BERTLARGE | 0.951 | 0.957b (0.951-0.962) | 0.951 | 0.944 | 0.952 |
aBERT: Bidirectional Encoder Representations from Transformers.
bBest-performing numerical representation for each pretrained weight initialization.
Figure 5Receiver operating characteristic curves for model predictions on the test set by cardiac magnetic resonance imaging measurement. AUC: area under the receiver operating characteristic curve.
Figure 6Fine-tuned BERTLARGE performance with replaced decimal numerical representations, as a function of number of annotated reports in the training set.
Figure 7Association of extracted left ventricular mass index, left ventricular ejection fraction, and right ventricular ejection fraction with clinical outcomes.