| Literature DB >> 31308376 |
Jason A Fries1,2, Paroma Varma3, Vincent S Chen4, Ke Xiao5, Heliodoro Tejeda5, Priyanka Saha5, Jared Dunnmon4, Henry Chubb5, Shiraz Maskatia5, Madalina Fiterau4, Scott Delp6, Euan Ashley7,8, Christopher Ré4,8, James R Priest5,8.
Abstract
Biomedical repositories such as the UK Biobank provide increasing access to prospectively collected cardiac imaging, however these data are unlabeled, which creates barriers to their use in supervised machine learning. We develop a weakly supervised deep learning model for classification of aortic valve malformations using up to 4,000 unlabeled cardiac MRI sequences. Instead of requiring highly curated training data, weak supervision relies on noisy heuristics defined by domain experts to programmatically generate large-scale, imperfect training labels. For aortic valve classification, models trained with imperfect labels substantially outperform a supervised model trained on hand-labeled MRIs. In an orthogonal validation experiment using health outcomes data, our model identifies individuals with a 1.8-fold increase in risk of a major adverse cardiac event. This work formalizes a deep learning baseline for aortic valve classification and outlines a general strategy for using weak supervision to train machine learning models using unlabeled medical images at scale.Entities:
Mesh:
Year: 2019 PMID: 31308376 PMCID: PMC6629670 DOI: 10.1038/s41467-019-11012-3
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Weak supervision scale up performance metrics. Metrics include a positive predictive value (precision); b sensitivity (recall); c area under the ROC curve (AUROC); and d normalized discounted cumulative gain (NDCG). The y-axis is the score in [0,100] and the x-axis is the number of unlabeled MRIs used for training. The dashed horizontal line indicates the expert-labeled baseline model with augmentations. Shaded regions and gray horizontal lines indicate 95% confidence intervals (where n = the number of unlabeled training MRIs)
Best weak supervision vs. hand labeled models
| Model | Size | Precision | Recall | F1 | AUROC | NDCG |
|---|---|---|---|---|---|---|
| HL | 106 | 10.0 [1.3, 18.7] | 20.0 [5.4, 34.6] | 12.8 [2.5, 23.1] | 85.4 [80.8, 90.0] | 40.6 [36.4, 44.9] |
| HL + Aug. | 106 | 30.7 [20.8, 40.6] | 53.3 [38.7, 68.0] | 37.8 [27.7, 47.9] | 83.4 [79.5, 87.3] | 55.7 [51.5, 59.9] |
| WS | 4239 | 53.3 [38.7, 68.0] | 60.8 [50.6, 71.0] | 91.4 [87.8, 95.0] | 84.5 [81.1, 88.0] | |
| WS + Aug. | 4239 | 70.0 [55.4, 84.6] |
WS indicates weak supervision models, HL indicates hand-labeled models, and Aug. indicates augmentation. Scores are computed with 95% confidence intervals (where n = the size column), with bold text indicating best performance overall
Frame-level labeling function performance metrics
| Labeling functions | Coverage (%) | Conflict (%) | Pos. Acc. | Neg. Acc. | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| LF_Area | 22.6 | 11.5 | 76.5 | 62.9 | 25.0 | 31.0 | 27.7 |
| LF_Perimeter | 9.8 | 8.0 | 100.0 | 0.0 | 20.8 | 26.2 | 23.2 |
| LF_Eccentricity | 87.4 | 38.9 | 85.7 | 42.3 | 12.7 | 85.7 | 22.1 |
| LF_Intensity | 28.9 | 24.1 | 0.0 | 69.0 | 0.0 | 0.0 | 0.0 |
| LF_Ratio | 90.4 | 41.7 | 67.5 | 49.6 | 10.7 | 64.3 | 18.3 |
Fig. 2Unadjusted survival from MACE in 9230 participants stratified by model classification. MACE occurred in 59 of 570 individuals (10.4%) classified as BAV compared to 511 of 8660 individuals (5.9%) classified as TAV over the course of a median 19 years of follow up (hazard ratio 1.8; 95% confidence interval 1.3–2.4, p = 8.83e−05 log-rank test)
Fig. 3Patient clustering visualization. t-SNE visualization of the last hidden layer outputs of the CNN-LSTM model as applied to 9230 patient MRI sequences and a–d frames capturing peak flow through the aorta for a random sample of patients. Blue and orange dots represent TAV and BAV cases. The model clusters MRIs based on aortic shape and temporal dynamics captured by the LSTM. The top example box (a) contains clear TAV cases with very circular flow shapes, with (b) and (c) becoming more irregular in shape until (d) shows highly irregular flow typical of BAV. Misclassifications of BAV (red boxes) generally occur when the model fails to differentiate regurgitation of the aortic valve and turbulent blood flow through a normal appearing aortic valve orifice
Prediction set validation
| Q1 | Q2 | Q3 | Q4 | Overall | |
|---|---|---|---|---|---|
|
| 24% (6) | 28% (7) | 36% (9) | 24% (6) | 28% |
|
| 48% (12) | 44% (11) | 40% (10) | 56% (14) | 47% |
|
| 28% (7) | 28% (7) | 24% (6) | 20% (5) | 25% |
|
| 40% (10) | 44% (11) | 28% (7) | 36% (9) | 37% |
|
| 4% (1) | 8% (2) | 16% (4) | 16% (4) | 11% |
|
| 16% (4) | 4% (1) | 20% (5) | 24% (6) | 16% |
|
| 40% (10) | 40% (10) | 20% (5) | 36% (9) | 34% |
|
| 4% (1) | 4% (1) | 0% (0) | 4% (1) | 3% |
| Total Subjects | 25 | 25 | 25 | 25 | 100 |
Bold rows are disjoint category counts for true BAV, confounding non-BAV valve pathologies, and imaging artifacts. Italicized rows contain categories where counts may overlap with non-BAV valve pathologies and image artifacts
Fig. 4Example MRI sequence data for BAV and TAV subjects. a Uncropped MRI frames for CINE, MAG, and VENC series in an oblique coronal view of the thorax centered upon an en face view of the aortic valve at sinotubular junction (red boxes). b 15-frame subsequence of a phase-contrast MRI for all series, with peak frame outlined in blue. MAG frames at peak flow for 12 patients, broken down by class: c bicuspid aortic valve (BAV) and d tricuspid aortic valve (TAV)
Fig. 5Aorta localization. a Uncropped MAG series MRI frame, showing 0–1 normalized, per pixel standard deviation. b Green box is a zoom of the heart region and the red box corresponds to the aorta—the highest weighted pixel area in the image. c A box and whisker plot of per-frame standard deviations for all 4239 MRI sequences in the weak training set. Here the blue box represents the interquartile range of the first and third quartiles, the black line is the median value, and the whiskers map to the minimum and maximum values across all frames at a given index. Note the most variation occurs in the first 15 frames
Fig. 6Weak supervision workflow. Pipeline for probabilistic training label generation based on user-defined primitives and labeling functions. Primitives and labeling functions (step 1) are used to weakly supervise the BAV classification task and programmatically generate probabilistic training data from large collections of unlabeled MRI sequences (step 2), which are then used to train a noise-aware deep learning classification model (step 3)
Fig. 7Deep neural network for MRI sequence classification. Each MRI frame is encoded by the DenseNet into a feature vector f. These frame features are fed in sequentially to the LSTM sequence encoder, which uses a soft attention layer to learn a weighted mean embedding of all frames Semb. This forms the final feature vector used for binary classification