| Literature DB >> 35310959 |
Karen E Batch1, Jianwei Yue1, Alex Darcovich1, Kaelan Lupton1, Corinne C Liu2, David P Woodlock2, Mohammad Ali K El Amine3, Pamela I Causa-Andrieu2, Lior Gazit4, Gary H Nguyen4, Farhana Zulkernine1, Richard K G Do2, Amber L Simpson1,5.
Abstract
The development of digital cancer twins relies on the capture of high-resolution representations of individual cancer patients throughout the course of their treatment. Our research aims to improve the detection of metastatic disease over time from structured radiology reports by exposing prediction models to historical information. We demonstrate that Natural language processing (NLP) can generate better weak labels for semi-supervised classification of computed tomography (CT) reports when it is exposed to consecutive reports through a patient's treatment history. Around 714,454 structured radiology reports from Memorial Sloan Kettering Cancer Center adhering to a standardized departmental structured template were used for model development with a subset of the reports included for validation. To develop the models, a subset of the reports was curated for ground-truth: 7,732 total reports in the lung metastases dataset from 867 individual patients; 2,777 reports in the liver metastases dataset from 315 patients; and 4,107 reports in the adrenal metastases dataset from 404 patients. We use NLP to extract and encode important features from the structured text reports, which are then used to develop, train, and validate models. Three models-a simple convolutional neural network (CNN), a CNN augmented with an attention layer, and a recurrent neural network (RNN)-were developed to classify the type of metastatic disease and validated against the ground truth labels. The models use features from consecutive structured text radiology reports of a patient to predict the presence of metastatic disease in the reports. A single-report model, previously developed to analyze one report instead of multiple past reports, is included and the results from all four models are compared based on accuracy, precision, recall, and F1-score. The best model is used to label all 714,454 reports to generate metastases maps. Our results suggest that NLP models can extract cancer progression patterns from multiple consecutive reports and predict the presence of metastatic disease in multiple organs with higher performance when compared with a single-report-based prediction. It demonstrates a promising automated approach to label large numbers of radiology reports without involving human experts in a time- and cost-effective manner and enables tracking of cancer progression over time.Entities:
Keywords: cancer; convolutional neural network (CNN); digital twins; machine learning; metastases; natural language processing (NLP); radiology; recurrent neural network (RNN)
Year: 2022 PMID: 35310959 PMCID: PMC8924403 DOI: 10.3389/frai.2022.826402
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Figure 1Example report of a chest CT following the template implemented in July 2009. The “Findings” section contains observations specific to each organ sites, while the “Impression” section can contain observations pertaining to any organ.
Model performance results for the baseline single-report metastases prediction model and the three novel multi-report metastases prediction models.
|
|
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| ||
| TF-IDF ensemble model (Baseline) | Accuracy | 99.69% (±0.15%) |
| 99.23% (±0.32%) | 92.33% (±1.53%) | 90.12% (±2.86%) | 96.60% (±1.43%) | 93.80% (±1.39%) | 92.50% (±2.53%) | 96.10% (±1.53%) |
| Precision | 0.9977 (±0.00) |
|
| 0.8553 (±0.02) | 0.9060 (±0.03) |
| 0.9080 (±0.02) | 0.8990 (±0.03) |
| |
| Recall | 0.9833 (±0.00) | 0.9983 (±0.00) | 0.8932 (±0.01) | 0.6733 (±0.03) | 0.7794 (±0.04) | 0.4595 (±0.04) | 0.6860 (±0.03) | 0.8310 (±0.04) | 0.5000 (±0.04) | |
| F1-score | 0.9904 (±0.00) |
| 0.9436 (±0.01) | 0.7535 (±0.02) | 0.8379 (±0.04) | 0.6182 (±0.04) | 0.7815 (±0.02) | 0.8637 (±0.03) | 0.6667 (±0.04) | |
| Simple CNN | Accuracy | 99.93% (±5.21%) | 99.85% (±7.59%) |
|
|
|
| 96.64% (±1.04%) | 98.56% (±1.14%) | 99.51% (±0.55%) |
| Precision | 0.9956 (±0.00) | 0.9950 (±0.00) |
|
| 0.9851 (±0.01) | 0.9429 (±0.02) |
| 0.9746 (±0.02) | 0.9592 (±0.02) | |
| Recall |
|
|
| 0.8960 (±0.02) | 0.9706 (±0.02) |
| 0.8564 (±0.02) | 0.9746 (±0.02) |
| |
| F1-score | 0.9978 (±0.00) | 0.9975 (±0.00) |
| 0.9234 (±0.02) | 0.9778 (±0.01) |
| 0.8920 (±0.02) | 0.9746 (±0.02) | 0.9691 (±0.02) | |
| Augmented CNN | Accuracy |
| 99.90% (±0.14%) | 99.97% (±0.06%) |
|
| 98.87% (±0.83%) | 96.81% (±1.01%) |
|
|
| Precision |
| 0.9966 (±0.00) | 0.9952 (±0.00) | 0.9388 (±0.01) | 0.9710 (±0.02) | 0.9167 (±0.02) | 0.9467 (±0.01) |
|
| |
| Recall |
|
|
|
|
|
| 0.8511 (±0.02) |
|
| |
| F1-score |
| 0.9983 (±0.00) | 0.9976 (±0.00) |
| 0.9781 (±0.01) | 0.9041 (±0.02) |
|
|
| |
| Bidirectional LSTM | Accuracy | 97.97% (±0.38%) | 99.23% (±0.39%) | 99.72% (±0.19%) | 96.66% (±1.03%) |
| 98.70% (±0.89%) |
| 98.32% (±1.23%) | 99.03% (±0.77%) |
| Precision | 0.9052 (±0.01) | 0.9798 (±0.01) | 0.9660 (±0.01) | 0.8465 (±0.02) |
| 0.8919 (±0.02) | 0.8404 (±0.02) | 0.9661 (±0.02) | 0.9375 (±0.02) | |
| Recall | 0.9366 (±0.01) | 0.9873 (±0.00) | 0.9803 (±0.01) | 0.8976 (±0.02) | 0.9781 (±0.01) |
|
| 0.9702 (±0.02) | 0.9375 (±0.02) | |
| F1-score | 0.9206 (±0.01) | 0.9835 (±0.01) | 0.9731 (±0.01) | 0.8713 (±0.02) |
| 0.8919 (±0.02) | 0.8717 (±0.02) | 0.9682 (±0.02) | 0.9375 (±0.02) | |
Organ datasets are split into three subsets for training (70%), testing (15%), and validation (15%). The n values correspond to the size of the sets. The highest values for each organ in each performance metric are bolded. Values in parentheses are within the 95% confidence interval rounded to two decimal places.
Figure 2The architectures of the three multi-report prediction models. (A) The Simple CNN architecture consisting of the embedding layer, 1D convolutional layer, max pooling layer, and dense layers. (B) The Augmented CNN, consisting of the same architecture as the Simple CNN with an added Attention Layer before the max pooling layer. (C) The Bi-LSTM, with the two LSTM layers processing inputs in opposite directions.
Figure 3Lung metastases developing on serial CT scans (arrows). Axial images of the lung from three consecutive CT scans, showing the development of a lung nodule, in the posterior right lower lobe (A–C). A separate nodule in the anterior right lower lobe also grew between the second (D) and third CT scan (E). A third nodule appeared in the left lower lobe on the third scan only (F). The first CT was negative for metastasis (A), with the text in the “Lungs” section of the findings reading “No suspicious findings.” The model predicted correctly that there were no metastases described with 100% confidence. The second CT (B,D) was labeled as positive for metastases by the radiologist who had access to all three scans, but negative by the CNN (with a confidence of 99.60%) which only had reports for the first two. The third CT (C,E,F) was labeled as positive by both radiologists and the CNN (with 100% confidence).