| Literature DB >> 35621895 |
Grey Kuling1, Belinda Curpen2, Anne L Martel1.
Abstract
Radiology reports are one of the main forms of communication between radiologists and other clinicians, and contain important information for patient care. In order to use this information for research and automated patient care programs, it is necessary to convert the raw text into structured data suitable for analysis. State-of-the-art natural language processing (NLP) domain-specific contextual word embeddings have been shown to achieve impressive accuracy for these tasks in medicine, but have yet to be utilized for section structure segmentation. In this work, we pre-trained a contextual embedding BERT model using breast radiology reports and developed a classifier that incorporated the embedding with auxiliary global textual features in order to perform section segmentation. This model achieved 98% accuracy in segregating free-text reports, sentence by sentence, into sections of information outlined in the Breast Imaging Reporting and Data System (BI-RADS) lexicon, which is a significant improvement over the classic BERT model without auxiliary information. We then evaluated whether using section segmentation improved the downstream extraction of clinically relevant information such as modality/procedure, previous cancer, menopausal status, purpose of exam, breast density, and breast MRI background parenchymal enhancement. Using the BERT model pre-trained on breast radiology reports, combined with section segmentation, resulted in an overall accuracy of 95.9% in the field extraction tasks. This is a 17% improvement, compared to an overall accuracy of 78.9% for field extraction with models using classic BERT embeddings and not using section segmentation. Our work shows the strength of using BERT in the analysis of radiology reports and the advantages of section segmentation by identifying the key features of patient factors recorded in breast radiology reports.Entities:
Keywords: BERT; BI-RADS; deep learning; natural language processing
Year: 2022 PMID: 35621895 PMCID: PMC9148091 DOI: 10.3390/jimaging8050131
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1This project aims to improve health indicator field extraction tasks by using section segmentation to narrow down the free-text report length. When a radiologist reads a report, they can divide the report into sections that are useful for finding specific information. With a classic BERT framework, the report is fed into the model without narrowing the report into sections, resulting in some confusion as to where the information is located. Using a BI-RADS BERT model to segment sections before field extraction, we achieve a higher performance.
Dataset statistics.
| Dataset Name | Number of | Number of | Avg. Exams ± St.D. | Exam |
|---|---|---|---|---|
| Patients | Exams | per Patient | Date Range | |
| Breast Imaging | 7917 | 80,648 | 10.2 ± 7.0 | 2005–2020 |
| Radiology Reports A | ||||
| Breast Imaging | 26,390 | 98,748 | 3.7 ± 3.0 | 2014–2018 |
| Radiology Reports B |
Figure 2Visual representation of the model architectures used for classification. (A) Text sequence classifier: this model takes a contextual embedding of the input text using a BERT architecture and then feeds the embedding into a fully connected linear layer to output a classification. (B) Text sequence classifier with auxiliary data: this model uses a auxiliary feature encoder to build an encoded auxiliary data vector that is concatenated with the contextual embedding to use for classification. (C) Auxiliary data encoder architecture: this encoder architecture include 3 fully connected layers followed by a Tanh activation function.
Figure 3Histograms of the labels for each field that is extracted from the breast radiology reports fine-tuning dataset. We can see that each task suffers from a dominating label, which makes G.F1 better at quantifying performance over accuracy.
Results of 5-fold cross validation section segmentation, evaluated with average accuracy with standard deviation (Std.Dev.) and average G.F1 with standard deviation (Std.Dev.) across all 900 reports used for fine-tuning. Aux. Data = the classification of the previous sentence in the report, the number of the given sentence it is classifying, and the total number of sentences in the report.
| Base Model Fine-Tuned | Avg. Acc. | Avg. G.F1 | |
|---|---|---|---|
| ± Std.Dev. | ± Std.Dev. | ||
| Without Aux. Data | Classic BERT |
|
|
| BioClinical BERT |
|
| |
| BI-RADS BERT |
|
| |
| With Aux. Data | Classic BERT |
|
|
| BioClinical BERT |
|
| |
| BI-RADS BERT |
|
|
Results of 5-fold cross validation section segmentation when trained using 10% of training data, evaluated with average accuracy with standard deviation (Std.Dev.) and average G.F1 with standard deviation (Std.Dev.) across all 900 reports used for fine-tuning. Aux. Data = the classification of the previous sentence in the report, the number of the given sentence it is classifying, and the total number of sentences in the report.
| Base Model Fine-Tuned | Avg. Acc. | Avg. G.F1 | |
|---|---|---|---|
| ± Std.Dev. | ± Std.Dev. | ||
| Without Aux. Data | Classic BERT |
|
|
| BioClinical BERT |
|
| |
| BI-RADS BERT |
|
| |
| With Aux. Data | Classic BERT |
|
|
| BioClinical BERT |
|
| |
| BI-RADS BERT |
|
|
Results of 5-fold cross validation field extraction without section segmentation, evaluated with average accuracy with standard deviation (Std.Dev.) and average G.F1 with standard deviation (Std.Dev.) across all 5 folds.
|
| |||
|---|---|---|---|
| Classic | BioClinical | BI-RADS | |
| Modality/Procedure |
|
|
|
| Previous Cancer |
|
|
|
| Menopausal Status |
|
|
|
| Purpose |
|
|
|
| Density |
|
|
|
| BPE |
|
|
|
Results of 5-fold cross validation field extraction with section segmentation, evaluated with average accuracy with standard deviation (Std.Dev.) and average G.F1 with standard deviation (Std.Dev.) across all 5 folds.
| SL | Section | Training Set |
| |||
|---|---|---|---|---|---|---|
| Used | Size ( | Classic | BioClinical | BI-RADS | ||
| Modality/Procedure | 128 | Title | 900 |
|
|
|
| Previous Cancer | 32 | History/Cl. Ind. | 613 |
|
|
|
| Menopausal Status | 128 | History/Cl. Ind. | 613 |
|
|
|
| Purpose | 32 | History/Cl. Ind. | 613 |
|
|
|
| Density | 32 | Findings | 897 |
|
|
|
| BPE | 32 | Findings | 897 |
|
|
|
Example of WordPiece tokenizer results for the word "mammogram".
| Model | WordPiece Tokenizer Vector |
|---|---|
| Classic |
|
| BioClinical |
|
| BI-RADS |
|
Results of field extraction with section segmentation.
| Task | Section Used | Data Size ( | Acc. (G. F1) of BERT Model | ||
|---|---|---|---|---|---|
| Modality Procedure | Title | 900 | Classic | BioClinical | BI-RADS |
| Max Seq = 32 |
|
|
| ||
| Max Seq = 128 |
|
|
| ||
| Max Seq = 512 |
|
|
| ||
| Previous Cancer | History/Cl. Indication | 613 | Classic | BioClinical | BI-RADS |
| Max Seq = 32 |
|
|
| ||
| Max Seq = 128 |
|
|
| ||
| Max Seq = 512 |
|
|
| ||
| Menopausal Status | History/Cl. Indication | 613 | Classic | BioClinical | BI-RADS |
| Max Seq = 32 |
|
|
| ||
| Max Seq = 128 |
|
|
| ||
| Max Seq = 512 |
|
|
| ||
| Purpose | History/Cl. Indication | 613 | Classic | BioClinical | BI-RADS |
| Max Seq = 32 |
|
|
| ||
| Max Seq = 128 |
|
|
| ||
| Max Seq = 512 |
|
|
| ||
| Density | Findings | 897 | Classic | BioClinical | BI-RADS |
| Max Seq = 32 |
|
|
| ||
| Max Seq = 128 |
|
|
| ||
| Max Seq = 512 |
|
|
| ||
| BPE | Findings | 897 | Classic | BioClinical | BI-RADS |
| Max Seq = 32 |
|
|
| ||
| Max Seq = 128 |
|
|
| ||
| Max Seq = 512 |
|
|
|
Bonferonni-corrected Mann–Whitney U test results for section segmentation without auxiliary data.
| Group1 | Group2 | Stat | Pval | Pval Corrected | Reject |
|---|---|---|---|---|---|
| BI-RADS | BioClinical | 352,753.0 | 0.0 | 0.0 | True |
| BI-RADS | Classic | 369,827.5 | 0.0002 | 0.0004 | True |
| BioClinical | Classic | 387,735.0 | 0.0347 | 0.0347 | True |
Bonferonni-corrected Mann–Whitney U test results for section segmentation with auxiliary data.
| Group1 | Group2 | Stat | Pval | Pval Corrected | Reject |
|---|---|---|---|---|---|
| BI-RADS | BioClinical | 394,951.5 | 0.0991 | 0.2974 | False |
| BI-RADS | Classic | 403,380.0 | 0.4161 | 0.4161 | False |
| BioClinical | Classic | 396,635.5 | 0.1428 | 0.2974 | False |
Bonferonni-corrected Mann–Whitney U test results for section segmentation without auxiliary data in an ablation study with 10% of training data.
| Group1 | Group2 | Stat | Pval | Pval Corrected | Reject |
|---|---|---|---|---|---|
| BI-RADS | BioClinical | 304,253.0 | 0.0 | 0.0 | True |
| BI-RADS | Classic | 374,838.0 | 0.0019 | 0.0019 | True |
| BioClinical | Classic | 336,184.0 | 0.0 | 0.0 | True |
Bonferonni-corrected Mann–Whitney U test results for section segmentation with auxiliary data in an ablation study with 10% of training data.
| Group1 | Group2 | Stat | Pval | Pval Corrected | Reject |
|---|---|---|---|---|---|
| BI-RADS | BioClinical | 242,663.5 | 0.0 | 0.0 | True |
| BI-RADS | Classic | 358,323.5 | 0.0 | 0.0 | True |
| BioClinical | Classic | 288,753.5 | 0.0 | 0.0 | True |
Bonferonni-corrected Mann–Whitney U test results for section segmentation without auxiliary data.
| Group1 | Group2 | Stat | Pval | Pval Corrected | Reject |
|---|---|---|---|---|---|
| BI-RADS | BioClinical | 347,125.0 | 0.0 | 0.0 | True |
| BI-RADS | Classic | 363,098.0 | 0.0 | 0.0 | True |
| BioClinical | Classic | 388,187.5 | 0.0385 | 0.0385 | True |
Bonferonni-corrected Mann–Whitney U test results for section segmentation with auxiliary data.
| Group1 | Group2 | Stat | Pval | Pval Corrected | Reject |
|---|---|---|---|---|---|
| BI-RADS | BioClinical | 395,781.0 | 0.1189 | 0.3568 | False |
| BI-RADS | Classic | 403,920.0 | 0.4438 | 0.4438 | False |
| BioClinical | Classic | 396,843.5 | 0.1489 | 0.3568 | False |
Bonferonni-corrected Mann–Whitney U test results for section segmentation without auxiliary data in an ablation study with 10% of training data.
| Group1 | Group2 | Stat | Pval | Pval Corrected | Reject |
|---|---|---|---|---|---|
| BI-RADS | BioClinical | 295,455.5 | 0.0 | 0.0 | True |
| BI-RADS | Classic | 379,517.0 | 0.0073 | 0.0073 | True |
| BioClinical | Classic | 319,629.5 | 0.0 | 0.0 | True |
Bonferonni-corrected Mann–Whitney U test results for section segmentation with auxiliary data in an ablation study with 10% of training data.
| Group1 | Group2 | Stat | Pval | Pval Corrected | Reject |
|---|---|---|---|---|---|
| BI-RADS | BioClinical | 229,853.0 | 0.0 | 0.0 | True |
| BI-RADS | Classic | 362,248.5 | 0.0 | 0.0 | True |
| BioClinical | Classic | 261,946.0 | 0.0 | 0.0 | True |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of modality.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 30.0 | 0.0152 | 0.0152 | True |
| BI-RADS BERT | Classic | 36.0 | 0.0 | 0.0 | True |
| BioClinical | Classic | 55.0 | 0.0072 | 0.0145 | True |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of previous cancer.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 30.0 | 0.0152 | 0.0152 | True |
| BI-RADS BERT | Classic | 36.0 | 0.0 | 0.0 | True |
| BioClinical | Classic | 55.0 | 0.0072 | 0.0145 | True |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of menopausal status.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 30.0 | 0.0152 | 0.0152 | True |
| BI-RADS BERT | Classic | 36.0 | 0.0 | 0.0 | True |
| BioClinical | Classic | 55.0 | 0.0072 | 0.0145 | True |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of purpose.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 30.0 | 0.0152 | 0.0152 | True |
| BI-RADS BERT | Classic | 36.0 | 0.0 | 0.0 | True |
| BioClinical | Classic | 55.0 | 0.0072 | 0.0145 | True |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of density.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 30.0 | 0.0152 | 0.0152 | True |
| BI-RADS BERT | Classic | 36.0 | 0.0 | 0.0 | True |
| BioClinical | Classic | 55.0 | 0.0072 | 0.0145 | True |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of BPE.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 30.0 | 0.0152 | 0.0152 | True |
| BI-RADS BERT | Classic | 36.0 | 0.0 | 0.0 | True |
| BioClinical | Classic | 55.0 | 0.0072 | 0.0145 | True |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of modality.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 2.0 | 0.0 | 0.0 | True |
| BI-RADS BERT | Classic | 3.0 | 0.0005 | 0.0142 | True |
| BioClinical | Classic | 11.0 | 0.1496 | 1.0 | False |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of previous cancer.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 2.0 | 0.0 | 0.0 | True |
| BI-RADS BERT | Classic | 3.0 | 0.0005 | 0.0142 | True |
| BioClinical | Classic | 11.0 | 0.1496 | 1.0 | False |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of menopausal status.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 2.0 | 0.0 | 0.0 | True |
| BI-RADS BERT | Classic | 3.0 | 0.0005 | 0.0142 | True |
| BioClinical | Classic | 11.0 | 0.1496 | 1.0 | False |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of purpose.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 2.0 | 0.0 | 0.0 | True |
| BI-RADS BERT | Classic | 3.0 | 0.0005 | 0.0142 | True |
| BioClinical | Classic | 11.0 | 0.1496 | 1.0 | False |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of density.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 2.0 | 0.0 | 0.0 | True |
| BI-RADS BERT | Classic | 3.0 | 0.0005 | 0.0142 | True |
| BioClinical | Classic | 11.0 | 0.1496 | 1.0 | False |
Bonferonni-corrected McNemar test results for field extraction with no section segmentation of BPE.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BioClinical | 2.0 | 0.0 | 0.0 | True |
| BI-RADS BERT | Classic | 3.0 | 0.0005 | 0.0142 | True |
| BioClinical | Classic | 11.0 | 0.1496 | 1.0 | False |
McNemar test results for field extraction of modality.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BI-RADS BERT with Segmentation | 10.0 | 0.0 | 0.0 | True |
McNemar test results for field extraction of previous cancer.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BI-RADS BERT with Segmentation | 10.0 | 0.0 | 0.0 | True |
McNemar test results for field extraction of menopausal status.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BI-RADS BERT with Segmentation | 10.0 | 0.0 | 0.0 | True |
McNemar test results for field extraction of purpose.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BI-RADS BERT with Segmentation | 10.0 | 0.0 | 0.0 | True |
McNemar test results for field extraction of density.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BI-RADS BERT with Segmentation | 10.0 | 0.0 | 0.0 | True |
McNemar test results for field extraction of BPE.
| Group1 | Group2 | Stat | Pval | Pval Corr | Reject |
|---|---|---|---|---|---|
| BI-RADS BERT | BI-RADS BERT with Segmentation | 10.0 | 0.0 | 0.0 | True |