| Literature DB >> 31888615 |
Yi Liu1, Qing Liu2, Chao Han1, Xiaodong Zhang1, Xiaoying Wang3.
Abstract
BACKGROUND: There are often multiple lesions in breast magnetic resonance imaging (MRI) reports and radiologists usually focus on describing the index lesion that is most crucial to clinicians in determining the management and prognosis of patients. Natural language processing (NLP) has been used for information extraction from mammography reports. However, few studies have investigated NLP in breast MRI data based on free-form text. The objective of the current study was to assess the validity of our NLP program to accurately extract index lesions and their corresponding imaging features from free-form text of breast MRI reports.Entities:
Keywords: BI-RADS; Breast cancer; Index lesion; Magnetic resonance imaging; Natural language processing; Rule-based method
Mesh:
Year: 2019 PMID: 31888615 PMCID: PMC6937920 DOI: 10.1186/s12911-019-0997-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Imaging features and their value set to be extracted from breast MRI reports
| Entity type | Imaging features | Data type | Value set of imaging feature | Units |
|---|---|---|---|---|
| Mass | Location | Categorical | Right, Left, Upper, Outer quadrant … | N/A |
| Shape | Categorical | Oval, Round, Irregular … | N/A | |
| Size | Numerical | mm | ||
| Signal | Categorical | N/A | ||
| T1WI | Categorical | Low signal, Isointensity, High signal | N/A | |
| T2WI | Categorical | Low signal, Isointensity, High signal | N/A | |
| DWI | Categorical | Low signal, Isointensity, High signal | N/A | |
| Margin | Categorical | Smooth, irregular, spiculated... | N/A | |
| Internal enhancement | Categorical | Homogeneous, Heterogeneous … | N/A | |
| Enhancement kinetic curve | Categorical | Persistent, Plateau, Wash-out | N/A | |
| NME | Location | Categorical | Right, Left, Upper, Outer quadrant … | N/A |
| Distribution pattern | Categorical | Focal area, Linear, Segmental … | N/A | |
| Scope | Numerical | mm | ||
| Signal | Categorical | N/A | ||
| T1WI | Categorical | Low signal, Isointensity, High signal | N/A | |
| T2WI | Categorical | Low signal, Isointensity, High signal | N/A | |
| DWI | Categorical | Low signal, Isointensity, High signal | N/A | |
| Internal enhancement | Categorical | Homogeneous, Heterogeneous... | N/A | |
| Enhancement kinetic curve | Categorical | Persistent, Plateau, Wash-out | N/A | |
| Other associated findings | Lymphadenopathy | Categorical | N/A | |
| Invasion of skin, nipple, chest wall, pectoralis muscle | Categorical | N/A |
T1WI T1-weighted imaging, T2WI T2-weighted imaging, DWI diffusion weighted imaging, NME non-mass enhancement, N/A not applicable
BI-RADS assessment categories and the number of index lesions found for each category
| BI-RADS category | Implication | Number |
|---|---|---|
| 0 | Requires additional imaging assessment and/or prior imaging for comparison | 0 |
| 1 | Negative | 0 |
| 2 | Benign discovery | 28 |
| 3 | Probably benign discovery | 60 |
| 4 | Suspected suspicious abnormality - biopsy | 87 |
| 5 | Highly suggestive of malignancy - appropriate action should be taken | 119 |
| 6 | Malignancy confirmed by biopsy - clinically feasible surgical resection | 177 |
BI-RADS Breast Imaging Reporting and Data System
Fig. 1An overview of our NLP program for extracting breast MRI reports
Fig. 2A representative original breast MRI report. The report consists of an imaging description and diagnostic impression
Fig. 3Annotated text with the final extraction results. Each report was converted into a list of vocabulary flagged with its section, sentence, and vocabulary number after section segmentation, sentence segmentation, and tokenization
Fig. 4The number of features in each lesion and the number of lesions in each report
Error analysis of the NLP system
| Imaging features | Not detected (cases) | Not detected correctly (cases) | |
|---|---|---|---|
| Index lesion | Location | 16 | 2 |
| Shape | 2 | 24 | |
| Size | 20 | 5 | |
| T1WI | 21 | – | |
| T2WI | 18 | – | |
| DWI | 12 | 10 | |
| Margin | 7 | 9 | |
| Internal enhancement | 20 | 3 | |
| Enhancement kinetic curve | 20 | – | |
| Lymphadenopathy | 12 | 20 | |
| Nipple invasion | 29 | – | |
| Skin invasion | 43 | – | |
| Chest wall invasion | 5 | – | |
| Pectoralis muscle invasion | 3 | – | |
| BI-RADS category | 20 | 4 |
NLP natural language processing, T1WI T1-weighted imaging, T2WI T2-weighted imaging, DWI diffusion weighted imaging, NME non-mass enhancement, BI-RADS Breast Imaging Reporting and Data System
Accuracy in extracting complete descriptions of breast lesions by the NLP system
| Entity type | Imaging features | Recall | Precision |
|---|---|---|---|
| Mass | Location | 90.1% | 95.7% |
| Shape | 85.4% | 94.1% | |
| Size | 90.4% | 94.2% | |
| T1WI | 90.3% | 94.1% | |
| T2WI | 89.6% | 94.1% | |
| DWI | 88.7% | 93.2% | |
| Margin | 90.9% | 95.9% | |
| Internal enhancement | 91.6% | 90.2% | |
| Enhancement kinetic curve | 91.6% | 95.7% | |
| NME | Location | 90.9% | 95.2% |
| Distribution pattern | 86.2% | 94.1% | |
| Scope | 89.2% | 92.3% | |
| T1WI | 91.1% | 93.6% | |
| T2WI | 88.9% | 93.2% | |
| DWI | 88.6% | 94.0% | |
| Internal enhancement | 91.3% | 91.5% | |
| Enhancement kinetic curve | 90.9% | 95.0% | |
| Lymphadenopathy | 98.7% | 87.7% | |
| Invasion | Nipple | 98.6% | 86.4% |
| Skin | 97.4% | 86.8% | |
| Chest wall | 97.7% | 86.9% | |
| Pectoralis muscle | 96.2% | 85.8% | |
| BI-RADS category | 96.6% | 94.8% | |
| Overall | 91.5% | 92.9% |
NLP natural language processing, T1WI T1-weighted imaging, T2WI T2-weighted imaging, DWI diffusion weighted imaging, NME non-mass enhancement, BI-RADS Breast Imaging Reporting and Data System