| Literature DB >> 34114076 |
J Martijn Nobel1,2, Sander Puts3, Jakob Weiss4,5, Hugo J W L Aerts6,4,5, Raymond H Mak4,5, Simon G F Robben6,7, André L A J Dekker3.
Abstract
BACKGROUND: In the era of datafication, it is important that medical data are accurate and structured for multiple applications. Especially data for oncological staging need to be accurate to stage and treat a patient, as well as population-level surveillance and outcome assessment. To support data extraction from free-text radiological reports, Dutch natural language processing (NLP) algorithm was built to quantify T-stage of pulmonary tumors according to the tumor node metastasis (TNM) classification. This structuring tool was translated and validated on English radiological free-text reports. A rule-based algorithm to classify T-stage was trained and validated on, respectively, 200 and 225 English free-text radiological reports from diagnostic computed tomography (CT) obtained for staging of patients with lung cancer. The automated T-stage extracted by the algorithm from the report was compared to manual staging. A graphical user interface was built for training purposes to visualize the results of the algorithm by highlighting the extracted concepts and its modifying context.Entities:
Keywords: Classification system; Free-text; Natural language processing; Radiology; Reporting
Year: 2021 PMID: 34114076 PMCID: PMC8192634 DOI: 10.1186/s13244-021-01018-1
Source DB: PubMed Journal: Insights Imaging ISSN: 1869-4101
Cohort composition of the training and validation sets
| Training ( | Validation ( | |
|---|---|---|
| T1a | 4 | 6 |
| T1b | 27 | 31 |
| T1c | 42 | 42 |
| T2 | 6 | 3 |
| T2a | 32 | 44 |
| T2b | 27 | 23 |
| T3 | 33 | 41 |
| T4 | 29 | 35 |
| CT | 106 | 120 |
| PET | 77 | 88 |
| PET-CT | 17 | 17 |
Included report statistics by T-substage for the training and validation sets
Fig. 1T-stage classifier. Schematic overview of T-stage classification. In the preprocessing step, raw data of the report are prepared for the actual processing. In the processing step, the following subtasks are performed: tumor size extraction, a T-stage presence check of abnormalities and involvement [8]
Fig. 2Graphical user interface MedStruct. Two screenshots of the graphical user interface MedStruct with the original report on the left side and its T-stage on the right side, combined with the items size, present and involvement. Also, N (nodal stage) and M (metastatic disease) are mentioned for future use. By using drop-down menus stages can be adjusted (upper). Annotated report at the left side and a feedback form at the right (lower) [22]
Fig. 3MedStruct pipeline. Schematic overview of the MedStruct pipeline, in which five different microservices are present: preprocessing, spaCy, pyContextNLP, measurement extractor and T-stage classifier. The report can be processed either from an Excel file or direct from the graphical user interface (GUI). All components use an intermediate JavaScript Object Notation (JSON) annotation format to chain the pipeline components and can be consumed over REpresentational State Transfer (REST) or chained using a message broker. The use of a JSON annotation format simplifies reusability of the different components, enables mixing programming languages, prevents for duplicate processing and guarantees token alignment between components. This implementation saves annotations at token level instead of sentence level, which enables precise highlighting of annotations in a GUI. Detected tumor and lymph nodes are stored as objects in a list, allowing for detection of concurrent mentions. Documents can now be processed individually with the same rule-based algorithm
T-stage classifier accuracy
| English | Dutch | |||
|---|---|---|---|---|
| Training ( | Validation ( | Training ( | Validation ( | |
| Accuracy T-substage | 0.87 | 0.84 | 0.79 | 0.88 |
| Accuracy T-stage | 0.89 | 0.89 | 0.81 | 0.89 |
| Tumor size-based T-stage | 0.78 | 0.76 | 0.70 | 0.79 |
Accuracy scores of training set and validation sets in the English cohort and the Dutch cohort. In the Dutch group, the outcomes with the new processing structure are recalculated at the substage level
Fig. 4Confusion matrices of the T-stage classification training set (upper) and validation set (lower)
Precision, recall and F1-scores T-substage
| Precision | Recall | F1 score | |
|---|---|---|---|
| T1a | 0.50 | 0.50 | 0.50 |
| T1b | 0.92 | 0.89 | 0.91 |
| T1c | 0.90 | 0.86 | 0.88 |
| T2 | 0.67 | 1.00 | 0.80 |
| T2a | 0.85 | 0.91 | 0.88 |
| T2b | 1.00 | 0.89 | 0.94 |
| T3 | 0.82 | 0.85 | 0.84 |
| T4 | 0.83 | 0.83 | 0.83 |
| T1a | 1.0 | 0.50 | 0.67 |
| T1b | 0.83 | 0.65 | 0.73 |
| T1c | 0.82 | 0.88 | 0.85 |
| T2 | 0.60 | 1.00 | 0.75 |
| T2a | 0.83 | 0.89 | 0.86 |
| T2b | 0.88 | 0.91 | 0.89 |
| T3 | 0.90 | 0.88 | 0.89 |
| T4 | 0.84 | 0.89 | 0.87 |
Precision, recall and F1-scores T-substage for the training set and validation set
T-stage errors by category
| Error group | Error type | Description | Training ( | Validation ( |
|---|---|---|---|---|
| Data selection | Sectionizer | Detects information in wrong subheadings | 1 | 3 |
| Missing blacklist synonyms | Falsely matched/falsely not excluded | 0 | 5 | |
| Context | Context missing | Context not matched because of missing modifier | 1 | 0 |
| Context mismatch | Context mismatch, wrong modifier detected | 1 | 3 | |
| Concept matching | Measurement extractor | e.g., using abbreviations (e.g., (AP) × (TVR) × (SI)) | 1 | 2 |
| Complexity | T4 multiple lobes | 2 | 1 | |
| Ambiguity | Confusion between node and mass (specific site: hilar) | 4 | 7 | |
| Nonspecific | 4 | 9 | ||
| Missing concepts synonyms | Lobulated | 1 | 0 | |
| Cystic | 2 | 0 | ||
| Pleural thickening | 1 | 0 | ||
| Spinal metastasis | 1 | 0 | ||
| Costal involvement | 0 | 1 | ||
| Supraclavicular extension | 0 | 1 | ||
| Reporter | Wrong input | Different sizes for the same tumor, no unit (mm/cm) present, size for tumor and atelectasis | 7 | 2 |
| Satellite node | 1 | 1 | ||
| Total errors | 27 | 35 |
T-stage errors by category for training and validation sets