| Literature DB >> 35907966 |
Jia Li1, Yucong Lin2, Pengfei Zhao1, Wenjuan Liu1, Linkun Cai3, Jing Sun1, Lei Zhao1, Zhenghan Yang1, Hong Song4, Han Lv5, Zhenchang Wang6,7.
Abstract
BACKGROUND: Given the increasing number of people suffering from tinnitus, the accurate categorization of patients with actionable reports is attractive in assisting clinical decision making. However, this process requires experienced physicians and significant human labor. Natural language processing (NLP) has shown great potential in big data analytics of medical texts; yet, its application to domain-specific analysis of radiology reports is limited.Entities:
Keywords: Artificial intelligence; Bidirectional encoding representation of transformer; Deep learning; Natural language processing; Radiology report
Mesh:
Year: 2022 PMID: 35907966 PMCID: PMC9338483 DOI: 10.1186/s12911-022-01946-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1Workflow of study
Summary of NLP studies focusing on actionable radiology reports (ML: Machine Learning, DL: Deep Learning, BERT: Bidirectional Encoding Representation of Transformer).
| Author(s) | Language | Number of radiology reports | Algorithm | Section of report | Research objective |
|---|---|---|---|---|---|
| Carrodeguas et al. [ | English | 2306 | ML/DL | Impression | Classifying recommendation |
| Helibrun et al. [ | English | 851 | Rule-based | Impression | Detecting critical finding |
| Lou et al. [ | English | 6000 | ML | Not mentioned | Classifying recommendation |
| Esteban et al. [ | English | 3401 | Software | Findings, impression | Classifying recommendation |
| Morioka et al. [ | English | 1402 | Rule-based | Not mentioned | Classifying disease condition |
| Fu et al. [ | English | 1000 | Rule-based ML/DL | Not mentioned | Classifying disease condition |
| Nakamura et al. [ | Japanese | 63646 | BERT | Order, findings, impression | Detecting critical finding |
| Jujjavarapu et al. [ | English | 871 | ML | Not mentioned | Classifying disease condition |
| Liu et al.. [ | Chinese | 1089 | BERT/ML | Findings | Classifying disease condition |
| Zhang et al. [ | Chinese | 359 | BERT Pre-training | Findings | Classifying disease condition |
| Zaman et al. [ | English | 1503 | BERT Pre-training | Findings | Classifying disease condition |
| Liu et al.. [ | English | 594 | BERT | Not mentioned | Classifying certainty |
| Proposed study | Chinese | 5864 | BERT Pre-training DL | Findings | Classifying disease condition |
Data labeling criteria
| Classification | Potentially clinically important findings | Label instruction |
|---|---|---|
| Normal finding (labeled as 0) | NA | The scenarios when all organs are described as normal |
| Irrelevant finding (labeled as 1) | If any lesion is observed and should be reported; meanwhile, the clinician is confident that the image finding provided limited information for diagnosis of tinnitus. | |
| Any Degeneration | ||
| -Brain degeneration | ||
| -Nasosinusitis (frontal sinus, sphenoid sinus, ethmoid sinus, maxillary sinus (except acute inflammation involving adjacent bone structures)) | ||
| -Nasal turbinate hypertrophy | ||
| -Deviation of nasal septum | ||
| -Sinus cyst | ||
| Chronic middle ear mastoiditis (except acute inflammation involving adjacent bone structures) | ||
| -Auditory canal cerumen | ||
| -Low middle cranial fossa | ||
| Relative finding (labeled as 2) | If one or more image findings should be reported in detail, and lead to a certain diagnosis for further examination or clinical evaluation. Or the image finding addressed a need for urgent communication with clinicians for timely treatment. Since there is variability in language expression, the labeler’s judgment is used as reference. | |
| Sigmoid sinus bone wall deficiency | ||
| -Superior semicircular canal dehiscence | ||
| -Auditory ossicle abnormality | ||
| -Bone fracture | ||
| -High jugular fossa | ||
| -Neoplasms | ||
| -Intracranial hemorrhage | ||
| -Cerebral infarction | ||
| -Cerebral herniation | ||
| -Neoplasms | ||
| -Nasosinusitis (morphologically altered bone or sinus tract obstruction) | ||
| -Tympanic lesion (inflammation, neoplasm or perforation) | ||
| -Otosclerosis | ||
| -Cholesteatoma | ||
| -Other neoplasm observed within the imaging field |
Fig. 2Details of report annotation and text preprocessing. Yellow characters represent findings irrelevant to tinnitus; red characters represent findings relevant to tinnitus and should be actionably reported in communication with physicians. *All reports were written in Chinese, and English translations are presented in the figure for illustration
Hyperparameters of model training
| Model | Layers | Epochs | Batch size | Optimizer |
|---|---|---|---|---|
| CNN | 16 | 20 | 32 | Adam |
| MLP | 16 | 20 | 32 | Adam |
| Bi-LSTM | 16 | 20 | 32 | Adam |
| Bi-LSTM-CNN | 32 | 20 | 32 | Adam |
| BERT (variants) -fine tune | 768 | 10 | 8 | Adam |
| BERT-pre-training | 768 | 10 | 8 | Adam |
Fig. 3Illustration of architecture in BERT based models. The English subtitle is a translation of a sentence in a Chinese radiology report
Training time, computing resource and hyperparameters in IDPT
| Data size | Train epochs | Train batch_size | Eval batch_size | Eval strategy | Eval steps | GPU | Pre-training time per epoch |
|---|---|---|---|---|---|---|---|
| 10.7 MByte | 10 | 16 | 16 | Steps | 100 | Nvidia GTX1070 | 32 min |
Fig. 4Token length distribution in training dataset
Token length distribution in all training datasets.
| Label | Average token length (±standard deviation) | Number of samples | |
|---|---|---|---|
| Normal finding | 0 | 182.92±12.62 | 1100 |
| Unrelated finding | 1 | 237.73±28.45 | 2851 |
| Related finding | 2 | 262.52±47.13 | 1913 |
Training Time and Hyperparameters in TLOS
| Token length | Train epochs | Batch size | Optimizer | Learning rate | Training time per epoch (Min) |
|---|---|---|---|---|---|
| 128 | 10 | 16 | Adam | 2e-5 | 12±0.24 |
| 256 | 10 | 16 | Adam | 2e-5 | 23±0.70 |
| 328 | 10 | 16 | Adam | 2e-5 | 31±0.47 |
| 468 | 10 | 16 | Adam | 2e-5 | 39±1.21 |
| 512 | 10 | 16 | Adam | 2e-5 | 43±0.62 |
Comparison of model performance metrics
| Embedding | Classifier | Accuracy | Precision | Recall | AUC | F1-score |
|---|---|---|---|---|---|---|
| Word2Vec | CNN | 0.729 | 0.744 | 0.729 | 0.767 | 0.733 |
| MLP | 0.644 | 0.643 | 0.644 | 0.711 | 0.644 | |
| Bi-LSTM | 0.737 | 0.740 | 0.737 | 0.677 | 0.738 | |
| Bi-LSTM-CNN | 0.728 | 0.729 | 0.728 | 0.692 | 0.728 | |
| BERT | CNN | 0.770 | 0.788 | 0.777 | 0.908 | 0.781 |
| MLP | 0.719 | 0.714 | 0.719 | 0.874 | 0.712 | |
| Bi-LSTM | 0.777 | 0.792 | 0.780 | 0.888 | 0.774 | |
| Bi-LSTM-CNN | 0.698 | 0.696 | 0.698 | 0.861 | 0.690 | |
| Fine-tune | 0.760 | 0.761 | 0.759 | 0.868 | 0.760 | |
| IDPT | ||||||
| BERT-wmm-ext | Fine-tune | 0.756 | 0.756 | 0.756 | 0.883 | 0.754 |
| Mengzi | Fine-tune | 0.751 | 0.751 | 0.751 | 0.846 | 0.750 |
| Roberta | Fine-tune | 0.767 | 0.767 | 0.767 | 0.878 | 0.764 |
The highest index is highlighted in bold
Fig. 5ROC curve and AUC values of BERT fine-tune and deep learning models. FPR:False Prediction Rate; TPR: True Prediction Rate.
Fig. 6ROC curve and AUC values of BERT combined with deep learning models. FPR:False Prediction Rate; TPR: True Prediction Rate.
Fig. 7ROC curve and AUC values of BERT original, BERT-finetune and BERT-IDPT model.FPR:False Prediction Rate; TPR: True Prediction Rate.
Fig. 8ROC curve and AUC values of BERT and Variant models. FPR:False Prediction Rate; TPR: True Prediction Rate.
Comparison of BERT finetune with different max sequence lengths
| Model | Max sequence length | Accuracy | Precision | Recall | AUC | F1-score |
|---|---|---|---|---|---|---|
| BERT fine-tune | 128 | 0.741 | 0.738 | 0.741 | 0.866 | 0.736 |
| 256 | 0.71 | 0.707 | 0.71 | 0.843 | 0.708 | |
| 328 | 0.616 | 0.627 | 0.616 | 0.797 | 0.601 | |
| 468 | 0.551 | 0.557 | 0.551 | 0.759 | 0.546 | |
| 512 | 0.760 | 0.761 | 0.759 | 0.868 | 0.760 |
Fig. 9ROC curve and AUC values of the BERT model using different max sequence lengths. FPR:False Prediction Rate; TPR: True Prediction Rate.
Fig. 10Comparison of accuracy in the BERT model using different max sequence lengths