| Literature DB >> 34330277 |
Huanyao Zhang1,2, Danqing Hu1,2, Huilong Duan1,2, Shaolei Li3, Nan Wu4, Xudong Lu1,2.
Abstract
BACKGROUND: Computed tomography (CT) reports record a large volume of valuable information about patients' conditions and the interpretations of radiology images from radiologists, which can be used for clinical decision-making and further academic study. However, the free-text nature of clinical reports is a critical barrier to use this data more effectively. In this study, we investigate a novel deep learning method to extract entities from Chinese CT reports for lung cancer screening and TNM staging.Entities:
Keywords: BERT; CT reports; Lung cancer screening and staging; Named entity recognition; Pre-training; Transformer
Year: 2021 PMID: 34330277 PMCID: PMC8323233 DOI: 10.1186/s12911-021-01575-x
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The development pipeline of the proposed method
Entity types for clinical named entity recognition
| Entity type | Description | Instance |
|---|---|---|
| Vessel | Description of great vessel invasion | 病灶包绕右下肺动脉主 (The lesion surrounds the right lower pulmonary trunk) |
| Vertebral Body | Description of Vertebral Body invasion | 颈7椎体压缩变扁(Cervical 7 vertebrae become compressed and flattened) |
| PAOPa | Description of pulmonary atelectasis or obstructive pneumonitis | 远端可见片絮影 (Fillets are visible at the far end) |
| Bronchus | Description of bronchial invasion | 凹陷 (indentation) |
| Pleura | Description of pleural invasion or metastasis | 增厚 (thickening) |
| Shape | Shape of mass | 类圆形 (round) |
| Density | Density of mass | 磨玻璃密度 (ground glass density) |
| Mass | Suspected mass/lump/lesion in lung | 结节 (nodule) |
| Enhancement | Enhancement extent of mass | 强化明显 (significant intension) |
| Size | Size of mass or lymph nodes | 25 × 22 cm |
| Location | Location of mass or lymph nodes | 左上肺右基底段 (upper left lung right basal segment) |
| Lymph | Suspected lymph node metastasis | 肿大淋巴结 (swollen lymph nodes) |
| Negation | Negative words | 未见 (no) |
| Effusion | Condition of pericardial effusion | 心包积液 (effusion) |
aPAOP: Pulmonary Atelectasis/Obstructive Pneumonitis
Fig. 2A chest CT report sample annotated with BIO tags a Original CT report. LOC: Location; SHP: Shape; MA: Mass; SZ: Size; Ng: Negation; LPH: Lymph. b Its’ translation version in English
The statistics of annotated named entities in chest CT reports
| Entity type | Total | |
|---|---|---|
| Count | Average length | |
| Vessel | 51 | 10.82 |
| Vertebral Body | 28 | 13.75 |
| PAOP | 85 | 8.77 |
| Bronchus | 58 | 4.66 |
| Pleura | 230 | 4.27 |
| Shape | 513 | 4.37 |
| Density | 340 | 5.00 |
| Mass | 874 | 4.11 |
| Enhancement | 185 | 5.44 |
| Size | 774 | 7.35 |
| Location | 1937 | 8.77 |
| Lymph | 588 | 4.66 |
| Negation | 924 | 4.27 |
| Effusion | 412 | 4.37 |
Fig. 3The architecture of the BERT-BTN model
The main hyper-parameters for the proposed model
| Parameter | Setting |
|---|---|
| LSTM_Hidden_Size | 128 |
| LSTM_Layer | 1 |
| Transformer_Layer | 1 |
| Transformer_Head | 1 |
| Dropout | 0.13 |
| Batch_size | 8 |
| Learning_Rate | 1e−4 |
The f1 scores of the proposed and benchmark models
| Model | Inexact-match | Exact-match | ||
|---|---|---|---|---|
| Macro | Micro | Macro | Micro | |
| FastText-Transformer | 89.29 ± 2.64 | 95.25 ± 0.46 | 77.12 ± 4.14 | 86.85 ± 1.18 |
| FastText-BiLSTM | 90.46 ± 1.31 | 95.72 ± 0.70 | 80.67 ± 0.87 | 88.08 ± 1.41 |
| FastText-BTN | 90.47 ± 1.82 | 95.22 ± 0.52 | 80.84 ± 3.16 | 87.76 ± 1.30 |
| BERT-Transformer | 90.94 ± 0.69 | 95.80 ± 0.31 | 81.67 ± 6.14 | 87.35 ± 1.23 |
| BERT-BiLSTM | 93.05 ± 0.89 | 97.27 ± 0.16 | 84.12 ± 1.59 | 90.13 ± 0.92 |
| BERT- BTN | 94.40 ± 0.91 | 84.58 ± 2.72 | ||
| BERT-fine-tune | 92.43 ± 0.61 | 96.22 ± 0.93 | 82.15 ± 3.41 | 88.33 ± 3.00 |
| BERT-BTN (with pre-training) | 96.78 ± 0.73 | 90.67 ± 0.51 | ||
Bold value indicates the values is best score in the current evaluation index
Fig. 4The training loss of models using BERT embedding
The exact match macro-f1 scores of the proposed and benchmark models about 14 types of entities
| Entity type | FastText-Transformer | FastText-BiLSTM | FastText-BTN | BERT-Transformer | BERT-BiLSTM | BERT- BTN | BERT-fine-tune | BERT-BTN (pre-training) |
|---|---|---|---|---|---|---|---|---|
| Vessel | 58.01 ± 11.71 | 54.54 ± 3.20 | 56.63 ± 13.56 | 47.42 ± 14.93 | 58.48 ± 5.28 | 59.05 ± 0.51 | 57.31 ± 18.21 | |
| Vertebral Body | 62.67 ± 24.11 | 59.01 ± 22.65 | 65.02 ± 12.48 | 74.41 ± 21.69 | 63.70 ± 24.61 | 70.67 ± 16.36 | 55.71 ± 35.11 | |
| PAOP | 60.54 ± 11.24 | 62.43 ± 8.84 | 66.94 ± 10.75 | 55.27 ± 18.44 | 75.49 ± 9.25 | 74.50 ± 12.46 | 72.32 ± 11.29 | |
| Bronchus | 60.55 ± 7.26 | 72.01 ± 3.48 | 70.79 ± 6.93 | 76.30 ± 7.61 | 79.81 ± 9.71 | 80.15 ± 5.64 | 79.37 ± 4.06 | |
| Pleura | 66.71 ± 10.56 | 84.22 ± 7.32 | 82.46 ± 7.53 | 79.64 ± 7.51 | 84.30 ± 2.60 | 85.10 ± 6.41 | 85.57 ± 3.87 | |
| Shape | 69.90 ± 7.34 | 77.52 ± 2.95 | 73.56 ± 2.64 | 79.25 ± 9.93 | 80.69 ± 2.77 | 81.65 ± 1.79 | ||
| Density | 84.33 ± 1.19 | 81.37 ± 3.23 | 83.85 ± 1.65 | 85.49 ± 8.00 | 87.46 ± 2.51 | 87.21 ± 4.56 | 86.08 ± 1.88 | |
| Mass | 80.16 ± 2.28 | 82.35 ± 3.16 | 83.04 ± 2.11 | 84.99 ± 7.43 | 84.76 ± 2.72 | 85.13 ± 2.4 | 77.44 ± 6.07 | |
| Enhancement | 74.51 ± 5.76 | 80.66 ± 2.81 | 76.54 ± 10.95 | 85.29 ± 5.76 | 84.33 ± 7.36 | 80.24 ± 14.90 | 84.27 ± 6.36 | |
| Size | 93.30 ± 1.65 | 95.58 ± 1.09 | 95.58 ± 1.08 | 95.70 ± 4.59 | 95.63 ± 1.78 | 96.03 ± 0.87 | 95.70 ± 1.32 | |
| Location | 83.87 ± 6.41 | 86.84 ± 2.46 | 86.87 ± 1.65 | 89.00 ± 3.58 | 91.36 ± 0.66 | 88.55 ± 4.00 | 90.60 ± 2.54 | |
| Lymph | 90.51 ± 4.00 | 94.16 ± 3.24 | 93.13 ± 1.17 | 93.65 ± 7.06 | 93.60 ± 3.66 | 94.09 ± 2.26 | 91.98 ± 3.46 | |
| Negation | 98.56 ± 0.41 | 98.58 ± 0.58 | 98.45 ± 2.97 | 98.84 ± 0.39 | 98.30 ± 0.22 | 94.59 ± 8.53 | 98.79 ± 0.38 | |
| Effusion | 96.12 ± 1.72 | 97.84 ± 0.18 | 97.82 ± 1.07 | 96.62 ± 4.10 | 95.61 ± 3.47 | 97.78 ± 0.92 | 96.52 ± 0.48 |
Bold value indicates the values is best score in the current evaluation index
The inexact match macro-f1 scores of the proposed and benchmark models about 14 types of entities
| Entity type | FastText-Transformer | FastText-BiLSTM | FastText-BTN | BERT-Transformer | BERT-BiLSTM | BERT- BTN | BERT-fine-tune | BERT-BTN (pre-training) |
|---|---|---|---|---|---|---|---|---|
| Vessel | 68.74 ± 5.78 | 59.25 ± 8.77 | 62.63 ± 9.30 | 51.06 ± 13.44 | 67.00 ± 9.22 | 73.48 ± 12.46 | 70.74 ± 10.08 | |
| Vertebral Body | 91.81 ± 7.22 | 92.75 ± 6.39 | 85.95 ± 6.79 | 80.00 ± 27.39 | 85.79 ± 11.08 | 85.24 ± 10.16 | 91.69 ± 13.64 | |
| PAOP | 76.25 ± 7.62 | 73.30 ± 14.43 | 77.31 ± 14.70 | 83.17 ± 4.08 | 91.16 ± 1.23 | 93.80 ± 3.13 | 85.44 ± 10.16 | |
| Bronchus | 73.86 ± 9.92 | 85.79 ± 6.32 | 83.21 ± 5.63 | 83.74 ± 3.51 | 90.54 ± 3.49 | 89.67 ± 4.51 | 91.83 ± 3.00 | |
| Pleura | 85.45 ± 7.95 | 93.70 ± 3.08 | 91.71 ± 4.50 | 84.78 ± 3.55 | 96.17 ± 1.77 | 95.25 ± 2.62 | 94.36 ± 4.48 | |
| Shape | 87.30 ± 6.44 | 89.73 ± 1.43 | 88.04 ± 2.74 | 89.01 ± 1.09 | 93.96 ± 2.10 | 93.28 ± 2.13 | 92.41 ± 1.74 | |
| Density | 91.68 ± 1.34 | 91.06 ± 1.77 | 92.72 ± 1.01 | 92.45 ± 1.83 | 93.34 ± 1.16 | 94.86 ± 2.81 | 95.49 ± 0.78 | |
| Mass | 95.32 ± 2.39 | 96.20 ± 1.11 | 96.83 ± 1.11 | 94.34 ± 1.06 | 97.01 ± 0.61 | 95.38 ± 3.96 | 96.86 ± 0.75 | |
| Enhancement | 89.62 ± 5.16 | 92.68 ± 3.00 | 89.04 ± 7.50 | 92.74 ± 1.89 | 95.48 ± 3.98 | 95.28 ± 3.2 | 94.51 ± 4.86 | |
| Size | 98.61 ± 0.46 | 98.44 ± 0.83 | 98.34 ± 0.46 | 97.74 ± 0.74 | 99.03 ± 0.59 | 98.45 ± 0.55 | 98.62 ± 0.77 | |
| Location | 93.52 ± 2.18 | 95.33 ± 1.05 | 95.48 ± 0.48 | 93.77 ± 0.92 | 97.41 ± 0.61 | 94.51 ± 3.35 | 97.24 ± 2.08 | |
| Lymph | 99.58 ± 0.37 | 99.71 ± 0.30 | 95.85 ± 2.54 | 99.13 ± 0.28 | 98.25 ± 1.76 | 99.41 ± 0.57 | 98.60 ± 2.28 | |
| Negation | 98.88 ± 0.36 | 99.06 ± 0.28 | 98.66 ± 0.52 | 99.05 ± 0.14 | 98.97 ± 0.25 | 99.00 ± 0.25 | 98.88 ± 0.11 | |
| Effusion | 99.31 ± 0.59 | 99.09 ± 0.64 | 97.65 ± 1.01 | 99.05 ± 1.31 | 99.08 ± 0.79 | 99.18 ± 0.61 | 99.30 ± 1.00 |
Bold value indicates the values is best score in the current evaluation index
Fig. 5Comparison of the proposed and benchmark models about 14 types of named entities under exact match scheme
Fig. 6Comparison of the proposed and benchmark models about 14 types of named entities under inexact match scheme