| Literature DB >> 36050674 |
Hyung Jun Park1,2, Namu Park3, Jang Ho Lee1, Myeong Geun Choi4, Jin-Sook Ryu5, Min Song6, Chang-Min Choi7,8.
Abstract
BACKGROUND: Extracting metastatic information from previous radiologic-text reports is important, however, laborious annotations have limited the usability of these texts. We developed a deep-learning model for extracting primary lung cancer sites and metastatic lymph nodes and distant metastasis information from PET-CT reports for determining lung cancer stages.Entities:
Keywords: Auto-annotation; Deep learning; Lung cancer; Natural language processing; Pseudo-labelling
Mesh:
Year: 2022 PMID: 36050674 PMCID: PMC9438247 DOI: 10.1186/s12911-022-01975-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1Structure of the model pre-processing and development for extracting information
Prevalence of primary sites and metastatic lymph nodes and organs
| Validation set (N = 473) | Additional-test set (N = 3362) | |||
|---|---|---|---|---|
| Number | Prevalence | Number | Prevalence | |
| Left | ||||
| Left (huge) | 6/473 | 0.0127 | 143/3362 | 0.0425 |
| Left lower lobe | 84/473 | 0.1776 | 553/3362 | 0.1645 |
| Left upper lobe | 126/473 | 0.2664 | 765/3362 | 0.2275 |
| Right | ||||
| Right (huge) | 8/473 | 0.0169 | 171/3362 | 0.0509 |
| Right lower lobe | 90/473 | 0.1903 | 690/3362 | 0.2052 |
| Right middle lobe | 27/473 | 0.0571 | 191/3362 | 0.0568 |
| Right upper lobe | 132/473 | 0.2791 | 849/3362 | 0.2525 |
| N1 | ||||
| Hilar | 142/473 | 0.3002 | ||
| Interlobar | 123/473 | 0.2600 | ||
| (Peri) Bronchial | 3/473 | 0.0063 | ||
| Lobar | 6/473 | 0.0127 | ||
| N2 | ||||
| Upper paratracheal | 32/473 | 0.0677 | ||
| Prevascular, retrotracheal | 36/473 | 0.0761 | ||
| Lower paratracheal | 77/473 | 0.1628 | ||
| Subaortic | 10/473 | 0.0211 | ||
| Para-aortic | 23/473 | 0.0486 | ||
| Subcarinal | 65/473 | 0.1374 | ||
| Para-oesophageal | 19/473 | 0.0402 | ||
| N3 | ||||
| Contralateral N1 | 43/473 | 0.0909 | ||
| Contralateral N2 | 87/473 | 0.1839 | ||
| Supraclavicular | 99/473 | 0.2093 | ||
| Intra-thoracic metastasis | ||||
| Malignant pleural effusion | 36/473 | 0.0761 | 342/3362 | 0.1017 |
| Malignant pericardial effusion | 6/473 | 0.0127 | 20/3362 | 0.0059 |
| Pleural nodule | 55/473 | 0.1163 | 542/3362 | 0.1612 |
| Contralateral lung | 61/473 | 0.1290 | 363/3362 | 0.1080 |
| Ipsilateral lung | 65/473 | 0.1374 | 1/3362 | 0.0003 |
| Synchronous lung cancer | 11/473 | 0.0233 | 13/3362 | 0.0039 |
| Lymphangitic meta | 9/473 | 0.0190 | 46/3362 | 0.0137 |
| Extra-thoracic metastasis | ||||
| Bone (including rib and sternum) | 119/473 | 0.2516 | 697/3362 | 0.2073 |
| Extra-thoracic lymph node | 82/473 | 0.1734 | 401/3362 | 0.1193 |
| Brain | 14/473 | 0.0296 | 75/3362 | 0.0223 |
| Adrenal | 22/473 | 0.0465 | 177/3362 | 0.0526 |
| Liver | 42/473 | 0.0888 | 62/3362 | 0.0184 |
| Other | 46/473 | 0.0973 | 143/3362 | 0.0425 |
Prediction accuracy for primary cancer location and metastatic sites
| Validation set | Additional-test set | |||||||
|---|---|---|---|---|---|---|---|---|
| Frequency | Precision | Sensitivity | F1-Score | Frequency | Precision | Sensitivity | F1-Score | |
| Left | ||||||||
| Left (huge) | 6 | 0.2000 (1/5) | 0.1667 (1/6) | 0.1818 | 143 | 0.6786 (19/28) | 0.1329 (19/143) | 0.2222 |
| Left lower lobe | 84 | 0.7143 (70/98) | 0.8333 (70/84) | 0.7692 | 553 | 0.7894 (521/660) | 0.9421 (521/553) | 0.8590 |
| Left upper lobe | 126 | 0.8684 (99/114) | 0.7857 (99/126) | 0.8250 | 765 | 0.9598 (717/747) | 0.9373 (717/765) | 0.9484 |
| Any of left† | 216 | 0.8940 (194/217) | 0.8981 (194/216) | 0.8961 | 1461 | 0.9666 (1387/1435) | 0.9493 (1387/1461) | 0.9579 |
| Right | ||||||||
| Right (huge) | 8 | 0.0000 (0/0) | 0.0000 (0/8) | 0.0000 | 171 | 0.0000 (0/0) | 0.0000 (0/171) | 0.0000 |
| Right lower lobe | 90 | 0.8690 (73/84) | 0.8111 (73/90) | 0.8391 | 690 | 0.9361 (659/704) | 0.9551 (659/690) | 0.9455 |
| Right middle lobe | 27 | 0.3898 (23/59) | 0.8519 (23/27) | 0.5349 | 191 | 0.4360 (177/406) | 0.9267 (177/191) | 0.5930 |
| Right upper lobe | 132 | 0.8850 (100/113) | 0.7576 (100/132) | 0.8163 | 849 | 0.9376 (766/817) | 0.9022 (766/849) | 0.9196 |
| Any of right† | 257 | 0.9141 (234/256) | 0.9105 (234/257) | 0.9123 | 1901 | 0.9616 (1853/1927) | 0.9748 (1853/1901) | 0.9681 |
| Overall | 473 | 0.7953 | 0.7738 | 0.7767 | 3362 | 0.8308 | 0.8504 | 0.8265 |
| Intra-thoracic | ||||||||
| Malignant effusion | 36 | 0.4096 (34/83) | 0.9444 (34/36) | 0.5714 | 342 | 0.57 (334/586) | 0.9766 (334/342) | 0.7198 |
| Pleural nodule | 55 | 0.6296 (51/81) | 0.9273 (51/55) | 0.7500 | 542 | 0.7674 (508/662) | 0.9373 (508/542) | 0.8439 |
| Contralateral metastasis | 61 | 0.3846 (40/104) | 0.6557 (40/61) | 0.4848 | 363 | 0.3441 (287/834) | 0.7906 (287/363) | 0.4795 |
| Extra-thoracic | ||||||||
| Bone | 119 | 0.8298 (117/141) | 0.9832 (117/119) | 0.9000 | 697 | 0.7462 (682/914) | 0.9785 (682/697) | 0.8467 |
| Extra-thoracic LN‡ | 82 | 0.4530 (82/181) | 1.0000 (82/82) | 0.6236 | 401 | 0.3347 (399/1192) | 0.995 (399/401) | 0.5009 |
| Adrenal | 22 | 0.4872 (19/39) | 0.8636 (19/22) | 0.6230 | 177 | 0.4958 (175/353) | 0.9887 (175/177) | 0.6604 |
| Liver | 42 | 0.8810 (37/42) | 0.8810 (37/42) | 0.8810 | 62 | 0.2609 (12/46) | 0.1935 (12/62) | 0.2222 |
| Overall | 473 | 0.6150 | 0.9113 | 0.7202 | 3362 | 0.5782 | 0.9276 | 0.6963 |
†Predicting cancer site between right or left that do not consider subdivision of the lung lobes. ‡LN: lymph nodes
Fig. 2AUROC and AUPRC curves for the prediction of primary sites in the validation set and the additional-test set. The AUROC and AUPRC curves on the validation set and the additional-test set are shown. The AUROC values for primary cancer sites were 0.913 and 0.946 in the validation set and the additional-test set, respectively. The micro-AUPRC values for primary cancer sites were 0.819 and 0.902 in the validation set and the additional-test set, respectively. F1: line of F1-score with some value (0.2 to 0.8)
Accuracy for prediction of metastatic lymph nodes in the validation set
| Lymph node | Frequency | Precision | Sensitivity | Specificity | F1-score |
|---|---|---|---|---|---|
| Hilar | 141 | 0.8141 (127/156) | 0.9007 (127/141) | 0.9127 (303/332) | 0.8552 |
| Interlobar | 121 | 0.7740 (113/146) | 0.9339 (113/121) | 0.9062 (319/352) | 0.8464 |
| Lobar | 6 | 0.4286 (6/14) | 1.0000 (6/6) | 0.9829 (459/467) | 0.6000 |
| Upper paratracheal | 32 | 0.8696 (20/23) | 0.6250 (20/32) | 0.9932 (438/441) | 0.7273 |
| Prevascular, retrotracheal | 35 | 0.9259 (25/27) | 0.7143 (25/35) | 0.9954 (436/438) | 0.8065 |
| Lower paratracheal | 77 | 0.8533 (64/75) | 0.8312 (64/77) | 0.9722 (385/396) | 0.8421 |
| Subaortic | 10 | 0.6000 (9/15) | 0.9000 (9/10) | 0.9870 (457/463) | 0.7200 |
| Para-aortic | 23 | 0.6000 (21/35) | 0.9130 (21/23) | 0.9689 (436/450) | 0.7241 |
| Subcarinal | 65 | 0.8000 (60/75) | 0.9231 (60/65) | 0.9632 (393/408) | 0.8571 |
| Para-oesophageal | 19 | 0.7083 (17/24) | 0.8947 (17/19) | 0.9846 (447/454) | 0.7907 |
| Contralateral N1 | 61 | 0.5357 (45/84) | 0.7377 (45/61) | 0.9053 (373/412) | 0.6207 |
| Contralateral N2 | 109 | 0.7647 (52/68) | 0.4771 (52/109) | 0.9560 (348/364) | 0.5876 |
| Supraclavicular | 99 | 0.9307 (94/101) | 0.9495 (94/99) | 0.981 (367/374) | 0.9400 |
| 0.7663 | 0.8265 | 0.9603 | 0.7862 | ||
Fig. 3AUROC and AUPRC curves for predicting metastatic organs in the validation set and the additional-test set. The AUROC and AUPRC curves on the validation set and the additional-test set are shown. The AUROC values for predicting metastatic organs were 0.944 and 0.950 in the validation set and the additional-test set, respectively. The AUPRC values for metastatic organs were 0.687 and 0.640 in the validation set and the additional-test set, respectively. F1: line of F1-score with some value (0.2 to 0.8)