| Literature DB >> 36010232 |
Ching-Heng Lin1,2,3, Kai-Cheng Hsu3,4,5,6, Chih-Kuang Liang3,7,8,9, Tsong-Hai Lee10,11, Ching-Sen Shih8, Yang C Fann3.
Abstract
Patients with intracranial artery stenosis show high incidence of stroke. Angiography reports contain rich but underutilized information that can enable the detection of cerebrovascular diseases. This study evaluated various natural language processing (NLP) techniques to accurately identify eleven intracranial artery stenosis from angiography reports. Three NLP models, including a rule-based model, a recurrent neural network (RNN), and a contextualized language model, XLNet, were developed and evaluated by internal-external cross-validation. In this study, angiography reports from two independent medical centers (9614 for training and internal validation testing and 315 as external validation) were assessed. The internal testing results showed that XLNet had the best performance, with a receiver operating characteristic curve (AUROC) ranging from 0.97 to 0.99 using eleven targeted arteries. The rule-based model attained an AUROC from 0.92 to 0.96, and the RNN long short-term memory model attained an AUROC from 0.95 to 0.97. The study showed the potential application of NLP techniques such as the XLNet model for the routine and automatic screening of patients with high risk of intracranial artery stenosis using angiography reports. However, the NLP models were investigated based on relatively small sample sizes with very different report writing styles and a prevalence of stenosis case distributions, revealing challenges for model generalization.Entities:
Keywords: cerebrovascular diseases; deep learning; intracranial artery stenosis; natural language processing; ruled-based model
Year: 2022 PMID: 36010232 PMCID: PMC9406429 DOI: 10.3390/diagnostics12081882
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
The percentage of cases with confirmed stenosis (≥50% diameter stenosis) for each artery in both internal and external datasets.
| Internal Dataset | External Dataset | |
|---|---|---|
| RIICA (%) | 740 (7.7) | 12 (3.8) |
| RACA (%) | 416 (4.3) | 2 (0.6) |
| RMCA (%) | 967 (10.1) | 13 (4.1) |
| RPCA (%) | 491 (5.1) | 6 (1.9) |
| RIVA (%) | 1052 (10.9) | 2 (0.6) |
| BA (%) | 554 (5.8) | 9 (2.9) |
| LIICA (%) | 735 (7.6) | 4 (1.3) |
| LACA (%) | 407 (4.2) | 4 (1.3) |
| LMCA (%) | 1005 (10.5) | 10 (3.2) |
| LPCA (%) | 547 (5.7) | 3 (1.0) |
| LIVA (%) | 943 (9.8) | 2 (0.6) |
BA—basilar artery; LACA—left anterior cerebral artery; LIICA—left internal carotid artery; LIVA—left intracranial vertebral artery; LMCA—left middle cerebral artery; LPCA—left posterior cerebral artery; RACA—right anterior cerebral artery; RIICA—right internal carotid artery; RIVA—right intracranial vertebral artery; RMCA—right middle cerebral artery; RPCA—right posterior cerebral artery.
Figure 1Overview of stenosis identification models. Panel A presents a rule-based model, which is a handcrafted feature-based approach. Panel B presents a long short-term memory model, which is a recurrent neural network approach. Panel C presents XLNet, which is a pretrained language model approach.
Figure 2The process of model training, internal testing, and external testing. Three different models were trained on 80% of the Linkou Chang Gung Memorial Hospital (LCGMH) dataset, and were tested on 20% of the LCGMH dataset for internal testing. The Kaohsiung Veterans General Hospital dataset was used for external testing. * Rule-based model was built at first round and tested in 10 rounds.
Comparison of area under the receiver operating characteristic curve (AUROC) results between internal and external testing datasets for the stenosis-identification task from three different NLP models. Results are expressed as mean ± standard deviation.
| Internal Testing Dataset | External Testing Dataset | |||||
|---|---|---|---|---|---|---|
| Cerebral Artery (Prevalence in Internal/External Dataset) % |
|
|
|
|
|
|
| RIICA (7.7/3.8) | 0.93 ± 0.01 | 0.95 ± 0.01 | 0.98 ± 0.00 | 0.71 | 0.76 ± 0.10 | 0.91 ± 0.11 |
| RACA (4.3/0.6) | 0.95 ± 0.01 | 0.96 ± 0.01 | 0.98 ± 0.01 | 0.50 | 0.73 ± 0.16 | 0.93 ± 0.01 |
| RMCA (10.1/4.1) | 0.94 ± 0.01 | 0.95 ± 0.01 | 0.99 ± 0.00 | 0.58 | 0.77 ± 0.10 | 0.97 ± 0.02 |
| RPCA (5.1/1.9) | 0.94 ± 0.01 | 0.95 ± 0.02 | 0.97 ± 0.01 | 0.50 | 0.58 ± 0.18 | 0.90 ± 0.06 |
| RIVA (10.9/0.6) | 0.96 ± 0.01 | 0.97 ± 0.01 | 0.99 ± 0.00 | 0.75 | 0.55 ± 0.19 | 0.99 ± 0.03 |
| BA (5.8/2.9) | 0.92 ± 0.02 | 0.95 ± 0.01 | 0.98 ± 0.01 | 0.83 | 0.47 ± 0.08 | 0.84 ± 0.04 |
| LIICA (7.6/1.3) | 0.93 ± 0.01 | 0.96 ± 0.01 | 0.98 ± 0.01 | 0.75 | 0.78 ± 0.07 | 0.93 ± 0.08 |
| LACA (4.2/1.3) | 0.95 ± 0.02 | 0.95 ± 0.01 | 0.98 ± 0.01 | 0.75 | 0.70 ± 0.15 | 0.99 ± 0.01 |
| LMCA (10.5/3.2) | 0.94 ± 0.01 | 0.95 ± 0.01 | 0.98 ± 0.00 | 0.50 | 0.80 ± 0.10 | 0.98 ± 0.01 |
| LPCA (5.7/1.0) | 0.93 ± 0.01 | 0.95 ± 0.02 | 0.98 ± 0.01 | 0.50 | 0.61 ± 0.14 | 0.79 ± 0.12 |
| LIVA (9.8/0.6) | 0.95 ± 0.01 | 0.97 ± 0.01 | 0.98 ± 0.00 | 0.50 | 0.65 ± 0.08 | 0.92 ± 0.09 |
BA—basilar artery; LACA—left anterior cerebral artery; LIICA—left internal carotid artery; LIVA—left intracranial vertebral artery; LMCA—left middle cerebral artery; LPCA—left posterior cerebral artery; RACA—right anterior cerebral artery; RIICA—right internal carotid artery; RIVA—right intracranial vertebral artery; RMCA—right middle cerebral artery; RPCA—right posterior cerebral artery.
Figure 3Comparison of the receiver operating characteristic curves (ROC) in stenosis detection tasks obtained by the rule-based model, the long short-term memory (LSTM) model, and the XLNet model with the internal test dataset. Each plot represents stenosis detection performance on each artery. The XLNet model clearly demonstrates superior performance with consistent larger area under the curve results compared with those of the LSTM model.