| Literature DB >> 35958012 |
Zhu Zhu1,2, Jing Li1,2, Jian Huang1,2, Zheming Li1,2, Hongjian Zhang3, Siyu Chen3, Qianhui Zhong3, Yulan Xie3, Shasha Hu1,2, Yinshuo Wang2,4, Dejian Wang5, Gang Yu1,2,6.
Abstract
Background: Due to the phenotypic similarities among different pediatric respiratory diseases with chronic cough, primary doctors often misdiagnose and the misuse of examinations is prevalent. In the pre-diagnosis stage, the patients' chief complaints and other information in the electronic medical record (EMR) provide a powerful reference for respiratory experts to make preliminary disease judgment and examination plan. In this paper, we proposed an intelligent prediagnosis system to predict disease diagnosis and recommend examinations based on EMR text.Entities:
Keywords: Chronic cough; deep learning; disease prediction; examination recommendation; natural language processing
Year: 2022 PMID: 35958012 PMCID: PMC9360821 DOI: 10.21037/tp-22-275
Source DB: PubMed Journal: Transl Pediatr ISSN: 2224-4336
Figure 1Diagram of the workflow of the entire framework. A medical language model was trained with a medical literature corpus and word2vec algorithms. This model generated word embeddings for text data used in downstream tasks. TextCNN models based on EMR data were built with word embeddings as input for disease prediction and examination recommendation tasks. EMR, electronic medical record; TextCNN, text convolutional neural network.
Figure 2Diagram of the workflow of the TextCNN disease prediction model. TextCNN, text convolutional neural network. BN, batch normalization; Fn, function; reLU, rectfier linear unit.
The 12 target categories in the prediagnosis disease prediction task
| Class label | Class name | Records | Disease names | ICD-10 code |
|---|---|---|---|---|
| 0 | AURI | 57318 | Acute upper respiratory infection | J06.900 |
| Upper respiratory infection | J06.900x003 | |||
| 1 | Bronchitis | 38502 | Acute bronchitis | J20.900 |
| Bronchitis | J40.x00 | |||
| Asthmatic bronchitis | J45.901 | |||
| Chronic asthmatic bronchitis | J44.804 | |||
| 2 | Asthma | 15002 | Noncritical bronchial asthma | J45.903 |
| Asthma | J45.900 | |||
| Bronchial asthma | J45.900x001 | |||
| Cough variant asthma | J45.005 | |||
| 3 | Pharyngitis | 12374 | Acute pharyngitis | J02.900 |
| Acute nasopharyngitis | J00.x00 | |||
| Herpangina | B08.501 | |||
| Infective nasopharyngitis | J00.x00x007 | |||
| Pharyngitis | J02.900x004 | |||
| 4 | Pneumonia | 4465 | Bronchopneumonia | J18.000 |
| Acute bronchopneumonia | J18.000a | |||
| Pneumonia | J18.900 | |||
| Acute pneumonia | J18.900a | |||
| 5 | Rhinitis | 4285 | Anaphylactic rhinitis | J30.400 |
| Acute rhinitis | J00.x00 | |||
| Allergic Rhinitis with Asthma | J45.004 | |||
| Allergic rhinitis | J30.400 | |||
| Rhinitis | J31.000x001 | |||
| 6 | Tonsillitis | 2992 | Acute tonsillitis | J03.900 |
| Acute suppurative tonsillitis | J03.901 | |||
| 7 | Laryngitis | 2850 | Acute laryngitis | J04.000 |
| Acute laryngotracheitis | J04.200 | |||
| 8 | Nasosinusitis | 2490 | Nasosinusitis | J32.900, J32.900x001 |
| Acute nasosinusitis | J01.900 | |||
| Chronic nasosinusitis | J32.900 | |||
| 9 | FLU | 2430 | Influenza | J11.101 |
| 10 | FBAO | 2327 | Upper airway cough syndrome | R05.x00 |
| Acute tracheitis | J04.100 | |||
| 11 | Others | 1795 | – | – |
Disease abbreviation: AURI, acute upper respiratory infection; FLU, influenza; FBAO, foreign body airway obstruction.
The confusion matrix for evaluation of prediagnosis disease prediction tasks
| Category | True value =1 | True value =0 |
|---|---|---|
| Prediction value =1 | TP | FP |
| Prediction value =0 | FN | TN |
Value of the confusion matrix elements: TP, true positive; FP, false positive; FN, false negative; TN, true negative.
Figure 3Diagram of the performance comparison of our method (MSCNN) against 4 other algorithms (including LR, GDBT, HAN, and BERT) on the top-1 result in the disease prediction task. Figures (A-C) show the performance on precision, recall, and F1-score, respectively. The results showed that our method had the best performance on all 3 metrics. MSCNN, medical-semantic-aware convolution neural network; LR, logistic regression; GBDT, gradient-boosted decision tree; HAN, hierarchical attention networks; BERT, bidirectional encoder representations from transformers; AURI, acute upper respiratory infection; FLU, influenza; FBAO, foreign body airway obstruction.
The measurement of our method against 4 baseline methods and their data enhanced version for prediagnosis disease prediction
| Methods | AC | MA precision | WA precision | MA recall | WA recall | MA F1-score | WA F1-score |
|---|---|---|---|---|---|---|---|
| LR | 0.52 | 0.35 | 0.51 | 0.18 | 0.52 | 0.21 | 0.48 |
| LR + data enhancement | 0.47 | 0.34 | 0.48 | 0.31 | 0.47 | 0.31 | 0.47 |
| GBDT | 0.54 | 0.38 | 0.53 | 0.18 | 0.54 | 0.21 | 0.5 |
| GBDT + data enhancement | 0.54 | 0.44 | 0.54 | 0.32 | 0.53 | 0.34 | 0.53 |
| HAN | 0.62 | 0.51 | 0.61 | 0.42 | 0.62 | 0.45 | 0.61 |
| HAN + data enhancement | 0.63 | 0.54 | 0.62 | 0.42 | 0.63 | 0.46 | 0.62 |
| BERT | 0.64 | 0.54 | 0.61 | 0.44 | 0.65 | 0.48 | 0.65 |
| BERT + data enhancement | 0.64 | 0.55 | 0.62 | 0.44 | 0.67 | 0.47 | 0.65 |
| Ours (MSCNN) | 0.68 | 0.56 | 0.67 | 0.45 | 0.68 | 0.5 | 0.67 |
| Ours (MSCNN) + data enhancement | 0.68 | 0.59 | 0.67 | 0.49 | 0.68 | 0.51 | 0.67 |
Metrics included AC, MA precision, WA precision, MA recall, WA recall, MA F1-score, and WA F1-score. AC, accuracy; MA, macro average; WA, weighted average. Methods include: LR, logistic regression; GBDT, gradient-boosted decision tree; HAN, hierarchical attention networks; BERT, bidirectional encoder representations from transformers; MSCNN, medical-semantic-aware convolution neural network.
The top-3 AC results of the 4 methods and their data-enhanced versions
| Methods | AC |
|---|---|
| LR | 0.759 |
| LR + Data Enhancement | 0.822 |
| GBDT | 0.789 |
| GBDT + Data Enhancement | 0.821 |
| HAN | 0.907 |
| HAN + Data Enhancement | 0.911 |
| BERT | 0.901 |
| BERT + Data Enhancement | 0.905 |
| Ours (MSCNN) | 0.923 |
| Ours (MSCNN) +Data Enhancement | 0.926 |
Methods include: LR, logistic regression; GBDT, gradient-boosted decision tree; HAN, hierarchical attention networks; BERT, bidirectional encoder representations from transformers; MSCNN, medical-semantic-aware convolution neural network; AC, accuracy.
The measurements of the method that used the TextCNN model only and the method that used our MSCNN for the 12 disease categories
| Disease | Precision | Recall | F1-score | |||||
|---|---|---|---|---|---|---|---|---|
| TextCNN | MSCNN | TextCNN | MSCNN | TextCNN | MSCNN | |||
| AURI | 0.56 | 0.75 | 0.59 | 0.77 | 0.58 | 0.76 | ||
| Bronchitis | 0.49 | 0.68 | 0.68 | 0.82 | 0.55 | 0.74 | ||
| Asthma | 0.48 | 0.68 | 0.26 | 0.45 | 0.32 | 0.5 | ||
| Pharyngitis | 0.41 | 0.58 | 0.25 | 0.42 | 0.29 | 0.48 | ||
| Pneumonia | 0.46 | 0.65 | 0.23 | 0.45 | 0.26 | 0.53 | ||
| Rhinitis | 0.37 | 0.58 | 0.23 | 0.43 | 0.33 | 0.49 | ||
| Tonsillitis | 0.27 | 0.45 | 0.11 | 0.29 | 0.17 | 0.35 | ||
| Laryngitis | 0.50 | 0.68 | 0.38 | 0.57 | 0.38 | 0.6 | ||
| Nasosinusitis | 0.49 | 0.68 | 0.28 | 0.45 | 0.34 | 0.51 | ||
| FLU | 0.19 | 0.36 | 0.05 | 0.19 | 0.07 | 0.26 | ||
| FBAO | 0.53 | 0.72 | 0.16 | 0.33 | 0.27 | 0.4 | ||
| Others | 0.39 | 0.56 | 0.22 | 0.48 | 0.30 | 0.48 | ||
Metrics included precision, recall, and F1-score. TextCNN, text convolutional neural network; MSCNN, medical-semantic-aware convolution neural network; AURI, acute upper respiratory infection; FLU, influenza; FBAO, foreign body airway obstruction.
Measurements of the 10 examination categories in the MSCNN examination recommendation approach
| Examination | AC | MA precision | WA precision | MA recall | WA recall | MA F1-score | WA F1-score |
|---|---|---|---|---|---|---|---|
| Blood RT | 0.88 | 0.84 | 0.87 | 0.55 | 0.88 | 0.56 | 0.83 |
| AL | 0.89 | 0.79 | 0.88 | 0.64 | 0.89 | 0.67 | 0.88 |
| hs-CRP | 0.87 | 0.84 | 0.87 | 0.56 | 0.87 | 0.57 | 0.83 |
| CAP | 0.98 | 0.7 | 0.97 | 0.62 | 0.98 | 0.65 | 0.97 |
| IGG | 0.98 | 0.72 | 0.97 | 0.62 | 0.98 | 0.65 | 0.97 |
| Alexin | 0.98 | 0.72 | 0.97 | 0.62 | 0.98 | 0.65 | 0.97 |
| chest PA | 0.63 | 0.7 | 0.78 | 0.71 | 0.63 | 0.63 | 0.64 |
| Renal function | 0.98 | 0.87 | 0.98 | 0.8 | 0.98 | 0.83 | 0.98 |
| Stool | 0.99 | 0.91 | 0.99 | 0.59 | 0.99 | 0.65 | 0.99 |
| Respiratory virus detection | 0.89 | 0.54 | 0.96 | 0.65 | 0.89 | 0.55 | 0.92 |
Metrics included AC, MA precision, WA precision, MA recall, WA recall, MA F1-score, and WA F1-score. MSCNN, medical-semantic-aware convolution neural network; AC, accuracy; MA, macro average; WA, weighted average; blood RT, routine examination; AL, abnormal lymphocyte detection; hs-CRP, hypersensitive C-reactive protein; CAP, CAP allergen test; IGG, immunoglobulin; chest PA, chest posteroanterior.
The measurement metrics for the entire dataset in the examination recommendation task
| Status | Precision | Recall | F1-score |
|---|---|---|---|
| 0 | 0.98 | 0.94 | 0.96 |
| 1 | 0.74 | 0.89 | 0.81 |
| AC | 0.93 | ||
| MA | 0.86 | 0.91 | 0.88 |
| WA | 0.94 | 0.93 | 0.93 |
We conducted precision, recall, and F1-score measurements in positive (status =1, examination undertaken) and negative (status =0, examination not undertaken) cases. AC was calculated for the overall dataset. MA precision/recall/F1-score and WA precision/recall/F1-score were calculated through weighted and macro averages of the 10 examination categories. AC, accuracy; MA, macro average; WA, weighted average.
Figure 4Diagram of the ROC and AUC curves for the 10 examination categories in the examination recommendation task. The model showed superior classification ability in the renal function and stool examinations. ROC, receiver operating characteristic; AUC, area under the curve.