| Literature DB >> 35168624 |
Mengying Wang1, Zhenhao Wei2, Mo Jia1, Lianzhong Chen2, Hong Ji3.
Abstract
PURPOSE: Predictively diagnosing infectious diseases helps in providing better treatment and enhances the prevention and control of such diseases. This study uses actual data from a hospital. A multiple infectious disease diagnostic model (MIDDM) is designed for conducting multi-classification of infectious diseases so as to assist in clinical infectious-disease decision-making.Entities:
Keywords: Deep learning; Early diagnosis; Infectious diseases; Multi-classification
Mesh:
Year: 2022 PMID: 35168624 PMCID: PMC8848865 DOI: 10.1186/s12911-022-01776-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Flowchart of enrolment
Key information extracted from medical records
| Target information class | Specific extraction |
|---|---|
| Patient information | Age, gender, visiting time |
| Physical examination | Temperature, blood pressure, pulse, respiratory rate |
| Symptom | Diagnosis, symptom |
| Medical history | Main complaint, history of present illness, anamnesis, medication |
| Medical laboratory examination | Name of item, results |
| Examination reports | Name of examination item, results, value range |
Fig. 2Sequence labeling of current case history based on BiLSTM-CRF model (English and Chinese)
Data transformed into features after NLP word segmentation
| Case number | Main complaint_ Femoral neck fracture pain | Past history_ Femoral neck fracture | Main complaint_ Symptoms_ Chest pain | Main complaint_ Symptoms_ fever | Temperature |
|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 0.91 |
| 2 | 0 | 1 | 0 | 1 | 0.89 |
Number of samples of each infectious disease category
| Infectious disease category | Viral hepatitis | Influenza | Hand foot and mouth disease | Tuberculosis | Syphilis | Infectious diarrhea | Measles |
|---|---|---|---|---|---|---|---|
| Number of samples | 3663 | 5007 | 3616 | 5834 | 1500 | 730 | 270 |
Fig. 3Multiple infectious disease diagnostic model structure
Fig. 4Auto-encoder model structure and hidden layer output Z vector
Confusion matrix predicted by the multi-classification model
| Real label | Prediction results | ||
|---|---|---|---|
| Class 1 | Class 2 | Class 3 | |
| Class 1 | |||
| Class 2 | |||
| Class 3 | |||
Final model prediction results after auto-encoder pre-training with various numbers of neurons
| Number of neurons | 256 | 512 | 1024 | 2048 | 4096 |
|---|---|---|---|---|---|
| Test set accuracy | 82.71% | 86.03% | 89.52% | 89.74% | 89.67% |
Training and test results for MIDDM
| Infectious disease | Number of training samples | Training accuracy (%) | Number of test samples | Testing recall (%) | Testing precision (%) | F1-score |
|---|---|---|---|---|---|---|
| Viral hepatitis | 2954 | 99.86 | 709 | 99.44 | 87.04 | 0.8704 |
| Influenza | 3924 | 98.47 | 1083 | 95.38 | 91.42 | 0.9142 |
| Hand foot and mouth disease | 3015 | 97.31 | 601 | 95.17 | 88.82 | 0.8882 |
| Tuberculosis | 4630 | 95.01 | 1204 | 86.88 | 94.66 | 0.9466 |
| Syphilis | 1208 | 83.03 | 292 | 72.60 | 89.45 | 0.8945 |
| Infectious diarrhea | 575 | 87.30 | 155 | 60.65 | 72.31 | 0.7231 |
| Measles | 190 | 42.11 | 80 | 37.50 | 44.12 | 0.4412 |
Fig. 5Relationship between the diagnostic recall of infectious diseases and the number of samples
Recognition accuracy and recall rate of five types of entities (%)
| Model | Disease diagnosis | Symptom | Medicine | Laboratory test | Imaging examination | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Recall | Accuracy | Recall | Accuracy | Recall | Accuracy | Recall | Accuracy | Recall | |
| HMM | 71.4 | 78.0 | 77.9 | 84.5 | 69.8 | 72.6 | 86.3 | 88.7 | 80.6 | 88.2 |
| CRF++ | 69.7 | 79.2 | 78.1 | 80.5 | 77.2 | 84.6 | 89.6 | 90.8 | 80.2 | 78.8 |
| LSTM-CRF | 85.3 | 87.5 | 81.8 | 82.5 | 90.2 | 91.5 | 89.6 | 88.5 | ||
| BiLSTM-CRF | 90.6 | |||||||||
CRF++ is an open source implementation tool for CRF. It is essentially a CRF algorithm. It is the CRF tool with the best comprehensive performance at present
F1-socre of five types of entity recognition (%)
| Model | Disease diagnosis | Symptom | Medicine | Laboratory test | Imaging examination | Average |
|---|---|---|---|---|---|---|
| F1-score | F1-score | F1-score | F1-score | F1-score | F1-score | |
| HMM | 74.6 | 81.1 | 71.2 | 87.5 | 84.2 | 79.7 |
| CRF + + | 74.1 | 79.3 | 80.7 | 90.2 | 79.5 | 80.5 |
| LSTM-CRF | 86.4 | 84.7 | 86.6 | 90.8 | 89.0 | 87.5 |
| BiLSTM-CRF |
Fig. 6Work flow
Comparison of the accuracy of infectious disease diagnosis between MIDDM and other models
| Infectious disease | MIDDM (%) | XGBoost (%) | Decision tree (%) | Bayesian (%) | Logistic regression (%) |
|---|---|---|---|---|---|
| Viral hepatitis | 99.44 | 96.19 | 90.13 | 85.19 | 91.26 |
| Influenza | 95.38 | 91.51 | 89.47 | 82.27 | 90.49 |
| Hand foot and mouth disease | 95.17 | 90.03 | 88.29 | 84.44 | 85.49 |
| Tuberculosis | 86.88 | 83.08 | 80.21 | 76.29 | 82.31 |
| Syphilis | 72.60 | 70.75 | 70.28 | 65.09 | 68.87 |
| Infectious diarrhea | 60.65 | 56.38 | 56.38 | 54.26 | 56.38 |
| Measles | 37.50 | 36.25 | 32.50 | 33.75 | 35.00 |