| Literature DB >> 29679019 |
Zhongliang Yang1,2, Yongfeng Huang3,4, Yiran Jiang5, Yuxi Sun5, Yu-Jin Zhang1, Pengcheng Luo6.
Abstract
Automatically extracting useful information from electronic medical records along with conducting disease diagnoses is a promising task for both clinical decision support(CDS) and neural language processing(NLP). Most of the existing systems are based on artificially constructed knowledge bases, and then auxiliary diagnosis is done by rule matching. In this study, we present a clinical intelligent decision approach based on Convolutional Neural Networks(CNN), which can automatically extract high-level semantic information of electronic medical records and then perform automatic diagnosis without artificial construction of rules or knowledge bases. We use collected 18,590 copies of the real-world clinical electronic medical records to train and test the proposed model. Experimental results show that the proposed model can achieve 98.67% accuracy and 96.02% recall, which strongly supports that using convolutional neural network to automatically learn high-level semantic features of electronic medical records and then conduct assist diagnosis is feasible and effective.Entities:
Mesh:
Year: 2018 PMID: 29679019 PMCID: PMC5910396 DOI: 10.1038/s41598-018-24389-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The overall framework of the proposed model. We use the convolutional neural network to extract the semantic feature vectors of unstructured electronic medical records and map them to the feature space, finally we use the classifier to calculate the probable probability of each disease and select the highest probability of the disease as the auxiliary diagnosis of our model.
Figure 2The number and proportion of each disease in C-EMRs.
The number of electronic medical records of each disease in the training set and test set, and the percentage of test data relative to training data of each disease.
| Hypertension | Diabetes | COPD | Arrhythmia | Asthma | Gastritis | Total | |
|---|---|---|---|---|---|---|---|
| Trianing set | 1250 | 1350 | 1250 | 1200 | 1000 | 950 | 7000 |
| Test set | 68 | 68 | 68 | 62 | 69 | 65 | 400 |
Figure 3The processing of training. (a) Shows the accuracy of train/test set and the loss of the training set varies with the number of epochs. (b) Shows the prediction time of each EMR in the test set.
Results of different methods, where “CNN” indicates the performance of the proposed model.
| Method | Training set | Testing set | ||||||
|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1-score | Accuracy | Precision | Recall | F1-score | Accuracy | |
| SVM | 0.96 | 0.95 | 0.96 | 0.9549 | 0.93 | 0.93 | 0.93 | 0.9315 |
| MultinomialNB | 0.93 | 0.92 | 0.92 | 0.9236 | 0.87 | 0.86 | 0.86 | 0.8600 |
| LogisticRegression | 0.93 | 0.93 | 0.93 | 0.9293 | 0.92 | 0.92 | 0.92 | 0.9175 |
| KNeighborsClassifier | 0.89 | 0.89 | 0.89 | 0.8911 | 0.90 | 0.89 | 0.89 | 0.8925 |
| CNN | ||||||||
The average prediction time of different methods for each EMR in test set, where “CNN” indicates the performance of the proposed model.
| Method | SVM | MultinomialNB | LogisticRegression | KNeighborsClassifier | CNN |
|---|---|---|---|---|---|
| Time(ms) | 180.5 ± 2.92 | 172.5 ± 2.55 | 167.5 ± 1.58 | 205.0 ± 1.0 |
Figure 4The change of feature space with the training process. In this feature space, each point represents an electronic medical record and different colors indicate different diseases. At the beginning of training (Epoch = 0), since the model parameters are randomly initialized, all the electronic medical records in the feature space are randomly distributed and indivisible. After 5 epoch, electronic medical records of different diseases began to have a trend of separation. After 10 epoch, the electronic medical records of all kinds of diseases have been separated, except for some areas and the edge of each category. When the training reaches 100 epoch, we can clearly see that the samples of each disease have been completely separated, and the electronic medical records of the same disease are also gathered together.