| Literature DB >> 35629129 |
Jakir Hossain Bhuiyan Masud1, Chiang Shun1,2, Chen-Cheng Kuo1, Md Mohaimenul Islam3,4,5, Chih-Yang Yeh1, Hsuan-Chia Yang1,3,6, Ming-Chin Lin1,7,8.
Abstract
Currently, the International Classification of Diseases (ICD) codes are being used to improve clinical, financial, and administrative performance. Inaccurate ICD coding can lower the quality of care, and delay or prevent reimbursement. However, selecting the appropriate ICD code from a patient's clinical history is time-consuming and requires expert knowledge. The rapid spread of electronic medical records (EMRs) has generated a large amount of clinical data and provides an opportunity to predict ICD codes using deep learning models. The main objective of this study was to use a deep learning-based natural language processing (NLP) model to accurately predict ICD-10 codes, which could help providers to make better clinical decisions and improve their level of service. We retrospectively collected clinical notes from five outpatient departments (OPD) from one university teaching hospital between January 2016 and December 2016. We applied NLP techniques, including global vectors, word to vectors, and embedding techniques to process the data. The dataset was split into two independent training and testing datasets consisting of 90% and 10% of the entire dataset, respectively. A convolutional neural network (CNN) model was developed, and the performance was measured using the precision, recall, and F-score. A total of 21,953 medical records were collected from 5016 patients. The performance of the CNN model for the five different departments was clinically satisfactory (Precision: 0.50~0.69 and recall: 0.78~0.91). However, the CNN model achieved the best performance for the cardiology department, with a precision of 69%, a recall of 89% and an F-score of 78%. The CNN model for predicting ICD-10 codes provides an opportunity to improve the quality of care. Implementing this model in real-world clinical settings could reduce the manual coding workload, enhance the efficiency of clinical coding, and support physicians in making better clinical decisions.Entities:
Keywords: clinical note; convolutional neural network; diagnosis codes; medication lists; natural language processing
Year: 2022 PMID: 35629129 PMCID: PMC9146030 DOI: 10.3390/jpm12050707
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Word2vector process.
Figure 2Architecture of CBOW and skip-gram.
Figure 3A hot vector.
Figure 4Architecture of neural network.
Figure 5Architecture of the deep-ADCA model.
Figure 6The overall process used in our study.
Patient characteristic.
| Characteristics | Number (%) | |
|---|---|---|
| Total Number of Patient | ||
| Male | 2212 | |
| Female | 2804 | |
| Age in year, mean (SD), year | 60.76 (18.38) | |
| Total number of clinical notes | All departments | 21,953 |
| Cardiology | 3668 | |
| Neurology | 2762 | |
| Nephrology | 5789 | |
| Metabolism | 3707 | |
| Psychiatry | 6027 | |
Performance of CNN model for different departments.
| Department | Test Cases | No. of ICD-10 Codes | No. of Drugs | Precision | Recall | F-Measure |
|---|---|---|---|---|---|---|
| Cardiology | 284 | 148 | 145 | 0.69 | 0.89 | 0.78 |
| Metabolism | 307 | 155 | 136 | 0.64 | 0.91 | 0.75 |
| Psychiatry | 475 | 193 | 128 | 0.50 | 0.87 | 0.64 |
| Nephrology | 432 | 277 | 221 | 0.48 | 0.84 | 0.62 |
| Neurology | 282 | 358 | 177 | 0.50 | 0.78 | 0.61 |
Figure 7Probabilities of ICD-10 codes predicted from given inputs.
Figure 8Prediction of missing diagnosis based on input drug.
The performance comparison between previous studies.
| Study | Approach | Dataset | Input | Target | Performance |
|---|---|---|---|---|---|
| Xie et al. [ | Deep learning | MIMIC-III | Diagnosis description | 2833 ICD-9 codes | Sensitivity: 0.29 |
| Huang et al. [ | Deep learning | MIMIC-III | Discharge summary | 10 ICD-9 codes and 10 blocks | F1 score: Full code-0.69, ICD-9 block-0.72 |
| Zeng et al. [ | Deep learning | MIMIC-III | Discharge summary | 6984 ICD-9 codes | F1 score-0.42 |
| Samonte et al. [ | Deep learning | MIMIC-III | Discharge summary | 10 ICD-9 codes | Recall: 0.62, F1-score: 0.67 |
| Hsu et al. [ | Deep learning | MIMIC-III | Discharge summary | Chapters (19), 50 and 100 ICD-9 codes | Micro F1 score: 0.76 |
| Gangavarapu et al. [ | Deep learning | MIMIC-III | Nursing notes | 19 Chapters | Accuracy- 0.83 |
| Singaravelan et al. [ | Deep learning | Medical Center | Subjective component | 1871 ICD-19 codes | Recall score: Chapter-0.57, block—0.49, Three-digit code-0.43, Full code—0.45 |
| Our study | Depp learning | Medical Center | Clinical notes | 1131 ICD-10 codes | Precision: 0.50~0.69 |