| Literature DB >> 32714371 |
Li Wang1,2, Qinghua Wang1, Heming Bai2, Cong Liu3, Wei Liu2, Yuanpeng Zhang1,2, Lei Jiang4, Huji Xu4,5,6, Kai Wang7,8, Yunyun Zhou7.
Abstract
Efficiently learning representations of clinical concepts (i. e., symptoms, lab test, etc.) from unstructured clinical notes of electronic health record (EHR) data remain significant challenges, since each patient may have multiple visits at different times and each visit may contain different sequential concepts. Therefore, learning distributed representations from temporal patterns of clinical notes is an essential step for downstream applications on EHR data. However, existing methods for EHR representation learning can not adequately capture either contextual information per-visit or temporal information at multiple visits. In this study, we developed a new vector embedding method called EHR2Vec that can learn semantically-meaningful representations of clinical concepts. EHR2Vec incorporated the self-attention structure and showed its utility in accurately identifying relevant clinical concept entities considering time sequence information from multiple visits. Using EHR data from systemic lupus erythematosus (SLE) patients as a case study, we showed EHR2Vec outperforms in identifying interpretable representations compared to other well-known methods including Word2Vec and Med2Vec, according to clinical experts' evaluations.Entities:
Keywords: electronic health record; natural language processing; representation learning; unstructured clinical notes; word vector
Year: 2020 PMID: 32714371 PMCID: PMC7344186 DOI: 10.3389/fgene.2020.00630
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Overview of project design. Our project has four steps. Firstly, we extract medical concepts (i.e., symptoms, lab test, medication, and diagnosis) from free clinical notes of SLE patients' EHR data. Next, we align these medical concepts into structured data format for each note per visit. Then, we sort notes in time series order for each patient. Finally, we comparatively study the performance of three embedding methods: Word2Vec, Med2Vec, and EHR2Vec.
Figure 2Deep learning architecture of EHR2Vec. EHR2Vec is developed under deep learning framework including two layers of optimizations. The first layer is based on self-attention structure with multi heads to capture the relationship of different medical concepts within each visit event. The second layer is based on co-occurrence of visits to capture the relationships among visits of patients.
Top 20 medication entities with the highest correlation to LN in the vector results obtained using four models.
| 1 | 0.68 | Albumen | 0.31 | Tabellae rhei ET natrii bicarbonatis | 0.93 | Hydroxychloroquine sulfate |
| 2 | 0.67 | Lamivudine | 0.31 | Ranitidine hydrochloride | 0.92 | Prednisone acetate |
| 3 | 0.66 | Felodipine | 0.27 | Iron sucrose | 0.90 | Methylprednisolone |
| 4 | 0.66 | Cefotaxime | 0.25 | Terazosin hydrochloride | 0.89 | Cyclophosphamide |
| 5 | 0.65 | Dexamethasone | 0.24 | Arotinolol hydrochloride | 0.86 | Calcium carbonate and vitamin D3 |
| 6 | 0.65 | Metoclopramide | 0.24 | Enalapril maleate | 0.82 | Omperazole |
| 7 | 0.65 | Dengzhanxixin | 0.24 | Diammonium glycyrrhizinate | 0.79 | Calcitriol |
| 8 | 0.65 | Colquhounia root | 0.24 | Clopidogrel hydrogen sulfate | 0.75 | Alfacalcidol |
| 9 | 0.64 | Fasudil hydrochloride | 0.23 | Rabeprazole | 0.72 | Leflunomide |
| 10 | 0.63 | Salvianolate | 0.23 | Haloperidol | 0.71 | Total glucosides of paeony |
| 11 | 0.63 | Thiamazole | 0.23 | Prednisone | 0.70 | Aspirin |
| 12 | 0.62 | Cefoperazone Sodium and Tazobactam Sodium | 0.23 | Levothyroxine sodium | 0.65 | Prednisolone acetate |
| 13 | 0.62 | Leigongteng | 0.23 | Lithium carbonate | 0.63 | Folic acid |
| 14 | 0.62 | Thyroid | 0.23 | Urokinase | 0.62 | Levothyroxine sodium |
| 15 | 0.62 | Prednisone | 0.22 | Penicillins | 0.61 | Warfarin Sodium |
| 16 | 0.62 | Fluvoxamine maleate | 0.22 | Carvedilol | 0.60 | Mycophenolate mofetil |
| 17 | 0.62 | Sodium valproate | 0.22 | Mecobalamin | 0.60 | Pantoprazole |
| 18 | 0.61 | Salvianolate | 0.21 | Furosemide and spironolactone | 0.60 | Valsartan |
| 19 | 0.61 | Tacrolimus | 0.21 | Deslanoside | 0.58 | Spironolactone |
| 20 | 0.61 | Sanqi Panax Notoginseng | 0.21 | Cefradine | 0.57 | Low Molecular Weight Heparin Calcium |
Figure 3Performance comparison by Intrusion analysis. We perform intrusion analysis to evaluate model performance by comparing clinicians' opinion with our identified medical concepts from four groups. The EHR2Vec shows higher accuracy than the other two models, Word2vec and Med2vec.
Top 10 medical entities in terms of vector value rank in three different dimensions.
| [Lab Test] Complement 3-B | [Medication] Omperazole | [Symptom] Widespread facial red rash |
| [Diagnosis] Pregnancy | [Lab Test] Urine protein qualitative test-U | [Symptom]Migratory double joint and shoulder pain |
| [Lab Test] Complement 4-B | [Medication] Calcium carbonate and vitamin D3 | [Symptom] Systemic diffusive and red rash |
| [Drug] Calcium carbonate and vitamin D3 | [Medication]Aspirin | [Symptom]Slightly swollen left-hand fingers |
| [Medication] Methylprednisolone | [Symptom] Cough | [Symptom]Scattered bleeding points on hands |
| [Medication] Methylprednisolone sodium succinate | [Medication] | [Symptom] Facial rash relief |
| [Lab Test]Anti-nuclear antibody (ANA)-B | [Medication]Prednisone acetate | [Symptom]Capillary and facial capillary expansion |
| [Medication] Prednisone acetate | [Lab Test]Anti-nuclear antibody (ANA)-B | [Symptom] Muscle and body tenderness |
| [Diagnosis] LN | [Diagnosis] LN | [Symptom] Scattered red rash |
| [Medication] Hydroxychloroquine | [Medication] | [Symptom] Left chest pain |
Figure 4SLE affects more patients' body organs and systems over time. SLE is a chronic disease with many comorbid conditions. This figure shows that more SLE patients are affected by comorbidities as the time accumulated. For example, in the initial year (Year 0), 829 patients manifested kidney diseases, and in Year 9, the number increased to 1,232.