| Literature DB >> 35488252 |
Ashis Kumar Chanda1, Tian Bai1, Ziyu Yang1, Slobodan Vucetic2.
Abstract
BACKGROUND: Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications. Learning vector representations, or embeddings, of terms in the notes, is an important pre-processing step in such applications. However, learning good embeddings is challenging because medical notes are rich in specialized terminology, and the number of available EHRs in practical applications is often very small.Entities:
Keywords: EHR; Electronic health records; Embeddings; Medical terms; Natural language processing; UMLS
Mesh:
Year: 2022 PMID: 35488252 PMCID: PMC9052653 DOI: 10.1186/s12911-022-01850-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1The framework for the skip-gram algorithm
Fig. 2Architecture of the proposed definition2vec algorithm
Fig. 3Illustrating a process for extracting definitions of medical terms
Statistics of discharge summaries in the MIMIC-III training data
| # training notes | 47,423 |
| # of unique medical terms in training data | 46,861 |
| Average # of medical terms in a discharge summary | 671 |
| # of unique medical concepts in training data | 29,740 |
| Average # of medical concepts per discharge summary | 364 |
| Average # of definition words per medical concept | 16 |
| # of unique diagnosis codes in training data | 6717 |
| Average # of diagnosis codes per discharge summary | 11 |
Accuracy of ICD-9-CM diagnosis code prediction using large training data set (predicting top 2690 ICD-9-CM diagnosis codes having frequency at least 10 times in training data)
| Model | AUC | F1 | |||
|---|---|---|---|---|---|
| MIC | MAC | MIC | MAC | R@8 | |
| BERT | 0.9580 | 0.8769 | 0.4516 | 0.0932 | 0.3922 |
| GloVe | 0.9703 | 0.8888 | 0.4727 | 0.1126 | 0.3938 |
| skip-gram | 0.9790 | 0.9316 | 0.4995 | 0.1333 | 0.4147 |
| fastText | 0.9340 | 0.4950 | 0.1372 | 0.4168 | |
| definition2vec | |||||
Bold font emphasizes the best method for each accuracy category
Accuracy of ICD-9-CM diagnosis code prediction using small training data sets (UT: number of unique medical terms, DC: number of ICD-9-CM diagnosis codes, PDC: number of predicted ICD-9-CM diagnosis codes occurring at least 10 times in training data)
| Model | 1000 data set | 5000 data set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| AUC | F1 | AUC | F1 | |||||||
| MIC | MAC | MIC | MAC | R@8 | MIC | MAC | MIC | MAC | R@8 | |
| GloVe | 0.8240 | 0.6919 | 0.1546 | 0.0266 | 0.3560 | 0.9122 | 0.8386 | 0.2829 | 0.0805 | 0.3997 |
| BERT | 0.8368 | 0.7212 | 0.1675 | 0.0341 | 0.3588 | 0.9198 | 0.8389 | 0.3016 | 0.1013 | 0.4063 |
| skip-gram | 0.8409 | 0.7426 | 0.1440 | 0.0320 | 0.3797 | 0.9439 | 0.9002 | 0.4274 | 0.2056 | 0.4621 |
| fastText | 0.8414 | 0.7720 | 0.1968 | 0.0711 | 0.4001 | 0.9468 | 0.9053 | 0.4291 | 0.2081 | 0.4663 |
| definition2vec | ||||||||||
Bold font emphasizes the best method for each accuracy category
Pearson correlation coefficient for semantic pair similarity
| Data set | GloVe | skip-gram | fastText | definition2vec |
|---|---|---|---|---|
| Pedersen | 0.2963 | 0.4297 | 0.6256 | |
| Pakhomov | 0.1712 | 0.5310 | 0.5732 | |
| UMNSRS | 0.2182 | 0.6058 | 0.6188 |
Bold font emphasizes the best method for each accuracy category
Cluster NMI value for different models
| Model | NMI value |
|---|---|
| GloVe | 0.1339 |
| Skip-gram | 0.2130 |
| fastText | 0.2834 |
| definition2vec |
Bold font emphasizes the best method for each accuracy category
Showing top 10 nearest neighbor terms for “heart attack” in definition2vec and skip-gram
| Large data set | Small data set | ||
|---|---|---|---|
| definition2vec | skip-gram | definition2vec | skip-gram |
| blockage | blocked artery | myocardial infarctions | pain |
| heart muscle | blockage | acute mi | cough blood |
| heart attacks | heart blockage | infarction | scheduling |
| heart blockage | heart muscle | hemorrhagic stroke | aortic aneurysms |
| blocked heart | blocked heart | myocarditis | abuse substance |
| heart block diagnosis | heart muscles | hypertensive crisis | providers |
| block heart | blood clots lung | myocardial | skip |
| slow heart rate | heart function | restrictive cardiomyopathy | caregiver |
| heart function | slow heart rate | ischemic change | substance abuse problem |
| myocardia | myocardial infarction | ischemia | cell phone |
Showing top 10 nearest neighbor terms for “bipolar disorder” in definition2vec and skip-gram
| Large data set | Small data set | ||
|---|---|---|---|
| definition2vec | skip-gram | definition2vec | skip-gram |
| schizophrenia | schizophrenia | depression | armour |
| schizoaffective disorder | schizoaffective disorder | psychosis | parkinson disease |
| major depression | depression | asthma | sildenafil |
| paranoid schizophrenia | major depression | hyperlipidemia | addison disease |
| bpad | bpad | neuropathy | ckd |
| psychotic disorder | multiple personality disorder | diabetic neuropathy | amenorrhea |
| bipolar affective disorder | seizure disorder | dyslipidemia | renal carcinoma |
| mood disorder | mood disorder | hypertension | obesity hypoventilation syndrome |
| bipolar illness | pervasive developmental disorder | malignant hypertension | oa |
| bipolar mood disorder | paranoid schizophrenia | anxiety | esophageal dilatation |
Showing top 10 nearest neighbor terms for two OOV terms, “nicotine replacement therapy” and “gastric pains” in definition2vec
| nicotine replacement therapy | gastric pains |
|---|---|
| nicotine replacement | stomach ache |
| smoking cessation therapy | stomach pain |
| nicotine patches | feeling bloated |
| nicotine transdermal patch | pain esophagus |
| ceassation smoking | gastrointestinal pain |
| nicotine dependence | esophageal pains |
| nicotine addiction | abdominal pains |
| quiting smoking | low ache |
| nicotine lozenges | low pains |
| dependence nicotine | gi pain |