| Literature DB >> 27656096 |
Kavita Ganesan1, Shane Lloyd2, Vikren Sarkar2.
Abstract
The ability to find highly related clinical concepts is essential for many applications such as for hypothesis generation, query expansion for medical literature search, search results filtering, ICD-10 code filtering and many other applications. While manually constructed medical terminologies such as SNOMED CT can surface certain related concepts, these terminologies are inadequate as they depend on expertise of several subject matter experts making the terminology curation process open to geographic and language bias. In addition, these terminologies also provide no quantifiable evidence on how related the concepts are. In this work, we explore an unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes. Our evaluation shows that we are able to use a data driven approach to discovering highly related concepts for various search terms including medications, symptoms and diseases.Entities:
Keywords: clinical concepts; concept graph; data mining; knowledge discovery; related concepts
Year: 2016 PMID: 27656096 PMCID: PMC5015701 DOI: 10.4137/BECB.S36155
Source DB: PubMed Journal: Biomed Eng Comput Biol ISSN: 1179-5972
Figure 1Preprocessing steps applied to clinical notes.
Figure 2Example of Concept-Graph constructed using three sentences that have been preprocessed. Each node represents a unique word in the text. Each node stores the sentence identifier (SID) and the position of the word in the sentence (PID).
Figure 3An overlap example, where vomiting is the query and nausea is the candidate concept. The format used is SID:PID where SID is the sentence identifier and PID is the position of the word in the sentence.
Precision of top 50 results.
| SEARCH TERM | PMI | PROB |
|---|---|---|
| Chest pain | 0.980 | 0.918 |
| Syncope | 0.980 | 0.959 |
| Lithium | 0.939 | 0.959 |
| Advair | 1.000 | 1.000 |
| Myocardial infarction | 1.000 | 0.980 |
| Bloody stool | 0.980 | 0.980 |
| Beta-blocker | 0.980 | 0.939 |
| Fracture | 0.980 | 1.000 |
| Penicillin | 0.959 | 0.959 |
| Pregnancy | 1.000 | 0.898 |
| Average |
Utility scores at different rank cutoffs.
| UTILITY@5 | UTILITY@10 | UTILITY@20 | UTILITY@30 | UTILITY@50 | |
|---|---|---|---|---|---|
| PROB | 0.848 | 0.872 | 0.862 | 0.851 | 0.816 |
| PMI | 0.945 | 0.943 | 0.935 | 0.913 | 0.900 |
| Difference | +11.50% | +8.18% | +8.48% | +7.29% | +9.10% |
Notes: @ indicates the number of related concepts in consideration. The maximum possible utility score is 1 and the lowest possible utility score would be −1.
Top 15 related concepts for four search queries ranked based on PMI and PROB scores.
| PMI-BASED RANKING | SCORE | PROB-BASED RANKING | SCORE |
|---|---|---|---|
| Pleuritic | 14.1634 | Breath | −3.1421 |
| Substernal | 13.4485 | Shortness | −3.1595 |
| Ictus | 12.9084 | Denies | −3.8398 |
| Palpitations | 12.8769 | Year | −4.4803 |
| Radiating | 12.8751 | Cath | −4.5945 |
| Pleuric | 12.7979 | Sob | −4.6244 |
| Anomolous | 12.7943 | Pleuritic | −4.6458 |
| Aur | 12.6486 | Reason | −4.7249 |
| Epigatric | 12.6055 | Man | −4.8863 |
| Squeezing | 12.5825 | Back | −5.0036 |
| Tightness | 12.5236 | Major | −5.0598 |
| Shortness | 12.5001 | Surgical | −5.0639 |
| Experience | 12.3301 | Left | −5.1356 |
| Stuttering | 12.3132 | Sided | −5.1442 |
| Crushing | 12.0798 | Abdominal | −5.2019 |
| Presyncope | 16.5897 | Year | −3.0528 |
| Vasovagal | 14.4518 | Telemetry | −3.1863 |
| Orthopnea | 13.9959 | Reason | −3.4546 |
| Palpitations | 13.9798 | Man | −3.7325 |
| Telemetry | 12.6358 | Woman | −3.7713 |
| Nocturnal | 12.5801 | Pain | −4.0291 |
| Lightheadedness | 11.848 | Contrast | −4.0291 |
| Dyspnea | 11.2657 | Presyncope | −4.0932 |
| Near | 10.6774 | Palpitations | −4.1776 |
| Ankle | 10.5456 | Fall | −4.1776 |
| Suffered | 10.4894 | Dyspnea | −4.4866 |
| Tias | 10.3803 | Near | −4.5303 |
| Episode | 10.2679 | Episode | −4.5303 |
| Paroxysmal | 10.135 | Chest | −4.5754 |
| Fall | 9.9611 | Orthopnea | −4.7198 |
| Diskus | 18.804 | Bid | −2.0538 |
| Spiriva | 17.3586 | Albuterol | −2.3352 |
| Discus | 17.3257 | Diskus | −2.9652 |
| Singulair | 15.5246 | Puff | −3.0721 |
| Puff | 15.2113 | Daily | −3.1096 |
| Inh | 14.3914 | Prn | −3.2701 |
| Tiotropium | 13.7217 | Mcg | −3.4996 |
| Inhaler | 13.6754 | Spiriva | −3.6027 |
| Albuterol | 13.0639 | Dose | −3.6571 |
| Disk | 12.9482 | Nebs | −3.7137 |
| Puffs | 12.5906 | Combivent | −3.834 |
| Combivent | 12.4954 | Medications | −3.834 |
| Zocor | 12.3302 | Discharge | −3.834 |
| Flovent | 12.2347 | Inhaler | −4.1876 |
| Nebs | 11.9469 | Puffs | −4.1876 |
| Qday | 11.7117 | Disk | −4.2701 |
| Comminuted | 13.0576 | Left | −3.0121 |
| Dislocation | 12.7384 | Right | −3.0912 |
| Nondisplaced | 12.6857 | Evidence | −3.4714 |
| Malalignment | 12.6517 | Comminuted | −4.0818 |
| Displaced | 12.4208 | Dislocation | −4.1051 |
| Intertrochanteric | 12.3706 | Acute | −4.399 |
| Diaphyseal | 12.3344 | Distal | −4.505 |
| Subcapital | 12.2103 | Displaced | −4.5227 |
| Burst | 12.1787 | Compression | −4.7435 |
| Fibular | 12.1329 | Seen | −4.796 |
| Styloid | 12.0737 | Rib | −4.797 |
| Midshaft | 12.0667 | Spine | −4.8751 |
| Radius | 12.0599 | Identified | −4.9203 |
| Acetabular | 12.0419 | Report | −4.9692 |
| Subtrochanteric | 12.0087 | Final | −4.9785 |