| Literature DB >> 28729710 |
Maya Rotmensch1, Yoni Halpern2, Abdulhakim Tlimat3, Steven Horng3,4, David Sontag5,6.
Abstract
Demand for clinical decision support systems in medicine and self-diagnostic symptom checkers has substantially increased in recent years. Existing platforms rely on knowledge bases manually compiled through a labor-intensive process or automatically derived using simple pairwise statistics. This study explored an automated process to learn high quality knowledge bases linking diseases and symptoms directly from electronic medical records. Medical concepts were extracted from 273,174 de-identified patient records and maximum likelihood estimation of three probabilistic models was used to automatically construct knowledge graphs: logistic regression, naive Bayes classifier and a Bayesian network using noisy OR gates. A graph of disease-symptom relationships was elicited from the learned parameters and the constructed knowledge graphs were evaluated and validated, with permission, against Google's manually-constructed knowledge graph and against expert physician opinions. Our study shows that direct and automated construction of high quality health knowledge graphs from medical records using rudimentary concept extraction is feasible. The noisy OR model produces a high quality knowledge graph reaching precision of 0.85 for a recall of 0.6 in the clinical evaluation. Noisy OR significantly outperforms all tested models across evaluation frameworks (p < 0.01).Entities:
Mesh:
Year: 2017 PMID: 28729710 PMCID: PMC5519723 DOI: 10.1038/s41598-017-05778-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Concept extraction pipeline. Non-negated concepts and ICD-9 diagnosis codes are extracted from Emergency Department electronic medical records. Concepts, codes and concept aliases are mapped to unique IDs, which in turn populate a co-occurrence matrix of size (Concepts) × (Patients).
Figure 2Workflow of modeling the relationship between diseases and symptoms and knowledge graph construction, for each of our 3 models (naive Bayes, logistic regression and noisy OR).
Figure 3Distribution of number of diseases and symptoms per patient record.
Top edge suggestions by models for a randomly chosen disease (Middle Ear Infection). The number of shown edges corresponds to the number of edges in the GHKG. For logistic regression, naive Bayes, and noisy OR the suggestions are ordered by their relative importance score. For the GHKG, the edges are sorted according to two broad buckets of edge frequency that are provided in the graph. The stars associated with each edge represent the expected frequency for which “disease A causes symptom B” as rated by physicians. [‘***’ = ‘always happens’, ‘**’ = ‘sometimes happens’, ‘*’ = ‘rarely happens’, ‘’ = ‘never happens’].
| Top edge suggestions for ‘Middle Ear Infection’ | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Ranking (importance score) | Logistic regression model | Naive Bayes model | Noisy OR model | Frequency (GHKG buckets) | GHKG | ||||
|
| Ear pain | *** | Inflammation of ear | *** | Ear pain | *** |
| Inflammation of ear | *** |
|
| Teeth chattering | Ear pain | *** | Inflammation of ear | *** |
| Ringing in the ears | ** | |
|
| Red face | * | Exudate | *** | Sore throat | ** |
| Headache | ** |
|
| Inflammation of ear | *** | Ache | *** | Coughing | * |
| Nausea | * |
|
| Itchy eyes | ** | Nasal congestion | * | Fever | ** |
| Crying | ** |
|
| Irritability | ** | Sore throat | ** | Nasal congestion | * |
| Fever | ** |
|
| Anger | * | Runny nose | * | Pain | *** |
| Nasal congestion | * |
|
| Red rashes | Coughing | * | Ache | *** |
| Ear pain | *** | |
|
| Sleepiness | ** | Sensitivity to light | * | Chills | ** |
| Loss of appetite | ** |
|
| Facial paralysis | Fever | ** | Headache | ** |
| Vertigo | * | |
Top edge suggestions by models for ‘Gallstones’. The number of shown edges corresponds to the number of edges in the GHKG. For logistic regression, naive Bayes and noisy OR the edges are ranked by their relative importance score. For the GHKG, the edges are sorted according to two broad buckets symptom frequency that are provided in the graph [‘frequent’ and ‘always’]. The internal ordering of the edges within a given bucket is random. The stars associated with each edge represent the expected frequency for which “disease A causes symptom B” as rated by physicians. [‘***’ = ‘always happens’, ‘**’ = ‘sometimes happens’, ‘*’ = ‘rarely happens’, ‘’ = ‘never happens’].
| Top edge suggestions for ‘Gallstones’ | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Ranking (importance score) | Logistic regression Model | Naive Bayes Model | Noisy OR Model | Frequency (GHKG buckets) | GHKG | ||||
|
| Abdominal cramping from Gallstones | *** | Abdominal cramping from Gallstones | *** | Pain | *** |
| Back pain | ** |
|
| Pain in upper-right abdomen | *** | Pain in upper-right abdomen | *** | Nausea | *** |
| Pain between shoulder blades | |
|
| Yellow skin and eyes | ** | Upper abdominal pain | *** | Abdominal pain | *** |
| Severe pain | *** |
|
| Pain | *** | Dark urine | * | Pain in upper abdomen | *** |
| Mild pain | ** |
|
| Pain in upper abdomen | *** | Yellow skin and eyes | ** | Vomiting | *** |
| Night pain | |
|
| Dark urine | * | Pain in upper abdomen | *** | Chills | * |
| Abdominal discomfort | *** |
|
| Upper abdominal pain | *** | Intermittent abdominal pain | *** | Tenderness | *** |
| Nausea | *** |
|
| Dry skin | Belching | Abdominal cramping from Gallstones | *** |
| Side pain | * | ||
|
| Sleepiness | * | Discomfort in upper abdomen | *** | Yellow skin and eyes | ** |
| Pain in upper-right abdomen | *** |
|
| Abdominal pain | *** | Abdominal pain | *** | Pain in upper-right abdomen | *** |
| Flatulence | |
|
| Restless legs syndrome | Intermittent pain | *** | Diarrhea | * |
| Indigestion | * | |
|
| Side pain | * | Swollen veins in the lower esophagus | Fever | ** |
| Vomiting | *** | |
|
| Regurgitation | Fluid in the abdomen | Flank pain | * |
| Abdominal cramping from Gallstones | *** | ||
Figure 4Precision-recall curves for two evaluation frameworks. (a) Precision-recall curve for the automatic evaluation evaluated against the GHKG. (b) Precision-recall curve rated according to physicians’ expert opinion. The red stars indicate thresholds corresponding to the two tags associated with symptoms in the GHKG (‘always’ or ‘frequent’). In both graphs, the relative performance of the models is the same.
Subset of the knowledge graph learned using the noisy OR model. For each disease we show the full list of edges along with their corresponding importance score in parentheses. Symptoms are ordered according to importance scores.
| Examples of Edge Suggestions for Noisy OR | |
|---|---|
|
| Suggested edges |
| Aphasia | problems with coordination (0.318), weakness (0.181), confusion (0.106), mental confusion (0.088), slurred speech (0.074), numbness (0.071), headache (0.049), seizures (0.045), weakness of one side of the body (0.042), difficulty speaking (0.034), blurred vision (0.018), malnutrition (0.017) |
| Appendicitis | pain (0.881), nausea (0.401), abdominal pain (0.361), tenderness (0.163), chills (0.152), diarrhea (0.124), vomiting (0.118), fever (0.096), loss of appetite (0.068), lower abdominal pain (0.040), cramping (0.037), constipation (0.036), discomfort (0.033), cyst (0.030), pain in right lower abdomen (0.029), sharp pain (0.023), pain during urination (0.022), pain in upper abdomen (0.020), pelvic pain (0.017), flank pain (0.016), vaginal discharge (0.013), abdominal discomfort (0.013), dull pain (0.012), infection (0.011) |
| Bed bug bite | skin rash (0.329), itching (0.173), anxiety (0.048), infection (0.029), sadness (0.026), depression (0.026), red spots (0.018), skin irritation (0.018), sweating (0.016), eye pain (0.015), lesion (0.012), substance abuse (0.011), hallucination (0.009), swollen feet (0.009), skin lesion (0.009), brief visual or sensory abnormality (0.009) |
| Bell’s palsy | numbness (0.308), weakness (0.198), headache (0.134), facial paralysis (0.071), ear pain (0.052), slurred speech (0.051), paralysis (0.046), facial pain (0.040), neck pain (0.038), facial swelling (0.037), tongue numbness (0.031), asymmetry (0.026), blurred vision (0.024), drooping of upper eyelid (0.020), lesion (0.019), malnutrition (0.019), difficulty swallowing (0.018), double vision (0.016) |
| Carpal tunnel syndrome | numbness (0.175), pain (0.167), hand pain (0.094), weakness (0.083), arm pain (0.071), wrist pain (0.060), swelling (0.054), hand numbness (0.041), redness (0.030), pins and needles (0.024), shoulder pain (0.024), vertigo (0.020), hand swelling (0.016), neck pain (0.016), infection (0.014), depression (0.011), sadness (0.011), anxiety (0.011), chronic back pain (0.010), back pain (0.010), malnutrition (0.010), severe pain (0.008), unsteadiness (0.008), dry skin (0.008) |
| Ectopic pregnancy | pain (0.537), bleeding (0.204), vaginal bleeding (0.181), abdominal pain (0.167), cramping (0.155), spotting (0.154), nausea (0.104), cyst (0.067), tenderness (0.055), lower abdominal pain (0.048), pelvic pain (0.040), diarrhea (0.031), vaginal discharge (0.023), discomfort (0.020), vomiting (0.016), back pain (0.015), vaginal pain (0.014), lightheadedness (0.011) |
| Kidney stone | pain (0.608), flank pain (0.495), nausea (0.232), blood in urine (0.141), pain during urination (0.084), vomiting (0.083), chills (0.067), abdominal pain (0.065), back pain (0.050), tenderness (0.040), discomfort (0.019), groin pain (0.018), severe pain (0.013), fever (0.012), testicle pain (0.011), frequent urge to urinate (0.011), lower abdominal pain (0.011), dark urine (0.011), urinary retention (0.011), sharp pain (0.010), cyst (0.010), pain in lower abdomen (0.010), diarrhea (0.009), constipation (0.008), infection (0.007), pelvic pain (0.007), side pain (0.004), dull pain (0.004) |
| Retinal detachment | vision loss (0.125), blurred vision (0.065), headache (0.057), neck pain (0.041), eye pain (0.039), dehydration (0.024), difficulty walking (0.023), itching (0.020), discomfort (0.018), unequal pupils (0.017), watery diarrhea (0.015), bone loss (0.015), partial loss of vision (0.014), ear pain (0.013), fast heart rate (0.012), slow bodily movement (0.009), low oxygen in the body (0.009), vision disorder (0.009), elevated alkaline phosphatase (0.009), seeing spots (0.009), abnormality walking (0.009), malnutrition (0.009) |
Figure 5Comparison of disease frequency for the ‘adult’ age bracket (40–60 years old). The y-axis shows the number of identified diseases in the emergency department data. The x-axis records the expected frequency of diseases according to the Google health knowledge graph for the ‘adult’ age bracket. The points highlighted demonstrate instances of frequency misalignment due to the differences in populations considered.