| Literature DB >> 21731656 |
Antony B Holmes1, Alexander Hawson, Feng Liu, Carol Friedman, Hossein Khiabanian, Raul Rabadan.
Abstract
Electronic health record (EHR) systems offer an exceptional opportunity for studying many diseases and their associated medical conditions within a population. The increasing number of clinical record entries that have become available electronically provides access to rich, large sets of patients' longitudinal medical information. By integrating and comparing relations found in the EHRs with those already reported in the literature, we are able to verify existing and to identify rare or novel associations. Of particular interest is the identification of rare disease co-morbidities, where the small numbers of diagnosed patients make robust statistical analysis difficult. Here, we introduce ADAMS, an Application for Discovering Disease Associations using Multiple Sources, which contains various statistical and language processing operations. We apply ADAMS to the New York-Presbyterian Hospital's EHR to combine the information from the relational diagnosis tables and textual discharge summaries with those from PubMed and Wikipedia in order to investigate the co-morbidities of the rare diseases Kaposi sarcoma, toxoplasmosis, and Kawasaki disease. In addition to finding well-known characteristics of diseases, ADAMS can identify rare or previously unreported associations. In particular, we report a statistically significant association between Kawasaki disease and diagnosis of autistic disorder.Entities:
Mesh:
Year: 2011 PMID: 21731656 PMCID: PMC3121722 DOI: 10.1371/journal.pone.0021132
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The amount of data entered into the NYPH EHR database each year has been increasing at an exponential rate since 1990 with data entry doubling every 8 years.
ICD-9 codes used to retrieve the sets of patients of the three rare diseases and the two control groups from NYPH EHR (2004–2009).
| Rare disease | ICD-9 codes | Number of patients |
| Kaposi sarcoma | 176, 176.0, 176.1, 176.2, 176.3, 176.4, 176.5, 176.8, 176.9 | 221 |
| Toxoplasmosis | 130, 130.0, 130.1, 130.2, 130.3, 130.4, 130.5, 130.7, 130.8, 130.9 | 138 |
| Kawasaki disease | 446.1 | 213 |
| Post-traumatic stress disorder (control) | 309.81 | 1281 |
| Influenza (control) | 487, 487.0, 487.1, 487.8 | 2582 |
Figure 2Outline of how search terms are mapped against ICD-9 code descriptions, using malignant lymphoma as an example.
Firstly the search term (S1) is broken down into words which are matched against the target phrase (S2) using regular expressions. Starting with every word in S1, S1 and S2 are compared and if there is no match, words are repeatedly removed from the match expression until only one word remains. If no match is found, a Levenshtein distance function is used to compare the terms for equality and if it scores lower that a threshold the terms are considered as matching.
Data source sizes.
| Data Source | Records |
| EHR ICD-9-CM patients | 768903 |
| NLP records | 406158 |
| PubMed Kaposi sarcoma | 1025 |
| PubMed Kawasaki disease | 598 |
| PubMed Toxoplasmosis | 960 |
| Wikipedia | One page per search term, if one exists. |
A selected list of significantly associated diseases with Kaposi sarcoma, toxoplasmosis, and Kawasaki disease, determined from the NYPH EHR, compared against either or both control groups of influenza and PTSD patients.
| ICD-9 | Description | Odds ratio | P-value | FDR |
|
| ||||
| 176.0 | Kaposi's sarcoma skin | N/A |
|
|
| 176.1 | Kaposi's sarcoma soft tissue | 116.83 |
|
|
| 176.2 | Kaposi's sarcoma palate | N/A |
|
|
|
| ||||
| 110.3 | Dermatophytosis of groin and perianal area | 15.58 | 0.001 | 0.005 |
| 110.4 | Dermatophytosis of foot | 3.46 | 0.005 | 0.022 |
| 112.2 | Candidiasis of other urogenital sites | 7.79 | 0.005 | 0.023 |
|
| ||||
| 078.5 | Cytomegaloviral disease | 23.19 | 0.002 | 0.007 |
| 786.3 | Hemoptysis | 9.66 | 0.003 | 0.012 |
| 284.1 | Pancytopenia | 4.83 | 0.014 | 0.044 |
|
| ||||
| 136.3 | Pneumocystosis | 24.95 |
|
|
| 176.0 | Kaposi's sarcoma skin | N/A | 0.003 | 0.022 |
| 176.4 | Kaposi's sarcoma lung | N/A | 0.003 | 0.023 |
|
| ||||
| 070.32 | Chronic viral hepatitis b without hepatic coma without hepatitis delta | 12.47 | 0.001 | 0.011 |
| 070.54 | Chronic hepatitis c without hepatic coma | 3.85 | 0.004 | 0.029 |
| 054.10 | Genital herpes unspecified | 6.24 | 0.007 | 0.041 |
|
| ||||
| 038.0 | Streptococcal septicemia | 13.92 | 0.008 | 0.047 |
| 038.8 | Other specified septicemias | 13.92 | 0.008 | 0.047 |
| 038.19 | Other staphylococcal septicemia | N/A | 0.009 | 0.048 |
|
| ||||
| 372.30 | Conjunctivitis unspecified | 3.67 |
|
|
| 034.1 | Scarlet fever | 10.39 |
| 0.015 |
| 299.0 | Autistic disorder current or active state | 15.15 |
| 0.017 |
|
| ||||
| 462 | Acute pharyngitis | 2.28 |
| 0.003 |
| 446.5 | Giant cell arteritis | 24.06 | 0.002 | 0.020 |
| 447.8 | Other specified disorders of arteries and arterioles | N/A | 0.003 | 0.034 |
If there are no patients with a diagnosis code in the control groups, odds ratio is not calculated (i.e. N/A). Kawasaki vs. influenza yields no significant associations not already found in Kawasaki vs. PTSD.
Figure 3The network of interactions of statistically significant diseases associated with Kawasaki disease compared to influenza combined with results from NLP reports, PubMed articles and Wikipedia articles.
Diseases linked to the diagnoses from either PubMed (green links) or Wikipedia (blue links) are documented associations. Diseases associated purely from diagnoses (red links) or NLP reports (gold links) are novel associations that have not been reported before.
Significantly associated diseases with toxoplasmosis, compared to Kaposi sarcoma (FDR0.05).
| ICD-9 | Description | Odds ratio | P-value | FDR |
| 130.0 | Meningoencephalitis due to toxoplasmosis | 70.46 |
|
|
| 130.7 | Toxoplasmosis of other specified sites | 72.07 |
|
|
| 780.39 | Other convulsions | 5.52 |
|
|
| 042 | Human immunodeficiency virus (hiv) disease | 1.90 |
|
|
| 130.8 | Multisystemic disseminated toxoplasmosis | N/A |
|
|
| 323.9 | Unspecified cause of encephalitis | N/A |
| 0.004 |
| 345.10 | Generalized convulsive epilepsy without intractable epilepsy | 16.01 |
| 0.005 |
| 345.90 | Epilepsy unspecified without intractable epilepsy | 3.20 | 0.001 | 0.011 |
| 130.2 | Chorioretinitis due to toxoplasmosis | N/A | 0.001 | 0.012 |
| 348.8 | Other conditions of brain | N/A | 0.001 | 0.012 |
| 784.0 | Headache | 2.31 | 0.003 | 0.021 |
| 363.00 | Focal chorioretinitis unspecified | N/A | 0.003 | 0.025 |
| 364.3 | Unspecified iridocyclitis | 12.81 | 0.003 | 0.025 |
| 644.10 | Other threatened labor unspecified as to episode of care | 12.81 | 0.003 | 0.025 |
| 648.91 | Other current conditions classifiable elsewhere of mother with delivery | 6.41 | 0.009 | 0.050 |
Table S7 shows the significantly associated diseases with Kaposi sarcoma, compared to toxoplasmosis. If there are no patients with a diagnosis code in the control groups, odds ratio is not calculated (i.e. N/A).