| Literature DB >> 27115429 |
Steven C Bagley1, Marina Sirota2, Richard Chen3, Atul J Butte2, Russ B Altman1,4.
Abstract
Patterns of disease co-occurrence that deviate from statistical independence may represent important constraints on biological mechanism, which sometimes can be explained by shared genetics. In this work we study the relationship between disease co-occurrence and commonly shared genetic architecture of disease. Records of pairs of diseases were combined from two different electronic medical systems (Columbia, Stanford), and compared to a large database of published disease-associated genetic variants (VARIMED); data on 35 disorders were available across all three sources, which include medical records for over 1.2 million patients and variants from over 17,000 publications. Based on the sources in which they appeared, disease pairs were categorized as having predominant clinical, genetic, or both kinds of manifestations. Confounding effects of age on disease incidence were controlled for by only comparing diseases when they fall in the same cluster of similarly shaped incidence patterns. We find that disease pairs that are overrepresented in both electronic medical record systems and in VARIMED come from two main disease classes, autoimmune and neuropsychiatric. We furthermore identify specific genes that are shared within these disease groups.Entities:
Mesh:
Year: 2016 PMID: 27115429 PMCID: PMC4846031 DOI: 10.1371/journal.pcbi.1004885
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1The overall information flow.
Clinical data on disease co-occurrence from the Columbia and Stanford EMRs were compared to the literature-mined gene and disease relationships in the VARIMED database.
Conventional 2 × 2 table for counting the presence/absence of a disease pair.
|
|
| |
|
|
|
Informal names for each combination of statistically significant results from EMRs and VARIMED.
|
|
| “Clinical and genetic” |
|
|
| “Clinical without observed genetic effect” |
|
|
| “Genetic with no observed clinical effect” |
+ = significant,
− = not significant.
Counts and frequencies (as percent) for diseases that occur in both EMR data sets and in VARIMED.
Number of genes is from VARIMED. Cluster names were assigned by hand to facilitate comprehension, as described in the text.
| Columbia | Stanford | ||||||
|---|---|---|---|---|---|---|---|
| Disease name | Count | Percent | Count | Percent | Number of genes | Cluster name | |
| 1 | Alcoholism | 27638 | 2.82 | 11363 | 4.10 | 81 | adulthood |
| 2 | Allergic rhinitis | 19216 | 1.96 | 22523 | 8.12 | 5 | other |
| 3 | Alopecia areata | 821 | 0.08 | 632 | 0.23 | 75 | other |
| 4 | Alzheimer’s | 9073 | 0.93 | 2444 | 0.88 | 179 | aged |
| 5 | Amyotrophic lateral sclerosis | 2182 | 0.22 | 276 | 0.10 | 70 | aged |
| 6 | Ankylosing spondylitis | 510 | 0.05 | 532 | 0.19 | 38 | adulthood |
| 7 | Aortic aneurysm | 2990 | 0.31 | 5401 | 1.95 | 22 | aged |
| 8 | Attention deficit | 6964 | 0.71 | 5043 | 1.82 | 93 | youth |
| 9 | Autism | 481 | 0.05 | 2423 | 0.87 | 218 | youth |
| 10 | Behcet’s s. | 53 | 0.01 | 82 | 0.03 | 42 | other |
| 11 | Bipolar disorder | 12373 | 1.26 | 7179 | 2.59 | 185 | adulthood |
| 12 | Cardiomyopathy | 11457 | 1.17 | 8212 | 2.96 | 4 | aged |
| 13 | Celiac sprue | 1954 | 0.20 | 1267 | 0.46 | 45 | other |
| 14 | Cholelithiasis | 15353 | 1.57 | 8095 | 2.92 | 5 | aged |
| 15 | Depression | 27085 | 2.77 | 8283 | 2.99 | 155 | adulthood |
| 16 | Diabetes type 1 | 19372 | 1.98 | 5116 | 1.84 | 323 | other |
| 17 | Diabetes type 2 | 60815 | 6.21 | 40176 | 14.49 | 254 | aged |
| 18 | Epilepsy | 12099 | 1.24 | 12095 | 4.36 | 9 | neonate |
| 19 | Goiter | 10820 | 1.11 | 9201 | 3.32 | 5 | adulthood |
| 20 | Gout | 192 | 0.02 | 106 | 0.04 | 12 | aged |
| 21 | HIV | 6138 | 0.63 | 1073 | 0.39 | 92 | adulthood |
| 22 | Hepatitis B | 5757 | 0.59 | 3212 | 1.16 | 14 | adulthood |
| 23 | Hepatitis C | 18421 | 1.88 | 6583 | 2.37 | 40 | aged |
| 24 | Hypertrophic cardiomyopathy | 603 | 0.06 | 831 | 0.30 | 4 | adulthood |
| 25 | Kawasaki’s d. | 495 | 0.05 | 328 | 0.12 | 66 | youth |
| 26 | Migraine | 8049 | 0.82 | 12593 | 4.54 | 18 | adulthood |
| 27 | Moyamoya | 130 | 0.01 | 557 | 0.20 | 8 | other |
| 28 | Multiple sclerosis | 14979 | 1.53 | 1685 | 0.61 | 261 | adulthood |
| 29 | Parkinson’s d. | 6116 | 0.62 | 2839 | 1.02 | 151 | aged |
| 30 | Psoriasis | 4577 | 0.47 | 3249 | 1.17 | 104 | adulthood |
| 31 | Rheumatoid arthritis | 7333 | 0.75 | 4775 | 1.72 | 348 | aged |
| 32 | Schizophrenia | 11256 | 1.15 | 1935 | 0.70 | 208 | adulthood |
| 33 | Sjogren’s s. | 348 | 0.04 | 893 | 0.32 | 7 | aged |
| 34 | Systemic lupus erythematosus | 3194 | 0.33 | 2090 | 0.75 | 175 | adulthood |
| 35 | Tuberculosis | 66569 | 6.80 | 912 | 0.33 | 32 | adulthood |
Fig 2The incidence-by-age patterns of the five clusters identified.
Using data from Stanford’s EMR, each graph shows the incidence at each age, averaged over all disorders in the cluster. The loess smoother marks the overall trend with a colored band. The same cluster colors are used throughout this paper. See the text for description of the cluster names.
Fig 3Network structure of the significant disease pairs that occur in both EMRs.
Each node represents a disease, with the node size scaled to the disease frequency in the Columbia EMR. Each edge connects statistically significant pairs, with the edge width scaled to the effect size (observed number divided by expected number). Node color corresponds to the cluster colors in Fig 2.
Fig 4Venn diagrams showing the overlap of the disease pairs from the two electronic medical records and from VARIMED.
At the top, the leftmost diagram shows the overlap of statistically significant disease pairs that are underrepresented in Columbia and in Stanford; the rightmost diagram is for overrepresented pairs. The bottom diagram shows the overlap between the conjunctions (overlapping regions) of the upper diagrams and the disease pairs in VARIMED. Arrows show how the results from the EMR sources were combined with the VARIMED results. The counts of disease pairs shown do not correspond exactly to those in Tables 6 and 7 because the VARIMED results here include discordant pairs, underrepresented in one EMR and overrepresented in the other.
Results for overrepresented disease pairs that are significant in Columbia and Stanford EMRs but not in VARIMED.
| Columbia | Stanford | ||||||
|---|---|---|---|---|---|---|---|
| Disease 1 | Disease 2 | Cluster name | Obs/Exp | P-value | Obs/Exp | P-value | |
| 1 | Alcoholism | Bipolar disorder | adulthood | 7.40 | 0.00E+00 | 3.32 | 1.55E-239 |
| 2 | Alcoholism | Depression | adulthood | 5.80 | 0.00E+00 | 2.80 | 1.28E-179 |
| 3 | Alcoholism | HIV | adulthood | 5.38 | 0.00E+00 | 1.89 | 6.96E-08 |
| 4 | Alcoholism | Hepatitis B | adulthood | 3.45 | 3.71E-138 | 1.56 | 9.53E-10 |
| 5 | Alcoholism | Schizophrenia | adulthood | 6.82 | 0.00E+00 | 3.93 | 1.74E-94 |
| 6 | Alzheimer’s | Parkinson’s d. | aged | 15.91 | 0.00E+00 | 6.95 | 8.46E-88 |
| 7 | Aortic aneurysm | Cardiomyopathy | aged | 4.54 | 3.14E-54 | 1.51 | 4.81E-10 |
| 8 | Aortic aneurysm | Cholelithiasis | aged | 2.73 | 3.65E-23 | 1.44 | 1.02E-07 |
| 9 | Attention deficit | Autism | youth | 31.86 | 6.52E-126 | 7.35 | 2.03E-172 |
| 10 | Bipolar disorder | Migraine | adulthood | 3.46 | 3.56E-86 | 1.69 | 2.58E-32 |
| 11 | Cardiomyopathy | Diabetes type 2 | aged | 4.61 | 0.00E+00 | 1.29 | 2.89E-26 |
| 12 | Cholelithiasis | Diabetes type 2 | aged | 3.04 | 0.00E+00 | 1.31 | 1.23E-29 |
| 13 | Cholelithiasis | Hepatitis C | aged | 3.19 | 5.65E-203 | 2.81 | 3.47E-101 |
| 14 | Depression | HIV | adulthood | 6.57 | 0.00E+00 | 1.72 | 1.44E-04 |
| 15 | Depression | Migraine | adulthood | 4.14 | 9.28E-286 | 2.09 | 1.27E-83 |
| 16 | Depression | Schizophrenia | adulthood | 11.74 | 0.00E+00 | 3.89 | 1.89E-66 |
| 17 | Diabetes type 2 | Gout | aged | 3.44 | 3.32E-12 | 2.80 | 5.46E-11 |
| 18 | Diabetes type 2 | Hepatitis C | aged | 4.92 | 0.00E+00 | 1.46 | 2.92E-48 |
| 19 | Goiter | Tuberculosis | adulthood | 1.66 | 1.17E-65 | 1.78 | 5.86E-05 |
| 20 | HIV | Hepatitis B | adulthood | 9.56 | 1.46E-213 | 4.67 | 1.76E-21 |
| 21 | HIV | Tuberculosis | adulthood | 6.03 | 0.00E+00 | 3.12 | 1.03E-03 |
| 22 | Hepatitis B | Tuberculosis | adulthood | 4.28 | 0.00E+00 | 4.07 | 2.83E-14 |
| 23 | Migraine | Systemic lupus erythematosus | adulthood | 3.96 | 5.45E-31 | 1.75 | 9.76E-12 |
Obs/Exp = Observed/Expected.
Results for disease pairs that are significant in VARIMED after removing pairs that are significant in both Columbia and Stanford EMRs.
| Disease 1 | Disease 2 | Cluster name | Disease 1 genes | Disease 2 genes | Gene overlap | P-value | OR | EMR | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Alopecia areata | Behcet’s s. | other | 75 | 42 | 6 | 3.47E-08 | 38.73 | |
| 2 | Alopecia areata | Celiac sprue | other | 75 | 45 | 7 | 1.37E-09 | 42.79 | |
| 3 | Alopecia areata | Diabetes type 1 | other | 75 | 323 | 40 | 4.36E-49 | 65.18 | |
| 4 | Alzheimer’s | Diabetes type 2 | aged | 179 | 254 | 12 | 8.84E-06 | 5.24 | C |
| 5 | Ankylosing spondylitis | HIV | adulthood | 38 | 92 | 7 | 9.93E-10 | 45.80 | |
| 6 | Ankylosing spondylitis | Multiple sclerosis | adulthood | 38 | 261 | 10 | 1.08E-10 | 25.32 | C |
| 7 | Behcet’s s. | Diabetes type 1 | other | 42 | 323 | 18 | 6.74E-21 | 42.80 | |
| 8 | Celiac sprue | Diabetes type 1 | other | 45 | 323 | 16 | 3.46E-17 | 31.49 | S |
| 9 | HIV | Multiple sclerosis | adulthood | 92 | 261 | 26 | 4.00E-26 | 27.90 | |
| 10 | HIV | Psoriasis | adulthood | 92 | 104 | 15 | 1.74E-17 | 34.94 | C |
| 11 | HIV | Systemic lupus erythematosus | adulthood | 92 | 175 | 27 | 7.99E-32 | 44.09 | |
| 12 | Multiple sclerosis | Psoriasis | adulthood | 261 | 104 | 28 | 1.60E-25 | 21.56 | C |
| 13 | Multiple sclerosis | Schizophrenia | adulthood | 261 | 208 | 14 | 2.47E-06 | 5.06 | C |
| 14 | Multiple sclerosis | Systemic lupus erythematosus | adulthood | 261 | 175 | 47 | 1.92E-42 | 23.34 | C |
| 15 | Parkinson’s d. | Rheumatoid arthritis | aged | 151 | 348 | 15 | 2.22E-07 | 5.84 | C |
| 16 | Psoriasis | Systemic lupus erythematosus | adulthood | 104 | 175 | 26 | 1.06E-28 | 35.40 | C |
| 17 | Schizophrenia | Systemic lupus erythematosus | adulthood | 208 | 175 | 10 | 3.84E-05 | 5.37 |
Disease 1/2 genes = number of genes for each disease in VARIMED,
Gene overlap = number of shared genes,
OR = Odds ratio,
EMR = which EMR had result,
C = Columbia,
S = Stanford.
Results for overrepresented (synergistic) disease pairs that are significant in Columbia and Stanford EMRs and in VARIMED.
Results are sorted by cluster and by Obs/Exp within each cluster.
| Columbia | Stanford | ||||||
|---|---|---|---|---|---|---|---|
| Disease 1 | Disease 2 | Cluster name | Obs/Exp | P-value | Obs/Exp | P-value | |
| 1 | Ankylosing spondylitis | Psoriasis | adulthood | 7.13 | 6.22E-10 | 2.57 | 6.85E-04 |
| 2 | Ankylosing spondylitis | Systemic lupus erythematosus | adulthood | 46.88 | 3.13E-102 | 3.24 | 2.56E-04 |
| 3 | Bipolar disorder | Depression | adulthood | 16.27 | 0.00E+00 | 7.07 | 0.00E+00 |
| 4 | Bipolar disorder | Schizophrenia | adulthood | 22.34 | 0.00E+00 | 10.16 | 0.00E+00 |
| 5 | Rheumatoid arthritis | Sjogren’s s. | aged | 35.29 | 2.33E-111 | 10.92 | 1.51E-117 |
Obs/Exp = Observed/Expected.
Results of overrepresented disease pairs that are significant in Columbia and Stanford EMRs and in VARIMED, showing the genetic information from VARIMED.
disease1/2 genes = number of genes for each disease, gene overlap = number of shared genes, pvalue = pvalue from Fisher exact test, OR = odds ratio, gene names = the gene symbols for the shared genes. Colons connect groups of genes all mapped from the same variant.
| disease1 | disease2 | disease1 genes | disease2 genes | gene overlap | pvalue | OR | gene names | |
|---|---|---|---|---|---|---|---|---|
| 1 | Ankylosing spondylitis | Psoriasis | 38 | 104 | 9 | 0.00 | 55.60 | CAST:ERAP1, ERAP1, HCP5, HLA-E, IL23R, MICA, MUC22, PSORS1C3, PTPN1 |
| 2 | Ankylosing spondylitis | Systemic lupus erythematosus | 38 | 175 | 9 | 0.00 | 32.96 | ABCF1, BTNL2, GPSM3, HCG23, HCP5, IL23R, MSH5:MSH5-SAPCD1, MUC22, TRIM31 |
| 3 | Bipolar disorder | Depression | 185 | 155 | 40 | 0.00 | 33.11 | ANK3, ANKS1B, BBS1, BCL11B, C11orf80, C15orf53, CACNA1C, CDH13, CNNM4, CNNM4:MIR3127, CNTNAP5, DDN, FER1L5, GLT8D1, GLT8D1:GNL3, GLT8D1:SPCS1, GNL3:PBRM1, GNL3:PBRM1:SNORD19, GNL3:SNORD69, ITIH1, ITIH3, ITIH4, KMT2D, LMAN2L, MACROD2, MAPK10, MUC22, NEK4, NFIX, PBRM1, PDE7B, PELI3, PRKAG1, REV1, SPCS1, SVEP1, SYNE1, TENM4, TMEM132D, ZNF804A |
| 4 | Bipolar disorder | Schizophrenia | 185 | 208 | 10 | 0.00 | 5.10 | ANK3, CACNA1C, CDH13, GPM6A, ITIH4, MAD1L1, MYO5B, PDE7B, PTPRG, ZNF804A |
| 5 | Rheumatoid arthritis | Sjogren’s s. | 348 | 7 | 4 | 0.00 | 31.15 | LOC100287329:LTA, LST1, LST1:NCR3, TNF |
Fig 5Incidence-by-age graphs for autism and Alzheimer’s disease.
Because of the gross disparities in these patterns, patients at risk for one disorder would be a low risk for the second disorder at any given age, reducing the observed comorbidity.