| Literature DB >> 27307606 |
Benjamin S Glicksberg1, Li Li1, Marcus A Badgeley1, Khader Shameer1, Roman Kosoy1, Noam D Beckmann2, Nam Pho3, Jörg Hakenberg2, Meng Ma2, Kristin L Ayers2, Gabriel E Hoffman2, Shuyu Dan Li2, Eric E Schadt2, Chirag J Patel3, Rong Chen2, Joel T Dudley4.
Abstract
MOTIVATION: Underrepresentation of racial groups represents an important challenge and major gap in phenomics research. Most of the current human phenomics research is based primarily on European populations; hence it is an important challenge to expand it to consider other population groups. One approach is to utilize data from EMR databases that contain patient data from diverse demographics and ancestries. The implications of this racial underrepresentation of data can be profound regarding effects on the healthcare delivery and actionability. To the best of our knowledge, our work is the first attempt to perform comparative, population-scale analyses of disease networks across three different populations, namely Caucasian (EA), African American (AA) and Hispanic/Latino (HL).Entities:
Mesh:
Year: 2016 PMID: 27307606 PMCID: PMC4908366 DOI: 10.1093/bioinformatics/btw282
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Workflow of the current study. We outline steps taken in our study from data organization and statistical methodologies to network analytics
Fig. 2.Disease and category frequency. We show for A disease counts (log10) overall and by EA, AA and HL cohorts. We show for B the distribution of the number of diseases encompassed within each of the 93 used CCS disease categories
Fig. 3.Disease susceptibility profiles based on racial group. We present here the distribution of diseases (with highlighted examples) that have statistically significant (Bonferroni corrected P < 4.2 × 10−05) differences in risk profiles for AA and HL cohorts compared to EA. The race beta values refer to effect size of race when controlling for age and sex with positive values indicating increased risk compared to EA and vice versa
Fig. 4.Distribution of significantly connected disease pairs by racial cohort. We show the amount of disease pairs that were significantly temporally related and comorbid for all racial groups (P < 1.42 × 10−06 criteria for both)
Temporal directionality and connectivity significance of selected disease pairs unique to each race cohort
| Pop. | Disease 1 | Disease 2 | β | |
|---|---|---|---|---|
| EA | Thyroid cancer | Postsurgical hypothyroidism | <6.4E−324 | 5.25 |
| EA | Lymphosarcoma | Aplastic anemia | <6.4E−324 | 3.42 |
| EA | Ulcerative colitis | Intestinal obstruction | <6.4E−324 | 3.27 |
| EA | Toxic diffuse goiter | Postsurgical hypothyroidism | 1.3E−153 | 3.11 |
| EA | Familial hypercholesterolemia | Acute cystitis | <6.4E−324 | 3.09 |
| AA | Diabetes mellitus, type 2 | Diabetic cataract | 2.1E−16 | 5.73 |
| AA | Hyperthyroidism | Toxic diffuse goiter | <6.4E−324 | 5.10 |
| AA | Chronic ulcer of skin | Osteomyelitis | 1.4E−235 | 4.96 |
| AA | Hypertension | IgA glomerulonephritis | 6.4E−75 | 4.09 |
| AA | HIV disease | Esophageal candidiasis | <6.4E−324 | 3.87 |
| HL | Diabetes mellitus, type 1 | Clostridium difficile colitis | 3.3E−73 | 2.51 |
| HL | Benign essential hypertension | Phobic disorder | 5.1E−28 | 2.25 |
| HL | Coronary artery disease | ARDS | 1.7E−61 | 1.89 |
| HL | Generalized anxiety disorder | Anemia | 3.1E−64 | 1.72 |
| HL | Major depressive disorder | Decubitus ulcer | 2.1E−42 | 1.67 |
For each population, we determined which temporally related disease pairs had Bonferroni-corrected significant connectivity (P < 1.42 × 10−06). We present particular disease pairs of interest from among the top-25 associations for each population, ranked by effect size. Effect size, or β, can be interpreted as the odds ratio of disease 2 occurring given disease 1, holding age and sex constant.
Fig. 5.Network structure patterns for each racial cohort and hub connectivity. We provide race-specific networks for EA (A), AA (B) and HL (C) populations for disease pairs that were significantly temporally related and comorbid for each group (P < 1.42 × 10−06 criteria for both). Effect size, shown as edge weight, is the increased risk of developing the target disease when having the source, controlling for sex and age. Node size reflects number of directed, outgoing connections. The larger text refers to diseases identified as hubs for the population
Metric statistic results across race-specific networks
| Metric | EA | AA | HL | EA/AA ( | EA/HL ( | AA/HL ( | Trend | |
|---|---|---|---|---|---|---|---|---|
| Closeness centrality | 0.27±0.38 | 0.21±0.35 | 0.16±0.37 | 0.28 | ||||
| Clustering coefficient | 0.05±0.09 | 0.08±0.1 | 0.01±0.05 | |||||
| Eccentricity | 0.78±1.18 | 0.69±1.21 | 0.21±0.50 | 0.42 | 1.37E | |||
| Edge count | 11.34±23.94 | 13.3±33.54 | 6.86±15.06 | 0.55 | 0.16 | |||
| In-degree | 5.67±7.89 | 6.65±8.16 | 3.43±3.1 | 0.13 | ||||
| Neighborhood connectivity | 109.22±66.37 | 289.76±111.47 | 69.08±34.72 | |||||
| Out-degree | 5.67±23.46 | 6.65±33.31 | 3.43±15.48 | 0.38 | – | – | – | – |
| Stress | 8.55±43.08 | 13.13±64.38 | 0.29±1.7 | 0.38 | 0.15 |
We determined significant differences (italicized) in network structure across EA, AA and HL networks using a one-ANOVA to compare average metric statistics for race-cohort networks (P < 0.05). We then performed Tukey HSD test on significant results to determine specifically which races differed from one another (P < 0.05).