| Literature DB >> 22988944 |
Matthias Arnold1, Mara L Hartsperger, Hansjörg Baurecht, Elke Rodríguez, Benedikt Wachinger, Andre Franke, Michael Kabesch, Juliane Winkelmann, Arne Pfeufer, Marcel Romanos, Thomas Illig, Hans-Werner Mewes, Volker Stümpflen, Stephan Weidinger.
Abstract
BACKGROUND: Genome-wide association studies (GWAS) have provided a large set of genetic loci influencing the risk for many common diseases. Association studies typically analyze one specific trait in single populations in an isolated fashion without taking into account the potential phenotypic and genetic correlation between traits. However, GWA data can be efficiently used to identify overlapping loci with analogous or contrasting effects on different diseases.Entities:
Mesh:
Year: 2012 PMID: 22988944 PMCID: PMC3782362 DOI: 10.1186/1471-2164-13-490
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Illustration of the different disease networks based on genome-wide association data.A: The bipartite graph constructed from all association data. The two disjoint node sets are diseases (n = 111) and loci (n = 734; 508 gene loci and 226 intergenic loci), connected to each other by an edge if a variant (n = 1,120) within the respective locus is associated with the corresponding trait. B: The SLN (shared locus network) consisting of 84 traits and 157 loci, retrieved by removing isolated traits and loci that are associated with a single trait only. C: The SVN (shared variant network) that corresponds to a variant-based representation of the data. Here, a trait and a locus are linked if the locus contains a variant comprising associations with this and at least one other trait. The network consists of 175 SNPs located in 94 loci that are associated with 55 diseases (see also Additional file 2: Table S1). The colors of the disease nodes correspond to disease classes according to the MeSH ontology, multi-colored nodes indicate an association with different disease classes; loci are depicted as transparent, diamond-shaped nodes. The node size reflects the number of loci a disease is associated with. In C, the edge color reflects the allelic information: gray indicates agonistic variant(s), red corresponds to antagonistic variant(s), and blue mark both agonistic and antagonistic signals.
Figure 2LD based locus assignment and its error sources. At the example of chromosome 8q21.11, LD-based locus assignment is given for 6 exemplary SNPs (blue box). LD information is given by a color scale displaying the LD-measure r2 with red depicting strong LD, blue low LD and white no LD. Example SNPs in LD are connected with black dashed lines. In the gray boxes, the two error sources of automated locus assignment are given. An assignment error I occurs if two variants not in LD, i.e. in two independent LD blocks, are located in the same gene, intergenic region or gene desert and thus are assigned the same locus. Here, this is the case for the variants rs-A/rs-B and rs-E/rs-F, respectively. The consequence of this type of error is a shared association on the locus level not mirrored on the variant level. An assignment error II is introduced if two variants are in LD but diverge in their assigned locus. Here, this is the case for rs-C and rs-D. Due to such abnormalities in the LD data the link between both variants is lost if only the locus level is considered.
Antagonistically linked traits
| CD | AST ( | 8 | |
| AST | CD ( | 5 | |
| GLI | COR ( | 5 | |
| RA | AST ( | 3 | |
| MS | RA ( | 3 | |
| UC | AST ( | 3 | |
| LN | PN ( | 3 | |
| TN | GLI ( | 2 | |
| VIT | CD ( | 2 | |
| T1D | CD ( | 2 | |
| T2D | GLI ( | 2 | |
| CeD | UC ( | 2 | |
| IPF | GLI ( | 2 | |
| PS | AST ( | 2 | |
| SLE | CD ( | 2 | |
| GLA | GLI ( | 1 | |
| MEL | VIT ( | 1 | |
| PN | LN ( | 1 | |
| ALC | HNN ( | 1 | |
| CRC | CeD ( | 1 | |
| IBD | T1D ( | 1 | |
| HCC | MS ( | 1 | |
| BLC | AST ( | 1 | |
| COR | GLI ( | 1 | |
| HNN | ALC ( | 1 |
Listed are diseases and the traits that share an antagonistic variant with the respective disorder. In the first column the considered disease is given. The second column specifies the abbreviation of the disorder in the first column. The third column contains the abbreviated diseases as defined in column two which have an antagonistic link to the disorder in column one, followed by the chromosomal location of the antagonistically associated variant(s) in parentheses. The last column lists the count of traits antagonistically linked to the disorder in column one. For a more detailed listing see Additional file 2: Table S1.
Agonistically linked traits
| CD | UC ( | 14 | |
| RA | CeD ( | 10 | |
| CeD | CD ( | 9 | |
| UC | CD ( | 9 | |
| COR | MCI ( | 7 | |
| SLE | RA ( | 7 | |
| T1D | RA ( | 6 | |
| HYP | ICA ( | 5 | |
| IBD | CD ( | 5 | |
| BLC | RA ( | 5 | |
| CAD | COR ( | 4 | |
| MG | IBD ( | 4 | |
| AAA | COR ( | 3 | |
| ICA | PD ( | 3 | |
| LN | SLE ( | 3 | |
| FL | LBL ( | 3 | |
| MCI | COR ( | 3 | |
| PD | HYP ( | 3 | |
| SS | SLE ( | 3 | |
| AS | UC ( | 3 | |
| VIT | CeD ( | 3 | |
| AA | T1D ( | 2 | |
| SC | UC ( | 2 | |
| AD | GLIOMA ( | 2 | |
| T2D | OBESITY ( | 2 | |
| HNN | GASTROINTESTINAL NEOPLASMS ( | 2 | |
| HTG | CD ( | 2 | |
| LEP | CD ( | 2 | |
| LL | CD ( | 2 | |
| LBL | FL ( | 2 | |
| MS | CD ( | 2 | |
| PVD | COPD ( | 2 | |
| PRN | COLORECTAL NEOPLASMS ( | 2 | |
| PS | UC ( | 2 | |
| COPD | LN ( | 2 |
Listed are diseases that share agonistic associations with at least two traits. In the first column the considered disease is given. The second column specifies the abbreviation of the disorder in the first column. The third column contains the disease abbreviations (as defined in column two) of traits which have an agonistic link to the disorder in column one, followed by the chromosomal location of the agonistically associated variant(s) in parentheses. Here, the full MeSH term is given for traits for which no abbreviation was defined. The last column lists the count of traits agonistically linked to the disorder in column one. For the complete list of agonistically linked traits and more details see Additional file 2: Table S1.
Figure 3Clustering of diseases with respect to genetic signals. We applied complete-linkage hierarchical clustering to identify groups of traits which show homogeneous patterns of genetic overlap to other disorders. We calculated for each pair of diseases the Pearson correlation of the patterns of overlap to the other diseases. The correlation values are ranging from −1 (white) indicating complete negative correlation to +1 (black) reflecting a perfect positive correlation. As the minimal value of the correlation coefficient was > −0.1, we collapsed the range of negative correlation. In red numbers, the 15 disease clusters are denoted. The Euclidian distance threshold was chosen as the maximal distance at which the six diseases showing no or only weak correlation with any other disease (disease names in gray) remain non-clustered.
Figure 4Data prioritization and analysis workflow. We established a semi-automated curation pipeline which automatically gathers and annotates GWA data obtained from three sources (locus assignment included). Last step of the preprocessing was the manual inspection of risk alleles and odds ratios. With this data set at hand, we construct a locus-based (SLN) and a variant-based (SVN) network representation of the data. For quality reasons, we then limited analyses to the SVN and investigated the contained variants and their effects further.