| Literature DB >> 33057419 |
Deisy Morselli Gysi1, Tiago de Miranda Fragoso2, Fatemeh Zebardast3, Wesley Bertoli4, Volker Busskamp5,6, Eivind Almaas7,8, Katja Nowick3.
Abstract
Biological and medical sciences are increasingly acknowledging the significance of gene co-expression-networks for investigating complex-systems, phenotypes or diseases. Typically, complex phenotypes are investigated under varying conditions. While approaches for comparing nodes and links in two networks exist, almost no methods for the comparison of multiple networks are available and-to best of our knowledge-no comparative method allows for whole transcriptomic network analysis. However, it is the aim of many studies to compare networks of different conditions, for example, tissues, diseases, treatments, time points, or species. Here we present a method for the systematic comparison of an unlimited number of networks, with unlimited number of transcripts: Co-expression Differential Network Analysis (CoDiNA). In particular, CoDiNA detects links and nodes that are common, specific or different among the networks. We developed a statistical framework to normalize between these different categories of common or changed network links and nodes, resulting in a comprehensive network analysis method, more sophisticated than simply comparing the presence or absence of network nodes. Applying CoDiNA to a neurogenesis study we identified candidate genes involved in neuronal differentiation. We experimentally validated one candidate, demonstrating that its overexpression resulted in a significant disturbance in the underlying gene regulatory network of neurogenesis. Using clinical studies, we compared whole transcriptome co-expression networks from individuals with or without HIV and active tuberculosis (TB) and detected signature genes specific to HIV. Furthermore, analyzing multiple cancer transcription factor (TF) networks, we identified common and distinct features for particular cancer types. These CoDiNA applications demonstrate the successful detection of genes associated with specific phenotypes. Moreover, CoDiNA can also be used for comparing other types of undirected networks, for example, metabolic, protein-protein interaction, ecological and psychometric networks. CoDiNA is publicly available as an R package in CRAN (https://CRAN.R-project.org/package=CoDiNA).Entities:
Mesh:
Year: 2020 PMID: 33057419 PMCID: PMC7561188 DOI: 10.1371/journal.pone.0240523
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Methods for comparing co-expression networks: The columns inform about the Number of networks that can be compared; Statistical methodology used; Whether the focus of comparison is on nodes or links.; Output type; if a Visualization tool is integrated; and Availability of the method.
| Method | Ref. | W | Description | Nodes or Links | Output | Visual tool | Available | Network Construction | Network size |
|---|---|---|---|---|---|---|---|---|---|
| > = 2 | Geometrical transformation, Normalized scores for links and classification of nodes. | Links and nodes | Full network, Nodes and Links classification | Yes | R package | No | Small, medium and large | ||
| [ | > = 2 | Jaccard-similarities from the union, intersections and exclusive links. | Links | Full network | Yes | GUI * | No | Small and medium | |
| [ | >2 | Finds conserved functional modules across multiple biological networks by transforming multiple networks into two feature matrices, factorizing the two feature matrices into consensus factors and a soft node selection | Links and nodes | Conserved modules in multiple networks | No | MATLAB | No | Small and medium | |
| [ | 2 | Hierarchical cluster analysis on the expression values | Nodes | Cluster of genes for hierarchical each group | Yes | R package | Gene modules | ||
| [ | 2 | Score the links to construct a unified differential co-expression network | Links | Full network | No | In-house software ** | Yes | Small, medium and large | |
| [ | 2 | Calculates the expression correlation changes of gene pairs between two conditions. Characteries a node condition by comparing the numbers of gene neighbors in different coexpression networks. | Links and nodes | Differential Regulated Links involved differential co-expression links and transcriptional regulation links | No | R package | Yes | ||
| [ | 2 | Highlights a subset of differentially co-expressed genes and links as either differentially regulated genes or differentially regulated links | Links and nodes | Differential Regulated Links involved differential co-expression links and transcriptional regulation links | Yes | R package | Yes | Small, medium and large | |
| [ | 2 | Probabilistic score for differential correlation | Nodes | Cluster of genes in each module | No | GUI * | Yes | ||
| [ | > = 2 | Identifies gene coexpression differences between multiple conditions based on WGCNA modules | Nodes | Cluster of genes in each module | No | In-house software ** | Gene modules | ||
| [ | 2 | Fisher’s z-test | Links | Full network | Yes | R package | Yes | Small and medium | |
| [ | 2 | Categorizes the correlation types for each group using Fisher’s transformation and later uses the concordant method for microarrays | Links | Specific links for each category | No | R package | Yes | ||
| [ | 2 | Uses a conditional F-Statistic to calculate differences in co-expression | Links and nodes | Genes that are differentially co-expressed in each sample | No | In-house software ** | Differential gene-gene co-expression patterns | ||
| [ | 2 | Calculates the Jaccard, Simpson, Geometric, Hypergeometric and Cosine indexes and Pearson correlation for links. | Links | Full network | Yes | Web-based | Yes | Small | |
| [ | > = 2 | Using a hierarchical data, it finds genes that are specific for each “branch” | Nodes | Conserved and specific modules in each hierarchical | No | Web-based | Yes | ||
| [ | 2 | Sub-graph matching | Nodes | Sub-graph | No | In-house | No | software ** | |
| [ | 2 | Identifies modules of differential genes. Based on unweigheted networks | Nodes | Cluster of genes in each module | No | In-house software ** | Both | Small and medium | |
| [ | 2 | Identifies conserved structures from topology and sequence similarity | Nodes | Conserved Network Structures | No | Web-based | No | ||
| [ | 2 | Computes graph similarities from trees for the nodes based on colouring graph theory | Nodes | Full network | No | In-house software ** | No | ||
| [ | 2 | Computes graph similarities for the nodes | Nodes | Node gaps, node mismatches and graph structural differences | No | Web-based | No | Small | |
| [ | > = 2 | Defines a “tissue-specific” network. Based on the average expression defines tissue specific genes | Links and nodes | Tissue specific genes and its networks | No | In-house software ** | Yes | Small, medium and large |
Fig 1Visual representation of the CoDiNA method for a 3-network comparison.
1a, 1b and 1c display three independent networks to be compared; violet links represent positively correlated gene-pairs, and green links negatively correlated ones. Node-size is relative to node strength. 1d shows the geometrical representation of CoDiNA: a 3D scatter-plot that is derived from plotting the weights of each link in the three networks. 1e displays the cube segments based on the τ-threshold.
Fig 2Visual representation of the CoDiNA method for a 3-network comparison: Categories definition.
2a represents where the α links lie in the 3D space. 2b and 2c represent the locations of β and γ links, respectively. The complete set of Φ and positions is shown in 2d.
Fig 3Visual representation of the CoDiNA method for a 3-network comparison: Scores definition.
The strength score () and the internal score ( are shown in panels 3a and 3b, respectively. The score is calculated as the Euclidean distance from the center of the cube to the set of links. This score allows the selection of strong links. Links that have a big variation in their weight have lower scores, while links with higher similarities have higher scores. The second score, , is the distance from the link weight to the categorical weight () and allows the selection of links that were well-classified in the Φ category. Links with a low score are not assigned to a particular Φ category, while well-classified links receive higher scores. Both scores can be combined to visualize the filtered network that contains only strong and well-classified links, as shown in 3c. Finally, the CoDiNA network with classified links and nodes is displayed in panel 3d.
Fig 4Running time for CoDiNA depends log linearly on the number of nodes, percentage of completeness of a network, and the number of networks under comparison.
Fig 5Comparing children and adults with tuberculosis and tested for HIV: The three panels show the categories of links and nodes for Panel I the CoDiNA network comparing adults with our without HIV; Panel II the CoDiNA network comparing children with or without HIV; and Panel III the complete CoDiNA network including adults (A) and children (C) with or without HIV. Note that, in the adults network (Panel I) most links are specific to HIV+ state, while for children (Panel II), most links are specific to individuals without HIV. Judging from Panel III, many links are lost in adults and in HIV+ children compared to HIV− children.
Fig 6Gene co-expression weighted network heatmap: Heatmap representing the omega values of links for each network.
The intensity of the color represents the weight of the link. Clusters of links that are stronger or weaker in certain groups of networks can be distinguished. Those clusters coincide with the CoDiNA subcategories shown on the left.
Disease Enrichment Analysis for each category in each CoDiNA network.
HIV I: HIV Infections; HIV S: HIV Seropositivity. p-values determined by the Fisher’s exact test when testing for enrichment of known disease genes within each category.
| Network | Disease | Φ | Known | Observed | |
|---|---|---|---|---|---|
| Adults | TB | Undefined | 25 | 15 | 0.13 |
| Adults | HIV I | Common | 114 | 6 | 0.13 |
| Children | AIDS | Specific to HIV+ children | 22 | 8 | 0.02 |
| Children | sAIDS | Specific to HIV+ children | 18 | 7 | 0.02 |
| Children | TB | Specific to HIV− children | 92 | 49 | 0.04 |
| Children | HIV I | Common | 211 | 25 | 0.02 |
| Children | HIV S | Specific to HIV+ children | 6 | 3 | 0.06 |
| Complete | TB | Undefined | 42 | 22 | 0.01 |
| Complete | AIDS | Specific to HIV+ children | 10 | 2 | 0.09 |
| Complete | sAIDS | Specific to HIV+ | 9 | 1 | 0.09 |
Fig 7TF-TF CoDiNA networks for each of the Glioma types: CoDiNA identified TFs with specific co-regulatory changes to each cancer, Panel I astrocytoma, Panel II Glioblastoma, Panel III oligodendroglioma.
Nodes are coloured according to the type of cancer CoDiNA associated them to. Panel I: We can see that mostly glioma and astrocytoma TFs are specific to the astrocytoma network. Panel II: The majority of nodes refer to specific changes in co-expression in glioblastoma, but there is an overlap with TFs involved in other gliomas. Panel III: Most links and genes are specific to oligodendroglioma, with some overlap of astrocytoma TFs.
Fig 8Workflow process of the CoDiNA R package.
Input data for the CoDiNA R package can be any networks, filtered for containing only significant links (according to the network construction method used). Edge list is a list containing all the links and their weights. The user can assign a weight of zero to links for which the p-value is not significant. The function MakeDiffNet() classifies the links into the Φ and categories, calculates and normalises the scores. Its output is used as input for assigning the nodes into categories by the function ClusterNodes(). The plot() function can be used on the output from MakeDiffNet() and automatically calls the function ClusterNodes().