| Literature DB >> 28477207 |
Danila Vella1,2, Italo Zoppis2, Giancarlo Mauri2, Pierluigi Mauri1, Dario Di Silvestre3.
Abstract
The reductionist approach of dissecting biological systems into their constituents has been successful in the first stage of the molecular biology to elucidate the chemical basis of several biological processes. This knowledge helped biologists to understand the complexity of the biological systems evidencing that most biological functions do not arise from individual molecules; thus, realizing that the emergent properties of the biological systems cannot be explained or be predicted by investigating individual molecules without taking into consideration their relations. Thanks to the improvement of the current -omics technologies and the increasing understanding of the molecular relationships, even more studies are evaluating the biological systems through approaches based on graph theory. Genomic and proteomic data are often combined with protein-protein interaction (PPI) networks whose structure is routinely analyzed by algorithms and tools to characterize hubs/bottlenecks and topological, functional, and disease modules. On the other hand, co-expression networks represent a complementary procedure that give the opportunity to evaluate at system level including organisms that lack information on PPIs. Based on these premises, we introduce the reader to the PPI and to the co-expression networks, including aspects of reconstruction and analysis. In particular, the new idea to evaluate large-scale proteomic data by means of co-expression networks will be discussed presenting some examples of application. Their use to infer biological knowledge will be shown, and a special attention will be devoted to the topological and module analysis.Entities:
Keywords: -Omics data; Co-expression network; PPI network; Pearson’s correlation; Systems biology; Topological analysis; WGCNA
Year: 2017 PMID: 28477207 PMCID: PMC5359264 DOI: 10.1186/s13637-017-0059-z
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145
Fig. 1a Biological networks. Nodes may represent several types of biological elements, while the edges describe the nature of their relationship. If A and B are two nodes connected by an edge, (A,B)∈E, B is a neighbor of A or A and B are adjacent. b Protein network classification proposed by Vidal et al. [25]
Fig. 2Pathguide website [40]. A repository containing information about 547 resources of molecular interactions and pathways
Fig. 3ReactomeFIViz: from disease pathway to PPI network. Main steps to obtain a protein functional and a physical protein network, starting from a specific pathway (oncogene induced senescence). Using ReactomeFIViz, pathways can be visualized in relation with others (a), can be detailed as a diagram showing all intermolecular relationships (b), and as a protein functional interaction network (c) showing just the relation among proteins that cooperate to perform a given molecular function. Finally, starting from a group of protein of interest, it is possible to obtain a network of protein-protein interactions by STRING; in the reported example, the interactions shown are limited to physical type, in particular binding, activation and inhibition (d)
Fig. 4The figure shows the ACSL1 protein and its neighbors in two co-expression networks obtained by processing the protein expression profiles of a control group and a group of patients affected by amyloidosis disease. In the considered groups of samples, ACSL1 shows a different degree. It suggests that this protein may have a key role in the emergent phanotypes. Green edgesrepresent a positive correlation between the expression profiles, while black edgesindicate negative correlations. The thick edges indicate known interactions present in public repositories as PPI
Fig. 5Possible cases of correlation between two variables. a Positive correlation. b No correlation. c Negative correlation
Measures of dependence between two variables
| Co-expression measures | What measures? | Input/Output | Features |
|---|---|---|---|
| Pearson’s correlation (PC) | Tendency to respond in opposite/same direction across different samples | Input: gene expressions value | • Sensitivity to outliers |
| Spearman’s correlation (SC) | Tendency to respond in opposite/same direction across different samples | Input: ranking values from expression levels in samples | • Robust to outliers |
| Mutual information | Reduction of uncertainty of a gene given the knowledge about other gene | Input: gene expression values | • Measure complex non-linear type relations (rarely present in biological data) |
| Kendall | Correspondence/compatibility among two rankings | Input: gene expression value | • Similar to SC |
Fig. 6Shape and degree distribution of random, small-world, and scale-free model with respect to a biological network. Models were calculated by ELIXIR web tool [131]
Fig. 7Functions used to describe the degree distribution of biological networks. Poisson curve a and power-law b shown for different parameters. c Example of graphlet of three nodes with frequency equal to 5
Centralities calculated by the CentiScaPe Cytoscape’s plugin
| Centrality | Description | Biological meaning |
|---|---|---|
| Diametera | Defines the longest shortest path in the network | |
| Average distancea | Defines the mean length of all the shortest paths in the network | |
| Degreeb | Describes the number of neighbors a node has | Highlights the number of nodes that regulated/regulate the node |
| Eccentricityb | Describes the longest shortest paths a node develop, giving us a proximity information | Highlights the easiness of a protein to reach/to be reached by all the other proteins in the network |
| Closenessb | Describes, for the node | Highlights the probability of a protein to be functionally relevant for several proteins, but irrelevant for a few others |
| Radialityb | Describes the integration of a node into the network | Highlights the ability of a protein to be functionally relevant for several proteins, but irrelevant for a few others |
| Centroidb | Describes the neighborhood of nodes by highlighting nodes that have the highest number of neighbors separated by the minimal shortest path | Highlights a protein that tends to be functionally capable of organizing discrete protein clusters or modules |
| Stressb | Describes the number of shortest paths that pass through a node | Highlights the relevance of a protein as functionally capable of holding together communicating nodes |
| Betweennessb | Describes, for each couple of nodes, the number of shortest paths that pass through a specific node | Highlights the relevance of a protein as functionally capable of holding together communicating nodes |
| Bridgingb | Describes the neighborhood of nodes by highlighting nodes with a high number of high-degree neighbors | Highlights a protein possibly bringing in communication sets of regulatory protein |
| Eigenvectorb | Describes a sort of weighted degree, where not only the number of the neighbors is important but also the Eigenvector of the neighbors itself | Highlights a protein interacting with several important proteins, suggesting a central super-regulatory role or a critical target of a regulatory pathways |
| Edge betweennessc | Describes, for each couple of nodes, the shortest paths that pass through a specific edge | Highlights the relevance of the interaction as capable of organizing regulatory process |
For each centrality, it is described the topological and biological meaning. The a indicates network’s properties. The b indicates node’s properties. The c indicates edge’s property
Fig. 8Example of topological, functional and disease modules not fully overlapped. The green nodes indicate a topological module, the blue nodes indicate a functional module, while the yellow nodes indicate a disease module
Fig. 9Procedure used to identify/predict modules in biological networks. The network structure is used to identify groups of highly connected nodes by graph clustering algorithm, while the GO annotations are used to improve the accuracy of the cluster prediction. The final result are clusters of nodes highly connected and related to functions/processes significantly enriched, thus acting at the basis of the emergent phenotypes