| Literature DB >> 32606380 |
Zahra Razaghi-Moghadam1,2, Zoran Nikoloski3,4.
Abstract
Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data from Escherichia coli and Saccharomyces cerevisiae as well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs from E. coli and S. cerevisiae to validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein-protein and protein-metabolite interactions.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32606380 PMCID: PMC7327016 DOI: 10.1038/s41540-020-0140-1
Source DB: PubMed Journal: NPJ Syst Biol Appl ISSN: 2056-7189
Fig. 1Visualisation of the steps in GRADIS.
a GRADIS requires expression data and knowledge of known transcription factor (TF) and gene (G) interactions as input. b The samples in the expression data are first clustered using k-means clustering, and the respective centroids are used to obtain informative and non-redundant data. c Features are then constructed from the scaled data sets obtained from the sample clustering in (b).
Fig. 2Construction of a Euclidian-metric complete graph.
An example of expression profiles (a) of transcription factor (TF) and a gene (G) represented in the unit square (b), and (c) the adjacency matrix of the Euclidean-metric complete graph obtained from (b). The feature for the TF–gene pairs is given by vectorisation of the upper triangle of the matrix (excluding the diagonal as non-informative).
Comparative analysis based on area under the ROC curve (AUC).
| Data | Methods | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ARACNE | CLR | TIGRESS | mrnet | GENIE3 | iRafNet | Wisdom of crowds | SIRENE (average) | Expression-based SVM | GRADIS | |
| DREAM4 | ||||||||||
| 0.56 | 0.71 | 0.50 | 0.69 | 0.77 | 0.5 | 0.82 | 0.54 | 0.81 (0.77–0.84) | 0.86 (0.80–0.92) | |
| 0.54 | 0.64 | 0.50 | 0.65 | 0.69 | 0.5 | 0.78 | 0.48 | 0.83 (0.79–0.87) | 0.85 (0.82–0.88) | |
| 0.56 | 0.71 | 0.52 | 0.72 | 0.73 | 0.5 | 0.79 | 0.5 | 0.72 (0.66–0.77) | 0.77 (0.72–0.82) | |
| 0.55 | 0.67 | 0.51 | 0.67 | 0.69 | 0.5 | 0.78 | 0.5 | 0.70 (0.67–0.73) | 0.76 (0.72–0.80) | |
| 0.58 | 0.68 | 0.51 | 0.52 | 0.76 | 0.5 | 0.80 | 0.48 | 0.71 (0.64–0.79) | 0.77 (0.71–0.82) | |
| DREAM5 | ||||||||||
| 0.50 | 0.50 | 0.74 | 0.74 | 0.82 | – | 0.81 | 0.42 | 0.84 (0.83–0.85) | 0.85 (0.84–0.86) | |
| 0.51 | 0.59 | 0.59 | 0.59 | 0.69 | – | 0.69 | 0.41 | 0.87 (0.85–0.88) | 0.94 (0.93–0.94) | |
| 0.50 | 0.52 | 0.52 | 0.52 | 0.54 | – | 0.54 | 0.49 | 0.80 (0.79–0.81) | 0.96 (0.96–0.97) | |
The performance of GRADIS is compared with that of unsupervised approaches (ARACNE, CLR, GENIE3, iRafNet, mrnet, TIGRESS), their combination based on wisdom of crowds and two supervised approaches (SIRENE and expression-based SVM classifier). Since the performance is based on the global (i.e., network-centric) approach, for SIRENE we report the average AUC over all TFs (for local comparison, refer to ‘Methods'). The numbers in parentheses refer to confidence intervals (see ‘Methods'). The comparison includes the five synthetic data sets from the DREAM4 challenge as well as the one synthetic and the two real-world data sets from the DREAM5 challenge. Results from iRafNet are not provided for the data sets in DREAM5 due to lack of data on knockout experiments and protein–protein interactions.