| Literature DB >> 36245002 |
Surabhi Jagtap1,2, Aurélie Pirayre2, Frédérique Bidard2, Laurent Duval2, Fragkiskos D Malliaros3.
Abstract
BACKGROUND: Gene expression is regulated at different molecular levels, including chromatin accessibility, transcription, RNA maturation, and transport. These regulatory mechanisms have strong connections with cellular metabolism. In order to study the cellular system and its functioning, omics data at each molecular level can be generated and efficiently integrated. Here, we propose BRANENET, a novel multi-omics integration framework for multilayer heterogeneous networks. BRANENET is an expressive, scalable, and versatile method to learn node embeddings, leveraging random walk information within a matrix factorization framework. Our goal is to efficiently integrate multi-omics data to study different regulatory aspects of multilayered processes that occur in organisms. We evaluate our framework using multi-omics data of Saccharomyces cerevisiae, a well-studied yeast model organism.Entities:
Keywords: Biological network integration; Graph representation learning; Multi-omics data; Multilayer network; Regulatory network inference
Mesh:
Substances:
Year: 2022 PMID: 36245002 PMCID: PMC9575224 DOI: 10.1186/s12859-022-04955-w
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Experimental design and BRANEnet processing workflow. The set up of wet-lab experiments (steps 1, 2, and 3) are taken from the data descriptor article [28]. Steps 4, 5, and 6 perform dataset collection and prepossessing before integration. (7) Learn embeddings using BRANEnet. (8–10) Downstream bioinformatics tasks
Fig. 2Overview of BRANEnet. Inputs (X) and (Y) are on the left. A multilayer network is composed of intra- and inter-omics relationships. For , the random walk-based PPMI matrix is computed. To obtain embeddings, is factorized and the final embeddings are obtained
Fig. 3TF-target prediction. Comparative performance of BRANEnet with baseline methods with AUPR scores for both average and weighted L2 coordinate-wise operations. The error bars show the standard deviation in AUPR scores for 10 runs
Fig. 4a Precision@k for top 500 edges compared to baseline methods. The x-axis represents top k edges and y-axis represents precision@k respectively. b MCC@threshold compared to baseline methods. The x-axis and y-axis represent threshold of and MCC@threshold, respectively
Fig. 5ION visualization for yeast during time-dependent heat stress inferred using BRANEnet. Node color, node shape and edge color represent the information shown in the legend. The label size of each node is proportional to the its degree in the inferred network
ION based identification of potential bio-markers
| Name | Function | References | |||
|---|---|---|---|---|---|
| STI1 | 40 | Hsp90 cochaperone | [ | A | |
| SSA4 | 39 | Heat shock protein | [ | A | |
| TFS1 | 46 | Inhibitor of carboxy-peptidase Y, Ras GAP | [ | A | |
| YMR090W | 50 | Unknown function | [ | A | |
| SSE2 | 36 | Hsp110 family member | [ | A | |
| IDH2 | 37 | Oxidative decarboxy-lation of isocitrate | [ | A | |
| SSA1 | 40 | ATPase | [ | A | |
| STE2 | 36 | Receptor for | [ | A | |
| HSP104 | 33 | Disaggregase | [ | A | |
| STI1 | 41 | Hsp90 cochaperone | [ | A | |
| MET6 | 31 | Cobalamin-independent methionine synthase | [ | A | |
| STR3 | 30 | Peroxisomal cysta-thionine beta-lyase | [ | A | |
| RTC3 | 28 | Unknown function | [ | A | |
| MSC1 | 27 | Unknown function | [ | A | |
| PNC1 | 27 | Nicotinamidase acid | [ | A | |
| GSP2 | 30 | GTP binding protein | [ | A | |
| GRE3 | 31 | Aldose reductase | [ | A | |
| YLR030W | 31 | Unknown function | – | B | |
| SOL4 | 32 | 6-phospho-gluconolactonase | [ | A | |
| HSP12 | 28 | Heat shock protein | [ | A | |
| IDH1 | 25 | Oxidative decarboxy-lation of isocitrate | [ | A |
The table provides the names, over- () or under- () expressed, node degree in ION (), function, cross references, and BRANEnet module information comparison with external studies of potential bio-markers during heat stress response in yeast
Fig. 6Functional enrichment of modules A and B. The y-axis represents the list of significantly enriched terms, while the x-axis shows their significance value ((p value)). Different colors of circles indicate types of functional annotations: biological process (BP) in pink, molecular function (MF) in blue, and KEGG pathway in green. The size of circle represents the number of differentially expressed genes/TFs
Fig. 7a Precision@k for top 500 edges. The x-axis represents top k edges and y-axis represents precision@k respectively. b MCC@threshold. The x-axis and y-axis represent threshold of and MCC@threshold, respectively
Parameter sensitivity analysis for TF-target prediction
| d = 32 | d =64 | d =128 | d =256 | |
|---|---|---|---|---|
| Average | ||||
| T =1 | 0.700 | 0.880 | 0.880 | 0.870 |
| T = 2 | 0.790 | 0.820 | 0.850 | 0.860 |
| T =3 | 0.830 | 0.850 | 0.870 | 0.870 |
| T =4 | 0.780 | 0.810 | 0.850 | 0.860 |
| T =5 | 0.820 | 0.840 | 0.860 | 0.870 |
| Weighted L2 | ||||
| T =1 | 0.980 | 0.982 | 0.983 | 0.983 |
| T =2 | 0.916 | 0.945 | 0.966 | 0.968 |
| T =3 | 0.956 | 0.967 | 0.979 | 0.979 |
| T =4 | 0.852 | 0.938 | 0.966 | 0.968 |
| T= 5 | 0.952 | 0.968 | 0.981 | 0.981 |
Node embeddings are computed using and . The performance is measured by computing AUPR score for average and weighted L2 coordinate-wise operations
Fig. 8Parameter sensitivity analysis for ION inference. Node embeddings are computed using and . The performance is measured by computing MCC for different values of . The x-axis represents the MCC score at threshold () given in the y-axis