| Literature DB >> 36207341 |
Ilaria Granata1, Ichcha Manipur2, Maurizio Giordano2, Lucia Maddalena2, Mario Rosario Guarracino3.
Abstract
Studies about the metabolic alterations during tumorigenesis have increased our knowledge of the underlying mechanisms and consequences, which are important for diagnostic and therapeutic investigations. In this scenario and in the era of systems biology, metabolic networks have become a powerful tool to unravel the complexity of the cancer metabolic machinery and the heterogeneity of this disease. Here, we present TumorMet, a repository of tumor metabolic networks extracted from context-specific Genome-Scale Metabolic Models, as a benchmark for graph machine learning algorithms and network analyses. This repository has an extended scope for use in graph classification, clustering, community detection, and graph embedding studies. Along with the data, we developed and provided Met2Graph, an R package for creating three different types of metabolic graphs, depending on the desired nodes and edges: Metabolites-, Enzymes-, and Reactions-based graphs. This package allows the easy generation of datasets for downstream analysis.Entities:
Mesh:
Year: 2022 PMID: 36207341 PMCID: PMC9547001 DOI: 10.1038/s41597-022-01702-x
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Overview of the Metabolic networks construction. The context-specific GSMs used in this study derive from the human generic GSM through the integration of tissue-specific multi-omics data (tissue-specific GSMs from Human Metabolic Atlas) or by integration of TCGA transcriptomics data (PDGSMMs from Biomodels). The context-specific GSMs carrying information about biochemical reactions are the input to create the context-specific metabolic networks of the TumorMet repository. Metabolites-based_tissue networks are generated by integrating TCGA gene/enzyme-expression data into the tissue-specific GSMs to weight the edges represented by enzymes connecting two metabolites. Networks of different patients have the same structure with different edge weights depending on patient expression profile. Enzymes-, Reactions and Metabolites-based_PDGSMMs networks are created from PDGSMMs and have enzymes/reactions as nodes connected by metabolites or metabolites as nodes connected by enzymes. Networks of different patients have different structures and no weights.
Properties of the Metabolites-based networks derived from tissue models.
| Kidney | Lung | Brain | Breast | Ovary | Prostate | |
|---|---|---|---|---|---|---|
| # Graphs | 928 | 1135 | 702 | 1217 | 379 | 551 |
| # Vertices | 4034 | 3990 | 3922 | 3394 | 3827 | 3939 |
| # Edges | 9210 | 9058 | 8914 | 6548 | 8533 | 8747 |
| Edge density | 0.000566 | 0.00056 | 0.00058 | 0.00057 | 0.00058 | 0.00056 |
| Avg. network degree | 4.56 | 4.54 | 4.54 | 3.86 | 4.46 | 4.44 |
| Edge weights | √ | √ | √ | √ | √ | √ |
| Assortativity degree | −0.038 | −0.035 | −0.034 | −0.049 | −0.027 | −0.03 |
| Global transitivity | 0.12 | 0.13 | 0.13 | 0.053 | 0.135 | 0.132 |
| Avg. local transitivity | 0.14 | 0.14 | 0.15 | 0.13 | 0.14 | 0.15 |
| Minimum diameter | 134.75 | 134.15 | 143.41 | 141.43 | 131.15 | 146.73 |
| Maximum diameter | 243.08 | 206.47 | 200.47 | 236.41 | 188.13 | 225.4 |
For each tissue dataset (along the columns), we report the number of graphs (first row) and the corresponding networks topological properties, such as the number of vertices and edges, edge density, average network degree, eventual presence of edge weights, assortativity degree, global transitivity, average local transitivity, minimum and maximum diameter (second through and eleventh rows). Observe that, for each tissue dataset, Metabolites-based networks share the same network structure, and thus topological properties, for all the samples since they derive from the same tissue metabolic model personalized with gene expression values.
For each tissue dataset of the Metabolites- (a), Enzymes- (b), and Reactions-based_PDGSMMs (c) networks (along the columns), we report the number of graphs (first row) and the corresponding networks topological properties, such as the number of vertices and edges, edge density, average network degree, eventual presence of edge weights, assortativity degree, global transitivity, average local transitivity, minimum and maximum diameter (second through and eleventh rows).
| Kidney | Lung | Brain | Breast | Ovary | Prostate | |
|---|---|---|---|---|---|---|
| # Graphs | 737 | 829 | 138 | 920 | 295 | 470 |
| # Vertices | 2679.05 ± 316.11 | 2619.5 ± 310.49 | 2634.49 ± 277.2 | 2576 ± 303.92 | 2576.93 ± 307.47 | 2676.14 ± 300.88 |
| # Edges | 6121.64 ± 839.57 | 6008.53 ± 908.15 | 6074.34 ± 783.77 | 5870.26 ± 841.16 | 5729.2 ± 837.64 | 5799.38 ± 769.08 |
| Edge density | 0.00086 ± 0.000009 | 0.0009 ± 0.0000009 | 0.0009 ± 0.0000009 | 0.0009 ± 0.0001 | 0.00087 ± 0.0001 | 0.0008 ± 0.000009 |
| Avg. network degree | 4.56 ± 0.23 | 4.57 ± 0.3 | 4.6 ± 0.28 | 4.54 ± 0.28 | 4.44 ± 0.29 | 4.33 ± 0.27 |
| Edge weights | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Assortativity degree | −0.01 ± 0.02 | 0.006 ± 0.029 | −0.004 ± 0.033 | 0.012 ± 0.031 | −0.008 ± 0.034 | −0.018 ± 0.03 |
| Global transitivity | 0.16 ± 0.02 | 0.17 ± 0.026 | 0.16 ± 0.03 | 0.17 ± 0.029 | 0.14 ± 0.035 | 0.12 ± 0.035 |
| Avg. local transitivity | 0.12 ± 0.02 | 0.12 ± 0.02 | 0.12 ± 0.02 | 0.12 ± 0.02 | 0.11 ± 0.021 | 0.11 ± 0.02 |
| Minimum diameter | 134.8 | 134.01 | 140.76 | 118.44 | 120.84 | 145.97 |
| Maximum diameter | 302.7 | 241.72 | 217.06 | 255.8 | 211.24 | 282.75 |
| # Graphs | 737 | 829 | 138 | 920 | 295 | 470 |
| # Vertices | 1941.256 ± 300.92 | 1859.76 ± 317.84 | 1911.35 ± 274.35 | 1846.58 ± 305.48 | 1859.98 ± 309.68 | 1934.25 ± 266.7 |
| # Edges | 63906.79 ± 18916.49 | 59341.79 ± 20947.88 | 63485.08 ± 17933.19 | 59530.08 ± 19744.67 | 59316.15 ± 202888.06 | 63922 ± 16898.25 |
| Edge density | 0.016 ± 0.002 | 0.016 ± 0.002 | 0.07 ± 0.002 | 0.016 ± 0.002 | 0.016 ± 0.002 | 0.016 ± 0.002 |
| Avg. network degree | 63.8 ± 14.16 | 61.23 ± 16.3 | 64.63 ± 14.07 | 62.15 ± 15.67 | 61.39 ± 15.86 | 64.49 ± 13.05 |
| Edge weights | x | x | x | x | x | x |
| Assortativity degree | 0.25 ± 0.04 | 0.25 ± 0.04 | 0.25 ± 0.04 | 0.26 ± 0.04 | 0.24 ± 0.046 | 0.25 ± 0.038 |
| Global transitivity | 0.18 ± 0.04 | 0.19 ± 0.046 | 0.18 ± 0.039 | 0.19 ± 0.046 | 0.19 ± 0.047 | 0.18 ± 0.035 |
| Avg. local transitivity | 0.29 ± 0.018 | 0.3 ± 0.02 | 0.03 ± 0.018 | 0.3 ± 0.02 | 0.298 ± 0.02 | 0.29 ± 0.015 |
| Minimum diameter | 14 | 13 | 13 | 14 | 13 | 14 |
| Maximum diameter | 34 | 36 | 28 | 33 | 35 | 30 |
| # Graphs | 737 | 829 | 138 | 920 | 295 | 470 |
| # Vertices | 3578.24 ± 595.037 | 3511.46 ± 637.32 | 3560.4 ± 543.41 | 3431.28 ± 591.12 | 3327.51 ± 582.49 | 3398 ± 527.44 |
| # Edges | 54823.89 ± 16130.9 | 60808.68 ± 19146.22 | 60137 ± 17749 | 59467 ± 18330 | 49776.08 ± 17158.5 | 46345.11 ± 14345.88 |
| Edge density | 0.0043 ± 0.0008 | 0.0048 ± 0.0007 | 0.004 ± 0.00085 | 0.0049 ± 0.00086 | 0.004 ± 0.00092 | 0.004 ± 0.0008 |
| Avg. network degree | 30.2 ± 6.13 | 33.74 ± 7.05 | 33.17 ± 7.1 | 33.9 ± 7.17 | 29.3 ± 6.96 | 26.91 ± 5.79 |
| Edge weights | x | x | x | x | x | x |
| Assortativity degree | 0.027 ± 0.016 | 0.052 ± 0.18 | 0.023 ± 0.15 | 0.048 ± 0.17 | 0.065 ± 0.2 | 0.06 ± 0.17 |
| Global transitivity | 0.028 ± 0.015 | 0.038 ± 0.017 | 0.037 ± 0.016 | 0.038 ± 0.017 | 0.028 ± 0.016 | 0.024 ± 0.013 |
| Avg. local transitivity | 0.038 ± 0.006 | 0.04 ± 0.006 | 0.043 ± 0.0059 | 0.04 ± 0.006 | 0.04 ± 0.006 | 0.04 ± 0.006 |
| Minimum diameter | 48 | 48 | 48 | 48 | 48 | 49 |
| Maximum diameter | 103 | 113 | 97 | 104 | 102 | 101 |
Observe that each network derived from PDGSMMs and corresponding to each patient sample has a different structure since the starting models are patient-specific (see Paragraphs on Metabolites-, Enzymes-, and Reactions-based PDGSMM networks). Therefore, values for network properties are reported as average ± standard deviation across all the networks of each dataset.
Properties of the Simplified Networks. See the caption of Table 1 for details.
| Simpl-Kidney-441 | Simpl-Kidney-1034 | Simpl-Lung-312 | Simpl-Lung-1017 | |
|---|---|---|---|---|
| # Graphs | 299 | 299 | 337 | 337 |
| # Vertices | 441 | 1034 | 312 | 1017 |
| # Edges | 1585 | 3226 | 1090 | 3102 |
| Edge density | 0.0163 | 0.006 | 0.022 | 0.006 |
| Avg. network degree | 7.18 | 6.24 | 6.98 | 6.1 |
| Edge weights | ✓ | ✓ | ✓ | ✓ |
| Assortativity degree | −0.22 | −0.13 | −0.11 | −0.12 |
| Global transitivity | 0.3 | 0.21 | 0.45 | 0.23 |
| Avg. local transitivity | 0.23 | 0.22 | 0.29 | 0.22 |
| Minimum diameter | 15.52 | 125.99 | 16.88 | 79.7 |
| Maximum diameter | 39.37 | 455.36 | 32.14 | 267.6 |
Classes per dataset for usage validation of Metabolites-based networks through classification. Only primary tumors have been selected.
| Kidney | Lung | Brain | |||
|---|---|---|---|---|---|
| Cases | 822 | Cases | 1025 | Cases | 666 |
| Kidney Renal Papillary cell carcinoma (KIRP) | 288 | Adenocarcinoma (LUAD) | 524 | Glioblastoma multiforme (GBM) | 155 |
| Kidney Renal Clear cell carcinoma (KIRC) | 534 | Squamous cell carcinoma (LUSC) | 501 | Lower grade glioma (LGG) | 511 |
| Cases | 1085 | Cases | 290 | Cases | 497 |
| High-grade serous ovarian cancers subtypes[ | Gleason score | ||||
| Basal-like | 192 | Differentiated | 75 | Pattern 3 | 199 |
| HER2-enriched | 82 | Mesenchymal | 75 | Pattern 4 | 249 |
| Luminal A | 564 | Proliferative | 75 | Pattern 5 | 49 |
| Luminal B | 207 | Immunoreactive | 65 | ||
| Normal-like | 40 | ||||
Networks provided in the TumorMet repository.
| Type of network | Data used to build the networks | Number of networks | |
|---|---|---|---|
| Kidney | Metabolites-based_tissue | • Tissue-Specific Model - Kidney • TCGA-KIRC & TCGA-KIRP GE | 928: 607 TCGA-KIRC 321 TCGA-KIRP |
| Metabolites-, Enzymes-, Reactions-based_PDGSMMs | • PDGSMMs from TCGA-KIRC & TCGA-KIRP • TCGA-KIRC & TCGA-KIRP GE (only for Metabolites-based) | 737: 484 TCGA-KIRC 253 TCGA-KIRP | |
| Simplified | • Tissue-Specific Model - Kidney • TCGA-KIRC & TCGA-KIRP GE | 299 for each simplification: 193 TCGA-KIRC 106 TCGA-KIRP | |
| Lung | Metabolites-based_tissue | • Tissue-Specific Model - Lung • TCGA-LUAD & TCGA-LUSC GE | 1135: 585 TCGA-KIRC 550 TCGA-KIRP |
| Metabolites-, Enzymes-, Reactions-based_PDGSMMs | • PDGSMMs from TCGA-LUAD & TCGA-LUSC • TCGA-LUAD & TCGA-LUSC GE (only for Metabolites-based) | 829: 429 TCGA-LUAD 400 TCGA-LUSC | |
| Simplified | • Tissue-Specific Model - Lung • TCGA-LUAD & TCGA-LUSC GE | 337 for each simplification: 174 TCGA-LUAD 163-TCGA-LUSC | |
| Brain | Metabolites-based_tissue | • Tissue-Specific Model - Brain • TCGA-GBM & TCGA-LGG GE | 702: 173 TCGA-GBM 529 TCGA-LGG |
| Metabolites-, Enzymes-, Reactions-based_PDGSMMs | • PDGSMMs from TCGA-GBM • TCGA-GBM GE (only for Metabolites-based) | 138 TCGA-GBM | |
| Breast | Metabolites-based_tissue | • INIT Cancer Model - Breast TCGA-BRCA GE | 1217 TCGA-BRCA |
| Metabolites-, Enzymes-, Reactions-based_PDGSMMs | • PDGSMMs from TCGA-BRCA • TCGA-BRCA GE (only for Metabolites-based) | 920 TCGA-BRCA | |
| Ovary | Metabolites-based_tissue | • Tissue-Specific Model - Ovary | 379 TCGA-OV |
| • TCGA-OV GE | |||
| Metabolites-, Enzymes-, Reactions-based_PDGSMMs | • PDGSMMs from TCGA-OV • TCGA-OV GE (only for Metabolites-based) | 295 TCGA-OV | |
| Prostate | Metabolites-based_tissue | • Tissue-Specific Model - Prostate • TCGA-PRAD GE | 551 TCGA-PRAD |
| Metabolites-, Enzymes-, Reactions-based_PDGSMMs | • PDGSMMs from TCGA-PRAD • TCGA-PRAD GE (only for Metabolites-based) | 470 TCGA-PRAD |
For each tumor tissue: the type of networks, the data used to generate the networks in terms of metabolic models and Gene Expression (GE) data from TCGA projects, and the number of networks, eventually subdivided by TCGA project ID. Observe that in the case of PDGSMMs derived networks, only for Metabolites-based_PDGSMM networks the GE data have been used to weight the edges.
Classes of PDGSMMs used to accomplish the classification task of Kidney and Lung PDGSMMs derived networks.
| Kidney | Lung | |
|---|---|---|
| Cases | 737 | 829 |
| Classes | KIRC 484 | LUAD 429 |
| KIRP 253 | LUSC 400 |
Fig. 2Scheme of the content of the TumorMet repository.
Fig. 3t-SNE representations of the Gram matrices of the test sets of the Kidney (a), Lung (b), Brain (c), and Ovary (d) Metabolites-based_tissue datasets. The TSNE function of the sklearn.manifold library has been used to generate the plots.
Fig. 4t-SNE representations of the Gram matrices of the test sets of the Breast_4cl (a), Breast_5cl (b), Prostate1 (c), and Prostate2 (d) Metabolites-based_tissue datasets. The TSNE function of the sklearn.manifold library has been used to generate the plots.
Classification scores on weighted and unweighted Metabolites-based_PDGSMMs networks of Kidney samples.
| Metabolites-based_PDGSMMs Kidney | ||
|---|---|---|
| # Classes | 2 | |
| # Samples per class | 484/253 | |
| Accuracy avg % | 83.45 ± 4.58 | 85.48 ± 3.12 |
| Precision avg % | 82.28 ± 5.13 | 84.43 ± 3.71 |
| Recall avg % | 80.87 ± 4.82 | 82.99 ± 3.36 |
| F1 avg % | 81.32 ± 5.08 | 83.59 ± 3.47 |
| MCC avg | 0.63 ± 0.10 | 0.67 ± 0.07 |
Fig. 5Classification scores on the Metabolites-based_tissue datasets. The box-plots show the classification scores obtained from the 10 iterations of the evaluation procedure on the training sets of the six Metabolites-based_tissue datasets. (a–c) report Accuracy, Precision, Recall, and F1 as percentages; (d) reports MCC values.
Classification scores on Enzymes- and Reactions-based_PDGSMMs Kidney and Lung datasets.
| Kidney | Lung | |||
|---|---|---|---|---|
| Enzymes-based_PDGSMMs | Reactions-based_PDGSMMs | Enzymes-based_PDGSMMs | Reactions-based_PDGSMMs | |
| # Classes | 2 | 2 | 2 | 2 |
| # Samples per class | 484/253 | 484/253 | 429/400 | 429/400 |
| Accuracy avg % | 78.97 ± 5.15 | 83.44 ± 4.32 | 78.17 ± 2.89 | 77.93 ± 2.62 |
| Precision avg % | 77.35 ± 6.00 | 81.99 ± 4.91 | 78.57 ± 2.69 | 78.36 ± 2.44 |
| Recall avg % | 75.72 ± 5.36 | 81.16 ± 5.00 | 78.00 ± 3.04 | 77.83 ± 2.75 |
| F1 avg % | 76.18 ± 5.42 | 81.39 ± 4.94 | 78.05 ± 3.04 | 77.76 ± 2.75 |
| MCC avg | 0.53 ± 0.11 | 0.63 ± 0.10 | 0.57 ± 0.06 | 0.56 ± 0.05 |
Classification scores on Metabolites-based_tissue datasets.
| # Classes | Kidney | Lung | Brain | Breast_4cl | Breast_5cl | Ovary | Prostate1 | Prostate2 |
|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 2 | 4 | 5 | 4 | 2 | 2 | |
| # Samples per class | 159/90 | 158/150 | 109/358 | 135/58/395/145 | 135/58/395/145/28 | 53/46/53/53 | 140/172 | 140/209 |
| Accuracy avg % | 92.80 ± 4.87 | 94.87 ± 3.68 | 95.83 ± 2.65 | 84.91 ± 4.15 | 81.02 ± 4.29 | 79.78 ± 7.79 | 71.83 ± 8.17 | 75.086.17 |
| Precision avg % | 91.97 ± 5.5 | 94.94 ± 3.85 | 93.63 ± 4.69 | 81.60 ± 4.99 | 72.30 ± 6.6 | 79.83 ± 8.57 | 72.23 ± 8.32 | 74.84 ± 6.11 |
| Recall avg % | 92.99 ± 5.1 | 94.95 ± 3.54 | 95.23 ± 3.57 | 85.93 ± 5.36 | 78.55 ± 7.3 | 79.93 ± 8.86 | 72.25 ± 8.22 | 75.99 ± 6.39 |
| F1 avg % | 92.12 ± 5.3 | 94.74 ± 3.83 | 94.15 ± 3.76 | 82.66 ± 4.76 | 73.36 ± 6.11 | 78.09 ± 8.83 | 71.14 ± 8.41 | 74.31 ± 6.35 |
| MCC avg | 0.85 ± 0.1 | 0.90 ± 0.07 | 0.89 ± 0.07 | 0.77 ± 0.06 | 0.73 ± 0.06 | 0.73 ± 0.1 | 0.44 ± 0.16 | 0.51 ± 0.12 |
| # Samples per class | 375/198 | 366/351 | 46/511 | 57/24/169/62 | 57/24/169/62/12 | 22/19/22/22 | 59/77 | 59/89 |
| Accuracy % | 97.03 | 93.72 | 91.00 | 85.26 | 83.64 | 70.59 | 73.53 | 73.00 |
| Precision % | 96.40 | 93.72 | 85.92 | 80.62 | 74.46 | 73.49 | 73.63 | 72.60 |
| Recall % | 97.13 | 93.72 | 93.36 | 83.99 | 81.83 | 70.33 | 74.05 | 73.53 |
| F1% | 96.75 | 93.72 | 88.56 | 82.02 | 77.48 | 71.05 | 73.44 | 72.57 |
| MCC | 0.94 | 0.87 | 0.79 | 0.77 | 0.76 | 0.61 | 0.48 | 0.46 |
Top: CV on training sets; Bottom: Validation on test sets.
| Measurement(s) | gene expression, metabolic relationships |
| Technology Type(s) | Genome Scale Metabolic Models; Computational network biology |
| Sample Characteristic - Organism | Homo sapiens |