| Literature DB >> 29880025 |
Léon-Charles Tranchevent1, Petr V Nazarov1, Tony Kaoma1, Georges P Schmartz1,2, Arnaud Muller1, Sang-Yoon Kim1, Jagath C Rajapakse3, Francisco Azuaje4.
Abstract
BACKGROUND: One of the main current challenges in computational biology is to make sense of the huge amounts of multidimensional experimental data that are being produced. For instance, large cohorts of patients are often screened using different high-throughput technologies, effectively producing multiple patient-specific molecular profiles for hundreds or thousands of patients.Entities:
Keywords: Biological networks; Network topology; Network-based methods
Mesh:
Year: 2018 PMID: 29880025 PMCID: PMC5992838 DOI: 10.1186/s13062-018-0214-9
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Fig. 1Workflow of our network-based method. The raw omics data are first processed into data matrices by applying dimensionality reduction. The selected omics features are then used to infer Patient Similarity Networks (PSN), from which topological features are extracted. These network topological features are then used to build classification models, with classes defined according to the binary clinical descriptors
Summary of the experiments described in the manuscript together with their global settings
| Tag | Cohort | Model integration | Feature sets | Data sources |
|---|---|---|---|---|
| Classical | Both | No | - | All |
| Topological | Both | Yes | All | All |
| Integrated | Both | Yes | All | All |
| Centrality | Both | No | Centralities (all) | All |
| Single centrality | Both | No | Centralities (one) | All |
| node2vec | Both | No | node2vec | All |
| Diffusion | Both | No | Diffusion | All |
| Modularity | Both | No | Modularities | All |
| Transcriptomic (microarray) | Both | No | All | Transcriptomic (microarray) |
| Transcriptomic (RNA-seq) | Both | No | All | Transcriptomic (RNA-seq) |
| Transcriptomic (both) | Small | No | All | Transcriptomic (both) |
| Genomic (aCGH) | Small | No | All | Genomic |
| Fused | Both | Yes | All | All |
For the parameters that are not mentioned (e.g., dimension reduction strategy, network inference method, classification algorithm), the experiments are repeated for all possible values. Integration with weighted voting scheme. An equivalent tag for these models on the small cohort is All three sources. This means two on the large cohort and three on the small cohort. On the large cohort, it is equivalent to the topological model
Fig. 2Performance of the network-based method and its components. The performance (i.e., balanced accuracy) of classification models in various settings, and for the three clinical endpoints of interest. a Performance of classical, topological and integrated models on the large cohort (498 samples). b Performance of classical, topological and integrated models on the small cohort (142 samples). c Performance of models using only one of the four feature sets at once (Centrality, node2vec, diffusion and modularity) or all of them (topological, as in a). Results were obtained on the large cohort. d Performance of models using a single centrality metric or all centrality metrics at once. Results were obtained on the large cohort
Fig. 3Impact of the data sources on the performance. The performance (i.e., balanced accuracy) of classification models in various settings, and for the three clinical endpoints of interest. a Performance of the topological models relying only on a single transcriptomic data source (greens), or on both sources (red, equivalent to the topological model presented in Fig. 2a). Results were obtained on the large cohort. b Same as a but on the small cohort. Performance of topological models using one (greens and maroon), two (dark green, only transcriptomic) or three data sources (red, equivalent to the topological model presented in Fig. 2a)
Results of the Chi-squared tests on the clinical descriptors of the CAMDA 2017 neuroblastoma dataset
| Gender | Age | MYCN | Risk | Stage | Prog | Death | |
|---|---|---|---|---|---|---|---|
| Gender | 1 | 1 | 1 | 1 | 1 | 1 | |
| Age | 0.61 | 5.3e-4 | 8.8e-28 | 1.6e-19 | 3.6e-7 | 4.8e-11 | |
| MYCN | 0.50 | 2.5e-5 | 3.2e-44 | 7.4e-11 | 2.3e-8 | 8.2e-17 | |
| Risk | 0.09 | 4.2e-29 | 1.5e-45 | 1.7e-57 | 3.4e-25 | 1.6e-34 | |
| Stage | 0.43 | 7.7e-21 | 3.5e-12 | 8.2e-59 | 4.2e-21 | 1.9e-21 | |
| Prog | 0.58 | 1.7e-8 | 1.1e-9 | 1.6e-26 | 2.0e-22 | 1.2e-49 | |
| Death | 0.37 | 2.3e-12 | 3.9e-18 | 7.5e-36 | 9.0e-23 | 5.5e-51 |
Results are presented for all pairwise comparisons with Bonferroni corrected P values in the upper triangle, and uncorrected P values in the lower triangle. Notes: Age: age at diagnosis, MYCN: MYCN mutation status, Risk: high-risk, Stage: INSS tumor stage, Prog: progression, Death: death from disease