| Literature DB >> 35860412 |
Lingmei Li1, Yifang Wei1, Guojing Shi1, Haitao Yang2, Zhi Li3, Ruiling Fang1, Hongyan Cao1,4, Yuehua Cui5.
Abstract
Lower-grade gliomas (LGG), characterized by heterogeneity and invasiveness, originate from the central nervous system. Although studies focusing on molecular subtyping and molecular characteristics have provided novel insights into improving the diagnosis and therapy of LGG, there is an urgent need to identify new molecular subtypes and biomarkers that are promising to improve patient survival outcomes. Here, we proposed a joint similarity network fusion (Joint-SNF) method to integrate different omics data types to construct a fused network using the Joint and Individual Variation Explained (JIVE) technique under the SNF framework. Focusing on the joint network structure, a spectral clustering method was employed to obtain subtypes of patients. Simulation studies show that the proposed Joint-SNF method outperforms the original SNF approach under various simulation scenarios. We further applied the method to a Chinese LGG data set including mRNA expression, DNA methylation and microRNA (miRNA). Three molecular subtypes were identified and showed statistically significant differences in patient survival outcomes. The five-year mortality rates of the three subtypes are 80.8%, 32.1%, and 34.4%, respectively. After adjusting for clinically relevant covariates, the death risk of patients in Cluster 1 was 5.06 times higher than patients in other clusters. The fused network attained by the proposed Joint-SNF method enhances strong similarities, thus greatly improves subtyping performance compared to the original SNF method. The findings in the real application may provide important clues for improving patient survival outcomes and for precision treatment for Chinese LGG patients. An R package to implement the method can be accessed in Github at https://github.com/Sameerer/Joint-SNF.Entities:
Keywords: Joint-SNF; LGG; Multi-omics data integration; Subtypes identification
Year: 2022 PMID: 35860412 PMCID: PMC9284445 DOI: 10.1016/j.csbj.2022.06.065
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
The averaged NMI on simulated dataset with the standard errors given in the parenthesis.
| Method | SimData1( | SimData2( | SimData3( | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Low | Moderate | High | Low | Moderate | High | Low | Moderate | High | |
| Joint-SNF | |||||||||
| JIVE-SNF | 0.364 (0.094) | 0.346 (0.076) | 0.360 (0.067) | 0.307 (0.087) | 0.346 (0.093) | 0.339 (0.072) | 0.235 (0.080) | 0.358 (0.098) | 0.339 (0.080) |
| SNF | 0.265 (0.032) | 0.356 (0.033) | 0.466 (0.054) | 0.196 (0.034) | 0.308 (0.031) | 0.362 (0.034) | 0.131 (0.035) | 0.277 (0.031) | 0.325 (0.031) |
| IntNMF | 0.294 (0.031) | 0.351 (0.036) | 0.414 (0.049) | 0.251 (0.030) | 0.310 (0.028) | 0.339 (0.023) | 0.205 (0.039) | 0.285 (0.037) | 0.328 (0.031) |
| CIMLR | 0.293 (0.042) | 0.449 (0.058) | 0.668 (0.052) | 0.170 (0.036) | 0.353 (0.044) | 0.452 (0.062) | 0.091 (0.043) | 0.303 (0.042) | 0.381 (0.038) |
Baseline characteristics of 86 LGG patients.
| Item | Classification | |
|---|---|---|
| Age, years | 38.56 ± 11.60 | |
| Gender | Female | 40(46.5) |
| Male | 46(53.5) | |
| WHO grade | II | 52(60.5) |
| III | 34(39.5) | |
| Sample type | Primary | 81(94.2) |
| Recurrent | 5(5.8) | |
| Survival outcome | Dead | 42(48.8) |
| Alive | 44(51.2) | |
| IDH_mutation_status | Mutant | 59(68.6) |
| Wildtype | 26(30.2) | |
| NA | 1(1.2) |
Fig. 1Schematic representation of the Joint-SNF method used for LGG subtyping.
Fig. 2Kaplan-Meier curves showing overall survival for the three subtypes of LGG obtained by Joint-SNF (A) and SNF (B).
Clinical and pathological characteristics of different subtypes.
| Characteristic | Cluster 1(n = 26) | Cluster 2 (n = 28) | Cluster 3(n = 32) |
|---|---|---|---|
| Age, years | 42.65 ± 14.61 | 37.61 ± 8.75 | 36.06 ± 10.42 |
| Female, n (%) | 12(46.1) | 14(50.0) | 14(43.8) |
| WHO grade, n (%) | |||
| Grade II | 1(3.8) | 26(92.9) | 25(78.1) |
| Grade III | 25(96.2) | 2(7.1) | 7(21.9) |
| Death event, n (%) | 20 (76.9) | 6 (21.4) | 16 (50.0) |
Comparison of log-rank test p-value of Joint-SNF and SNF across different numbers of subtypes.
| Method | p-value under different numbers of clusters | ||
|---|---|---|---|
| 3 | 4 | 5 | |
| Joint-SNF | 7.48E-08 | 6.28E-07 | 3.10E-07 |
| SNF | 1.17E-07 | 6.37E-07 | 4.29E-04 |
Fig. 3Boxplots of the -log10(p-value) obtained with the log-rank test for the difference of the survival curves assuming different numbers of clusters using Joint-SNF and SNF over 20 random sample splits.
The mean p-value of the log-rank test over 20 random sample splits.
| Method | mean p-value of the log-rank test | ||
|---|---|---|---|
| 3 | 4 | 5 | |
| Joint-SNF | 8.61E-05 | 1.83E-04 | 6.71E-04 |
| SNF | 1.68E-04 | 1.64E-03 | 1.11E-02 |
Cox regression results of 86 LGG patients.
| Item | ||||
|---|---|---|---|---|
| Subtypes | ||||
| Cluster1* | 1.622(0.626) | 2.591 | 0.009 | 5.062(1.484,17.260) |
| Cluster3 | 0.885(0.488) | 1.814 | 0.070 | 2.423(0.931,6.306) |
| Age | 0.347(0.325) | 1.067 | 0.286 | 1.415(0.747,2.679) |
| Gender | 0.020(0.320) | 0.064 | 0.949 | 1.020(0.545,1.911) |
| WHO grade | 0.603(0.487) | 1.240 | 0.215 | 1.829(0.704,4.748) |
Note: *Showing statistical significance at the 0.05 significance level. Cluster 2 was used as the reference group for subtype comparison. When considering the influence of age, patients were divided into two groups with 36 years old as the cutoff value ( vs.).
Fig. 4Boxplots of the pathway activity for 5 pathways in different subtypes.
Fig. 5(A) Dendrogram representing hierarchical clustering of identified co-expressed modules. (B) Heatmap visualizing the correlation between Eigengene of modules and clinical traits of LGG. Each row represents a color module, and each column represents a clinical feature. Each cell is filled with the correlation and p-value.
Fig. 6GO biological process enrichment analysis (A) and KEGG enrichment analysis (B) for 478 genes in yellow module. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 7(A) Heatmap reflecting the expression level of candidate genes in the three subtypes. Each row corresponds to a gene feature and each column corresponds to a patient. Red and blue colors indicate relatively high and low gene expressions. The three colored bars on the top indicate subtype cluster 1, 2 and 3 from left to right. (B) Network diagram of the interactions between hub genes (red) and candidate genes (blue). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 8Plots showing prognostic survival curves of the 9 hub genes sorted in ascending order by p-value.
| Lists of the joint and individual structure matrices |
| 1. A singular value decomposition is performed for each data. Using the singular values (Σ) and the right singular values (V), the reduced data set is ΣV′. |
| 2. Set J = { |
| 3. Set |
| 4. |
| 5. Repeat steps 2–4 until the Frobenius norm of the difference between the current and previous iteration in both |
| 6. Return results (J, A, and the ranks used in the decomposition). |
| normalize the weight matrix |
| |
| |
| |
| 3. Iteratively update similarity network |
| 4. |