| Literature DB >> 31408573 |
Ziyu Ning1, Chenchen Feng1, Chao Song2, Wei Liu3, Desi Shang4, Meng Li1, Qiuyu Wang1, Jianmei Zhao1, Yuejuan Liu1, Jiaxin Chen1, Xiaoyang Yu5, Jian Zhang1, Chunquan Li1.
Abstract
Accurate predictions of classification biomarkers and disease status are indispensable for clinical cancer diagnosis and research. However, the robustness of conventional gene biomarkers is limited by issues with reproducibility across different measurement platforms and cohorts of patients. In this study, we collected 4775 samples from 12 different cancer datasets, which contained 4636 TCGA samples and 139 GEO samples. A new method was developed to detect miRNA-mediated subpathway activities by using directed random walk (miDRW). To calculate the activity of each miRNA-mediated subpathway, we constructed a global directed pathway network (GDPN) with genes as nodes. We then identified miRNAs with expression levels which were strongly inversely correlated with differentially expressed target genes in the GDPN. Finally, each miRNA-mediated subpathway activity was integrated with the topological information, differential levels of miRNAs and genes, expression levels of genes, and target relationships between miRNAs and genes. The results showed that the proposed method yielded a more robust and accurate overall performance compared with other existing pathway-based, miRNA-based, and gene-based classification methods. The high-frequency miRNA-mediated subpathways are more reliable in classifying samples and for selecting therapeutic strategies.Entities:
Keywords: cancer biomarker; classification; miRNA-mediated subpathway; topological information
Mesh:
Substances:
Year: 2019 PMID: 31408573 PMCID: PMC6763789 DOI: 10.1002/1878-0261.12563
Source DB: PubMed Journal: Mol Oncol ISSN: 1574-7891 Impact factor: 6.603
Figure 1Details of the miDRW‐based miRNA‐mediated subpathway activity inference method. The miRNA‐mediated subpathways are obtained from the gene profiles based on the miDRW method. The z(g ) is a row vector of gene g expression value across all samples. The a(miR ) (i.e., miRNA‐mediated subpathway activity) is also a row vector which is the row j of the miRNA (namely miR ) expression value across all samples. The middle panel is the overview illustration of miDRW‐based miRNA‐mediated subpathway activity inference. The GDPN is constructed on 150 metabolic and 150 nonmetabolic pathways, which include 4113 gene nodes and 40 875 directed edges. The dotted line circle is a virtual node which ensures gene weights flow through the GDPN. P 0 is the initial weight of the genes, and P is the output weight vector. For the miR , we reversed the edge direction when merged the pathways into the GDPN. The miRNA‐mediated subpathway activity a(miR ) only integrated expression value vector of the significantly differentially expressed target genes of miR into P .
Figure 2Classification performances of Logistic regression. (A) The height of the bar represents the AUCs which are generated by Logistic regression on within‐datasets. (B) The height of the bar represents the AUCs which are generated by Logistic regression on the ‘GEO‐>TCGA’ cross‐dataset. (C) The height of the bar represents the AUCs which are generated by Logistic regression on the ‘TCGA‐>GEO’ cross‐dataset. (D) A global view of the statistical significances for 11 within‐datasets. Rows represent cancers, and columns represent methods. Values represent the −log10 (p) of the Wilcoxon signed‐rank test between the AUCs of miDRW and the AUCs of other methods. Error bars represent standard deviation in (A), (B), and (C).
Figure 3The selected active miRNA‐mediated subpathways are associated with important pathways. (A) The hierarchical cluster analysis based on active miRNA‐mediated subpathways of KICH before the median frequency. The row and column represent miRNA‐mediated subpathway and samples (the red and green bars represent normal and cancer samples), respectively. (B) The summary bubble‐bar plot shows the functional enrichment results of the active miRNA‐mediated subpathways of KICH. The bars on the right show the percentage of significantly differentially expressed genes annotated to the KEGG pathways. The bubble size indicates the number of genes in each KEGG pathway, and different colors correspond to different FDRs. The darker color indicates the smaller FDR. (C) The pie chart shows the proportion of active miRNA‐mediated subpathways present in different cancers. The majority of the active miRNA‐mediated subpathways are cancer‐specific.
Figure 4A snapshot of the pathways in cancer (hsa05200). The orange (yellow) color nodes represent the differentially expressed target genes of hsa‐miR‐134 (hsa‐miR‐326). The pink color nodes represent the common differentially expressed target genes of hsa‐miR‐134 and hsa‐miR‐326.
Figure 5The influence of topological structure and reproducibility power of the miDRW method for within‐dataset experiments. (A) The line indicates the reproducibility power of the miDRW method for within‐dataset experiments. The x‐axis represents the number of top miRNA‐mediated subpathways, and the y‐axis shows the reproducibility power C score of the top k miRNA‐mediated subpathways, k = 10, 20, 30, and 40. (B) The line shows the influence of topological structure and target relationships on the miDRW method. The x‐axis represents the percentage of deleted target genes and miRNAs, and the y‐axis shows the AUCs obtained corresponding to those percentage.