| Literature DB >> 35711911 |
Lili Su1, Guang Liu1, Ying Guo2, Xuanping Zhang1, Xiaoyan Zhu1, Jiayin Wang1.
Abstract
More and more cancer-associated genes (CAGs) are being identified with the development of biological mechanism research. Integrative analysis of protein-protein interaction (PPI) networks and co-expression patterns of these genes can help identify new disease-associated genes and clarify their importance in specific diseases. This study proposed a PPI network and co-expression integration analysis model (PRNet) to integrate PPI networks and gene co-expression patterns to identify potential risk causative genes for pancreatic adenocarcinoma (PAAD). We scored the importance of the candidate genes by constructing a high-confidence co-expression-based edge-weighted PPI network, extracting protein regulatory sub-networks by random walk algorithm, constructing disease-specific networks based on known CAGs, and scoring the genes of the sub-networks with the PageRank algorithm. The results showed that our screened top-ranked genes were more critical in tumours relative to the known CAGs list and significantly differentiated the overall survival of PAAD patients. These results suggest that the PRNet method of ranking cancer-associated genes can identify new disease-associated genes and is more informative than the original CAGs list, which can help investigators to screen potential biomarkers for validation and molecular mechanism exploration.Entities:
Keywords: cancer-associated genes; co-expression; machine learning; pancreatic adenocarcinoma; protein-protein interaction
Year: 2022 PMID: 35711911 PMCID: PMC9197464 DOI: 10.3389/fgene.2022.854661
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Flowchart of PRNet. The protein interaction information from validated PPI databases. Gene pairs’ expression correlation was calculated by using TCGA PAAD mRNA expression matrix. These datasets were merged to construct a standard PAAD-specific weighted-PPI network. Short random walks algorithm was used to slice this extensive network into multiple communities, and kept only the communities containing known CAGs as disease-specific sub-networks. Finally, the PageRank algorithm was applied to analyze the final constructed network and calculated the PR value of each node.
FIGURE 2Novel PAAD candidates’ features. (A) The distribution of the known CAGs after reordering. (B) The CERES scores of the top 719 genes and the CAG list relative to all genes. (C) The CERES score distribution of the 20 top-ranked genes. (D) Functional annotation of the 20 top-ranked genes.
FIGURE 3Kaplan–Meier OS curves of genes ranked from 1 to 10.
FIGURE 4Consensus cluster of PAAD samples based on top-ranked genes. (A) Consensus cluster heatmap of PAAD samples. (B) The silhouette plot of the two clusters (G1/2) defined by the top-ranked genes. (C) Principal component analysis of the total mRNA expression profile in the TCGA dataset. (D) Kaplan–Meier OS curves for different subgroups. (E) Differentially expressed genes between G1 and G2 subgroup. (F) GSEA analysis of differentially expressed genes between G1 and G2 subgroup by using hallmark pathways. G2: high-risk subgroup; G1: low-risk subgroup.