| Literature DB >> 34335688 |
Yuyao Gao1,2,3, Xiao Chang4, Jie Xia5, Shaoyan Sun6, Zengchao Mu3, Xiaoping Liu1,2,3.
Abstract
Hepatocellular carcinoma (HCC) is one of the most common causes of cancer-related death, but its pathogenesis is still unclear. As the disease is involved in multiple biological processes, systematic identification of disease genes and module biomarkers can provide a better understanding of disease mechanisms. In this study, we provided a network-based approach to integrate multi-omics data and discover disease-related genes. We applied our method to HCC data from The Cancer Genome Atlas (TCGA) database and obtained a functional module with 15 disease-related genes as network biomarkers. The results of classification and hierarchical clustering demonstrate that the identified functional module can effectively distinguish between the disease and the control group in both supervised and unsupervised methods. In brief, this computational method to identify potential functional disease modules could be useful to disease diagnosis and further mechanism study of complex diseases.Entities:
Keywords: biomarkers; differential partial correlation network; functional module identification; hepatocellular carcinoma; multi-omics data
Year: 2021 PMID: 34335688 PMCID: PMC8320536 DOI: 10.3389/fgene.2021.672117
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1The flowchart for data integration. (A) Construction of differential partial correlation network (Diff-PCORN). After constructing the initial network based on the PPI network, the edges with a significant Pearson correlation are kept (adjusted p < 0.01) as the PCCN. Then, PCORN is constructed by deleting the non-significant edges (dotted line in PCCN) with adjusted p-values of partial correlation greater than or equal to 0.05. Remove edges in both PCORN (disease) and PCORN (control). (B) Construction of differential methylation network (Diff-MN). Pearson correlation coefficient can be calculated between methylation sites if there are interactions between their corresponding genes. Every edge is assigned a score by the maximum value of correlation difference between MN (disease) and MN (control). The edge with a score greater than 0.7 forms the Diff-MN. (C) Rank the candidate genes according to the variation frequency and the top 15 genes are identified as potential disease-related genes or module biomarkers.
FIGURE 2Interaction network of disease-related genes. In total, 29 interactions among 15 disease-related genes are identified by STRING database, and they can be considered as a disease module. The edge with a higher combined score from the STRING database is wider, and the node with the higher degree is bigger.
FIGURE 3Validation of identified disease module. (A) In total, 723 known cancer genes were obtained from the CGC database and nine of them are identified as disease-related genes. (B) In total, 168 hepatocellular carcinoma-associated genes were obtained from the KEGG database and four of them are identified as disease-related genes.
Common genes between predicted disease-related genes and known cancer genes.
FIGURE 4Results of classification and clustering with identified disease-related genes. (A) ROC curve obtained from classification between tumor and normal group using fivefold cross-validation. ROC, receiver operating characteristic. AUC, the area under the curve. (B) ROC curve obtained from classification between tumor and normal group with independent dataset GSE14520. (C) Heat map of hierarchical clustering with single linkage and cityblock distance. (D) Heat map of hierarchical clustering with the same parameter in independent dataset GSE14520.