| Literature DB >> 32111006 |
Jifeng Zhang1,2, Shoubao Yan1, Cheng Jiang1, Zhicheng Ji3, Chenrun Wang1, Weidong Tian2.
Abstract
Prognostic gene signatures are critical in cancer prognosis assessments and their pinpoint treatments. However, their network properties remain unclear. Here, we obtained nine prognostic gene sets including 1439 prognostic genes of different cancers from related publications. Four network centralities were used to examine the network properties of prognostic genes (PG) compared with other gene sets based on the Human Protein Reference Database (HPRD) and String networks. We also proposed three novel network measures for further investigating the network properties of prognostic gene sets (PGS) besides clustering coefficient. The results showed that PG did not occupy key positions in the human protein interaction network and were more similar to essential genes rather than cancer genes. However, PGS had significantly smaller intra-set distance (IAD) and inter-set distance (IED) in comparison with random sets (p-value < 0.001). Moreover, we also found that PGS tended to be distributed within network modules rather than between modules (p-value < 0.01), and the functional intersection of the modules enriched with PGS was closely related to cancer development and progression. Our research reveals the common network properties of cancer prognostic gene signatures in the human protein interactome. We argue that these are biologically meaningful and useful for understanding their molecular mechanism.Entities:
Keywords: cancer; human protein interactome; modules; network property; prognostic genes; prognostic genes sets
Mesh:
Substances:
Year: 2020 PMID: 32111006 PMCID: PMC7140842 DOI: 10.3390/genes11030247
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
List of literature sources, cancer types and sizes of prognostic gene sets in this study.
| Study 1 | Disease | Number of Prognostic Genes in Study | Gene Set | Number of Prognostic Genes in Gene Set |
|---|---|---|---|---|
| Gentles et al. ( | Multiple tumor types | various | S1 | 120 * |
| The Cancer Genome Atlas Research Network ( | Ovarian carcinoma | 190 | S2 | 185 |
| Lenz et al. ( | (Diffuse) Large-B-cell lymphomas | 39,283,71 | S3 | 330 |
| Zhao et al. ( | Renal cell carcinoma | 259 | S4 | 222 |
| Dave et al. ( | Burkitt’s lymphoma | 217 | S5 | 200 |
| Bullinger et al. ( | Acute myeloid leukemia (AML) | 133 | S6 | 103 |
| Liu et al. ( | (Triple-negative) Breast cancer | 11 | S7 | 135 |
| Wang et al. ( | (Lymph-node-negative) Breast cancer | 76 | ||
| van de Vijveret al. ( | Breast cancer | 70 | ||
| Wistuba et al. ( | Lung adenocarcinoma | 31 | S8 | 118 |
| Tang et al. ( | Non-small cell lung cancer (NSCLC) | 12 | ||
| Xie et al. ( | NSCLC | 59 | ||
| Zhu et al. ( | NSCLC | 15 | ||
| Boutros et al. ( | NSCLC | 6 | ||
| Lau et al. ( | NSCLC | 3 | ||
| Gerami et al. ( | Melanoma | 28 | S9 | 174 |
| Wu et al. ( | Prostate cancer | 32 | ||
| Li et al. ( | AML | 24 | ||
| Lohavanichbutr et al. ( | Oral squamous cell carcinomas (OSCC) | 13 | ||
| Sveen et al. ( | Colorectal cancer | 7 | ||
| Smith et al. ( | Colon cancer | 34 | ||
| Ramaswamy et al. ( | Solid tumors | 17 | ||
| Yeoh et al. ( | Acute lymphoblastic leukemia (ALL) | 7–20 |
: Please see supplementary Table S1 for details of references; *: it consists of the top 60 adversely prognostic genes and top 60 favorably prognostic genes, based on the global meta-z score.
Figure 1Schematic diagram of calculating GDM (genset-distribution in modules) of gene sets in the network. The formula for GDM and its calculation process were provided for the given example in the chart below.
Figure 2Human protein-protein interaction networks and their node degree distributions. (A) and (B) represent their power-law degree distributions of the Human Protein Reference Database (HPRD) network and the String network respectively; (C) the HPRD network consisting of 9402 notes and 36,746 edges (V9.0) and the scattered red nodes represent prognostic genes.
Figure 3Boxplots of degree (A,B), betweenness (C,D), closeness (E,F) and eigenvector (G,H) of 1439 prognostic genes and four other gene sets for comparison based on the HPRD and String networks. One tailed t-test was used to test whether the four network centrality measures had significantly different averages between the union set of all prognostic genes (PG), essential gene set (ES), and cancer gene set (CA) (triple asterisks, p-value < 0.001; n.s., not significant). The black dashed lines and the numbers in maroon display the average levels of respective centrality measures for the whole network. The figure shows the four network properties of PG are significantly different from CA and metastasis-angiogenesis gene set (MA)but are close to ES.
Figure 4Distributions of clustering coefficient (CC) (A,B), intra-set distance (IAD) (C,D), inter-set distances (IED) (E,F), and genset-distribution in modules (GDM) (G,H) of nine prognostic gene sets (PGS), random sets, and four other gene sets for comparison based on the HPRD and String networks. The random sets were sampled from the whole HGNC gene database 1000 times with each sample containing 120 genes. Differences in the distribution of four network properties between PGS and the random gene sets (or other gene sets) were estimated using a one-tailed KS test (triple asterisks, p-value < 0.001; double asterisks, p-value < 0.01; single asterisks, p-value< 0.05; n.s., not significant). In general, four network properties were significantly different between PGS and the random gene sets. Random indicates the random gene sets, OGS indicates other comparable gene sets, namely, CA, MA, ES, and housekeeping gene set (HK). Here, PGS were considered as separate individuals, and “PGS & PGS” indicates IED between one PGS and another.
Figure 5Intersections of enriched gene ontology (GO) terms of network modules containing at least two PGS using functional enrichment analysis. The top ten and only three in total of GO terms (BP) were shown separately in the upper and lower parts of the figure. They were sorted in ascending order of p-value, which were estimated using Fisher’s test and adjusted using FDR, and the final p-value was the larger of the two with common GO term. The top left of the figure also showed which PGS were included in these modules.