| Literature DB >> 23799036 |
Feng-Hsiang Chung1, Henry Hsin-Chung Lee, Hoong-Chien Lee.
Abstract
Significantly expressed genes extracted from microarray gene expression data have proved very useful for identifying genetic biomarkers of diseases, including cancer. However, deriving a disease related inference from a list of differentially expressed genes has proven less than straightforward. In a systems disease such as cancer, how genes interact with each other should matter just as much as the level of gene expression. Here, in a novel approach, we used the network and disease progression properties of individual genes in state-specific gene-gene interaction networks (GGINs) to select cancer genes for human colorectal cancer (CRC) and obtain a much higher hit rate of known cancer genes when compared with methods not based on network theory. We constructed GGINs by integrating gene expression microarray data from multiple states--healthy control (Nor), adenoma (Ade), inflammatory bowel disease (IBD) and CRC--with protein-protein interaction database and Gene Ontology. We tracked changes in the network degrees and clustering coefficients of individual genes in the GGINs as the disease state changed from one to another. From these we inferred the state sequences Nor-Ade-CRC and Nor-IBD-CRC both exhibited a trend of (disease) progression (ToP) toward CRC, and devised a ToP procedure for selecting cancer genes for CRC. Of the 141 candidates selected using ToP, ∼50% had literature support as cancer genes, compared to hit rates of 20% to 30% for standard methods using only gene expression data. Among the 16 candidate cancer genes that encoded transcription factors, 13 were known to be tumorigenic and three were novel: CDK1, SNRPF, and ILF2. We identified 13 of the 141 predicted cancer genes as candidate markers for early detection of CRC, 11 and 2 at the Ade and IBD states, respectively.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23799036 PMCID: PMC3683052 DOI: 10.1371/journal.pone.0065683
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1ToP procedure flow chart for selecting of CRC cancer genes.
DEG, differentially expressed gene; PPIN, protein-protein interaction network. Boxes in the right-most column illustrate how the predicted tumorigenic gene CDC6 satisfies the ToP criteria: the gene-gene interaction sub-network associated with it grows markedly as the state progress from normal through adenoma to CRC.
Figure 2Number of genes and gene-pair interactions in networks as functions of Pearson p-value.
Number of genes (A) and gene-pair interactions (B) in the disease specific networks, as functions of Pearson p-value threshold, p 0, in the 8-sample gene-networks of patients belonging to the four state-types: Nor, Ade, IBD, and CRC. Non-Nor results are averaged over 100 random 8-sample sets. Error bars indicate standard deviations. Asterisks above (below) the curves give p-values of two-sample Student’s t-test between CRC and IBD (CRC and Nor): * p-value<10−4; ** p-value<10−8; *** p-value<10−12; **** p-value<10−16.
Structural parameters for the four gene-gene interaction networks* (GGINs).
| Network | No. of nodes | No. of edges | Mean degree < | Power-law exponent of degree distribution | Mean clustering coefficient |
| Nor | 1436 | 1215 | 1.69 | −2.75 | 0.0458 |
| Ade | 1801 | 2281 | 2.53 | −2.23 | 0.0904 |
| IBD | 2478 | 3457 | 2.79 | −2.22 | 0.0922 |
| CRC | 2318 | 4988 | 4.30 | −1.85 | 0.1266 |
For Pearson p-value p 0 = 0.01. For disease networks numbers given are averaged over 100 8-sample networks.
Figure 3Percentage of genes in a given range of clustering coefficient plotted as a function of degrees in the Nor, Ade, IBD and CRC networks.
Genes of degree 1 are not shown. The clustering coefficient of a gene of degree 2 is either 0 or 1. Asterisks indicate p-values (by Wilcoxon rank sum tests) relative to Nor: * p-value <0.05; ** p-value <0.01.
Figure 4Function-function networks.
Nodes are functional modules named after Gene Ontology terms. Functional modules containing less than 70 genes are not shown. The diameter of a module scales with the logarithm of the number of genes in the module. The color shade of a module indicates the number of intra-module gene-gene interactions per gene. The thickness of the edge indicates the number of inter-module gene-gene interactions.
Figure 5Percentage overlaps of functional modules.
0 For a given functional module, the percentage overlap is expressed as the ration of the number of links (belonging to that module) common to the two networks to the number of links in the smaller partner. Asterisks indicate p-values from one-sample Student’s t-test of the Ade-CRC intersection versus the other five intersections: for *, **, and ***, p-value<10−2, 10−3, and 10−4, respectively.
Figure 6Examples of changes in partial gene networks connected to cancer genes.
Partial networks to which the four ToP genes ILF2 (top left), CDK1 (bottom left), SNRPF (top right), and MCM10 (bottom right) separately belong in the Nor, Ade, IBD and CRC networks. In each case, the size of the module connected to the ToP gene increases along the state sequence Nor-Ade-CRC or Nor-IBD-CRC, or both. Nodal trim color code: over-expression, red; under-expression, blue; neutral, black. Nodal color code for GO functions: cell cycle, green; RNA splicing, purple; DNA repair, brown; chromatin remodelling and histone modification, yellow.
Figure 7Results from 1000 randomization tests (white box) and in actual cases (gray box).
Randomization tests are type-1 for eBayes and SAM, and type-2 for ToP and ToP+SAM (see Methods). (A) Number of genes selected. (B) Percentage of genes listed in CancerGenes [40] database among those selected in (A). ***, p-value <0.001 for permutation test by randomization; **, p-value <0.01; *, p-value <0.05.
Figure 8Percentages of selected genes listed inCancerGenes [ and gene coding transcription factors (TFs).
Non-tumor TF means not listed in CancerGenes. (A) In gene set selected by statistical threshold. (B) In top 134 genes in gene sets. Numbers given above bars indicate total number genes in set.
Gene ontology enrichment analysis for predicted cancer genes.
| Gene Ontology | Class | Genes (%) |
| Adjusted |
| Nuclear lumen | CC | 71 (51%) | 2.60e-33 | 6.40e-31 |
| Cell cycle | BP | 46 (33%) | 1.40e-23 | 9.50e-21 |
| Nucleoside binding | MF | 47(34%) | 1.50e-12 | 2.50e-10 |
The 48 genes among the 141 predicted cancer genes known in the literature as diagnostic or prognostic markers for CRC, or are reported to be associated with them (Table S5).
| Gene symbol | Gene name | No. of researchpapers | Diagnosticmarkers | Prognostic markers |
| Sustaining proliferative signaling | ||||
| AURKB | aurora kinase B | 3 | V | |
| BUB1B | budding uninhibited by benzimidazoles 1 homolog beta (yeast) | 4 | V | |
| CCND1 | cyclin D1 | 3 | ||
| CDC25A | cell division cycle 25 homolog A (S. pombe) | 4 | V | |
| CDC25B | cell division cycle 25 homolog B (S. pombe) | 2 | V | |
| CDK1 | cyclin-dependent kinase 1 | 4 | V | |
| CDK2 | cyclin-dependent kinase 2 | >20 | ||
| CDK4 | cyclin-dependent kinase 4 | 6 | V | |
| CDK8 | cyclin-dependent kinase 8 | 3 | V | |
| CENPA | centromere protein A | 3 | ||
| E2F5* | E2F transcription factor 5, p130-binding | 4 | ||
| HMGA1 | high mobility group AT-hook 1 | 2 | ||
| MAD2L1 | MAD2 mitotic arrest deficient-like 1 (yeast) | 2 | V | |
| MKI67 | antigen identified by monoclonal antibody Ki-67 | 4 | V | |
| MYC* | v-myc myelocytomatosis viral cancer gene homolog (avian) | 5 | V | |
| PML | promyelocytic leukemia | 3 | V | |
| PLK1 | polo-like kinase 1 (Drosophila) | 3 | V | |
| PTPN11 | protein tyrosine phosphatase, non-receptor type 11 | 3 | ||
| SKP2 | S-phase kinase-associated protein 2 (p45) | 2 | V | |
| TUBB | tubulin, beta | 5 | ||
| Resisting cell death | ||||
| BCL2* | B-cell CLL/lymphoma 2 | 7 | V | |
| BIRC5 | baculoviral IAP repeat-containing 5 | >20 | V | |
| KAT2B | K(lysine) acetyltransferase 2B | 1 | ||
| HSPH1 | heat shock 105 kDa/110 kDa protein 1 | 3 | V | |
| RFC2* | replication factor C (activator 1) 2, 40 kDa | 1 | ||
| TRAP1 | TNF receptor-associated protein 1 | 3 | ||
| Inducing agiogenesis | ||||
| PECAM1 | platelet/endothelial cell adhesion molecule | 2 | ||
| MMP2 | matrix metallopeptidase 2 (gelatinase A, 72 kDa gelatinase,72 kDa type IV collagenase) | 7 | V | |
| MMP9 | matrix metallopeptidase 9 (gelatinase B, 92 kDa gelatinase,92 kDa type IV collagenase) | 16 | V | |
| Activating invasion and metastasis | ||||
| CEBPB | CCAAT/enhancer binding protein (C/EBP), beta | 3 | ||
| CSE1L | CSE1 chromosome segregation 1-like (yeast) | 5 | V | |
| PLAU | plasminogen activator, urokinase | 4 | V | |
| PSAT1 | phosphoserine aminotransferase 1 | 1 | ||
| SNRPF | small nuclear ribonucleoprotein polypeptide F | 3 | ||
| SPARC | secreted protein, acidic, cysteine-rich (osteonectin) | 7 | V | |
| TGFB1* | transforming growth factor, beta 1 | 11 | V | |
| Enabling replicative immortality | ||||
| PARP1 | poly (ADP-ribose) polymerase 1 | 1 | V | |
| TOP2A | topoisomerase (DNA) II alpha 170 kDa | 2 | ||
| Epigenetic switching | ||||
| EZH2 | enhancer of zeste homolog 2 (Drosophila) | 4 | V | |
| HDAC2 | histone deacetylase 2 | 9 | V | |
| NAT10 | N-acetyltransferase 10 (GCN5-related) | 13 | ||
| PRMT1 | protein arginine methyltransferase 1 | 2 | V | |
| Hyperactivation of fatty acid synthase | ||||
| FASN | fatty acid synthase | 3 | V | |
| Genome Instability and Mutation | ||||
| ATR | ataxia telangiectasia and Rad3 related | 3 | ||
| MRE11A | MRE11 meiotic recombination 11 homolog A (S. cerevisiae) | 4 | ||
| MSH2 | mutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli) | >20 | ||
| Deregulating cellular energetics | ||||
| PKM2 | pyruvate kinase, muscle | 4 | V | |
| Avoiding immune destruction | ||||
| ILF2 | interleukin enhancer binding factor 2, 45 kDa | 2 | ||
The sixteen transcription factors predicted to be cancer genes in this study.
| TF | Degree | Clustering coefficient | Student’s | Student’s | OMIM | Listed in | |||||||
| Nor | Ade/IBD | CRC | Nor | Ade/IBD | CRC |
| fold change |
| fold change | ||||
| Ade only | |||||||||||||
| +CDK2 | 1 | 22 | 48 | 0 | 0.05 | 0.05 | 5.58e-03 | 1.51 | 6.88e-02 | 1.33 | 116953 | YES | |
| +EZH2 | 0 | 6 | 11 | 0 | 0.07 | 0.02 | 1.63e-02 | 1.71 | 7.74e-02 | 1.46 | 601573 | YES | |
| +HDAC2 | 0 | 3 | 35 | 0 | 0 | 0.07 | 1.47e-05 | 1.53 | 3.54e-02 | 1.33 | 605164 | YES | |
| +HMGA1 | 0 | 0 | 6 | 0 | 0 | 0.13 | <1e-06 | 1.78 | 9.47e-02 | 1.25 | 600701 | YES | |
| +KAT2B | 2 | 3 | 8 | 0 | 0 | 0 | 1.35e-03 | 0.55 | 1.60e-01 | 0.82 | 602303 | YES | |
| SUPT16H | 1 | 5 | 8 | 0 | 0.4 | 0.32 | 5.39e-05 | 1.58 | 6.46e-02 | 1.21 | 605012 | YES | |
| TRIM28 | 0 | 4 | 8 | 0 | 0.17 | 0.11 | 1.47e-05 | 1.56 | 4.79e-02 | 1.33 | 601742 | YES | |
| YEATS4 | 0 | 3 | 12 | 0 | 0.67 | 0.68 | 1.02e-03 | 1.68 | 5.50e-02 | 1.48 | 602116 | YES | |
| +CDK1 | 21 | 39 | 68 | 0.01 | 0.03 | 0.01 | 3.25e-03 | 2.27 | 5.34e-02 | 1.92 | 116940 | NO | |
| +ILF2 | 1 | 7 | 12 | 0 | 0.29 | 0.64 | <1e-06 | 1.83 | 1.58e-02 | 1.49 | 603181 | NO | |
| +SNRPF | 1 | 10 | 20 | 0 | 0.27 | 0.38 | 4.28e-04 | 1.52 | 6.97e-02 | 1.37 | 603541 | NO | |
| Ade & IBD | |||||||||||||
| +CEBPB | 1 | 1/6 | 7 | 0 | 0/0 | 0.05 | 3.24e-04/7.92e-05 | 1.80/1.20 | 5.15e-03 | 2.48 | 189965 | YES | |
| +E2F5 | 1 | 1/0 | 7 | 0 | 0/0 | 0.29 | <1e-06/1.76e-03 | 1.99/0.55 | 5.56e-03 | 1.89 | 600967 | YES | |
| +MYC | 1 | 0/1 | 21 | 0 | 0/0 | 0.03 | <1e-06/3.1e-01 | 3.05/0.13 | 2.26e-02 | 2.14 | 190080 | YES | |
| RUVBL1 | 0 | 2/0 | 17 | 0 | 1/0 | 0.26 | <1e-06/2.8e-03 | 2.10/0.41 | 8.37e-03 | 1.78 | 603449 | YES | |
| IBD only | |||||||||||||
| +PML | 0 | 0 | 11 | 0 | 0 | 0.05 | <1e-06 | 0.85 | 2.00e-02 | 0.42 | 102578 | YES | |
In each case, the degree of the TF increases in the progression Nor to Ade to CRC. TF’s in the first column marked by “+” have been reported in the literature as being associated with CRC and appear in Table 3.
Predicted candidate diagnostic markers for early detection of CRC in the Ade or IBD state.
| Gene | Degree | Clustering coefficient | Student’s | Student’s | ANOVA |
| |||||||
| Nor | Ade | CRC | Nor | Ade | CRC |
| fold change |
| fold change | ||||
| Ade | |||||||||||||
| *SUPT16H | 1 | 5 | 8 | 0 | 0.4 | 0.32 | 5.39e-05 | 1.58 | 6.46e-02 | 1.21 | 1.00e-05 | YES | |
| #PRMT5 | 0 | 6 | 8 | 0 | 0.33 | 0.25 | <1e-06 | 1.78 | 2.77e-02 | 1.34 | 5.61e-06 | YES | |
| NOLC1 | 1 | 9 | 28 | 0 | 0.78 | 0.57 | 2.85e-05 | 1.75 | 2.30e-02 | 1.44 | 4.22e-05 | NO | |
| #$PSAT1 | 0 | 5 | 18 | 0 | 0.4 | 0.67 | <1e-06 | 6.15 | 2.70e-03 | 5.62 | 3.54e-05 | NO | |
| CCT7 | 1 | 9 | 18 | 0 | 0.56 | 0.41 | 8.43e-05 | 1.54 | 2.58e-02 | 1.4 | 1.05e-03 | NO | |
| CCT4 | 0 | 7 | 17 | 0 | 0.71 | 0.46 | <1e-06 | 1.57 | 1.62e-02 | 1.45 | 1.25e-04 | NO | |
| *#ILF2 | 1 | 7 | 12 | 0 | 0.29 | 0.64 | <1e-06 | 1.83 | 1.58e-02 | 1.49 | 2.13e-05 | NO | |
| $CCT3 | 0 | 5 | 11 | 0 | 0.6 | 0.67 | <1e-06 | 1.84 | 1.45e-02 | 1.55 | 1.81e-04 | NO | |
| DARS | 0 | 6 | 10 | 0 | 0.33 | 0.47 | 4.89e-05 | 1.62 | 3.71e-02 | 1.43 | 1.37e-03 | NO | |
| CCT8 | 1 | 7 | 9 | 0 | 0.76 | 1 | 1.91e-05 | 1.51 | 1.43e-02 | 1.42 | 4.56e-04 | NO | |
| GEMIN6 | 0 | 5 | 9 | 0 | 0.5 | 0.72 | <1e-06 | 1.56 | 1.91e-02 | 1.39 | 5.28e-04 | NO | |
| IBD | |||||||||||||
| $#*CEBPB | 1 | 6 | 7 | 0 | 0.00 | 0.05 | 7.92e-05 | 1.20 | 5.15e-03 | 1.31 | 1.01E-03 | YES | |
| $#PLAU | 0 | 5 | 6 | 0 | 0.00 | 0.07 | 3.22e-05 | 1.75 | 7.44e-03 | 1.43 | 7.76E-05 | YES | |
A biomarker for early detection in the Ade state is not a DEG in the IBD state, and vice versa. The hash and asterisk superscripts indicate the gene also appears in Tables 3 and 4, respectively. Genes with a $ superscript are common to the ToP+SAM lists for the Ade and IBD sequences; genes without are from the Ade sequence only.
Types of cancer genes contributed by the Ade and IBD state-sequences.
| From TPS list generated by | Predicted cancer genes | In | Reported in literature as CRC-related | TFs | Markers for early detection (state of detection) |
| Ade sequence | 67 | 27 | 15 | 11 | 9 (Ade) |
| Both Ade & IBD sequences | 67 | 39 | 32 | 4 | 2 (Ade) 2 (IBD) |
| IBD sequence | 7 | 1 | 1 | 1 | 0 |
| Total | 141 | 67 | 48 | 16 | 13 |