| Literature DB >> 28819308 |
Jian Liu1, Yuhu Cheng1, Xuesong Wang2, Lin Zhang1, Hui Liu1.
Abstract
It is urgent to diagnose colorectal cancer in the early stage. Some feature genes which are important to colorectal cancer development have been identified. However, for the early stage of colorectal cancer, less is known about the identity of specific cancer genes that are associated with advanced clinical stage. In this paper, we conducted a feature extraction method named Optimal Mean based Block Robust Feature Extraction method (OMBRFE) to identify feature genes associated with advanced colorectal cancer in clinical stage by using the integrated colorectal cancer data. Firstly, based on the optimal mean and L 2,1-norm, a novel feature extraction method called Optimal Mean based Robust Feature Extraction method (OMRFE) is proposed to identify feature genes. Then the OMBRFE method which introduces the block ideology into OMRFE method is put forward to process the colorectal cancer integrated data which includes multiple genomic data: copy number alterations, somatic mutations, methylation expression alteration, as well as gene expression changes. Experimental results demonstrate that the OMBRFE is more effective than previous methods in identifying the feature genes. Moreover, genes identified by OMBRFE are verified to be closely associated with advanced colorectal cancer in clinical stage.Entities:
Mesh:
Year: 2017 PMID: 28819308 PMCID: PMC5561268 DOI: 10.1038/s41598-017-08881-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The graphical depiction of gene identification using OMRFE.
Figure 2The OMBRFE model for cancer gene identification from colorectal cancer integrated data.
Figure 3The identification accuracies of OMRFE with different values of l.
Figure 4The identification accuracies of OMRFE, RFE and FE, where FE is the feature extraction method with L 2-norm, RFE is the robust feature extraction method with L 2,1-norm, and OMRFE is the robust feature extraction method with L 2,1-norm and an optimal mean removed. NSR is denoted as the noise-to-signal ratio.
The top 10 GO terms corresponding to genes identified by different methods.
| Rank | Name | OMBRFE | OMRFE | CRPCA-OM | RPCA | SPCA | PMD | Genes in Genome |
|---|---|---|---|---|---|---|---|---|
| Input | Input | Input | Input | Input | Input | |||
| PV | PV | PV | PV | PV | PV | |||
| 1 | Tissue development |
| 74 | 72 | 74 | 63 | 74 | 1794 |
|
| 1.19E-15 | 8.85E-14 | 2.67E-15 | 8.84E-12 | 7.13E-15 | |||
| 2 | Cell development |
| 76 | 69 | 75 | 66 | None | 1970 |
|
| 1.59E-14 | 1.74E-10 | 1.10E-13 | 1.97E-11 | ||||
| 3 | Regulation of developmental process |
| 77 | 71 | 78 | 75 | 72 | 1912 |
|
| 9.70E-16 | 5.84E-12 | 6.73E-16 | 1.13E-16 | 1.69E-12 | |||
| 4 | Regulation of multicellular organismal development |
| 74 | 60 | 73 | 63 | 60 | 1469 |
|
| 1.74E-20 | 8.39E-12 | 1.75E-19 | 1.04E-15 | 7.23E-12 | |||
| 5 | Positive regulation of gene expression |
| 68 | 60 | 65 | 59 | 66 | 1332 |
|
| 4.59E-19 | 1.31E-13 | 6.85E-17 | 2.52E-15 | 4.38E-17 | |||
| 6 | Positive regulation of nucleobase-containing compound metabolic process |
| 66 | 61 | 64 | 59 | 66 | 1448 |
|
| 5.28E-16 | 1.45E-12 | 1.42E-14 | 9.94E-14 | 2.71E-15 | |||
| 7 | Regulation of cell differentiation |
| 62 | 61 | 65 | 64 | 57 | 1405 |
|
| 2.22E-14 | 3.94E-13 | 9.37E-16 | 3.04E-17 | 3.55E-11 | |||
| 8 | Positive regulation of nitrogen compound metabolic process |
| 66 | 63 | 64 | 61 | 66 | 1484 |
|
| 1.76E-15 | 4.03E-13 | 4.45E-14 | 2.28E-14 | 2.52E-15 | |||
| 9 | Positive regulation of transcription, DNA-templated |
| 62 | 57 | 60 | 56 | 67 | 1221 |
|
| 3.11E-17 | 1.43E-13 | 1.02E-15 | 3.45E-15 | 3.11E-15 | |||
| 10 | Positive regulation of cellular biosynthetic process |
| 65 | 66 | 63 | 62 | 65 | 1547 |
|
| 4.49E-14 | 7.59E-14 | 9.61E-13 | 4.17E-14 | 5.57E-15 |
Figure 5Venn diagram is shown for the feature genes identified by OMBRFE and Elastic Net.
The top 20 genes of OMBRFE unique, Elastic Net unique and the overlapping portions of OMBRFE and Elastic Net.
| OMBRFE unique | Overlap | Elastic Net unique | |
|---|---|---|---|
|
| APC, RUNX3, MSX1, RB1, NRAS, EDNRB, KRAS, OBSCN, MLH1, CACNA1G, PTEN, GPC6, PDE4D, CARD11, RNF213, CCND1, WBSCR17, SOCS2, CSMD1. | GNAS, WT1, MGMT, DIRAS3, TTN, PKD2L1, JAKMIP1, NTRK1, SEMA3B, WRN, BCL2, PLAGL1, PPP2R2C, DMD, RHD, CCND2, PLEKHA4, PIK3R1, PRDM16, FCRL4. | SYK, DDX5, ADRA2C, HSD17B2, HIST1H4I, FOXP4, REEP5, PDK4, OR51E2, S100P, HIP1, ZNF570, SDHC, DDIT3, CRTC1, SLC22A11, CYP26B1, GPR125, TNFAIP3, CATSPER4. |
The detailed information of the top 20 genes identified by OMBRFE.
| NO. | Gene Symbol | Location | Function of Genes |
|---|---|---|---|
| 1 | GNAS | 20q13.3 | It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Colloid carcinoma associated with intraductal papillary mucinous neoplasms and its intestinal-type preinvasive precursor are associated with high frequencies of GNAS mutations. |
| 2 | APC | 5q21-q22 | This gene encodes a tumor suppressor protein that acts as an antagonist of the Wnt signaling pathway. It is also involved in other processes including cell migration and adhesion, transcriptional activation, and apoptosis. |
| 3 | WT1 | 11p13 | This gene encodes a transcription factor that contains four zinc-finger motifs at the C-terminus and a proline/glutamine-rich DNA-binding domain at the N-terminus. WT1 is a major regulator of tumor angiogenesis and progression. |
| 4 | MGMT | 10q26 | Alkylating agents are potent carcinogens that can result in cell death, mutation and cancer. The protein encoded by this gene is a DNA repair protein that is involved in cellular defense against mutagenesis and toxicity from alkylating agents. |
| 5 | RUNX3 | 1p36 | This gene encodes a member of the runt domain-containing family of transcription factors. It functions as a tumor suppressor, and the gene is frequently deleted or transcriptionally silenced in cancer. |
| 6 | DIRAS3 | 1p31 | This gene encodes a member of the ras superfamily. This gene is imprinted gene with monoallelic expression of the paternal allele which is associated with growth suppression. The encoded protein may also play a role autophagy in certain cancer cells by regulating the autophagosome initiation complex. |
| 7 | MSX1 | 4p16.2 | This gene encodes a member of the muscle segment homeobox gene family. The encoded protein functions as a transcriptional repressor during embryogenesis through interactions with components of the core transcription complex and other homeoproteins. |
| 8 | RB1 | 13q14.2 | The protein encoded by this gene is a negative regulator of the cell cycle and was the first tumor suppressor gene found. The encoded protein also stabilizes constitutive heterochromatin to maintain the overall chromatin structure. |
| 9 | TTN | 2q31 | This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. DNA sequence analysis of patients with dilated cardiomyopathy shows that genetic variation in TTN gene contributes to a 14% of the cases. |
| 10 | NRAS | 1p13.2 | This is an N-ras oncogene encoding a membrane protein that shuttles between the Golgi apparatus and the plasma membrane. Mutations in this gene have been associated with somatic rectal cancer, follicular thyroid cancer, autoimmune lymphoproliferative syndrome, Noonan syndrome, and juvenile myelomonocytic leukemia. |
| 11 | EDNRB | 13q22 | The protein encoded by this gene is a G protein-coupled receptor which activates a phosphatidylinositol-calcium second messenger system. Its ligand, endothelin, consists of a family of three potent vasoactive peptides: ET1, ET2, and ET3. Studies suggest that the multigenic disorder, Hirschsprung disease type 2, is due to mutations in the endothelin receptor type B gene. |
| 12 | KRAS | 12p12.1 | This gene, a Kirsten ras oncogene homolog from the mammalian ras gene family, encodes a protein that is a member of the small GTPase superfamily. The transforming protein that results is implicated in various malignancies, including lung adenocarcinoma, mucinous adenoma, ductal carcinoma of the pancreas and colorectal carcinoma. |
| 13 | OBSCN | 1q42.13 | The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. |
| 14 | PKD2L1 | 10q24 | This gene encodes a member of the polycystin protein family. The encoded protein contains multiple transmembrane domains, and cytoplasmic N- and C-termini. The protein may be an integral membrane protein involved in cell-cell/matrix interactions. |
| 15 | MLH1 | 3p21.3 | This gene was identified as a locus frequently mutated in hereditary nonpolyposis colon cancer (HNPCC). It is a human homolog of the E. coli DNA mismatch repair gene mutL, consistent with the characteristic alterations in microsatellite sequences (RER+ phenotype) found in HNPCC. |
| 16 | CACNA1G | 17q22 | Voltage-sensitive calcium channels mediate the entry of calcium ions into excitable cells, and are also involved in a variety of calcium-dependent processes, including muscle contraction, hormone or neurotransmitter release, gene expression, cell motility, cell division, and cell death. This gene encodes a T-type, low-voltage activated calcium channel. The function of T-type channels is important for the proliferation of human ovarian cancer cells. |
| 17 | PTEN | 10q23.3 | This gene was identified as a tumor suppressor that is mutated in a large number of cancers at high frequency. The protein encoded by this gene is a phosphatidylinositol-3,4,5-trisphosphate 3-phosphatase. |
| 18 | JAKMIP1 | 4p16.1 | Janus kinase and microtubule interacting protein 1. Overexpression of JAKMIP1 associates with Wnt/β-catenin pathway activation and promotes cancer cell proliferation |
| 19 | NTRK1 | 1q21-q22 | This gene encodes a member of the neurotrophic tyrosine kinase receptor (NTKR) family. The presence of this kinase leads to cell differentiation and may play a role in specifying sensory neuron subtypes. Mutations in this gene have been associated with congenital insensitivity to pain, anhidrosis, self-mutilating behavior, mental retardation and cancer. |
| 20 | GPC6 | 13q32 | The glypicans comprise a family of glycosylphosphatidylinositol-anchored heparan sulfate proteoglycans, and they have been implicated in the control of cell growth and cell division. The glypican encoded by this gene is a putative cell surface coreceptor for growth factors, extracellular matrix proteins, proteases and anti-proteases. |