| Literature DB >> 25822500 |
Pei-Wei Zhang1, Lei Chen2, Tao Huang1, Ning Zhang3, Xiang-Yin Kong1, Yu-Dong Cai4.
Abstract
Gathering vast data sets of cancer genomes requires more efficient and autonomous procedures to classify cancer types and to discover a few essential genes to distinguish different cancers. Because protein expression is more stable than gene expression, we chose reverse phase protein array (RPPA) data, a powerful and robust antibody-based high-throughput approach for targeted proteomics, to perform our research. In this study, we proposed a computational framework to classify the patient samples into ten major cancer types based on the RPPA data using the SMO (Sequential minimal optimization) method. A careful feature selection procedure was employed to select 23 important proteins from the total of 187 proteins by mRMR (minimum Redundancy Maximum Relevance Feature Selection) and IFS (Incremental Feature Selection) on the training set. By using the 23 proteins, we successfully classified the ten cancer types with an MCC (Matthews Correlation Coefficient) of 0.904 on the training set, evaluated by 10-fold cross-validation, and an MCC of 0.936 on an independent test set. Further analysis of these 23 proteins was performed. Most of these proteins can present the hallmarks of cancer; Chk2, for example, plays an important role in the proliferation of cancer cells. Our analysis of these 23 proteins lends credence to the importance of these genes as indicators of cancer classification. We also believe our methods and findings may shed light on the discoveries of specific biomarkers of different types of cancers.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25822500 PMCID: PMC4378934 DOI: 10.1371/journal.pone.0123147
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The ten types of cancers and their sample sizes.
| Cancer Type | Cancer Abbreviation | Cancer Name | Sample size | Number of training samples | Number of test samples |
|---|---|---|---|---|---|
| 1 | BLCA | Bladder Urothelial Carcinoma | 127 | 102 | 25 |
| 2 | BRCA | Breast invasive carcinoma | 747 | 598 | 149 |
| 3 | COAD/READ | Colon adenocarcinoma and Rectum adenocarcinoma | 464 | 371 | 93 |
| 4 | GBM | Glioblastoma multiforme | 215 | 172 | 43 |
| 5 | HNSC | Head and Neck squamous cell carcinoma | 212 | 170 | 42 |
| 6 | KIRC | Kidney renal clear cell carcinoma | 454 | 363 | 91 |
| 7 | LUAD | Lung adenocarcinoma | 237 | 190 | 47 |
| 8 | LUSC | Lung squamous cell carcinoma | 195 | 156 | 39 |
| 9 | OV | Ovarian serous cystadenocarcinoma | 412 | 330 | 82 |
| 10 | UCEC | Uterine Corpus Endometrioid Carcinoma | 404 | 323 | 81 |
| Total | 3467 | 2775 | 692 | ||
The top 23 important proteins for the classification of the 10 cancer types.
| Order | Name | Gene Name | Protein function and regulatory pathways |
|---|---|---|---|
| 1 | FASN | FASN | Fatty acid synthase (FASN) catalyzes the synthesis of long-chain fatty acids from acetyl-CoA and malonyl-CoA. Indicated as a poor prognosis in breast and prostate cancer. |
| 2 | Claudin-7 | CLDN7 | Claudins make up tight junction strands. |
| 3 | PR | PGR | Progesterone receptor, Transcription Factor |
| 4 | TIGAR | C12ORF5 | Regulates p53 tumor suppressor pathway and glycolysis |
| 5 | GATA3 | GATA3 | Transcription factor |
| 6 | NDRG1_pT346 | NDRG1 | A member of the NDRG family functions in growth, differentiation, and cell survival |
| 7 | AR | AR | Androgen receptor (AR). Transcription Factor |
| 8 | PREX1 | REX1 | Downstream of Heterotrimeric G proteins (Guanine nucleotide exchange factor) |
| 9 | PEA15_pS116 | PEA15 | Implicated in the regulation of multiple cellular processes including apoptosis, integrin activation, and insulin-sensitive glucose transport in insulin-responsive cells. Its activation is mediated through binding to multiple proteins, including ERK1&2, RSK2, Akt, FADD, and Caspase-8. |
| 10 | Cyclin_B1 | CCNB1 | Cyclin B1 regulates mitosis. Cyclin B1 levels rise during S phase and G2, and peak at mitosis. |
| 11 | ER-alpha | ESR1 | Estrogen receptor, Transcription Factor |
| 12 | AMPK_alpha | PRKAA1 | Involved in energy homeostasis regulation |
| 13 | Acetyl-a-Tubulin-Lys40 | The cytoskeleton consists of three types of cytosolic fibers: microtubules, microfilaments (actin filaments), and intermediate filaments. Acetylation of α-tubulin at Lys40 is required for dynamic cell shape remodeling, cell motility, tubulin stability and terminal branching of cortical neurons | |
| 14 | Rab-25 | Rab-25 | A member of Rab11 family possesses small Ras-like GTPase activity. Increased Rab25 expression is associated with aggressive growth in ovarian and breast cancer, where Rab25 may inhibit apoptosis and promote cancer cell proliferation and invasion through regulation of vesicle transport and cellular motility. |
| 15 | Chk2 | CHEK2 | Kinase acts downstream of ATM/ATR involving in DNA damage checkpoint control, embryonic development, and tumor suppression |
| 16 | E-Cadherin | CDH1 | A member of transmembrane glycoprotein superfamily, Mediate calcium-dependent cell-cell adhesion and normal tissue development. |
| 17 | ACC1 | ACACA | Key enzyme in the biosynthesis and oxidation of fatty acids. Involved in energy homeostasis regulation |
| 18 | GAPDH | GAPDH | Glyceraldehyde 3-phosphate dehydrogenase |
| 19 | PKC-alpha_pS657 | PRKCA | PKC alpha is an ubiquitously expressed PKC isozyme that has been implicated in the regulation of a broad range of cellular functions |
| 20 | TRFC | TFRC | Transferrin Receptor |
| 21 | Cyclin_E1 | CCNE1 | Cyclin E has been found to be associated with the transcription factor E2F in a temporally regulated manner. The cyclin E/E2F complex is detected primarily during the G1 phase of the cell cycle and decreases as cells enter S phase. E2F is known to be a critical transcription factor for expression of several S phase specific proteins. |
| 22 | CD20 | CD20 | A surface molecule of B-lymphocyte during the differentiation of B-cells into plasma cells |
| 23 | GAB2 | GAB2 | A docking protein, which mainly mediates the interaction between receptor tyrosine kinases (RTKs) and non-RTK receptors. |