| Literature DB >> 20433712 |
Austin H Chen1, Yin-Wu Tsau, Ching-Heng Lin.
Abstract
BACKGROUND: High-throughput microarray experiments now permit researchers to screen thousands of genes simultaneously and determine the different expression levels of genes in normal or cancerous tissues. In this paper, we address the challenge of selecting a relevant and manageable subset of genes from a large microarray dataset. Currently, most gene selection methods focus on identifying a set of genes that can further improve classification accuracy. Few or none of these small sets of genes, however, are biologically relevant (i.e. supported by medical evidence). To deal with this critical issue, we propose two novel methods that can identify biologically relevant genes concerning cancers.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20433712 PMCID: PMC2873479 DOI: 10.1186/1471-2164-11-274
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The flowchart of the random forest gene selection method.
Figure 2The hyperplane and the support vectors. The red line is the hyperplane which separates the two classes of samples. The blue points are the support vectors. A is the right classified sample but has less influence on the hyperplane. B is the right classified sample and has influence on the hyperplane. C is the incorrectly classified sample.
Parameter settings in SVM for SVST method.
| Parameter | Setting |
|---|---|
| Kernel Type | Linear |
| Gamma [Default: 1/(# of genes)] | 1/7200 for leukemia 1/12600 for prostate cancer |
| Cost | 1 |
The biologically relevant genes found in leukemia.
| SNR | t-TEST | LSD | TNoM | MDMR | WEPO | RFGS* | SVST | |
|---|---|---|---|---|---|---|---|---|
| Gene1 | ZYX | SNRPD1 | SNRPD1 | KLHDC10 | ZYX | PTMA | MGST1 | ZYX |
| Gene2 | TCF3 | PRPF18 | LAMP2 | BTG2 | APLP2 | CXCR4 | CD63 | TCF3 |
| Gene3 | CCND3 | LAMP2 | PRPF18 | CD68 | MGST1 | IFITM3 | SERPING1 | CD33 |
| Gene4 | CST3 | PRKCI | PRKCI | EIF4A1 | CSTA | ADA | QSOX1 | CD63 |
| Gene5 | CD33 | MSH2 | GTF2E2 | PFKL | CD63 | RPL23A | APLP2 | TCRA |
| Gene6 | CD79A | GTF2E2 | MSH2 | LIPE | CTSD | PLCB2 | SPTAN1 | |
| Gene7 | SPTAN1 | DCK | ALCAM | LYN | POU2AF1 | MPO | ||
| Gene8 | Macmarcks | CLU | CTSD | CST3 | ||||
| Gene9 | FAH | ACADM | HOXA9 | |||||
| Gene10 | PLEK | CD79A | ||||||
| Gene11 | MPO | Macmarcks | ||||||
| Gene12 | LRPAP1 | CCND3 | ||||||
| Gene13 | PSMB9 | |||||||
| Gene14 | IL18 | |||||||
| Gene15 | STOM | |||||||
| 8 | 7 | 7 | 6 | 12 | 5 | 9* | 15 | |
Comparison of biologically relevant genes in leukemia identified using 8 methods An * indicates the average number of biologically relevant genes found in the top 25 genes using the random forest gene selection method.
Figure 3The average number of hits generated by the random forest method regarding biologically relevant genes in leukemia.
Functions of the biologically relevant genes found in leukemia.
| Gene Name | Gene Function | Evidence References |
|---|---|---|
| ZYX | Adhesion plaque protein. Binds alpha-actinin and the CRP protein. May be a component of a signal transduction pathway that mediates adhesion-stimulated changes in gene expression. | [ |
| TCF3 | Heterodimers between TCF3 and tissue-specific basic helix-loop-helix (bHLH) proteins play major roles in determining tissue-specific cell fate during embryogenesis, like muscle or early B-cell differentiation. Binds to the kappa-E2 site in the kappa immunoglobulin gene enhancer. | [ |
| CD33 | In the immune response, may act as an inhibitory receptor upon ligand induced tyrosine phosphorylation by recruiting cytoplasmic phosphatase(s). | [ |
| CD63 | This antigen is associated with early stages of melanoma tumor progression. May play a role in growth regulation. Lysosome membrane; Multi-pass membrane protein. Late endosome membrane; Multi-pass membrane protein. Note = Also found in Weibel-Palade bodies of endothelial cells. Located in platelet dense granules. melanomas, hematopoietic cells, tissue macrophages. | [ |
| TCRA | T cell receptor alpha-chain. | [ |
| SPTAN1 | Fodrin, which seems to be involved in secretion, interacts with calmodulin in a calcium-dependent manner. | [ |
| MPO | Part of the host defense system of polymorphonuclear leukocytes. It is responsible for microbicidal activity against a wide range of organisms. | [ |
| CST3 | As an inhibitor of cysteine proteinases, this protein is thought to serve an important physiological role as a local regulator of this enzyme activity. | [ |
| HoxA9 | Sequence-specific transcription factor which is part of a developmental regulatory system that provides cells with specific positional identities on the anterior-posterior axis. | [ |
| CD79A | Required in cooperation with CD79B for initiation of the signal transduction cascade activated by binding of antigen to the B-cell antigen receptor complex. | [ |
| Macmarcks | May be involved in coupling the protein kinase C and calmodulin signal transduction systems. | [ |
| CCND3 | Essential for the control of the cell cycle at the G1/S (start) transition. Potentiates the transcriptional activity of ATF5. | [ |
| PSMB9 | The proteasome is a multicatalytic proteinase complex which is characterized by its ability to cleave peptides with Arg, Phe, Tyr, Leu, and Glu adjacent to the leaving group at neutral or slightly basic pH. The proteasome has an ATP-dependent proteolytic activity. This subunit is involved in antigen processing to generate class I binding peptides. | [ |
| IL18 | Augments natural killer cell activity in spleen cells and stimulates interferon gamma production in T-helper type I cells. | [ |
| STOM | Interacting selectively with one or more specific sites on a receptor molecule, a macromolecule that undergoes combination with a hormone, neurotransmitter, drug or intracellular messenger to initiate a change in cell function. | [ |
The 15 biologically relevant genes found in the top 25 ranked genes in leukemia selected using the SVST method.
The biologically relevant genes found in prostate cancer.
| SNR | t-Test | LSD | TNoM | MDMR | WEPO | RFGS* | SVST | |
|---|---|---|---|---|---|---|---|---|
| Gene1 | HPN | UCK2 | UCK2 | NFIX | HPN | NF2 | PTGDS | HPN |
| Gene2 | PTGDS | LPIN1 | LPIN1 | FOXG1 | PDIA5 | PTGDS | HPN | NELL2 |
| Gene3 | NELL2 | KIAA0746 | KIAA0746 | PML | ICA1 | KLK3 | CLU | PTGDS |
| Gene4 | S100A4 | GNB2L1 | GNB2L1 | AGR2 | CLU | NELL2 | S100A4 | |
| Gene5 | TARP | CAV2 | CAV2 | KLK3 | MYL6 | SERPINF1 | TNFSF10 | |
| Gene6 | COL4A6 | IGBP1 | IGBP1 | UAP1 | FLNA | HSPA8 | SERBP1 | |
| Gene7 | ANGPT1 | CASP3 | CASP3 | FBP1 | SERPING1 | XBP1 | RBP1 | |
| Gene8 | RBP1 | DOPEY2 | DOPEY2 | ACTG2 | ALCAM | GSTM1 | ||
| Gene9 | GSTM1 | PDIA5 | PDIA5 | AGR2 | ANGPT1 | |||
| Gene10 | LMO3 | |||||||
| Gene11 | COL4A6 | |||||||
| Gene12 | DIO2 | |||||||
| Gene13 | TARP | |||||||
| 9 | 9 | 9 | 3 | 7 | 8 | 9* | 13 | |
Comparison of biologically relevant genes in prostate cancer identified using 8 methods. An * indicates the average number of biologically relevant genes found in the top 25 genes using the random forest gene selection method.
Figure 4The average number of hits generated by the random forest method regarding biologically relevant genes in prostate cancer.
Functions of the biologically relevant genes found in prostate cancer.
| Gene Name | Gene Function | Evidence References |
|---|---|---|
| HPN | Plays an essential role in cell growth and maintenance of cell morphology. | [ |
| S100A4 | S100 calcium binding protein A4. | [ |
| RBP1 | Intracellular transport of retinol. | [ |
| ANGPT1 | Appears to play a crucial role in mediating reciprocal interactions between the endothelium and surrounding matrix and mesenchyme. | [ |
| COL4A6 | Type IV collagen is the major structural component of glomerular basement membranes (GBM), forming a 'chicken-wire' meshwork together with laminins, proteoglycans, and entactin/nidogen. | [ |
| NELL2 | Chicken nel-like 2 homolog with a wide and weak expression, expressed in adult and fetal brain and hemopoietic cells (nucleated peripheral blood cells) but not in B cells. | [ |
| GSTM1 | Conjugation of reduced glutathione to a wide number of exogenous and endogenous hydrophobic electrophiles. | [ |
| PTGDS | It is likely to play important roles in both maturation and maintenance of the central nervous system and male reproductive system. | [ |
| TARP | Transmembrane receptor activity. | [ |
| LMO3 | Lim domain only 3. | [ |
| DIO2 | Essential for providing the brain with appropriate levels of T3 (3,5,3'-triiodothyronine) during the critical period of development. | [ |
| SERBP1 | May play a role in the regulation of mRNA stability. | [ |
| TNFSF10 | Induces apoptosis. Its activity may be modulated by binding to the decoy receptors TNFRSF10C/TRAILR3, TNFRSF10D/TRAILR4 and TNFRSF11B/OPG that cannot induce apoptosis. | [ |
Comparison of related methods and results.
| Authors | Methods | Cancer Type | Results |
|---|---|---|---|
| Ben-Tor et al. [ | TNoM | Ovarian | 4/137 (Among the top 137 genes, 8 are cancer-related genes. 4 genes (GAPDH, SLPI, HE4 and keratin 18) are ovarian genes.) |
| Covell et al. [ | SOM | Bladder | 1/5 (1 out of the top 5 genes is a Bladder gene) |
| Up-regulated in tumor cells and down-regulated in normal cells | Breast | 1/3 (1 out of the top 3 genes is a Breast gene) | |
| CNS | 5/62 (5 out of the top 62 genes are CNS genes) | ||
| Colorectal | 2/37 (2 out of the top 37 genes are Colorectal genes) | ||
| Leukemia | 11/68 (11 out of the top 68 genes are Leukemia genes) | ||
| Lung | 1/4 (1 out of the top 4 genes is a Lung gene) | ||
| Lymphoma | 7/33 (7 out of the top 33 genes are Lymphoma genes) | ||
| Melanoma | 3/12 (3 out of the top 12 genes are melanoma genes) | ||
| Mesothelioma | 0/49 (0 out of the top 49 genes is a Mesothelioma gene) | ||
| Pancreas | 2/9 (2 out of the top 9 genes are Pancreas genes) | ||
| Prostate | 6/36 (6 out of the top 36 genes are Prostate genes) | ||
| Renal | 4/26 (4 out of the top 26 genes are Renal genes) | ||
| Uterine | 1/42 (1 out of the top 42 genes is a Uterine gene) | ||
Statistically sound performance comparison for the leukemia dataset.
| Methods | 25 genes | 50 genes | 75 genes | 100 genes | 125 genes | 150 genes |
|---|---|---|---|---|---|---|
| SNR | .90(.87 to 1) | .93(.87 to .99) | .94(.89 to .1) | .95(.87 to .99) | .94(.88 to 1) | .96(.85 to 1) |
| t-Test | .88(.67 to 1) | .91(.66 to .99) | .91(.69 to .99) | .91(.65 to 1) | .92(.69 to .99) | .92(.64 to 1) |
| LSD | .85(.50 to 1) | .88(.53 to .95) | .89(.51 to .94) | .89(.52 to 1) | .87(.54 to .97) | .89(.54 to 1) |
| TNoM | .73(.67 to .91) | .73(.65 to .90) | .73(.66 to .91) | .73(.67 to .90) | .76(.69 to .92) | .75(.67 to .92) |
| MDMR | .91(.79 to 1) | .93(.74 to .98) | .93(.72 to .96) | .94(.78 to 98) | .94(.76 to .1) | .94(.79 to .99) |
| WEPO | .64(.46 to .79) | .61(.51 to .79) | .60(.50 to 76) | .67(.52 to 81) | .69(.50 to .85) | .73(.53 to .86) |
| RFGS | .86(.75 to .95) | .85(.76 to .98) | .85(.75 to .94) | .86(.75 to .95) | .88(.78 to .99) | .86(.73 to .97) |
| SVST | .95(.88 to 1) | .98(.87 to .99) | .97(.85 to .1) | .98(.87 to 1) | .98(.88 to .99) | .97(.87 to 1) |
Figure 5Statistically sound performance comparison among 8 methods for the leukemia dataset.
Statistically sound performance comparison for the prostate cancer dataset.
| Methods | 25 genes | 50 genes | 75 genes | 100 genes | 125 genes | 150 genes |
|---|---|---|---|---|---|---|
| SNR | .86(.82 to .95) | .86(.82 to .95) | .85(.80 to .97) | .86(.83 to .95) | .83(.80 to .93) | .84(.82 to .96) |
| t-Test | .80(.67 to .94) | .82(.66 to .92) | .82(.67 to .90) | .81(.67 to .93) | .81(.68 to .93) | .80(.69 to .95) |
| LSD | .79(.65 to .94) | .81(.63 to .93) | .81(.62 to .95) | .81(.64 to .95) | .81(.67 to .94) | .82(.64 to .93) |
| TNoM | .65(.53 to .80) | .65(.51 to .78) | .63(.50 to .79) | .65(.53 to .80) | .65(.52 to .78) | .63(.51 to .81) |
| MDMR | .87(.76 to .95) | .84(.75 to .97) | .86(.76 to .98) | .86(.75 to .97) | .87(.78 to .95) | .87(.74 to .98) |
| WEPO | .56(.43 to .70) | .57(.44 to .69) | .67(.53 to .74) | .70(.55 to .79) | .68(.52 to .75) | .73(.64 to .86) |
| RFGS | .80(.65 to .91) | .81(.68 to .92) | .78(.63 to .91) | .82(.68 to .92) | .79(.65 to .90) | .81(.67 to .92) |
| SVST | .92(.85 to .95) | .90(.83 to .96) | .91(.84 to .95) | .92(.87 to .94) | .92(.82 to .95) | .93(.81 to .97) |
Figure 6Statistically sound performance comparison among 8 methods for the prostate cancer dataset.
Figure 7The gene-gene interaction graph of biologically relevant leukemia genes identified by the SVST method.
The gene-gene interaction among identified leukemia genes.
| Biologically relevant | Number of interacted | Bridge gene between | Biologically relevant |
|---|---|---|---|
| ZYX | 15 | NEDD9 | TCF3 |
| ATXN1 | CST3 | ||
| TES | SPTAN1 | ||
| TCF3 | 47 | NEDD9 | ZYX |
| CREBBP | HOXA9 | ||
| CD33 | 3 | PTPN6 | CD79A |
| SRC | SPTAN1 | ||
| CD63 | 13 | HLADRA | TCRA |
| TCRA | 15 | HLADRA | CD63 |
| HSPA5 | MPO | ||
| SPTAN1 | 46 | ACTB | MPO |
| CASP3 | IL18 | ||
| TES | ZYX | ||
| SRC | CD33 | ||
| MPO | 35 | ACTB | SPTAN1 |
| HSPA5 | TCRA | ||
| CST3 | 9 | ATXN1 | ZYX |
| HOXA9 | 13 | CREBBP | TCF3 |
| CD79A | 16 | PTPN6 | CD33 |
| MACMARCKS | 1 | - | |
| CCND3 | 26 | - | |
| PSMB9 | 13 | - | |
| IL18 | 8 | CASP3 | SPTAN1 |
| STOM | 8 | - | |
Preliminary study of gene-gene interaction of biologically relevant leukemia genes identified by the SVST method