| Literature DB >> 19455233 |
Abstract
In gene selection for cancer classification using microarray data, we define an eigenvalue-ratio statistic to measure a gene's contribution to the joint discriminability when this gene is included into a set of genes. Based on this eigenvalue-ratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the proposed gene selection methods can select a compact gene subset which can not only be used to build high quality cancer classifiers but also show biological relevance.Entities:
Keywords: cancer classification; gene selection; microarray; statistical redundancy
Year: 2007 PMID: 19455233 PMCID: PMC2675847
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Gene p-values for statistical redundancy test and t-test.
| 1 | −2.12132 | 1.002851 | 0.57 | 0.98 |
| 2 | −2.82843 | 1.000007 | 0.98 | 0.98 |
| 3 | −3.53553 | 1.000011 | 0.95 | 0.98 |
| 4 | −4.24264 | 1.000016 | 0.98 | 0.98 |
| 5 | −4.94975 | 1.000021 | 0.96 | 0.98 |
| 6 | −5.65685 | 1.000028 | 0.96 | 0.98 |
| 7 | −6.36396 | 1.000035 | 0.93 | 0.98 |
| 8 | −7.07107 | 1.000044 | 0.92 | 0.98 |
| 9 | −7.77817 | 1.000053 | 0.92 | 0.98 |
| 10 | −8.48528 | 1.03608 | 0 | 0 |
| 11 | −9.19239 | 1.001265 | 0.7 | 0.98 |
| 12 | −9.89949 | 1.000086 | 0.95 | 0.98 |
| 13 | −10.6066 | 1.000098 | 0.85 | 0.98 |
| 14 | −11.3137 | 1.000112 | 0.9 | 0.98 |
| 15 | −12.0208 | 1.000126 | 0.9 | 0.98 |
| 16 | −12.7279 | 1.000141 | 0.88 | 0.98 |
| 17 | −13.435 | 1.000158 | 0.86 | 0.98 |
| 18 | −14.1421 | 1.000175 | 0.83 | 0.98 |
| 19 | −14.8492 | 1.000192 | 0.85 | 0.98 |
| 20 | −15.5563 | 1.082116 | 0 | 0 |
Gene filtering process in Algorithm 1.
| 1 | 20 | 19, 18, 17, 16, 15, 14, 13,12,11 |
| 2 | 10 | 9, 8, 7, 6, 5, 4, 3, 2, 1 |
Genes selected under different pair-wise correlations.
| 0.9|i-j| | 20 10 | 20 10 |
| 0.8|i-j| | 20 10 4 | 20 16 11 9 |
| 0.7|i-j| | 20 11 10 6 3 | 20 17 15 12 10 |
| 0.6|i-j| | 20 11 10 7 4 2 | 20 18 16 14 10 |
| 0.5|i-j| | 20 14 11 10 8 6 4 2 | 20 19 17 15 13 11 |
The 13 selected genes for Leukemia data.
| X95735 | Zyxin |
| M27891 | CST3 Cystatin C (amyloid angiopathy and cerebral hemorrhage) |
| M55150 | FAH Fumarylacetoacetate |
| M19507 | MPO Myeloperoxidase |
| U82759 | GB DEF = Homeodomain protein HoxA9 mRNA |
| U59632 | Cell division control related protein (hCDCrel-1) mRNA |
| X57398 | NME1 Non-metastatic cells 1, protein (NM23A) expressed in isoform a |
| M31523 | TCF3 Transcription factor 3 (E2A immunoglobulin enhancer binding factors E12/E47) |
| M11722 | Terminal deoxynucleotidyl transferase mRNA (TdT) |
| U52682 | IRF4 Interferon regulatory factor 4 |
| HG651–HT4201 | Adducin, Alpha Subunit, Alt. Splice 2 |
| U51010 | GB DEF = Nicotinamide N-methyltransferase gene, exon 1 and 5′ flanking region |
| M37457 | Na+,K+ -ATPase catalytic subunit alpha-III isoform gene |
Golub’s 50 genes that are selected and/or filtered.
| Z15115 | X15949 | ||||
| X63469 | U20998 | ||||
| X90858 | U46751 | M57710 | M80254 | ||
| Y00787 | M28130 | L08246 | M69043 | ||
| M91432 | X74262 | U32944 | |||
| M27891 | M63138 | M83652 | M19045 | ||
| X95735 | M23197 | M84526 | M16038 | X17042 | M62762 |
| M11722 | M92287 | U05259 | |||
| M55150 | U50136 | ||||
| M32304 | M81695 | ||||
| M31211 | X59417 | M91432 | U22376 | U26266 | D38073 |
| M31523 | Z15115 | ||||
| HG1612-HT1612 | M29696 | L47738 | D26156 | ||
| X95735 | X04085 | ||||
| M19507 | M96326 | ||||
| M13792 | M31303 | U35451 | |||
| M31523 | Y08612 | M31211 | U29175 | X63469 | Z69881 |
| M27891 | Y00787 | Y12670 | |||
| M19507 | X85116 | ||||
| M31523 | S50223 | M13792 | |||
The 46 selected genes for Colon cancer data.
| H08393 | COLLAGEN ALPHA 2(XI) CHAIN (Homo sapiens) |
| M63391 | Human desmin gene, complete cds |
| T47377 | S-100P PROTEIN (HUMAN) |
| R84411 | SMALL NUCLEAR RIBONUCLEOPROTEIN ASSOCIATED PROTEINS B AND B’ (HUMAN) |
| Z50753 | H.sapiens mRNA for GCAP-II/uroguanylin precursor |
| H55916 | PEPTIDYL-PROLYL CIS-TRANS ISOMERASE, MITOCHONDRIAL PRECURSOR (HUMAN) |
| T59878 | PEPTIDYL-PROLYL CIS-TRANS ISOMERASE B PRECURSOR (HUMAN) |
| H17434 | NUCLEOLIN (HUMAN) |
| M26383 | Human monocyte-derived neutrophil-activating protein (MONAP) mRNA, complete cds |
| T60778 | MATRIX GLA-PROTEIN PRECURSOR (Rattus norvegicus) |
| M82919 | Human gamma amino butyric acid (GABAA) receptor beta-3 subunit mRNA, complete cds |
| R42244 | ANTIGEN PEPTIDE TRANSPORTER 1 (HUMAN) |
| M80815 | H.sapiens a-L-fucosidase gene, exon 7 and 8, and complete cds. |
| R62549 | PUTATIVE SERINE/THREONINE-PROTEIN KINASE B0464.5 IN CHROMOSOME III (Caenorhabditis elegans) |
| T72863 | FERRITIN LIGHT CHAIN (HUMAN) |
| T41204 | P14780 92 KD TYPE V COLLAGENASE PRECURSOR |
| X68688 | H.sapiens ZNF33B gene |
| T61661 | PROFILIN I (HUMAN) |
| R52081 | TRANSCRIPTIONAL ACTIVATOR GCN5 (Saccharomyces cerevisiae) |
| D26129 | RIBONUCLEASE PANCREATIC PRECURSOR (HUMAN); contains element MER21 repetitive element |
| T47383 | ALKALINE PHOSPHATASE, PLACENTAL TYPE 1 PRECURSOR (Homo sapiens) |
| R80427 | C4-DICARBOXYLATE TRANSPORT SENSOR PROTEIN DCTB (Rhizobium leguminosarum) |
| M28128 | Homo sapiens eosinophil cationic protein (ECP) mRNA, complete cds |
| R73660 | GAMMA-INTERFERON-INDUCIBLE PROTEIN IP-30 PRECURSOR (HUMAN) |
| M96839 | Human proteinase 3 gene, exon 5 and cds (3′ end) |
| R65697 | ATP SYNTHASE A CHAIN (Trypanosoma brucei brucei) |
| H64807 | PLACENTAL FOLATE TRANSPORTER (Homo sapiens) |
| K02268 | Human enkephalin B (enkB) gene, exon 4 and 3′ flank and complete cds |
| H73908 | METALLOTHIONEIN-IA (Bos taurus) |
| M28373 | Homo sapiens amyloid protein A4 precursor mRNA, 3′ end of cds |
| M94132 | Human mucin 2 (MUC2) mRNA sequence |
| M23419 | INITIATION FACTOR 5A (HUMAN);contains element PTR5 repetitive element |
| J03210 | Human collagenase type IV mRNA, 3′ end |
| T53396 | 60S ACIDIC RIBOSOMAL PROTEIN P1 (Polyorchis penicillatus) |
| T72175 | IG KAPPA CHAIN PRECURSOR V-III REGION (HUMAN) |
| M29277 | Human isolate JuSo MUC18 glycoprotein mRNA (3′ variant), complete cds |
| M85289 | Human heparan sulfate proteoglycan (HSPG2) mRNA, complete cds. |
| R28373 | HEMOGLOBIN BETA CHAIN (HUMAN) |
| T67406 | COMPLEMENT C4 PRECURSOR (Homo sapiens) |
| T57882 | MYOSIN HEAVY CHAIN, NONMUSCLE TYPE A (Homo sapiens) |
| H02465 | GUANINE NUCLEOTIDE-BINDING PROTEIN G(I)/G(S)/G(O) GAMMA-7 SUBUNIT (Bos taurus) |
| M59807 | NATURAL KILLER CELLS PROTEIN 4 PRECURSOR (HUMAN); contains element MSR1 repetitive element |
| R38636 | UROKINASE PLASMINOGEN ACTIVATOR SURFACE RECEPTOR, GPI-AN CHORED (HUMAN) |
| T57780 | IG LAMBDA CHAIN C REGIONS (HUMAN) |
| R33481 | TRANSCRIPTION FACTOR ATF-A AND ATF-A-DELTA (Homo sapiens) |
| M31994 | Human cytosolic aldehyde dehydrogenase (ALDH1) gene, exon 13 |