| Literature DB >> 17316436 |
Shuangge Ma1, Xiao Song, Jian Huang.
Abstract
BACKGROUND: A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17316436 PMCID: PMC1821041 DOI: 10.1186/1471-2105-8-60
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Gap statistics as a function of number of clusters. Red solid line: Colon data; Green dashed line: Nodal data.
Comparison of estimation and prediction performance of different approaches.
| Lasso | Simple | GLasso | SGLasso | ||
| Colon | Nonzero | 19 | 500 | 500 | 22 |
| Cluster | - | 9 | 9 | 8 | |
| Prediction | 0.129 | 0.226 | 0.161 | 0.129 | |
| Nodal | Nonzero | 37 | 500 | 233 | 66 |
| Cluster | - | 20 | 9 | 17 | |
| Prediction | 0.245 | 0.163 | 0.122 | 0.122 | |
| Follicular | Nonzero | 15 | 729 | 233 | 79 |
| Cluster | - | 34 | 2 | 13 | |
| Prediction | 5.9 | 2.3 | 0.5 | 6.5 | |
| MCL | Nonzero | 15 | 834 | 132 | 28 |
| Cluster | - | 30 | 3 | 3 | |
| Prediction | 8.2 | 6.2 | 19.3 | 20.3 | |
Nonzero: number of genes in the final models. Cluster: number of clusters in the final models. Prediction: for Colon and Nodal, Leave-One-Out prediction error; For Follicular and MCL, the logrank statistic.
Colon data: genes with nonzero estimates from SGLasso.
| Est. | Gene ID | Gene Description |
| 0.229 | Hsa.1047 | Small Nuclear Ribonucleoprotein Associated Protein B/B'; |
| 0.385 | Hsa.1410 | TRANSLATIONAL INITIATION FACTOR 2 BETA SUBUNIT (HUMAN); |
| -0.058 | Hsa.1039 | Homo sapiens secretory pancreatic stone protein (PSP-S) mRNA |
| -0.110 | Hsa.1013 | PROFILIN I (HUMAN) |
| -0.018 | Hsa.2809 | IG MU CHAIN C REGION (HUMAN) |
| -0.072 | Hsa.42949 | ESTROGEN SULFOTRANSFERASE (Bos taurus) |
| -0.155 | Hsa.1454 | Human gamma amino butyric acid (GABAA) receptor beta-3 subunit mRNA |
| 0.233 | Hsa.8214 | PUTATIVE SERINE/THREONINE-PROTEIN KINASE B0464.5 I |
| 0.193 | Hsa.1209 | P14780 92 KD TYPE V COLLAGENASE PRECURSOR |
| -0.299 | Hsa.8147 | Human desmin gene, complete cds. |
| -0.511 | Hsa.37937 | MYOSIN HEAVY CHAIN, NONMUSCLE (Gallus gallus) |
| 0.181 | Hsa.462 | Human serine kinase mRNA, complete cds. |
| 0.484 | Hsa.627 | Human monocyte-derived neutrophil-activating protein (MONAP) mRNA |
| 0.097 | Hsa.601 | Human aspartyl-tRNA synthetase alpha-2 subunit mRNA |
| -0.525 | Hsa.696 | Human cleavage stimulation factor |
| 0.238 | Hsa.1682 | TRISTETRAPROLINE (HUMAN) |
| -0.492 | Hsa.1832 | MYOSIN REGULATORY LIGHT CHAIN 2, SMOOTH MUSCLE ISOFORM |
| -0.254 | Hsa.612 | Human beta adaptin mRNA |
| 0.967 | Hsa.6814 | COLLAGEN ALPHA 2(XI) CHAIN (Homo sapiens) |
| 0.189 | Hsa.3306 | Human gene for heterogeneous nuclear ribonucleoprotein core protein A1. |
| 0.227 | Hsa.3016 | S-100P PROTEIN (HUMAN) |
| 0.167 | Hsa.2928 | H.sapiens mRNA for p cadherin. |
Nodal data: genes with nonzero estimates from SGLasso.
| Estimate | Gene ID | Gene Description |
| 0.008 | D63486_at | Human mRNA for KIAA0152 gene, complete cds |
| 0.094 | X74496_at | H.sapiens mRNA for prolyl oligopeptidase |
| 0.014 | Y10260_at | H.sapiens EYA1 gene |
| -0.063 | U27185_at | Human RAR-responsive (TIG1) mRNA, complete cds |
| -0.062 | U69263_at | Human matrilin-2 precursor mRNA, partial cds |
| 0.001 | D87673_at | Human mRNA for heat shock transcription factor 4, complete cds |
| -0.100 | M83233_at | Homo sapiens transcription factor (HTF4A) mRNA, complete cds |
| 0.011 | U07223_at | Human beta2-chimaerin mRNA, complete cds |
| -0.121 | X16354_at | Human mRNA for transmembrane carcinoembryonic antigen BGPa |
| 0.451 | M59916_at | Human acid sphingomyelinase (ASM) mRNA, complete cds |
| 0.047 | U88898_r_at | Human endogenous retroviral H protease |
| 0.104 | X97630_at | H.sapiens mRNA for serine/threonine protein kinase EMK |
| 0.001 | S83309_s_at | germ cell nuclear factor |
| -0.101 | D87071_at | Human mRNA for KIAA0233 gene, complete cds |
| 0.013 | J00277_at | Human c-Ha-ras1 proto-oncogene, complete coding sequence |
| 0.823 | J02982_f_at | Human glycophorin B mRNA, complete cds |
| 0.001 | M69013_at | Human guanine nucleotide-binding regulatory protein mRNA |
| -0.096 | X92396_at | H.sapiens mRNA for novel gene in Xq28 region |
| -0.091 | Y00815_at | Human mRNA for LCA-homolog. LAR protein |
| -0.116 | AB000114_at | Human mRNA for osteomodulin |
| 0.016 | D50532_at | Human mRNA for macrophage lectin 2, complete cds |
| -0.072 | M83221_at | Homo sapiens I-Rel mRNA, complete cds |
| -0.061 | X76717_at | H.sapiens MT-1l mRNA |
| 0.031 | J02645_at | Human translational initiation factor (eIF-2), alpha subunit mRNA |
| -0.070 | X53587_at | Human mRNA for integrin beta 4 |
| -1.323 | AFFX-CreX-3_st | X03453 Bacteriophage P1 cre recombinase protein |
| -0.083 | D80009_at | Human mRNA for KIAA0187 gene |
| 0.019 | J04615_at | Human lupus autoantigen mRNA, complete cds |
| 0.009 | L20861_at | Homo sapiens proto-oncogene (Wnt-5a) mRNA |
| -0.056 | L20971_at | Human phosphodiesterase mRNA, complete cds |
| -0.026 | M84820_s_at | Human retinoid X receptor beta (RXR-beta) mRNA, complete cds |
| 0.156 | U37408_at | Human CtBP mRNA, complete cds |
| 0.079 | U89336_cds7_at | receptor for advanced glycosylation end products gene |
| 0.070 | X76059_at | H.sapiens mRNA for YRRM1 |
| -0.084 | X82207_at | H.sapiens mRNA for beta-centractin (PC3) |
| 0.025 | X99687_at | H.sapiens mRNA for methyl-CpG-binding protein 2 |
| 0.501 | L38933_rna1_at | the longest open reading frame predicts a protein of 202 amino acids |
| -0.034 | U02493_at | Human 54 kDa protein mRNA, complete cds |
| 0.016 | U77846_rna1_s_at | Human elastin gene, Human elastin gene |
| -0.102 | X07618_s_at | Human mRNA for cytochrome P450 db1 variant a |
| -0.614 | X15357_at | Human mRNA for natriuretic peptide receptor (ANP-A receptor) |
| 0.033 | Y08265_s_at | H.sapiens mRNA for DAN26 protein, partial |
| -0.033 | HG3521-HT3715_at | Ras-Related Protein Rap1b |
| -0.078 | L33075_at | Homo sapiens ras GTPase-activating-like protein (IQGAP1) mRNA |
| 0.011 | X66364_at | H.sapiens mRNA PSSALRE for serine/threonine protein kinase |
| -0.094 | AF009674_at | Homo sapiens axin (AXIN) mRNA, partial cds. |
| -0.379 | AFFX-BioB-3_at | J04423 E coli bioB gene biotin synthetase |
| -0.184 | AFFX-BioDn-3_at | J04423 E coli bioD gene dethiobiotin synthetase |
| 0.058 | HG2465-HT4871_at | Dna-Binding Protein Ap-2, Alt. Splice 3 |
| 0.017 | D00762_at | Human mRNA for proteasome subunit HC8 |
| -0.090 | HG1612-HT1612_at | Macmarcks |
| -0.062 | U09178_s_at | Human dihydropyrimidine dehydrogenase mRNA, complete cds |
| 0.017 | U29175_at | Human transcriptional activator (BRG1) mRNA, complete cds. |
| -1.184 | U39817_at | Human Bloom syndrome protein (BLM) mRNA, complete cds |
| -0.211 | U41344_at | Human prolargin (PRELP) gene, 5' flanking sequence |
| -0.080 | X16832_at | Human mRNA for cathepsin H (EC 3.4.22.16) |
| 0.013 | X99226_at | H.sapiens mRNA for FAA protein |
| 0.000 | Z49878_at | H.sapiens mRNA for guanidinoacetate N-methyltransferase |
| -0.105 | X68560_at | H.sapiens SPR-2 mRNA for GT box binding protein |
| 0.048 | HG3998-HT4268_at | L-Glycerol-3-Phosphate:Nad+ Oxidoreductase |
| 0.001 | U79285_at | Human clone 23828 mRNA sequence |
| 0.003 | X79981_at | H.sapiens VE-cadherin mRNA |
| 0.024 | X98176_at | H.sapiens mRNA for MACH-beta-1 protein. |
| 0.001 | Z18956_at | H.sapiens mRNA for taurine transporter |
| 0.014 | U18548_at | Human GPR12 G protein coupled-receptor gene, complete cds. |
| -0.281 | Z22536_at | Homo sapiens ALK-4 mRNA, complete CDS |
Follicular data: genes with nonzero estimates from SGLasso.
| Estimate | Gene ID | Gene Description |
| 0.035 | 227117_at | CDNA FLJ40762 fis, clone TRACH2002847 |
| -0.049 | 228671_at | hypothetical protein LOC199953 |
| 0.070 | 228776_at | gap junction protein, alpha 7, 45 kDa (connexin 45) |
| 0.055 | 230448_at | hypothetical protein MGC15523 |
| -0.042 | 230297_x_a | synaptic Ras GTPase activating protein 1 homolog |
| 0.091 | 230938_x_a | activating transcription factor 5 |
| 0.053 | 209863_s_a | tumor protein p73-like |
| 0.002 | 224125_at | pleckstrin homology domain containing, family N member 1 |
| 0.002 | 230826_at | monocyte to macrophage differentiation-associated 2 |
| 0.001 | 238605_at | Transcribed locus |
| 0.005 | 222545_s_a | chromosome 10 open reading frame 57 |
| 0.062 | 239565_at | CDNA FLJ37010 fis, clone BRACE2009732 |
| 0.022 | 242904_x_a | |
| 0.026 | 222015_at | Casein kinase 1, epsilon |
| 0.032 | 219361_s_a | interferon stimulated exonuclease gene 20 kDa-like 1 |
| 0.046 | 223333_s_a | angiopoietin-like 4 |
| 0.084 | 224357_s_a | membrane-spanning 4-domains, subfamily A, member 4 |
| 0.046 | 204470_at | chemokine (C-X-C motif) ligand 1 |
| 0.023 | 205114_s_a | chemokine (C-C motif) ligand 3 |
| 0.118 | 208470_s_a | haptoglobin |
| 0.058 | 237542_at | Transcribed locus |
| 0.052 | 202953_at | complement component 1, q subcomponent, B chain |
| 0.022 | 206214_at | phospholipase A2, group VII |
| 0.085 | 210321_at | granzyme H (cathepsin G-like 2, protein h-CCPX) |
| 0.054 | 214038_at | chemokine (C-C motif) ligand 8 |
| -0.074 | 201841_s_a | heat shock 27 kDa protein 1 |
| -0.028 | 211429_s_a | serpin peptidase inhibitor, clade A, member 1 |
| -0.120 | 211470_s_a | sulfotransferase family, cytosolic, 1C, member 1 |
| 0.081 | 216950_s_a | Fc fragment of IgG, high affnity Ia, receptor (CD64) |
| -0.056 | 222694_at | hypothetical protein MGC2752 |
| -0.042 | 232618_at | chromosome Y open reading frame 15A |
| -0.016 | 232874_at | Dedicator of cytokinesis 9 |
| -0.034 | 237222_at | |
| 0.024 | 240105_at | Chromosome 21 open reading frame 66 |
| -0.016 | 241755_at | Ubiquinol-cytochrome c reductase core protein II |
| -0.032 | 242306_at | TPA regulated locus |
| -0.040 | 243705_at | DDHD domain containing 1 |
| 0.049 | 237131_at | hypothetical protein LOC645469 |
| 0.042 | 238359_at | |
| -0.045 | 242601_at | hypothetical protein LOC253012 |
| 0.049 | 243101_x_a | Chromosome 20 open reading frame 160 |
| 0.059 | 219360_s_a | transient receptor potential cation channel, subfamily M |
| -0.006 | 226665_at | AHA1, activator of heat shock 90 kDa protein ATPase homolog 2 |
| -0.007 | 231852_at | |
| -0.009 | 241946_at | zinc finger, DHHC-type containing 21 |
| -0.009 | 208067_x_a | ubiquitously transcribed tetratricopeptide repeat gene |
| -0.009 | 210973_s_a | fibroblast growth factor receptor 1 |
| -0.006 | 220235_s_a | chromosome 1 open reading frame 103 |
| -0.001 | 227697_at | suppressor of cytokine signaling 3 |
| -0.006 | 227404_s_a | Early growth response 1 |
| -0.004 | 235102_x_a | GRB2-related adaptor protein |
| -0.007 | 209189_at | v-fos FBJ murine osteosarcoma viral oncogene homolog |
| -0.001 | 213281_at | V-jun sarcoma virus 17 oncogene homolog (avian) |
| -0.008 | 201694_s_a | early growth response 1 |
| -0.003 | 202672_s_a | activating transcription factor 3 |
| 0.002 | AFFX-r2-Hs | |
| -0.088 | 223710_at | chemokine (C-C motif) ligand 26 |
| -0.091 | 228844_at | solute carrier family 13, member 5 |
| -0.023 | 233831_at | hypothetical protein LOC644752 |
| -0.058 | 234062_at | CDNA FLJ12400 fis, clone MAMMA1002782 |
| 0.021 | 239574_at | Enoyl Coenzyme A hydratase domain containing 3 |
| 0.044 | 240142_at | |
| -0.140 | 215536_at | major histocompatibility complex, class II, DQ beta 2 |
| -0.045 | 218935_at | EH-domain containing 3 |
| -0.041 | 211177_s_a | thioredoxin reductase 2 |
| 0.014 | 231119_at | replication factor C (activator 1) 3, 38 kDa |
| 0.032 | 232475_at | chromosome 15 open reading frame 42 |
| 0.022 | 237546_at | Transcribed locus |
| 0.047 | 238201_at | |
| 0.072 | 239670_at | |
| 0.051 | 240607_at | Hypothetical protein LOC150271 |
| 0.039 | 241411_at | weakly similar to NP 055301.1 neuronal thread protein AD7c-NTP |
| 0.002 | 223745_at | F-box protein 31 |
| 0.002 | 230280_at | tripartite motif-containing 9 |
| -0.008 | 226771_at | ATPase, Class I, type 8B, member 2 |
| -0.010 | 226869_at | Full length insert cDNA clone ZD77F06 |
| -0.002 | 203029_s_a | protein tyrosine phosphatase, receptor type, N polypeptide 2 |
| -0.005 | 209459_s_a | 4-aminobutyrate aminotransferase |
| -0.009 | 221790_s_a | low density lipoprotein receptor adaptor protein 1 |
MCL data: genes with nonzero estimates from SGLasso.
| Estimate | Gene ID | Gene Description |
| 0.011 | 24860 | Hs.522568, Phosphatidylinositol-specific phospholipase C |
| 0.005 | 26556 | Hs.173438, Fas apoptotic inhibitory molecule |
| 0.018 | 28537 | Hs.120949, CD36 antigen |
| 0.030 | 28640 | Hs.84113, Cyclin-dependent kinase inhibitor 3 |
| -0.004 | 28679 | Hs.469723, RNA, U17D small nucleolar |
| 0.005 | 30010 | Hs.85137, Cyclin A2 |
| 0.009 | 32690 | Hs.3104, Kinesin family member 14 |
| 0.010 | 32973 | Hs.58992, SMC4 structural maintenance of chromosomes 4-like 1 |
| 0.078 | 27095 | Hs.156346, Topoisomerase (DNA) II alpha 170 kDa |
| 0.094 | 30157 | Hs.497741, Centromere protein F, 350/400 ka |
| 0.100 | 30898 | Hs.532755, Likely ortholog of mouse gene trap locus 3 |
| 0.084 | 31049 | Hs.241517, Polymerase (DNA directed), theta |
| 0.080 | 34771 | Hs.524390, Tubulin, alpha, ubiquitous |
| -0.067 | 16541 | Hs.30054, Coagulation factor V |
| -0.065 | 23972 | Hs.431009, Zinc finger protein, multitype 2 |
| -0.036 | 24262 | |
| -0.061 | 24379 | Hs.120260, Immunoglobulin superfamily receptor translocation associated 1 |
| -0.056 | 25058 | Hs.298990, actin dependent regulator of chromatin |
| -0.101 | 25171 | Hs.21388, Zinc finger, DHHC domain containing 21 |
| -0.103 | 26192 | Hs.530274, Aldolase B, fructose-bisphosphate |
| -0.037 | 27659 | Hs.437336, Hypothetical protein MGC61571 |
| -0.091 | 29653 | Hs.40758, RAB30, member RAS oncogene family |
| -0.053 | 31196 | Hs.508010, Fibronectin type III domain containing 3 |
| -0.033 | 32497 | |
| -0.076 | 32947 | Hs.522863, Chromosome Y open reading frame 15A |
| -0.076 | 33506 | Hs.364045, Hypothetical protein LOC92270 |
| -0.059 | 33892 | Hs.105956, Alpha 1,4-galactosyltransferase |
| -0.060 | 34438 | Hs.368912, Dipeptidylpeptidase 4 |
Figure 2Paths of parameter estimates for Lasso, GLasso and SGLasso. Red lines, cluster 1; Blue lines, cluster 2; Green lines, cluster 3. Solid lines, β1, β4 and β7; Dashed lines, β2, β5, and β8; Dashed-Dotted lines, β3, β6, and β9. The grey lines show the selected tuning parameters. C1, C2 and C3 in the lower-left panel denote clusters 1, 2 and 3, respectively.