| Literature DB >> 11983058 |
Abstract
BACKGROUND: Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier.Entities:
Mesh:
Substances:
Year: 2002 PMID: 11983058 PMCID: PMC115205 DOI: 10.1186/gb-2002-3-4-research0017
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Example of a good pair of genes in the colon dataset. The expression values give almost full separation between normal and tumor tissues. Along the x-axis (horizontal) are the expression values of M63391, and along the y-axis (vertical) are the expression values of H08393. The points marked 'x' are normal tissues, and the tumor tissues are marked by an 'o'. The expression values have been through the preprocessing steps described in the text. Also plotted is the DLD axis and the class-decision boundary for these two genes. Note that the DLD axis and the decision boundary are orthogonal, but as a result of different scaling on the axes it does not appear so in the plot.
Figure 2Plots of prediction accuracy on the colon and ALL/AML dataset using four different FSS procedures and DLD prediction. (a) Colon dataset: LOOCV and DLD prediction. (b) Colon dataset: L-31-OCV and DLD prediction. (c) ALL/AML dataset: LOOCV and DLD prediction. (d) ALL/AML dataset: L-36-OCV and DLD prediction. Along the x-axis are the number of genes in the feature subsets, and average prediction accuracy is shown along the y-axis. The FSS procedures individual ranking (IR), all pairs (AP), forward selection (FS) and greedy pairs (GP) are explained in the text.
Top-ranked 50 genes (25 pairs) for ALL/AML class separation using AP (all pairs) ranking
| Pair rank | Gene ID | Pair | Gene | Gene rank | Gene annotation |
| 1 | M84526_at | 16.16 | 12.88 | 1 | DF D component of complement (adipsin) |
| 1 | M92287_at | 16.16 | 8.87 | 6 | CCND3 Cyclin D3 |
| 2 | M23197_at | 15.43 | 11.72 | 2 | CD33 CD33 antigen (differentiation antigen) |
| 2 | M31523_at | 15.43 | 11.0 | 3 | TCF3 Transcription factor 3 (E2A immunoglobulin enhancer binding factors E12/E47) |
| 3 | U46499_at | 13.56 | 9.14 | 5 | Glutathione S-transferase, microsomal |
| 3 | M31303_rna1_at | 13.56 | 6.52 | 36 | Oncoprotein 18 (Op18) gene |
| 4 | M63138_at | 13.53 | 9.46 | 4 | CTSD Cathepsin D (lysosomal aspartyl protease) |
| 4 | HG1612-HT1612_at | 13.53 | 8.47 | 10 | Macmarcks |
| 5 | X62320_at | 12.58 | 8.38 | 11 | GRN Granulin |
| 5 | Z14982_rna1_at | 12.58 | 5.54 | 93 | MHC-encoded proteasome subunit gene LAMP7-E1 (LMP7) |
| 6 | M31211_s_at | 12.41 | 8.61 | 9 | MYL1 Myosin light chain (alkali) |
| 6 | X62654_rna1_at | 12.41 | 7.32 | 17 | ME491 gene extracted from |
| 7 | M27891_at | 12.15 | 8.83 | 7 | CST3 Cystatin C (amyloid angiopathy and cerebral hemorrhage) |
| 7 | U89922_s_at | 12.15 | 5.24 | 115 | LTB Lymphotoxin-beta |
| 8 | X59417_at | 12.14 | 7.99 | 12 | Proteasome iota chain |
| 8 | X52056_at | 12.14 | 6.52 | 37 | SPI1 Spleen focus forming virus (SFFV) proviral integration oncogene spi1 |
| 9 | M19507_at | 12.09 | 7.03 | 24 | MPO Myeloperoxidase |
| 9 | M89957_at | 12.09 | 6.92 | 28 | IGB Immunoglobulin-associated beta (B29) |
| 10 | M84371_rna1_s_at | 11.9 | 7.18 | 23 | CD19 gene |
| 10 | U16954_at | 11.9 | 6.43 | 40 | (AF1q) mRNA |
| 11 | M63379_at | 11.74 | 7.56 | 13 | CLU Clusterin (complement lysis inhibitor; testosterone-repressed prostate message 2; apolipoprotein J) |
| 11 | M83667_rna1_s_at | 11.74 | 7.52 | 14 | NF-IL6-beta protein mRNA |
| 12 | M16038_at | 11.72 | 7.31 | 18 | LYN V-yes-1 Yamaguchi sarcoma viral related oncogene homolog |
| 12 | Y08612_at | 11.72 | 6.31 | 47 | Rabaptin-5 protein |
| 13 | D88422_at | 11.68 | 8.72 | 8 | Cystatin A |
| 13 | M11722_at | 11.68 | 7.18 | 21 | Terminal transferase mRNA |
| 14 | X66401_cds1_at | 11.67 | 5.89 | 69 | LMP2 gene extracted from |
| 14 | Y00433_at | 11.67 | 4.72 | 182 | GPX1 Glutathione peroxidase 1 |
| 15 | M63959_at | 11.61 | 6.59 | 34 | LRPAP1 Low density lipoprotein-related protein-associated protein 1 (alpha-2-macroglobulin receptor-associated protein 1) |
| 15 | X51521_at | 11.61 | 6.44 | 38 | VIL2 Villin 2 (ezrin) |
| 16 | Z15115_at | 11.37 | 7.49 | 15 | TOP2B Topoisomerase (DNA) II beta (180 kDa) |
| 16 | U10868_at | 11.37 | 5.55 | 92 | ALDH7 Aldehyde dehydrogenase 7 |
| 17 | Y12670_at | 11.25 | 5.67 | 82 | LEPR Leptin receptor |
| 17 | U77948_at | 11.25 | 5.6 | 87 | KAI1 Kangai 1 (suppression of tumorigenicity 6, prostate; CD82 antigen (R2 leukocyte antigen detected by monoclonal antibody IA4)) |
| 18 | U46751_at | 11.2 | 5.8 | 73 | Phosphotyrosine independent ligand p62 for the Lck SH2 domain mRNA |
| 18 | L06797_s_at | 11.2 | 5.77 | 74 | Probable G protein-coupled receptor LCR1 homolog |
| 19 | M95678_at | 11.19 | 5.73 | 77 | PLCB2 Phospholipase C, beta 2 |
| 19 | U72936_s_at | 11.19 | 5.38 | 101 | X-linked helicase II |
| 20 | S76617_at | 11.16 | 6.43 | 41 | BLK Protein-tyrosine kinase blk |
| 20 | L09209_s_at | 11.16 | 5.9 | 68 | APLP2 Amyloid beta (A4) precursor-like protein 2 |
| 21 | M55150_at | 11.11 | 6.7 | 32 | FAH Fumarylacetoacetate |
| 21 | M96803_at | 11.11 | 4.74 | 180 | SPTBN1 Spectrin, beta, non-erythrocytic 1 |
| 22 | X17042_at | 11.08 | 6.93 | 27 | PRG1 Proteoglycan 1, secretory granule |
| 22 | X99920_at | 11.08 | 5.15 | 124 | S100 calcium-binding protein A13 |
| 23 | S50223_at | 10.86 | 6.65 | 33 | HKR-T1 |
| 23 | U82759_at | 10.86 | 4.39 | 229 | GB DEF Homeodomain protein HoxA9 mRNA |
| 24 | J03589_at | 10.83 | 5.62 | 85 | Ubiquitin-like protein GDX |
| 24 | X12447_at | 10.83 | 5.16 | 122 | ALDOA Aldolase A |
| 25 | X74262_at | 10.8 | 5.89 | 70 | Retinoblastoma binding protein p48 |
| 25 | L19437_at | 10.8 | 5.13 | 127 | TALDO Transaldolase |
Top-ranked 50 genes (25 pairs) for colon tumor/normal class separation using AP (all pairs) ranking
| Pair rank | Gene ID | Pair | Gene | Gene rank | Gene annotation |
| 1 | M63391 | 10.02 | 5.57 | 2 | Human desmin gene, complete cds |
| 1 | H08393 | 10.02 | 5.47 | 4 | Collagen alpha 2(XI) chain ( |
| 2 | X12671 | 9.85 | 5.37 | 5 | Human gene for heterogeneous nuclear ribonucleoprotein (hnRNP) core protein A1 |
| 2 | Z50753 | 9.85 | 5.09 | 9 | |
| 3 | R87126 | 9.58 | 6.37 | 1 | Myosin heavy chain, nonmuscle ( |
| 3 | X63629 | 9.58 | 4.86 | 12 | |
| 4 | M36634 | 9.27 | 4.65 | 15 | Human vasoactive intestinal peptide (VIP) mRNA, complete cds |
| 4 | H11084 | 9.27 | 3.55 | 65 | Vascular endothelial growth factor ( |
| 5 | J05032 | 8.96 | 5.2 | 8 | Human aspartyl-tRNA synthetase alpha-2 subunit mRNA, complete cds |
| 5 | U19969 | 8.96 | 3.05 | 132 | Human two-handed zinc finger protein ZEB mRNA, partial cds |
| 6 | J02854 | 8.94 | 5.28 | 7 | Myosin regulatory light chain 2, smooth muscle isoform (human) (contains element TAR1 repetitive element) |
| 6 | R54097 | 8.94 | 3.93 | 46 | Translational initiation factor 2 beta subunit (human) |
| 7 | H06524 | 8.89 | 4.18 | 29 | Gelsolin precursor, plasma (human) |
| 7 | U22055 | 8.89 | 3.77 | 51 | Human 100 kDa coactivator mRNA, complete cds |
| 8 | M76378 | 8.74 | 4.81 | 13 | Human cysteine-rich protein (CRP) gene, exons 5 and 6 |
| 8 | T62947 | 8.74 | 4.12 | 34 | 60S ribosomal protein L24 ( |
| 9 | D21261 | 8.67 | 3.46 | 76 | SM22-alpha homolog (human) |
| 9 | H20709 | 8.67 | 2.69 | 203 | Myosin light chain alkali, smooth-muscle isoform (human) |
| 10 | X86693 | 8.64 | 4.16 | 30 | |
| 10 | D14812 | 8.64 | 2.66 | 211 | Human mRNA for ORF, complete cds |
| 11 | H09719 | 8.35 | 2.57 | 237 | Tubulin alpha-6 chain ( |
| 11 | L07648 | 8.35 | 2.31 | 321 | Human MXI1 mRNA, complete cds |
| 12 | X12369 | 8.25 | 3.27 | 97 | Tropomyosin alpha chain, smooth muscle (human) |
| 12 | R98842 | 8.25 | 3.05 | 131 | Prothymosin alpha ( |
| 13 | J04102 | 8.11 | 3.06 | 128 | Human erythroblastosis virus oncogene homolog 2 (ets-2) mRNA, complete cds |
| 13 | U14631 | 8.11 | 2.84 | 164 | Human 11 beta-hydroxysteroid dehydrogenase type II mRNA, complete cds |
| 14 | T63133 | 8.06 | 2.85 | 160 | Thymosin beta-10 (human) |
| 14 | T61661 | 8.06 | 2.51 | 255 | Profilin I (human) |
| 15 | T92451 | 8.06 | 4.12 | 33 | Tropomyosin, fibroblast and epithelial muscle-type (human) |
| 15 | U09587 | 8.06 | 3.46 | 74 | Human glycyl-tRNA synthetase mRNA, complete cds |
| 16 | T71025 | 8.0 | 4.34 | 24 | Human |
| 16 | L11706 | 8.0 | 3.18 | 104 | Human hormone-sensitive lipase (LIPE) gene, complete cds |
| 17 | Z48541 | 7.96 | 3.14 | 120 | |
| 17 | D25217 | 7.96 | 2.54 | 249 | Human mRNA (KIAA0027) for ORF, partial cds |
| 18 | M76378 | 7.94 | 5.04 | 10 | Human cysteine-rich protein (CRP) gene, exons 5 and 6 |
| 18 | T56604 | 7.94 | 3.89 | 48 | Tubulin beta chain ( |
| 19 | X54942 | 7.93 | 4.42 | 23 | |
| 19 | R44301 | 7.93 | 3.36 | 85 | Mineralocorticoid receptor ( |
| 20 | T90280 | 7.86 | 3.52 | 70 | Ribophorin II precursor (human) |
| 20 | T51534 | 7.86 | 3.01 | 137 | Cystatin C precursor (human) |
| 21 | R96357 | 7.81 | 2.89 | 156 | Polyadenylate-binding protein ( |
| 21 | R46753 | 7.81 | 2.64 | 216 | Cyclin-dependent kinase inhibitor 1 ( |
| 22 | M76378 | 7.75 | 4.51 | 20 | Human cysteine-rich protein (CRP) gene, exons 5 and 6 |
| 22 | D00860 | 7.75 | 3.38 | 81 | Ribose-phosphate pyrophosphokinase I (human) |
| 23 | X14958 | 7.56 | 4.53 | 19 | Human hmgI mRNA for high mobility group protein Y |
| 23 | X87159 | 7.56 | 2.63 | 221 | |
| 24 | T51023 | 7.55 | 4.12 | 32 | Heat shock protein HSP 90-beta (human) |
| 24 | D31716 | 7.55 | 2.83 | 168 | Human mRNA for GC box binding protein, complete cds |
| 25 | M26383 | 7.52 | 5.53 | 3 | Human monocyte-derived neutrophil-activating protein (MONAP) mRNA, complete cds |
| 25 | T47377 | 7.52 | 4.11 | 35 | S-100P protein (human) |