| Literature DB >> 17845722 |
Gabriel S Eichler1, Mark Reimers, David Kane, John N Weinstein.
Abstract
Interpretation of microarray data remains a challenge, and most methods fail to consider the complex, nonlinear regulation of gene expression. To address that limitation, we introduce Learner of Functional Enrichment (LeFE), a statistical/machine learning algorithm based on Random Forest, and demonstrate it on several diverse datasets: smoker/never smoker, breast cancer classification, and cancer drug sensitivity. We also compare it with previously published algorithms, including Gene Set Enrichment Analysis. LeFE regularly identifies statistically significant functional themes consistent with known biology.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17845722 PMCID: PMC2375025 DOI: 10.1186/gb-2007-8-9-r187
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1The LeFE algorithm illustrated schematically for a category of two genes. See Materials and methods for further details and Table 4 for a description of the steps (keyed to the circled letters). LeFE, Learner of Functional Enrichment.
Top 20 LeFE Categories for current versus never-smokers classification
| Rank | Category | FDR |
| 1 | Electron transporter activity BioCarta | ~0 |
| 1 | Carbohydrate metabolism (GO:0005975) | ~0 |
| 1 | Electron transport BioCarta | ~0 |
| 1 | Glutathione metabolism GenMAPP | ~0 |
| 1 | Pentose Phosphate Pathway BLACK | ~0 |
| 6 | Xenobiotic metabolism (GO:0006805) | 0.016 |
| 6 | O Glycans biosynthesis GenMAPP | 0.016 |
| 8 | PentosePathway BLACK | 0.045 |
| 9 | Protein amino acid O-linked glycosylation (GO:0006493) | 0.069 |
| 10 | Pentose phosphate pathway GenMAPP | 0.094 |
| 10 | Gamma hexachlorocyclohexane degradation GenMAPP | 0.094 |
| 12 | Tyrosine metabolism GenMAPP | 0.097 |
| 13 | Cysteine metabolism (GO:0006534) | 0.105 |
| 13 | G1pPathway BLACK | 0.105 |
| 13 | T cell differentiation (GO:0030217) | 0.105 |
| 13 | Fatty acid metabolism BioCarta | 0.105 |
| 17 | Retrograde vesicle-mediated transport, Golgi to ER (GO:0006890) | 0.11 |
| 18 | Aldehyde metabolism (GO:0006081) | 0.11 |
| 19 | Digestion (GO:0007586) | 0.114 |
| 19 | MAP00051 Fructose and mannose metabolism GenMAPP | 0.114 |
FDR, false discovery rate; GO, Gene Ontology; LeFE, Learner of Functional Enrichment.
Top 20 categories for sensitivity to gefitinib
| Rank | Category | FDR |
| 1 | Androgen up genes na | 0.347 |
| 2 | EGF receptor signaling pathway BioCarta | 0.408 |
| 3 | MAP00100 Sterol biosynthesis GenMAPP | 0.531 |
| 4 | Epidermal growth factor receptor signaling pathway (GO:0007173) | 0.531 |
| 5 | G1/S transition of mitotic cell cycle (GO:0000082) | 0.628 |
| 6 | positive regulation of I-kappaB kinase/NF-kappaB cascade (GO:0043123) | 0.628 |
| 7 | Cell-cell adhesion (GO:0016337) | 0.748 |
| 8 | Aspartate catabolism (GO:0006533) | 0.915 |
| 9 | Calcium-independent cell-cell adhesion (GO:0016338) | 0.915 |
| 10 | Regulation of glycolysis (GO:0006110) | 0.915 |
| 11 | Detection of pest, pathogen or parasite (GO:0009596) | 0.915 |
| 12 | MalatePathway BLACK | 0.915 |
| 12 | RarPathway BLACK | 0.915 |
| 14 | Epidermis development (GO:0008544) | 0.931 |
| 14 | Regulation of endocytosis (GO:0030100) | 0.931 |
| 16 | NFKB reduced Hinata et al 2003 | 0.998 |
| 17 | mRNA editing (GO:0006381) | ~1 |
| 17 | EMT DOWN Jechlinger et al 2003 | ~1 |
| 19 | Chloride transport (GO:0006821) | ~1 |
| 19 | Induction of apoptosis by intracellular signals (GO:0008629) | ~1 |
FDR, false discovery rate; GO, Gene Ontology; LeFE, Learner of Functional Enrichment.
Figure 2Importance plots (probability density distributions) of gene importance scores calculated by LeFE: smoker versus nonsmoker dataset. Shown are representative distributions for three gene categories (red curves) and their corresponding negative control gene sets (black curves). The curves were smoothed according to default settings of the 'density' function in R. The shifted secondary peaks, denoted by red arrows, for aldehyde metabolism and glutathione metabolism reflect genes important to the Random Forest models. The viral life cycle category contains no secondary peaks and therefore does not appear to be associated with smoking. See Results for further details.
Top 20 categories for breast cancer classification
| Rank | Category | FDR |
| 1 | Breast_cancer_estrogen_signalling GEArray | 0.02 |
| 1 | Drug_resistance_and_metabolism BioCarta | 0.02 |
| 1 | FRASOR_ER_UP Frasor_et_al_2004 | 0.02 |
| 4 | mta3Pathway BioCarta | 0.041 |
| 5 | Fatty_Acid_Synthesis BioCarta | 0.065 |
| 6 | Cell_cycle_checkpoint II | 0.065 |
| 6 | p35alzheimersPathway BioCarta | 0.065 |
| 6 | FRASOR_ER_DOWN Frasor_et_al_2004 | 0.065 |
| 9 | L-phenylalanine catabolism | 0.068 |
| 9 | UDP-glucose metabolism | 0.068 |
| 9 | Cell_cycle_regulator | 0.068 |
| 12 | Electron_transporter_activity BioCarta | 0.078 |
| 13 | skp2e2fPathway BioCarta | 0.1 |
| 14 | Fatty_acid_metabolism BioCarta | 0.1 |
| 15 | Ubiquinone biosynthesis | 0.102 |
| 16 | MAP00010_Glycolysis_Gluconeogenesis GenMAPP | 0.12 |
| 16 | MAPKKK_cascade GO | 0.12 |
| 18 | G1Pathway BioCarta | 0.134 |
| 18 | MAP00280_Valine_leucine_and_isoleucine_degradation GenMAPP | 0.134 |
| 20 | Response to metal ion | 0.144 |
FDR, false discovery rate; GO, Gene Ontology; LeFE, Learner of Functional Enrichment.
Figure 3A Comparison of LeFE with PathwayRF Shown is a comparison of Learner of Functional Enrichment (LeFE) and PathwayRF with respect to the size distribution of categories identified as important for breast cancer classification using the Gene Ontology (GO) biological process categories. (a) Scatter plots showing category rank versus category size. Ties in category ranks were resolved through random reordering. Red lines are lowess regressions. (b) Comparison of GO superset and subset ranks. Almost all points for PathwayRF are below the blue x = y line, indicating that supersets rank lower (better) than that their corresponding subsets. The panel for LeFE shows no such bias. (c) The GO biological process hierarchy (with the most general categories toward the top). Blue circles denote the top 25 categories ranked by PathwayRF; red circles denote the same for LeFE; and yellow circles denote categories in the top 25 for both algorithms. The mean GO level is 4.92 for PathwayRF and 7.08 for LeFE. There are no cases in which LeFE's top results are the ancestors of top results from PathwayRF. However, the black edges highlight eight cases in which LeFE found categories that are progeny of categories identified by PathwayRF.
Steps in the LeFE algorithm
| Step | Details |
| A | The gene expression matrix, signature vector, and gene categories (not shown in Figure 1) are entered |
| B | For each category |
| C | A negative control set consisting of |
| D | A composite matrix of gene expression |
| E | A random forest with 400 trees is built on |
| F | A vector |
| G | The statistical significance of the gene expression evidence for rejection of the null hypothesis that the mean of |
| H | Applying the above procedure to a single gene category creates three outputs. The first (denoted |
Shown are the steps in the Learner of Functional Enrichment (LeFE) algorithm, with the letters in the left-hand column corresponding to those in Figure 1.
Figure 4Replicate applications of LeFE to the breast cancer classification dataset. Scatter plot comparing the ranks resulting from two applications of Learner of Functional Enrichment (LeFE) to the breast cancer classification dataset, with n= 75, n= 6, and nTree = 400. The inset represents a blowup of the top 50 categories. r denotes the Pearson's correlation coefficient of the ranks (the Spearman correlation coefficient).
Correlation of ranks between two applications of LeFE with different random number generator seeds
| Dataset | Overall correlation | Top 50 correlations |
| Breast cancer | 0.98 | 0.70 |
| Current Smoker/Never- Smoker | 0.97 | 0.70 |
| Gefitinib | 0.95 | 0.61 |
LeFE, Learner of Functional Enrichment.