| Literature DB >> 18818771 |
Abstract
Analysis of microarray and other high throughput data often involves identification of genes consistently up or down-regulated across samples as the first step in extraction of biological meaning. This gene-level paradigm can be limited as a result of valid sample fluctuations and biological complexities. In this report, we describe a novel method, SLEPR, which eliminates this limitation by relying on pathway-level consistencies. Our method first selects the sample-level differentiated genes from each individual sample, capturing genes missed by other analysis methods, ascertains the enrichment levels of associated pathways from each of those lists, and then ranks annotated pathways based on the consistency of enrichment levels of individual samples from both sample classes. As a proof of concept, we have used this method to analyze three public microarray datasets with a direct comparison with the GSEA method, one of the most popular pathway-level analysis methods in the field. We found that our method was able to reproduce the earlier observations with significant improvements in depth of coverage for validated or expected biological themes, but also produced additional insights that make biological sense. This new method extends existing analyses approaches and facilitates integration of different types of HTP data.Entities:
Mesh:
Year: 2008 PMID: 18818771 PMCID: PMC2546449 DOI: 10.1371/journal.pone.0003288
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Schematic overview of SLEPR method (see Materials And Methods section for more details).
The goal of SLEPR method is to use sample-level differentiated genes for each sample to capture the sample-level specificity for gene-level variance, and then use functional enrichment levels of these gene lists to evaluate pathway-level data consistency associated with the contrasted classes in study: Inclusion/Target class versus Exclusion/Background class (e.g., NGT versus DM2+IGT in the human type 2 diabetes mellitus (DM2) study [23]). Step 1 of SLEPR is to assign the samples to the Inclusion class (I) and Exclusion class (E). Then for each genes or features in study (i.e., G1, G2, G3…Gn), consider the data distribution and use median and MADe of data in samples of class E to set up the cutoff for sample-level differentiated genes for each genes (Step 2). Each gene Gi will have its own cutoff to determine if it is a sample-level differentiated gene. Gene Gi will be selected as the sample-level differentiated gene for a sample if the data of gene Gi in this sample is beyond the cutoff (Step 3). Each sample including samples from both I and E classes will have its own sample-level differentiated gene list (L1, L2, L3….) (Step 3). To determine the functional enrichment levels in any a priori defined gene sets, pathways, or functional categories (e.g., GO terms) for each of the sample-level differentiated lists, batch computation of Fisher's exact test based enrichment analysis is performed and the results are merged automatically into a matrix (e.g., Stanford format file) of enrichment scores which consists of enrichment scores of each sample from class I and E for each term (T1, T2, T3, …Tm), which are transformed as −log10(p-value) of Fisher's exact test p-values (Step 4). To determine whether a gene set, pathway, or functional category (e.g., GO term) is significant in terms of how consistent it is enriched across samples, a pathway ranking algorithm is applied to the enrichment score matrix to obtain pathway ranking scores, which considers both positive contribution of class I and negative contribution of class E from individual sample-level enrichment level (see details in Materials And Methods section) (Step 5). To determine the statistical significance of actual ranking of a gene set or a GO term in the contrasted classes: I versus E, the entire procedure (steps 1 to 5) is repeated 1000 times or more by simply permutating the class labels for each selected samples (Step 6). The pathway ranking scores of each term from each permutation are pooled together and used to build the empirically derived distribution of pathway ranking scores from the permutation procedure. The permutated p-value for each term is calculated as the fraction of random trials resulting in permutated pathway ranking scores higher than the actual score from the original sample assignments.
Top ranked GSEA annotation terms in SLEPR Analysis Result for Comparison of NGT vs DM2+IGT in human DM2 data.
| GSEA_TermName | GSEA_TermID | Combined Ranking | Permutated P_Val | FDR q_Val |
| Mitochondrial genes | HUMAN_MITODB_6_2002 | 1.009408 | 0.00011557 | 0.144 |
| Mitochondrial genes | MITOCHONDRIA | 0.96734041 | 0.00013403 | 0.0835 |
| Genes involved in electron transport | ELECTRON_TRANSPORT_CHAIN | 0.73313179 | 0.00033066 | 0.13733333 |
| Oxidative Phosphorylation | MOOTHA_VOXPHOS | 0.63908609 | 0.00048395 | 0.15075 |
| PGC related genes | PGC | 0.41673472 | 0.00146549 | 0.3652 |
| RIBOSOMAL_PROTEINS | RIBOSOMAL_PROTEINS | 0.26145395 | 0.00456661 | 0.9483 |
Microarray data for human type 2 diabetes mellitus (DM2) study [23] was re-analyzed with SLEPR method to compare either NGT versus IGT+DM2 (IGT+DM2 as Exclusion/Background class in SLEPR). One-side MADe method was used in SLEPR for selection of highly expressed genes as sample-level differentiated genes in the comparison of NGT versus IGT+DM2. Two-side MADe method was also used for selection of highly and lowly expressed genes as sample-level differentiated genes, and similar result was obtained (see Table S4). 1000 permutations were performed. Combined_Ranking: Combined ranking scores for terms; Permutated_P_Val: p-value of terms derived from permutated data; FDR_q_Val: FDR of terms derived from permutated data (see Materials And Methods section for details).
Top ranked GSEA annotation terms in SLEPR Analysis Result for Comparison of NGT vs DM2 in human DM2 data.
| GSEA_TermName | GSEA_TermID | Combined Ranking | Permutated P_Val | FDR_q_Val |
| Mitochondrial genes | HUMAN_MITODB_6_2002 | 2.33012855 | 0.0000025 | 0.003 |
| Mitochondrial genes | MITOCHONDRIA | 1.19377468 | 0.0000975 | 0.0585 |
| PROPANOATE_METABOLISM | PROPANOATE_METABOLISM | 0.87576134 | 0.00029917 | 0.11966667 |
| PGC related genes | PGC | 0.85709181 | 0.00031583 | 0.09475 |
| Oxidative Phosphorylation | MOOTHA_VOXPHOS | 0.83877765 | 0.0003425 | 0.0822 |
| Genes involved in electron transport | ELECTRON_TRANSPORT_CHAIN | 0.80602325 | 0.0003875 | 0.0775 |
| RIBOSOMAL | ||||
| PROTEINS | RIBOSOMAL_PROTEINS | 0.73004401 | 0.00051333 | 0.088 |
| Downregulated in correlation with overt Alzheimer's Disease, in the CA1 region of the hippocampus | ALZHEIMERS_DISEASE_DN | 0.43387443 | 0.00222333 | 0.3335 |
Microarray data for human type 2 diabetes mellitus (DM2) study [23] was re-analyzed with SLEPR method to compare either NGT versus DM2 (DM2 as Exclusion/Background class in SLEPR). One-side MADe method was used in SLEPR for selection of highly expressed genes as sample-level differentiated genes in the comparison of NGT versus DM2. 1000 permutations were performed. Combined_Ranking: Combined ranking scores for terms; Permutated_P_Val: p-value of terms derived from permutated data; FDR_q_Val: FDR of terms derived from permutated data (see Materials And Methods section for details).
Top ranked GSEA annotation terms in GSEA Analysis Result for Comparison of NGT vs DM2+IGT in human DM2 data.
| GSEA_TermName | GSEA_TermID | ES | NES | NOM p-val | FDR q-val | FWER p-val |
| Upregulated by expression of mutant MeCP2 (Rett syndrome) vs. wt MeCP2 in fibroblasts | RETT_UP | 0.6723 | 1.830771 | 0.001** | 0.93558 | 0.56 |
| Granule constituents expressed during mouse promyelocytic cell line cell differentiation | LIAN_MYELOID_DIFF_GRANULE | 0.5568 | 1.791872 | 0.002012 | 0.700521 | 0.683 |
| Genes highly associated with medulloblastoma treatment failure | POMEROY_MD_TREATMENT GOOD_VS_POOR_DN | 0.5402 | 1.720124 | 0.016327 | 0.971015 | 0.875 |
| Regulated by UV-B light in normal human epidermal keratinocytes, cluster 8 | UVB_NHEK3_C8 | 0.4442 | 1.685022 | 0.001** | 1 | 0.931 |
| Genes involved in electron transport | ELECTRON_TRANSPORT_CHAIN | 0.3457 | 1.680579 | 0.060797 | 0.863601 | 0.937 |
| Down-regulated in mycosis fungoides (cutaneous T-cell lymphoma) T-cells resistant to IFN-alpha, compared to sensitive parent cell line | IFNALPHA_RESIST_DN | 0.5722 | 1.648226 | 0.014315 | 0.980393 | 0.968 |
| Regulated by UV-B light in normal human epidermal keratinocytes, cluster 6 | UVB_NHEK3_C6 | 0.4663 | 1.642358 | 0.018987 | 0.885222 | 0.979 |
Microarray data for human type 2 diabetes mellitus (DM2) study [23] was re-analyzed with GSEA method (using the newest version (v 2.0.1) GSEA tool [22]) to compare NGT versus IGT+DM2. 1000 permutations were performed. **: p-value(s) is adjusted to p = 1/number of permutation for p = 0 according to GSEA manual.
Top ranked GSEA annotation terms in GSEA Analysis Result for Comparison of NGT vs DM2 in human DM2 data.
| GSEA_TermName | GSEA_TermID | ES | NES | NOM p-val | FDR q-val | FWER p-val |
| Up-regulated following treatment with Et-743 at any timepoint in at least 8 of 11 sarcoma cell lines | ET743_SARCOMA_UP | 0.4778 | 1.851262 | 0.0019455 | 0.835225 | 0.416 |
| Oxidative Phosphorylation | MOOTHA_VOXPHOS | 0.6187 | 1.844164 | 0.02 | 0.460014 | 0.446 |
| Target genes down regulated by p53 | KANNAN_P53_DN | 0.6835 | 1.84351 | 0.005848 | 0.309037 | 0.447 |
| Genes involved in electron transport | ELECTRON_TRANSPORT_CHAIN | 0.594 | 1.840375 | 0.0217822 | 0.240362 | 0.46 |
| p-regulated in liver, heart or kidney tissue from hypophysectomized rats (lacking growth hormone), compared to normal controls | HYPOPHYSECTOMY_RAT_UP | 0.5129 | 1.776757 | 0.0040404 | 0.3918 | 0.664 |
| Upregulated by expression of mutant MeCP2 (Rett syndrome) vs. wt MeCP2 in fibroblasts | RETT_UP | 0.544 | 1.697153 | 0.0179641 | 0.740229 | 0.855 |
| Genes down-regulated by LIF treatment (10 ng/ml, overnight) in AtT20 cells | ABBUD_LIF_DN | 0.569 | 1.682703 | 0.0226804 | 0.727445 | 0.872 |
| OXIDATIVE_PHOSPHORYLATION | OXIDATIVE_PHOSPHORYLATION | 0.5167 | 1.653935 | 0.0459082 | 0.832665 | 0.908 |
Microarray data for human type 2 diabetes mellitus (DM2) study [23] was re-analyzed with GSEA method (using the newest version (v 2.0.1) GSEA tool [22]) to compare NGT versus DM2. 1000 permutations were performed. **: p-value(s) is adjusted to p = 1/number of permutation for p = 0 according to GSEA manual.
Figure 2Heatmap of enrichment scores in all samples from NGT versus IGT and DM2 for the top 17 ranked terms of SLEPR result listed in Table S2.
The enrichment scores, which in general derived from Fisher's exact test p-value using formula (−Log10(p-value)), were floored to 0 if the ListHits<2 or p-value>0.05. The rows of the heatmap are the ranked terms in the same order as in Table S2 (Top 7 of them shown in Table 1) from top to bottom with the higher ranks at the top. The gradient of red color in heatmap indicated the enrichment levels.
Comparison of SLEPR results using a series of cutoffs for selection of sample-level differentiated genes.
| TermName (TermID) | 0.5XMADe | 0.75XMADe | 1XMADe | 1.25XMADe | 1.5XMADe | |||||
| p-Value* | Rank | p-Value* | Rank | p-Value* | Rank | p-Value* | Rank | p-Value* | Rank | |
| Mitochondrial genes (HUMAN_MITODB_6_2002) | 0.150 | 153 | 3.08E-04 | 1 | 1.16E-04 | 1 | 6.58E-05 | 2 | 1.03E-05 | 1 |
| Mitochondrial genes (MITOCHONDRIA) | 0.170 | 179 | 6.38E-04 | 2 | 1.34E-04 | 2 | 2.22E-05 | 1 | 2.31E-05 | 2 |
| Genes involved in electron transport (ELECTRON_TRANSPORT_CHAIN) | 4.49E-03 | 7 | 8.82E-04 | 4 | 3.31E-04 | 3 | 1.75E-04 | 3 | 7.61E-05 | 4 |
| Oxidative Phosphorylation (MOOTHA_VOXPHOS) | 1.01E-02 | 9 | 6.91E-04 | 3 | 4.83E-04 | 4 | 2.22E-04 | 4 | 7.53E-05 | 3 |
| PGC related genes (PGC) | 1.05E-02 | 10 | 2.93E-03 | 6 | 1.47E-03 | 5 | 5.05E-04 | 6 | 6.21E-04 | 7 |
Comparison of permutated p-values and ranks of SLEPR analysis results of for human type 2 diabetes mellitus (DM2) data [23] with IGT and DM2 samples as Exclusion/Background class by using a series of different cutoffs (0.5, 0.75,1,1.25,1.5× of MADe) for selection of highly expressed genes as sample-level differentiated genes. The one-side MADe method selecting for highly expressed genes was used (see Materials And Methods section). The top 5 ranked terms from the case of 1XMADe are selected for comparison of permutated p-values and ranks for the same terms with those derived from other cutoffs. 1000 permutations were performed.
Top ranked terms in SLEPR analysis result for testis-related tissues in GNF dataset.
| GSEA_TermName | GSEA_TermID | Combined_Ranking | Permutated_P_Val | FDR_q_Val |
| Genes expressed specifically in human testis tissue | HUMAN_TISSUE_TESTIS | 25.62201 | 7.29E-07 | 0.001 |
| Testis related genes curated from the GNF normal tissue compendium | TESTIS_EXPRESSED_GENES | 24.08417 | 7.29E-07 | 0.0005 |
| Cell-cycle dependent genes regulated following exposure to serum in a variety of human fibroblast cell lines | SERUM_FIBROBLAST_CELLCYCLE | 6.951284 | 9.56E-05 | 0.0437 |
| 50 top ranked SAM-defined over-expressed genes in each subgroup__PR | ZHAN_MM_CD138_PR_VS_REST | 6.523997 | 0.000106 | 0.0365 |
Microarray data for GNF human tissues study [41] was analyzed with SLEPR method to compare testis related tissues (total 5 testis related tissues selected: Testis Germ Cells, Testis Interstitial, Testis Leydig Cell, Testis Seminiferous Tubule, Testis) (as Inclusion/Target tissues in SLEPR) to the other tissues (total 74 tissues) (as Exclusion/Background class in SLEPR) as for GSEA annotated gene sets. In SLEPR method, one-side MADe method for selection of highly expressed genes as sample-level differentiated genes was used. 1000 permutations were performed.
Top ranked GO Biological Processes terms in SLEPR analysis result for testis-related tissues in GNF dataset.
| TermName | Term | Combined_Ranking | Permutated_P_Val | FDR_q_Val |
| spermatogenesis | GO∶0007283 | 17.21279 | 6.71E-07 | 0.0005 |
| male gamete generation | GO∶0048232 | 17.21279 | 6.71E-07 | 0.0005 |
| sexual reproduction | GO∶0019953 | 16.27175 | 6.71E-07 | 0.000333 |
| reproduction | GO∶0000003 | 16.17275 | 6.71E-07 | 0.00025 |
| gametogenesis | GO∶0007276 | 13.95838 | 6.71E-07 | 0.0002 |
| nuclear division | GO∶0000280 | 4.461007 | 0.000178 | 0.044167 |
| M phase | GO∶0000279 | 4.361697 | 0.000195 | 0.041429 |
Microarray data for GNF human tissues study [41] was analyzed with SLEPR method to compare testis related tissues (total 5 testis related tissues selected: Testis Germ Cells, Testis Interstitial, Testis Leydig Cell, Testis Seminiferous Tubule, Testis) as Inclusion/Target tissue to the rest of the tissues (74 other tissues) as Exclusion/Background class for testis-specific GO biological processes, using one-side MADe method for selection of highly expressed genes as sample-level differentiated genes. 1000 permutations were performed.
Top ranked terms in GSEA analysis result for testis-related tissues in GNF dataset.
| GSEA_TermName | GSEA_TermID | ES | NES | NOM p-val | FDR q-val | FWER p-val |
| 50 most interesting genes upregulated by the combination of TSA and DAC in at least one of four pancreatic cancer cell lines, but not in normal (HPDE) cells | TSADAC_PANC50_UP | 0.571787 | 2.3071 | 0.001** | 0.002 | 0.004 |
| Up-regulated 2 hours after VEGF treatment in human umbilical vein endothelial cells | VEGF_HUVEC_2HRS_UP | 0.562051 | 2.01 | 0.001** | 0.077 | 0.121 |
| Testis related genes curated from the GNF normal tissue compendium | TESTIS_EXPRESSED_GENES | 0.868249 | 1.8032 | 0.001** | 0.271 | 0.417 |
| Genes expressed specifically in human testis tissue | HUMAN_TISSUE_TESTIS | 0.904183 | 1.7737 | 0.001** | 0.255 | 0.474 |
Microarray data for GNF human tissues study [41] was analyzed with GSEA method [23] (Using the newest version (v2.0.1) GSEA tool [24]) to compare testis related tissues (total 5 testis related tissues selected: Testis Germ Cells, Testis Interstitial, Testis Leydig Cell, Testis Seminiferous Tubule, Testis) to the other tissues (total 74 tissues) as for GSEA annotated gene sets. 1000 permutations were performed. **: p-value(s) is adjusted to p = 1/number of permutation for p = 0 according to GSEA manual.
Figure 3Heatmap of enrichment scores of sample-level differentiated genes of all samples in human GNF tissue dataset
[41] for the top 8 ranked GO biological process terms shown in Table 7. The enrichment scores, which in general derived from Fisher's exact test p-value using formula (−Log10(p-value)), were floored to 0 if the ListHits<2 or p-value>0.05. The rows of the heatmap are the terms and columns are tissue samples from the dataset. The gradient of red color in heatmap indicated the enrichment levels.
Top ranked GO Biological Processes terms of SLEPR result for muscle-related tissues in GNF dataset.
| TermName | Term | Combined_Ranking | Permutated_P_Val | FDR_q_Val |
| muscle contraction | GO∶0006936 | 5.035783 | 0.000195 | 0.289 |
| muscle development | GO∶0007517 | 3.766843 | 0.000438 | 0.325 |
| cell-cell signaling | GO∶0007267 | 3.115529 | 0.000755 | 0.373667 |
| morphogenesis | GO∶0009653 | 2.51698 | 0.001281 | 0.47525 |
| development | GO∶0007275 | 2.509388 | 0.001292 | 0.3836 |
| organogenesis | GO∶0009887 | 2.434454 | 0.001375 | 0.34 |
| cell motility | GO∶0006928 | 2.410239 | 0.00141 | 0.298857 |
| striated muscle contraction | GO∶0006941 | 1.781977 | 0.002482 | 0.4605 |
| regulation of body fluids | GO∶0050878 | 1.602872 | 0.003043 | 0.501778 |
| angiogenesis | GO∶0001525 | 1.40132 | 0.003982 | 0.5909 |
Microarray data for GNF human tissues study [41] was analyzed with SLEPR method to compare muscle related tissues (total 4 muscle related tissues selected: Cardiac Myocytes, Heart, Skeletal Muscle, Smooth Muscle) as Inclusion/Target tissue to the rest of tissues (74 other tissues) as Exclusion/Background class for testis-specific GO biological processes, using one-side MADe method for selection of highly expressed genes as sample-level differentiated genes (see Materials And Methods section). 1000 permutations were performed.
Top ranked terms in SLEPR analysis results for a well-studied prostate cancer dataset.
| GSEA_TermName | GSEA_TermID | Combined_Ranking | Permutated_P_Val | FDR q_Val |
| Sixty-seven genes commonly upregulated in cancer relative to normal tissue, from a meta-analysis of the OncoMine gene expression database | CANCER_NEOPLASTIC_META_UP | 7.312481 | 5.43E-07 | 0.000111 |
| Genes downregulated in response to glutamine starvation | PENG_GLUTAMINE_DN | 6.578343 | 5.43E-07 | 6.25E-05 |
| These are genes identified by simple statistical criteria as differing in their mRNA expresssion between WTs and fetal kidneys LOW | LI_FETAL_VS_WT_KIDNEY_UP | 5.961163 | 5.43E-07 | 0.00004 |
| Genes 2fold upregulated by insulin | ROME_INSULIN_2F_UP | 5.905375 | 5.43E-07 | 3.7E-05 |
| Genes up-regulated by MYC in P493-6 (B-cell) | SCHUMACHER_MYC_UP | 5.762658 | 5.43E-07 | 3.45E-05 |
| Genes highly expressed in hepatocellular carcinoma with poor survival. | HCC_SURVIVAL_GOOD_VS_POOR_DN | 5.449725 | 5.43E-07 | 2.94E-05 |
A well studied prostate cancer dataset [42], [43] was analyzed with SLEPR method. SLEPR method used the 25 tumor samples as Inclusion/Target class compared to 9 nonmalignant tissues as Exclusion/Background class for GSEA annotated terms, by two-sided MADe method for selection of both highly and lowly expressed genes as sample-level differentiated genes. Only top ranked functional terms were shown in the table (the chromosomal location-based annotation terms were taken out for simplicity). 1000 permutations were performed.
Top ranked terms in GSEA analysis results for a well-studied prostate cancer dataset.
| GSEA_TermName | GSEA_TermID | ES | NES | NOM p-val | FDR q-val | FWER p-val |
| Genes up-regulated by MYC in P493-6 (B-cell) | SCHUMACHER_MYC_UP | 0.6769 | 2.043 | 0.001* | 0.031466 | 0.039 |
| Genes overexpressed in polyclonal plasmablastic cells (PPCs), mature plasma cells isolated from tonsils (TPCs), and mature plasma cells isolated from bone marrow (BMPCs), as compared to B cells purified from peripheral blood (PBBs) and tonsils (TBCs) | TARTE_PC | 0.6101 | 1.954 | 0.0019193 | 0.07472 | 0.144 |
| Genes downregulated in response to glutamine starvation | PENG_GLUTAMINE_DN | 0.4842 | 1.939 | 0.001** | 0.058919 | 0.167 |
| Genes downregulated in response to leucine starvation | PENG_LEUCINE_DN | 0.5067 | 1.911 | 0.001** | 0.066378 | 0.238 |
| Downregulated in HL-60 promyeloid leukemic cells after treatment with the cytotoxic drug cantharidin | CANTHARIDIN_DN | 0.6218 | 1.903 | 0.001* | 0.059042 | 0.26 |
| Genes downregulated in response to rapamycin starvation | PENG_RAPAMYCIN_DN | 0.5000 | 1.891 | 0.0020576 | 0.060076 | 0.309 |
| Sixty-seven genes commonly upregulated in cancer relative to normal tissue, from a meta-analysis of the OncoMine gene expression database | CANCER_NEOPLASTIC_META_UP | 0.6634 | 1.883 | 0.0042105 | 0.058265 | 0.335 |
A well studied prostate cancer dataset [42], [43] was analyzed with GSEA method (Using the newest version (v2.0.1) GSEA tool [24]). GSEA compared the tumor samples to the nonmalignant tissues. Only top ranked functional terms were shown in the table (the chromosomal location-based annotation terms were taken out for simplicity, the full ranked list can be obtained from table S7). 1000 permutations were performed. **: p-value(s) is adjusted to p = 1/number of permutation for p = 0 according to GSEA manual.