| Literature DB >> 23586463 |
Edward Y Chen1, Christopher M Tan, Yan Kou, Qiaonan Duan, Zichen Wang, Gabriela Vaz Meirelles, Neil R Clark, Avi Ma'ayan.
Abstract
BACKGROUND: System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23586463 PMCID: PMC3637064 DOI: 10.1186/1471-2105-14-128
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Enrichr workflow. Enrichr receives lists of human or mouse genes as input. It uses 35 gene-set libraries to compute enrichment. The enrichment results are interactively displayed as bar graphs, tables, grids of terms with the enriched terms highlighted, and networks of enriched terms.
List of gene set libraries ranked by number of terms
| | | ||
|---|---|---|---|
| Down-regulated CMAP | 6100 | 8695 | 100 |
| Up-regulated CMAP | 6100 | 11251 | 100 |
| HMDB Metabolites | 3906 | 3729 | 47.1495 |
| GeneSigDB | 2139 | 23729 | 126.6947 |
| Human CoR Complexome | 1796 | 10231 | 158.2778 |
| CORUM | 1673 | 2741 | 4.6934 |
| Cancer Cell Line Encyclopedia | 967 | 15797 | 176.2079 |
| GO Biological Process | 941 | 7683 | 78.4676 |
| MSigDB Computational | 858 | 10061 | 106.4207 |
| Genome Browser PWMs | 615 | 13362 | 275.1447 |
| MGI Mammalian Phenotype Top 4 | 476 | 10496 | 201.7101 |
| Kinase Enrichment Analysis KEA | 474 | 4533 | 36.7089 |
| ENCODE TF ChIP-seq | 434 | 19851 | 1064.055 |
| GO Molecular Function | 402 | 8469 | 121.8284 |
| Chromosome Location | 386 | 32740 | 84.8187 |
| PPI Hub Proteins | 385 | 16487 | 247.2286 |
| Histone Modifications ChIP-seq | 356 | 21921 | 1232.129 |
| TRANSFAC/JASPAR PWMs | 335 | 42887 | 1249.63 |
| Pfam InterPro Domains | 311 | 7588 | 35.3408 |
| BioCarta Pathways | 249 | 1295 | 17.6506 |
| ChIP Enrichment Analysis ChEA | 240 | 42574 | 1455.7 |
| microRNA TargetScan | 222 | 7504 | 154.6036 |
| GO Cellular Component | 205 | 7325 | 172.1268 |
| KEGG Pathways | 200 | 4128 | 48.44 |
| WikiPathways | 199 | 2854 | 38.8191 |
| MSigDB Oncogenic Signatures | 189 | 11250 | 165.709 |
| OMIM Expanded | 187 | 2178 | 88.9198 |
| Mouse Gene Atlas | 96 | 20686 | 660.1354 |
| NCI-60 Cancer Cell Lines | 93 | 12232 | 343.3333 |
| OMIM Disease | 90 | 1759 | 25.0667 |
| VirusMINT | 85 | 851 | 14.8824 |
| Human Gene Atlas | 84 | 15381 | 449.7619 |
| SILAC Phosphoproteomics | 84 | 7732 | 341.869 |
| Reactome Pathways | 78 | 3185 | 72.5128 |
| MGI Mammalian Phenotype Top 3 | 71 | 10406 | 717.4366 |
Figure 2Validation of enrichment scoring methods. (a) Histogram of overall appearance of genes in gene sets within all the gene-set libraries implemented in Enrichr plotted on a log-log scale; b-c) Random gene lists are used to obtain enrichment analysis ranking using the Fisher exact test. Average ranks with their associated standard deviations are plotted against gene list length from the ChEA gene set library (b) and the GO Biological Process gene-set library (c); d-e) Ranks of specific transcription factors in enrichment analyses using the ChEA gene-set library by the various enrichment analysis scoring methods. Lists of differentially expressed genes after knockdown of the transcription factors with entries in the ChEA gene-set library were used as input; (d) Average rank for those factors comparing the three scoring methods; (e) histogram of cumulative ranks for the three methods.
Rank of entries from the ChEA gene-set library using the three scoring methods implemented in Enrichr given input of lists of up or down regulated genes indentified from studies that profiled gene expression after knockdown or knockout of the same transcription factors
| Nanog | Up | 16518401 | 1,4,5,16, | 2,4,15,18,22, | 1,5,12,16,18, |
| | | | 28,33,62,144 | 28,33,116 | 28,37,117 |
| Nanog | Down | 16518401 | 5,11,14,16, | 1,3,4,20,41, | 1,6,12,15,18, |
| | | | 39,58,78,92 | 54,61,64 | 56,70,73 |
| Pou5f1 | Up | 16518401 | 3,11,12,18, | 1,4,12,23, | 1,8,14,15, |
| | | | 27,71,81 | 33,35,36 | 21,50,54 |
| Pou5f1 | Down | 16518401 | 32,64,78,156, | 1,65,92,121, | 23,52,90,127, |
| | | | 176,181,204 | 160,165,188 | 171,176,192 |
| Nanog | Up | 16767105 | 3,7,12,18,38, | 1,3,11,17,21, | 3,5,9,12,25, |
| | | | 46,56,113 | 23,26,69 | 29,36,80 |
| Nanog | Down | 16767105 | 18,28,79,89, | 4,17,21,33,44, | 23,25,35,48, |
| | | | 92,102,160,164 | 83,139,157 | 60,86,142,186 |
| Pou5f1 | Up | 16767105 | 1,9,18,23,31, | 2,5,10,20, | 1,2,16,20, |
| | | | 82,120,183 | 30,34,79 | 23,55,88 |
| Pou5f1 | Down | 16767105 | 25,44,124,166, | 47,49,60,131, | 43,44,74,134, |
| | | | 167,180,216 | 139,169,200 | 147,153,177 |
| Sox2 | Up | 16767105 | 2,10,35,59,61, | 11,15,26,36, | 3,9,26,44, |
| | | | 70,121 | 68,71,103 | 58,80,123 |
| Sox2 | Down | 16767105 | 5,44,50,130, | 10,72,85,106, | 1,61,82,108, |
| | | | 139,149,176 | 110,140,151 | 116,166,177 |
| Sox2 | Up | 17515932 | 2,14,15,41,50, | 6,27,30,35, | 2,7,24,39,44, |
| | | | 61,82 | 44,49,55 | 45,57 |
| Sox2 | Down | 17515932 | 8,19,68,93,117, | 6,29,73,95, | 4,17,84,103, |
| | | | 164,216 | 124,146,210 | 132,151,168 |
| klf4 | Up | 18264089 | 1,27,31,183 | 6,22,31,199 | 1,23,31,210 |
| klf4 | Down | 18264089 | 61,71,163,200 | 78,85,190,222 | 78,79,209,219 |
| Zfp281 | Up | 18757296 | 3,24 | 3,6 | 3,6 |
| Zfp281 | Down | 18757296 | 60,159 | 63,138 | 64,147 |
| chd1 | Up | 19587682 | 126 | 106 | 107 |
| chd1 | Down | 19587682 | 231 | 214 | 125 |
| Tbx3 | Up | 20139965 | 110 | 96 | 96 |
| Tbx3 | Down | 20139965 | 93 | 70 | 76 |
Figure 3Global view of signatures created using genes that are highly expressed in cancer cell lines and their matching human tissues. Enriched terms are highlighted on each grid based on the level of significance using various gene-set libraries, each represented by a different color. Circles are used to highlight specific clusters of enriched terms.