| Literature DB >> 24884810 |
Justin Andrew Christiaan Powell1.
Abstract
BACKGROUND: Despite the widespread use of high throughput expression platforms and the availability of a desktop implementation of Gene Set Enrichment Analysis (GSEA) that enables non-experts to perform gene set based analyses, the availability of the necessary precompiled gene sets is rare for species other than human.Entities:
Mesh:
Year: 2014 PMID: 24884810 PMCID: PMC4038065 DOI: 10.1186/1471-2105-15-146
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Work flow of GO2MSIG. Input data is shown in filled blue boxes, algorithmic steps are shown in unfilled boxes.
Figure 2Gene set size distribution. Red bars show the distribution of gene set sizes in the MSigDB c5 collection for sets from 10 to 200 genes in size. Blue bars show the distribution for the equivalent collection generated by GO2MSIG. The 10 to 200 gene size range includes over 89% of the gene sets in each collection.
Sizes of gene set collections built from the NCBI gene2go table
| | | | ||
|---|---|---|---|---|
| 234826 | 196 | 48 (40) | | |
| 212042 | 1288 | 218 (55) | 221 (60) | |
| 3702 | 27942 | 2032 (129) | 1951 (85) | |
| 227321 | 7326 | 1152 (69) | 35 (31) | |
| 198094 | 5097 | 465 (81) | 466 (81) | |
| 9913 | 5567 | 2634 (67) | 1285 (58) | |
| 6239 | 12642 | 1505 (84) | 1098 (81) | |
| 195099 | 1826 | 315 (62) | 316 (63) | |
| 246194 | 2609 | 363 (64) | 362 (65) | |
| 227377 | 1798 | 271 (67) | 272 (67) | |
| 214684 | 3427 | 969 (68) | | |
| 7955 | 16957 | 2201 (83) | 1342 (68) | |
| 243164 | 1583 | 265 (72) | 265 (71) | |
| 352472 | 7694 | 1184 (86) | 801 (72) | |
| 7227 | 12560 | 2750 (83) | 2459 (78) | |
| 205920 | 1090 | 221 (56) | 223 (59) | |
| 511145 | 2518 | 198 (112) | | |
| 9031 | 2104 | 1460 (64) | 643 (52) | |
| 243231 | 3269 | 347 (82) | 348 (82) | |
| 9606 | 18106 | 5808 (82) | 4403 (81) | |
| 265669 | 2811 | 384 (79) | 385 (79) | |
| 243233 | 2902 | 377 (72) | 378 (72) | |
| 10090 | 24667 | 5615 (79) | 3643 (74) | |
| 222891 | 928 | 204 (54) | 206 (56) | |
| 39947 | 4266 | 30 (18) | 2 (14) | |
| 36329 | 1770 | 212 (65) | 219 (67) | |
| 223283 | 3950 | 436 (73) | 439 (77) | |
| 10116 | 18599 | 5746 (79) | 3081 (75) | |
| 246200 | 4250 | 497 (85) | 496 (86) | |
| 559292 | 6244 | 2005 (75) | 1849 (74) | |
| 284812 | 5276 | 1627 (82) | 1118 (67) | |
| 211586 | 4272 | 418 (79) | 419 (79) | |
| 999953 | 1073 | 157 (74) | 147 (80) | |
| 9606 | 18106 | | 1422 (69)2 | |
| 9606 | 18106 | 5383 (80) | ||
Gene sets were built from the NCBI gene2go annotation table and GO ontology downloaded on 13th September 2013. Default settings were used which filter out gene sets containing fewer than 10 or more than 700 genes. Organisms were omitted when the biggest collection contained fewer than 30 sets. In cases where use of all evidence codes reduces the number of gene sets compared with using high quality codes only, this is due to maximum set size filtering. 1For comparison the currently available MSigDB GO based human collection and a human set built from the annotation file for the Affymetrix HG-U133 Plus 2.0 array are also shown. 2Set number and sizes were calculated for the MSigDB collection with filtering as above (the full collection contains 1454 gene sets).