| Literature DB >> 22873695 |
Nathan L Tintle1, Alexandra Sitarik, Benjamin Boerema, Kylie Young, Aaron A Best, Matthew Dejongh.
Abstract
BACKGROUND: Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22873695 PMCID: PMC3462729 DOI: 10.1186/1471-2105-13-193
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Characteristics of the set of microarrays
| | | | | | | | | | | | | |
| 7989 | 55 | GEO | 799 | 117 | 819 | 113 | 1626 | 200 | 139 | 158 | 3971 | |
| | | | | | | | | | | | | |
| 4778 | 41 | GEO | 695 | 84 | 615 | 79 | 1008 | 193 | 93 | 98 | 2865 | |
| | | | | | | | | | | | | |
| 1849 | 104 | GEO | 651 | 106 | 564 | 89 | 295 | 257 | 72 | 90 | 2124 | |
| | | | | | | | | | | | | |
| 2215 | 407 | GEO | 604 | 79 | 499 | 83 | 441 | 180 | 86 | 92 | 2064 | |
| | | | | | | | | | | | | |
| 2750 | 852 | PD | 697 | 86 | 604 | 90 | 521 | 360 | 98 | 118 | 2574 | |
| 1880 | 78 | GEO | 571 | 65 | 477 | 79 | 257 | 238 | 64 | 70 | 1830 | |
| 1849 | 89 | GEO | 556 | 70 | 488 | 81 | 248 | 256 | 57 | 73 | 1829 | |
| | | | | | | | | | | | | |
| 731 | 43 | GEO | 332 | 46 | 252 | 42 | 80 | 80 | 14 | 14 | 860 | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| 8198 | 195 | GEO | 457 | 111 | 547 | 103 | 1672 | 421 | 138 | 156 | 3605 | |
| 4084 | 119 | GEO | 767 | 150 | 724 | 103 | 757 | 367 | 110 | 128 | 3106 | |
| 1242 | 100 | GEO | 445 | 64 | 367 | 59 | 251 | 135 | 14 | 15 | 1350 | |
| | | | | | | | | | | | | |
| 1521 | 56 | GEO | 572 | 76 | 426 | 77 | 248 | 174 | 42 | 42 | 1657 | |
| | | | | | | | | | | | | |
| 4329 | 907 | M3D | 847 | 115 | 788 | 98 | 752 | 399 | 160 | 169 | 3328 | |
| 2001 | 72 | GEO | 653 | 77 | 673 | 81 | 365 | 328 | 88 | 91 | 2356 | |
| 5598 | 176 | GEO | 823 | 133 | 835 | 104 | 1042 | 389 | 136 | 154 | 3616 | |
| 4050 | 245 | M3D | 746 | 112 | 641 | 95 | 705 | 328 | 109 | 125 | 2861 | |
| 4515 | 42 | GEO | 719 | 109 | 602 | 93 | 920 | 418 | 144 | 165 | 3170 | |
| 3581 | 10934 | 1600 | 9921 | 1469 | 11188 | 4723 | 1564 | 1767 | 43166 | |||
1 For which array data is available.
2M3D=Many Microbe Microarrays Database [18], PD=The Paul Dunman Laboratory, GEO=Gene Expression Omnibus [17].
3 Gene Ontology [13], BP=Biological Process, CC=Cellular Component, MF=Molecular Function, KEGG=Kyoto Encyclopedia of Genes and Genomes [14], MO= Microbes Online Predicted Operons using the method of Price et al. [16,21], SEED [15,22], SS=SEED subsystems, Scenario=SEED scenario, Path=SEED path.
Example of overlap among gene sets related to arginine biosynthesis
| fig|83333.1.peg.269 | Ornithine carbamoyltransferase (EC 2.1.3.3) | + | M | M | + | + | None |
| fig|83333.1.peg.2440 | N-succinyl-L,L-diaminopimelatedesuccinylase (EC 3.5.1.18) | + | + | M | M | M | with peg.2339 |
| fig|83333.1.peg.2771 | N-acetylglutamate synthase (EC 2.3.1.1) | + | + | M | + | + | None |
| fig|83333.1.peg.3116 | Argininosuccinate synthase (EC 6.3.4.5) | + | + | + | + | + | None |
| fig|83333.1.peg.3181 | Arginine pathway regulatory protein ArgR, repressor of argregulon | + | M | M | M | M | None |
| fig|83333.1.peg.3294 | Acetylornithine and N-succinyl-L,L-diaminopimelateaminotransferase (EC 2.6.1.11 and EC 2.6.1.17) | + | M | M | + | + | None |
| fig|83333.1.peg.3877 | Acetylornithinedeacetylase (EC 3.5.1.16) | + | + | M | + | + | None |
| fig|83333.1.peg.3878 | N-acetyl-gamma-glutamyl-phosphate reductase (EC 1.2.1.38) | + | + | M | + | + | with peg.3879 and peg.3880 |
| fig|83333.1.peg.3879 | Acetylglutamate kinase (EC 2.7.2.8) | + | + | M | + | + | with peg.3878 and peg.3880 |
| fig|83333.1.peg.3880 | Argininosuccinatelyase (EC 4.3.2.1) | + | + | + | + | + | with peg.3878 and peg.3879 |
| fig|83333.1.peg.4164 | Ornithine carbamoyltransferase (EC 2.1.3.3) | + | M | M | + | + | None |
| Number of other genes in the set1 | 0 | 2 | 28 | 34 | 0 |
1Genes in the set that don’t appear in this table.
+ means present in the set.
M means Missing (not present in the set).
None means that the gene is not present in any predicted operon.
Median (and maximum) of set sizes by organism and source
| | | | | | | | | |
| 8 (3661) | 5 (1574) | 5 (4487) | 15 (192) | 3 (42) | 7 (57) | 5 (28) | 5 (28) | |
| | | | | | | | | |
| 7 (2141) | 7.5 (1037) | 5 (2580) | 13 (53) | 3 (32) | 5 (74) | 4 (17) | 4 (17) | |
| | | | | | | | | |
| 7 (1268) | 11 (580) | 4 (1498) | 10 (66) | 3 (18) | 6 (47) | 4 (13) | 5 (13) | |
| | | | | | | | | |
| 6 (1039) | 6 (468) | 4 (1242) | 10 (63) | 2 (32) | 5 (39) | 3 (15) | 3 (15) | |
| | | | | | | | | |
| 8 (1413) | 4.5 (711) | 5 (1596) | 11 (107) | 2 (23) | 5 (35) | 4 (12) | 4 (12) | |
| 6 (1064) | 16 (554) | 5 (1229) | 8 (97) | 4 (53) | 5 (32) | 3 (10) | 4 (10) | |
| 6 (965) | 11.5 (503) | 5 (1138) | 9 (66) | 4 (46) | 5 (34) | 4 (10) | 4(10) | |
| | | | | | | | | |
| 5 (350) | 9 (196) | 5 (407) | 5 (52) | 5 (66) | 4 (32) | 4.5 (9) | 4.5 (9) | |
| | | | | | | | | |
| | | | | | | | | |
| 7 (1578) | 3 (703) | 5 (2184) | 24 (354) | 3 (36) | 8 (66) | 4 (37) | 4 (37) | |
| 7 (2076) | 4 (1054) | 4 (2480) | 15 (202) | 3 (52) | 6 (58) | 3 (14) | 3 (14) | |
| 7 (549) | 10 (315) | 4 (622) | 6 (53) | 3 (29) | 4 (32) | 3.5 (13) | 4 (13) | |
| | | | | | | | | |
| 6 (815) | 11.5 (418) | 4.5 (932) | 10 (52) | 4 (29) | 5 (37) | 3 (9) | 3 (9) | |
| | | | | | | | | |
| 7 (2370) | 4 (1308) | 4 (2665) | 14 (180) | 2 (28) | 6 (63) | 3.5 (18) | 3 (18) | |
| 7 (1207) | 10 (690) | 4 (1412) | 11 (124) | 4 (33) | 5 (33) | 3 (12) | 3 (12) | |
| 8 (2981) | 4 (1626) | 4 (3511) | 17 (200) | 2 (31) | 7 (58) | 4 (26) | 4 (26) | |
| 7 (1959) | 5.5 (1024) | 5 (2237) | 14 (105) | 2 (28) | 7 (60) | 4 (15) | 4 (14) | |
| 7 (1618) | 5 (896) | 4 (1771) | 12 (87) | 3 (46) | 6 (83) | 4 (15) | 4 (15) | |
| 7 (3661) | 6.5 (1626) | 4 (4487) | 12 (354) | 3 (66) | 6 (83) | 4 (37) | 4 (37) | |
1 The minimum set size in all cases was 2.
2Gene Ontology [13], BP=Biological Process, CC=Cellular Component, MF=Molecular Function, KEGG=Kyoto Encyclopedia of Genes and Genomes [14], MO= Microbes Online Predicted Operons using the method of Price et al. [16,21], SEED [15,22], SS=SEED subsystems, Scenario=SEED scenario, Path=SEED path.
Pearson correlations between consistency metrics
| 1.0 | | | | | | | |
| 0.95 | 1.0 | | | | | | |
| 0.38 | 0.40 | 1.0 | | | | | |
| 0.36 | 0.39 | 0.99 | 1.0 | | | | |
| -0.15 | -0.21 | -0.30 | -0.28 | 1.0 | | | |
| -0.15 | -0.21 | -0.29 | -0.28 | 0.98 | 1.0 | | |
| -0.30 | -0.42 | -0.43 | -0.43 | 0.64 | 0.62 | 1.0 | |
Mean levels of consistency metrics by source (rank out of the 8 sources in parentheses)
| Gene Ontology | BP | 0.10 (7) | 1.23 (5) | 0.43 (6) | 0.37 (7) |
| CC | 0.10 (5) | 1.26 (7) | 0.50 (3) | 0.42 (4) | |
| MF | 0.10 (6) | 1.24 (6) | 0.42 (8) | 0.40 (6) | |
| KEGG | | 0.10 (8) | 1.28 (8) | 0.43 (7) | 0.31 (8) |
| MO: Predicted Operons | 0.06 (1) | 0.92 (1) | 0.57 (1) | 0.56 (1) | |
| SEED | SS | 0.09 (4) | 1.19 (4) | 0.47 (5) | 0.41 (5) |
| Scenarios | 0.08 (3) | 1.06 (3) | 0.49 (4) | 0.49 (2) | |
| Paths | 0.08 (2) | 1.05 (2) | 0.50 (2) | 0.48 (3) | |
a. Smaller values mean more consistent sets, because there is less variability within the set.
b. Larger values mean more consistent sets, because the sets contain genes with higher correlations.
Figure 1Gene set consistency by gene set size across the eight gene set sources.a. Assessing gene set consistency using s. 1 Smaller values of s indicate more consistent sources. b. Assessing gene set consistency using s. 1 Smaller values of s indicate more consistent sources. c. Assessing gene set consistency using corr. 1Larger values of corr indicate more consistent sources.1d. Assessing gene set consistency using PC11. 1 Larger values of PC1 indicate more consistent sources
Summary of optimal set sources by statistical method
| Rank ordered differential expression; Gene set analysis on differential expression | Predicted Operons, Paths, Scenarios | |
| Rank ordered absolute expression; On/off calling algorithms; Flux-balance analysis; Gene set analysis on absolute expression | Predicted Operons, Paths, Scenarios | |
| Correlation of pairs of genes; K-means clustering; Regulatory Network Inference; Operon prediction | Predicted Operons, Gene Ontology: Cellular Component Hierarchy, Paths, Scenarios |