| Literature DB >> 21779320 |
Kristof Engelen1, Qiang Fu, Pieter Meysman, Aminael Sánchez-Rodríguez, Riet De Smet, Karen Lemmens, Ana Carolina Fierro, Kathleen Marchal.
Abstract
BACKGROUND: Microarrays are the main technology for large-scale transcriptional gene expression profiling, but the large bodies of data available in public databases are not useful due to the large heterogeneity. There are several initiatives that attempt to bundle these data into expression compendia, but such resources for bacterial organisms are scarce and limited to integration of experiments from the same platform or to indirect integration of per experiment analysis results. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2011 PMID: 21779320 PMCID: PMC3136457 DOI: 10.1371/journal.pone.0020938
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
An overview of the content of the three expression compendia that can be accessed through COLOMBOS.
|
|
|
| |
|
| 4295 | 4105 | 4525 |
|
| 1429 | 259 | 717 |
|
| GEO, AE | GEO | GEO |
|
| 1483 | 265 | 723 |
|
| 84 | 9 | 25 |
|
| 35 | 13 | 9 |
|
| 6.1% | 6.40% | 3.90% |
|
| 242 | 67 | 77 |
|
| 56 | 24 | 23 |
|
| |||
|
| EcoCyc | BioCyc | BioCyc |
|
| RegulonDB | DBTBS | |
|
| EcoCyc | BioCyc | BioCyc |
|
| UniProt GOA | UniProt GOA | UniProt GOA |
Finding potential novel Fur targets –a case study.
| Locus tag | Name | Description | Operon | Known | COLOMBOS | Meta-analysis | Evidence |
| b1681 |
| SufBCD Fe-S cluster scaffold |
| + | + | Fur, OxyR, IHF, lscR | |
| b1683 |
| SufBCD Fe-S cluster scaffold |
| + | + | Fur, OxyR, IHF, lscR | |
| b2392 |
| Manganese transport protein |
| + | + | + | Fur, MntR |
| b2673 |
| Glutaredoxin-like protein |
| + | + | + | Fur, NrdR |
| b2674 |
| Not annotated |
| + | + | + | Fur, NrdR |
| b2675 |
| Ribonucleoside-Pi reductase 2 α |
| + | + | + | Fur, NrdR |
| b2676 |
| Ribonucleoside- Pi reductase 2 β |
| + | + | + | Fur, NrdR |
| b4291 |
| Fe3+ dicitrate transport protein |
| + | + | Fur, CRP, PdhR | |
| b0468 |
| Inner membrane protein |
| + | Predicted | ||
| b0804 |
| PKHD-type hydroxylase |
| + | Predicted; Fur dependent expression | ||
| b1018 |
| UPF0409 protein |
| + | Predicted; functional in related strain | ||
| b1452 |
| Uncharacterized protein |
| + | + | Fur dependent expression | |
| b1494 |
| Probable zinc protease |
| + | Potential operon | ||
| b1495 |
| Uncharacterized protein |
| + | Predicted | ||
| b1705 |
| Not annotated |
| + | Predicted; Fur dependent expression | ||
| b2211 |
| ATP-binding ABC transporter |
| + | |||
| b3070 |
| Uncharacterized protein |
| + | + | Predicted | |
| b3337 |
| Bacterioferritin-associated ferredoxin |
| + | Indirect RhyB regulation | ||
| b3410 |
| Ferrous iron transport protein C |
| + | TU | ||
| b4366 |
| Transcriptional activator protein |
| + |
Conceptual comparison of COLOMBOS with similar initiatives.
| COLOMBOS | M3D | GXA | GeneVestigator | |
|
| ||||
|
| Cross-platform compendia | Single platform compendia (Affymetrix) | Experiment centered (ArrayExpress meta-analysis) | Single platform compendia (Affymetrix) |
|
| Prokaryotes (3) | Prokaryotes (2) and a eukaryote | Eukaryotes (10) | Eukaryotes (9) and a prokaryote |
|
| Incroporation of multiple species-specific DBs | Referal to BioCyc, SGD | EBI | None |
|
| Microarray annotation and condition ontology | Microarray annotation | Microarray annotation and condition ontology | Microarray annotation |
|
| Interactive visualization, expression analysis | Visualization, expression analysis | Interactive visualization, expression analysis | Interactive visualization, expression analysis |
|
| ||||
|
| Multiple queries | Single query | Single query | Single query (limited) |
|
| Gene IDs; functional or structural characteristics | Gene IDs | Gene/protein IDs | Gene IDs |
|
| Experiment, annotation, or ontology | Experiment, annotation | Experiment, annotation, or ontology | Annotation |
|
| Analysis results and/or entire compendia | Analysis results and/or entire compendia | Only experiments indirectly (through ArrayExpress) | Analysis results (limited) |
Compendium: a data matrix (genes in rows, microarrays in columns) combining expression measurements from different experiments (an experiment being a set of microarrays submitted to the public DBs as such, implying that they were performed by the same lab and on the same technological platform). Single- vs. cross-platform: combining data from the same technological platform is relatively easy as the same preprocessing methodology can be employed; COLOMBOS is unique in combining data from different platforms using a specialized homogenization pipeline. Meta-analysis: expression data are not combined directly but experiments are analyzed separately where after the results are compared.
The biological conditions measured on a microarray are described with a set of formal terms which are organized into a higher level ontology. Such an ontology facilitates querying for related experiments or conditions.
Single versus multiple queries: query results can be retained in the COLOMBOS user workspace where they can be organized and structured, into larger ‘analysis projects’. This allows for integrative across-query analysis where relations between single query results can be explored, e.g. by combining or differentiating single query results.
Figure 1Screenshots of COLOMBOS data analysis components.
The bottom part shows the two main panels of the data analysis page. The left hand workspace panel is always visible, containing an overview of the modules and the main analysis controls. The content of the right hand data analysis panel depends on the actions of the user. In this case it shows the overview page for a module selected in the workspace. This overview page not only provides some general information on the selected module, but also serves as a guide for further examination and analysis steps. These are illustrated at the top part of the figure and include visualization, content editing (demonstrated is the removal of genes based on expression profile similarity), splitting the module based on expression values (shown here in the gene direction), and exploration of gene and contrast information.