| Literature DB >> 25267386 |
Basel Abu-Jamous, Rui Fa, David J Roberts, Asoke K Nandi1.
Abstract
BACKGROUND: The scale and complexity of genomic data lend themselves to analysis using sophisticated mathematical techniques to yield information that can generate new hypotheses and so guide further experimental investigations. An ensemble clustering method has the ability to perform consensus clustering over the same set of genes from different microarray datasets by combining results from different clustering methods into a single consensus result.Entities:
Mesh:
Year: 2014 PMID: 25267386 PMCID: PMC4262117 DOI: 10.1186/1471-2105-15-322
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The pipeline of steps in the Bi-CoPaM method.
Budding yeast microarray datasets
| ID | GEO accession | Year | N | Description | Ref. |
|---|---|---|---|---|---|
| D01 | GSE8799 | 2008 | 15 | Two mitotic cell-cycles (w/t). | [ |
| D02 | GSE8799 | 2008 | 15 | Two mitotic cell-cycles (mutated cyclins). | [ |
| D03 | E-MTAB-643* | 2011 | 15 | Response to an impulse of glucose. | [ |
| D04 | E-MTAB-643* | 2011 | 15 | Response to an impulse of ammonium. | [ |
| D05 | GSE54951 | 2014 | 6 | Response of | - |
| D06 | GSE25002 | 2014 | 9 | Osmotic stress response and treatment of transformants expressing the | - |
| D07 | GSE36298 | 2013 | 6 | Mutations of OPI1, INO2, and INO4 under carbon-limited growth conditions. | [ |
| D08 | GSE50728 | 2013 | 8 | 120-hour time-course during fermentation. | - |
| D09 | GSE36599 | 2013 | 5 | Stress adaptation and recovery. | [ |
| D10 | GSE47712 | 2013 | 6 | Combinations of the yeast mediator complex’s tail subunits mutations. | [ |
| D11 | GSE21870 | 2013 | 4 | Combinations of mutations in DNUP60 and DADA2. | - |
| D12 | GSE38848 | 2013 | 6 | Various strains under aerobic or anaerobic growth. | [ |
| D13 | GSE36954 | 2012 | 6 | Response to mycotoxic type B trichothecenes. | [ |
| D14 | GSE33276 | 2012 | 6 | Response to heat stress for three different strains. | - |
| D15 | GSE40399 | 2012 | 7 | Response to various perturbations (heat, myriocin treatment, and lipid supplement). | - |
| D16 | GSE31176 | 2012 | 6 | W/t, | [ |
| D17 | GSE26923 | 2012 | 5 | Varying levels of GCN5 F221A mutant expression. | [ |
| D18 | GSE30054 | 2012 | 31 | CEN.PK122 oscillating for two hours. | - |
| D19 | GSE30051 | 2012 | 32 | CEN.PL113-7D oscillating for two hours. | [ |
| D20 | GSE30052 | 2012 | 49 | CEN.PL113-7D oscillating for four hours. | [ |
| D21 | GSE32974 | 2012 | 15 | About 5 hours of cell-cycle (w/t). | [ |
| D22 | GSE32974 | 2012 | 15 | About 4 hours of cell-cycle (mutant lacking Cdk1 activity). | [ |
| D23 | GSE24888 | 2011 | 5 | Untreated yeast versus yeasts treated with | - |
| D24 | GSE19302 | 2011 | 6 | Response to degron induction for w/t and nab2-td mutant. | [ |
| D25 | GSE33427 | 2011 | 5 | Untreated w/t, and wt/t, | [ |
| D26 | GSE17716 | 2011 | 7 | Effect of overexpression and deletion of MSS11 and FLO8. | [ |
| D27 | GSE31366 | 2011 | 4 | Presence and absence of mutli-inhibitors for parental and tolerant strains. | - |
| D28 | GSE26171 | 2011 | 4 | Response to patulin and/or ascorbic acid. | [ |
| D29 | GSE22270 | 2011 | 4 | PY1 and Met30 strains in room temperature or 35 C. | - |
| D30 | GSE29273 | 2011 | 4 | Time-series during yeast second fermentation. | - |
| D31 | GSE29353 | 2011 | 5 | Different haploid strains growing in low glucose medium. | [ |
| D32 | GSE21571 | 2011 | 8 | Different combinations of mutations in HTZ1, SWR1, SWC2, and SWC5. | [ |
| D33 | GSE17364 | 2010 | 4 | Untreated w/t and Slt2-deficient yeasts, or treated with sodium arsenate for two hours. | [ |
| D34 | GSE15352 | 2010 | 8 | 24-hour time-course of yeast grown under a low temperature (10 C). | [ |
| D35 | GSE15352 | 2010 | 8 | 24-hour time-course of yeast grown under a normal temperature (28 C). | [ |
| D36 | GSE15352 | 2010 | 8 | 24-hour time-course of yeast grown under a high temperature (37 C). | [ |
| D37 | GSE16799 | 2009 | 21 | UC-V irradiation of w/t, | [ |
| D38 | GSE16346 | 2009 | 4 | BY474 cells grown to mid-log under presence versus absence of L-carnitine and/or H2O2. | - |
| D39 | GSE14227 | 2009 | 10 | Two hours of wild-type yeast growth. | [ |
| D40 | GSE14227 | 2009 | 9 | Two hours of | [ |
The first column shows the unique identifier which is used hereinafter to refer to each of these datasets. The second to the sixth columns respectively show the Gene Expression Omnibus (GEO) accession number, the year in which the dataset was published, number of time-points or conditions after replicate summarisation, dataset description, and reference.
*D03 and D04 have accession numbers in the European Bioinformatics Institute (EBI) repository rather than GEO accession numbers.
Numbers of genes included in each of the 16 clusters at all of the considered δ values
| Tightness | δ | Cluster | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | ||
| Complementary | 0.0 | 1085 | 1457 | 610 | 655 | 592 | 268 | 303 | 175 | 175 | 154 | 143 | 92 | 51 | 49 | 29 | 10 |
| 0.1 | 516 | 394 | 84 | 105 | 79 | 12 | 9 | 3 | 1 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | |
| 0.2 | 344 | 47 | 17 | 14 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.3 | 257 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.4 | 164 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.5 | 79 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.6 | 22 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Tightest | 1.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Figure 2Average MSE values and the number of genes included in the tightest six clusters at all of the adopted δ values. (A) Average MSE values and (B) number of genes included.
Figure 3Average expression profiles for the clusters C1 and C2 at DTB with the respective δ values of 0.3 and 0.2, based on all of the forty datasets. Each column of plots represents a cluster and each row represents a dataset.
Figure 4Upstream sequence analysis for the cluster C1. (A), (B), and (C) show the motifs C1-1, C1-2, and C1-3 respectively and their highly matched known transcription factors’ binding sites. (D) is a Venn diagram that shows the numbers of genes’ upstream sequences in C1 that contain each of these three motifs.
Figure 5Upstream sequence analysis for the cluster C2. (A) and (B) show the motifs C2-1 and C2-2 respectively and their highly matched known transcription factors’ binding sites. (C) is a Venn diagram that shows the numbers of genes’ upstream sequences in C2 that contain each of these two motifs.
Most enriched GO terms in the clusters C1 and C2 at various levels of tightness
| GO process | Back. frequency | δ = 0.1 | δ = 0.2 | δ = 0.3 | δ = 0.4 | δ = 0.5 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Freq. | P-val. | Freq. | P-val. | Freq. | P-val. | Freq. | P-val. | Freq. | P-val. | |||
| C1 | Ribosome biogenesis | 411/7167 | 210/516 | E-140 | 183/344 | E-146 | 153/257 | E-129 | 124/164 | E-123 | 65/79 | E-66 |
| Biological process unknown* | 1189/6334 | 46/516 | 26/344 | 17/257 | 9/164 | 4/79 | ||||||
| C2 | Response to oxidative stress | 101/7167 | 23/394 | E-6 | 6/47 | E-3 | ||||||
| Oxidation-reduction process | 174/7167 | 33/394 | E-7 | 3/47 | >E-1 | |||||||
| Biological process unknown* | 1189/6334 | 114/394 | 12/47 |
*The enrichment of the “biological process unknown” term has been found by the GO Slim Mapper tool rather than the GO Term Finder tool. Note that the p-value is only provided by the GO Term Finder tool.
Figure 6The distribution of the 47 genes included in C2 at DTB with δ = 0.2 based on the biological processes with which they have been associated and over the major cellular components. Note that any single gene might be found in multiple cellular components, and thus the total number of gene markers in the Figure does not directly correspond to the total number of genes considered.
Figure 7Genetic interaction network between the genes in the APha-RiB regulon (C2 at DTB with δ = 0.2). A sub-network of eight genes is highlighted and the types of genetic interactions between its genes are labelled. This is the same sub-network which is highlighted in Figure 8. A genetic interaction exists between two genes if the impact of perturbing both genes is different from the additive impact of perturbing each gene individually. A positive genetic interaction is that in which perturbing both genes results in a higher fitness, i.e. a weaker defect, than the additive defect of perturbing each one individually. On the other hand, a negative genetic interaction exists when the defect caused by perturbing both genes is stronger than the additive defect caused by perturbing each gene individually. A similar profile (S) genetic interaction indicates high correlation between both genes’ genetic interaction profiles with the rest of the genes.
Figure 8Protein-protein physical interaction network between the products of the genes in the APha-RiB regulon (C2 at DTB with δ = 0.2). Each node represents a gene, and a link between any two nodes represents the existence of a physical interaction between the products of those genes, i.e. between the proteins which are encoded by those genes. A relatively highly connected sub-network of eight genes is highlighted for more discussion in the main text; this is the same sub-network highlighted in Figure 7.
Figure 9Venn diagram showing the size of overlap between our novel APha-RiB cluster (C2 at DTB with δ = 0.2) and the subsets of genes with expression reported to be positively correlated with stress and negatively correlated with growth in three previous studies [22, 23, 60] .
Figure 10Regulation of the RRB cluster (C1) and the APha-RiB cluster (C2). Ticked dashed links have been detected in this study and were also previously identified in the literature while dashed links with question marks have been only detected in this study. However, most of the previous studies consider one or few stress conditions in contrast to “generic stress conditions”. Notice that the cluster “C2 APha-RiB” is novel and that the links from the literature that point at it are based on the assumption that it is a stress response module.