| Literature DB >> 24130853 |
Kyrylo Bessonov1, Christopher J Walkey, Barry J Shelp, Hennie J J van Vuuren, David Chiu, George van der Merwe.
Abstract
Analyzing time-course expression data captured in microarray datasets is a complex undertaking as the vast and complex data space is represented by a relatively low number of samples as compared to thousands of available genes. Here, we developed the Interdependent Correlation Clustering (ICC) method to analyze relationships that exist among genes conditioned on the expression of a specific target gene in microarray data. Based on Correlation Clustering, the ICC method analyzes a large set of correlation values related to gene expression profiles extracted from given microarray datasets. ICC can be applied to any microarray dataset and any target gene. We applied this method to microarray data generated from wine fermentations and selected NSF1, which encodes a C2H2 zinc finger-type transcription factor, as the target gene. The validity of the method was verified by accurate identifications of the previously known functional roles of NSF1. In addition, we identified and verified potential new functions for this gene; specifically, NSF1 is a negative regulator for the expression of sulfur metabolism genes, the nuclear localization of Nsf1 protein (Nsf1p) is controlled in a sulfur-dependent manner, and the transcription of NSF1 is regulated by Met4p, an important transcriptional activator of sulfur metabolism genes. The inter-disciplinary approach adopted here highlighted the accuracy and relevancy of the ICC method in mining for novel gene functions using complex microarray datasets with a limited number of samples.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24130853 PMCID: PMC3793944 DOI: 10.1371/journal.pone.0077192
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Illustration of the Correlation Clustering using an example of the graph G with {+} and {-} edges colored in black and red respectively.
In graph G the gray circles refer to nodes (e.g. gene names) and connecting lines to edges (E) with {+} and {-} values. Green and blue circles represent putative clusters.
Figure 2Illustration of expression profile of x and y with following patterns: A) x and y have both high similarity based on absolute difference and r(x,y); B) x and y have low similarity based on absolute difference but high r(x,y); C) x and y have very low similarity based on absolute difference but high negative r(x,y).
Figure 3Fermentation profile and the impact of NSF1 on the controlled expression of sulfur pathway genes during Riesling fermentation.
A) Fermentation profile of the M2 and M2 nsf1∆ in Riesling grape must measured by the amount of culture weight lost due CO2 production. The arrow shows the datum after 85% sugar fermentation, the time at which DEGs were determined. Error bars represent standard deviation (SD). B) The heatmap of the sulfur related genes from the MFD dataset microarray expression data corresponding to fermentation of 85% of the sugars. The expression values were normalized for each gene by converting them into z-scores (absolute expression value – mean expression across all samples / SD across all samples) in order to ensure median expression value of zero for each gene across all samples. M2 triplicate samples are represented in columns A-C and those for M2 nsf1∆ in columns D-F.
Figure 4The overall ICC method workflow culminating with formation of the largest Interconnected Correlation Gene Cluster (ICGC).
0.98, and z-scores distribution with relatively “light tails” based on kurtosis value, the correlation threshold was lowered to r<-0.95 or r>0.95 which is slightly outside of the classical statistical two-tailed α=0.10 threshold. While being aware of the risk of getting a higher number of false positive hits at lower threshold, our goal is to get some true positives in presence of false positives. In addition, the obtained ICGCs for NSF1 at -0.95 >r>0.95 and -0.98>r>0.98 had 77.3% overlap in gene composition. This shows a low threshold impact on the final results with the ICGC preserving the initial core. Thus, selection of the threshold is mainly based on the desired size of the ICGC and biological context. We recommend calculating the percent overlap between ICGCs under different thresholds to judge its impact on reliability and robustness of the final results. We recommend selection of the threshold between -0.95 >r>0.95 and -0.98>r>0.98.
Figure 5Distribution of the z-scores corresponding to all genes (a total of 5667 genes) with the empirical probability density function plotted as a red line.
The probability (p) corresponds to probability density function of finding a particular z-score at a particular value. The calculated z-scores were derived from r values. These values were obtained from a comparison of the NSF1 expression profile to that of every other gene in the dataset (r). The blue bars correspond to critical regions at 0.95
error of skewness
(SES) was found to be 0.03253.Selected genes from the largest ICGC by category from the MFD dataset.
|
|
|
|
|---|---|---|
| stress response |
| 0.0235 |
| cell cycle control |
| 0.013 |
| carbohydrate metabolism/energy metabolism |
| 0.014 |
| ribosome assembly/ protein synthesis |
| 0.040 |
| transcription / translation regulation |
| 0.022 |
| protein degradation |
| 0.030 |
| vesicle trafficking |
| 0.0038 |
| cell wall related proteins |
| 0.0074 |
| cell nucleus trafficking |
| 0.0020 |
| sulfur metabolism |
| 0.0012 |
Note: ‘Bolded’ and ‘non-bolded’ genes are up-regulated (PCC > 0) and down-regulated (PCC < 0), respectively, at the end of fermentation (85% sugars fermented, which represents fermentation progression) with respect to the 24 h time point. A complete list of MFD ICGC genes is provided in Table S1. The average p-value corresponds to the average p-value of GO terms linked to the category genes (see Methods).
Representative genes found in the largest ICGC from the VFD dataset.
|
|
|
|
|---|---|---|
| Vesicle trafficking |
| 0.0053 |
| Post-translational protein modification |
| 0.011 |
| Stress response |
|
|
| Sulfur Metabolism |
| 0.0152 |
| Ribosome Assembly / Transcription / Translation |
| 0.036 |
Note: ‘bolded’ and ‘non-bolded’ genes are up-regulated and down-regulated at the end of fermentation (85% sugars fermented, which represents fermentation progression) with respect to the 24 h time point. A complete list of VFD ICGC genes is provided in Table S2. The average p-value corresponds to the average p-value of GO terms linked to the category genes (see Methods).
Figure 6NSF1 was needed for the controlled transcription of some sulfur pathway genes under defined sulfur conditions.
The indicated genes were assayed in M2, M2 nsf1∆, met4∆ and nsf1∆met4∆ mutants under sulfur rich (S+) and (S-) limiting conditions. Asterisk (*) denotes statistically significant differences in gene expression at 95% significance level according to one sample t-test with population mean = 1 (no change in gene expression between assayed conditions).
Figure 7Nsf1 localized to the nucleus under limiting sulfur conditions.
M2 NSF1-GFP cells transformed with pNIC96-mCherry-hphMX were pre-cultured in YNB S- medium to early log phase and shifted to fresh YNB S+ or YNB S- medium. Cells were monitored by fluorescence microscopy at the indicated times. The arrow (→) represents media shift.
Figure 8Nsf1 was not nuclear under rich sulfur conditions.
M2 NSF1-GFP cells transformed with pNIC96-mCherry-hphMX were pre-cultured in YNB S- medium to early log phase and shifted to fresh YNB S+ or YNB S- medium. Cells were monitored by fluorescence microscopy at the indicated times. The arrow (→) represents medium shift.