| Literature DB >> 18582382 |
Linqi Zhou1, Xiaotu Ma, Fengzhu Sun.
Abstract
BACKGROUND: Identifying factors affecting gene expression variation is a challenging problem in genetics. Previous studies have shown that the presence of TATA box, the number of cis-regulatory elements, gene essentiality, and protein interactions significantly affect gene expression variation. Nonetheless, the need to obtain a more complete understanding of such factors and how their interactions influence gene expression variation remains a challenge. The growth rates of yeast cells under several DNA-damaging conditions have been studied and a gene's toxicity degree is defined as the number of such conditions that the growth rate of the yeast deletion strain is significantly affected. Since toxicity degree reflects a gene's importance to cell survival under DNA-damaging conditions, we expect that it is negatively associated with gene expression variation. Mutations in both cis-regulatory elements and transcription factors (TF) regulating a gene affect the gene's expression and thus we study the relationship between gene expression variation and the number of TFs regulating a gene. Most importantly we study how these factors interact with each other influencing gene expression variation.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18582382 PMCID: PMC2474594 DOI: 10.1186/1752-0509-2-54
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Gene expression variation is negatively correlated with protein interaction degree. The x-axis represents protein physical interaction degree, and the y-axis represents gene expression variation. A) The LOWESS fit to the gene expression variation. B) Bar-plot of the expression variation of all the genes with a given protein interaction degree together with the linear regression fit to the gene expression variation in relation to the interaction degree. The linear coefficient β = -0.0302, R2 = 1.41%, and p-value = 9.704e-14. The red dots are the mean expression variation of the genes given the protein physical interaction (PPI) degree. The bar represents the standard deviation of the gene expression variation given PPI degree. To keep the same scale for gene expression variation across the figures, the range of the y-axis is -2.5 to 0.5.
Figure 2The effect of essentiality, toxicity degree, and protein interaction degree on gene expression variation. A) Bar-plot of the expression variation of all the genes with a given toxicity degree together with the linear regression fit to the expression variation of the genes in relation to the toxicity degree. The linear coefficient β = -0.0629, R2 = 0.75%, and the p-value = 4.73e-08. B) The mean expression variation and the linear regression fit to the expression variation with respect to PPI degree for non-essential genes stratified according to toxicity degree and for the essential genes. The β values are -0.0172, -0.0230, -0.0304, -0.0164 and -0.0460 for toxicity degree 0, 1, 2, and 3, and the essential genes, respectively. The corresponding p-values are 0.0313, 0.0141, 0.0011, 0.2248 and 0.0013, respectively. R2 is 0.31%, 0.75%, 2.75%, 0.8% and 4.41%, respectively. The labels are the same as those in Figure 1.
Figure 3The effect of TATA box, number of TFs, and toxicity degree on gene expression variation. A) The relationship between expression variation and toxicity degree stratified by the presence/absence of the TATA box (R2 = 2.59%, β = -0.1674, p-value = 7.174e-06 for the TATA-containing gene set; R2 = 0.13%, β = -0.0234, p-value = 0.0413 for the non-TATA-containing gene set). B) The relationship between expression variation and the number of TFs up to 25 (R2 = 8.28%, β = 0.0654, p-value < 2.2e-16). The labels are the same as those in Figure 1.
The relationship between gene expression variation and the number of cis-elements.
| Gene expression data-set | Linear regression | R2 | |
| β | p value | ||
| Ca_Na exposure | 0.0589 | 1.17e-06 | 1.25% |
| Chemostat | 0.0188 | 0.0067 | 0.39% |
| Environmental Stress | 0.0507 | 5.48e-07 | 1.33% |
| Oxidative Stress | 0.0025 | 0.725 | 0.01% |
| Gene expression data-set | Linear regression | R2 | |
| β | p value | ||
| Ca_Na exposure | 0.0539 | 4.45e-08 | 1.33% |
| Chemostat | 0.0161 | 0.0046 | 0.36% |
| Environmental Stress | 0.0516 | 3.56e-10 | 1.74% |
| Oxidative Stress | 0.0013 | 0.826 | 0 |
| Gene expression data-set | Linear regression | R2 | |
| β | p value | ||
| Ca_Na exposure | 0.0393 | 1.05e-07 | 1.02% |
| Chemostat | 0.0151 | 0.0004 | 0.45% |
| Environmental Stress | 0.0346 | 1.68e-08 | 1.15% |
| Oxidative Stress | 0.0028 | 0.526 | 0.01% |
Cis-elements are identified with three different criteria according to their conservation in two other species. β is the linear coefficient in the linear model, the p-value is related to the null hypothesis that β ≠ 0 versus β = 0, and the R2 is the fraction of variation explained by the number of cis-elements.
Analysis of four factors and their interactions affecting expression variation using stepwise selection with AIC.
| variable | Ca_Na_exposure | Chemostat | Environmental Stress | Oxidative Stress | ||||||||
| model | p value | R2 | model | p value | R2 | model | p value | R2 | model | p value | R2 | |
| x1 | √ | 0.4252 | 0.05% | √ | 0.1582 | 0.16% | √ | 0.0647 | 0.28% | √ | 0.1075 | 0.21% |
| x2 | √ | 0.3721 | 0.06% | √ | 0.0635 | 0.28% | √ | 0.0861 | 0.24% | 0.8729 | 0.002% | |
| x3 | √ | 1.16E-20 | 6.76% | √ | 4.59E-09 | 2.73% | √ | < 2e-16 | 13.42% | √ | 7.30E-09 | 2.65% |
| x4 | √ | 3.22E-12 | 3.83% | √ | 7.96E-10 | 3.00% | √ | < 2e-16 | 6.92% | √ | 0.8841 | 0.002% |
| x1*x2 | ||||||||||||
| x1*x3 | ||||||||||||
| x1*x4 | √ | 0.0581 | 0.29% | √ | 0.0100 | 0.53% | √ | 0.0348 | 0.36% | |||
| x2*x3 | √ | 0.0422 | 0.33% | |||||||||
| x2*x4 | √ | 0.0226 | 0.41% | √ | 0.0036 | 0.68% | √ | 0.0108 | 0.53% | |||
| x3*x4 | √ | 0.0039 | 0.67% | √ | 0.0185 | 0.45% | √ | 6.2e-09 | 2.71% | |||
| R2model | 16.36% | 12.73% | 22.39% | 4.43% | ||||||||
The four main factors include protein interaction degree (x1), toxicity degree (x2: treat essential genes as ones with toxicity degree 4), number of TFs (x3), and the presence of TATA box (x4: 1-TATA containing genes, 0-non-TATA containing genes). The protein interaction data used in this analysis is based on the MIPS dataset. The column marked with "√" indicates inclusion in the final linear model. The multiple linear regression is based on the final linear model, respectively. The p-value is related to the null hypothesis that β ≠ 0 versus β = 0. R2 is the variation explained by the model and each independent variable, respectively.
The effect of two factors on expression variation stratified by the presence/absence of TATA box.
| TATA dataset | ||||||||||||
| variable | Ca_Na_exposure | Chemostat | Environmental Stress | Oxidative Stress | ||||||||
| β | p-value | R2 | β | p-value | R2 | β | p-value | R2 | β | p-value | R2 | |
| x2 | -0.1436 | 0.0024 | 1.95% | -0.0764 | 0.0034 | 1.81% | -0.0510 | 0.156 | 0.43% | 0.0081 | 0.778 | 0.02% |
| x3 | 0.0312 | 5.4e-10 | 7.90% | 0.0168 | 1.5e-09 | 7.50% | 0.0283 | 2.6e-13 | 10.79% | 0.0138 | 4.8e-06 | 4.37% |
| R2 (model) | 9.47% | 8.97% | 11.09% | 4.39% | ||||||||
| Non-TATA dataset | ||||||||||||
| variable | Ca_Na_exposure | Chemostat | Environmental Stress | Oxidative Stress | ||||||||
| β | p-value | R2 | β | p-value | R2 | β | p-value | R2 | β | p-value | R2 | |
| x2 | -0.0167 | 0.402 | 0.06% | -0.0026 | 0.8189 | 0.005% | 0.0258 | 0.086 | 0.26% | 0.0038 | 0.747 | 0.009% |
| x3 | 0.0502 | < 2e-16 | 8.70% | 0.0300 | < 2e-16 | 9.56% | 0.0468 | < 2e-16 | 12.6% | 0.0181 | 4.4e-10 | 3.40% |
| R2 (model) | 8.85% | 9.62% | 12.67% | 3.39% | ||||||||
The linear model that includes the toxicity degree (x2) and the number of TFs (x3) is built for each of the four gene expression datasets, respectively. R2 is the variation explained by each independent factor and the model, respectively. β is the linear coefficient in the linear model, and the p-value is related to the null hypothesis that β ≠ 0 versus β = 0.
The effect of toxicity degree on expression variation stratified by the set of environmental stress response (ESR).
| Gene group | Ca_Na_exposure | Chemostat | Environmental Stress | Oxidative Stress | ||||
| β | p-value | β | p-value | β | p-value | β | p-value | |
| ESR | -0.0710 | 0.0053 | -0.0314 | 0.0532 | -0.0176 | 0.3640 | 0.0026 | 0.8708 |
| Non-ESR | -0.0848 | 2.09e-13 | -0.0354 | 1.11e-07 | -0.0579 | 1.28e-11 | -0.0091 | 0.2086 |
The linear model is built for each of the four gene expression datasets, respectively. β is the linear coefficient in the linear model and the p-value is related to the null hypothesis that β ≠ 0 versus β = 0.