| Literature DB >> 15773999 |
Gajendra P S Raghava1, Joon H Han.
Abstract
BACKGROUND: A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. On other hand there are only few studies on relation between expression level and composition of nucleotide/protein sequence, using expression data. There is a need to understand why particular genes/proteins express more in particular conditions. In this study, we analyze 3468 genes of Saccharomyces cerevisiae obtained from Holstege et al., (1998) to understand the relationship between expression level and amino acid composition.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15773999 PMCID: PMC1083413 DOI: 10.1186/1471-2105-6-59
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The average expression level of genes according to the length of protein.
| 25–100 | 59 | 15.58 |
| 100–200 | 561 | 8.39 |
| 200–400 | 1168 | 3.71 |
| 400–800 | 1179 | 2.51 |
| 800–1200 | 327 | 1.85 |
| 1200 | 168 | 2.13 |
Figure 1An example plot between expression of genes and length of their protein sequence on one subset of reference data (692 genes). It is generated using LGEpred server option "Standard plot" using option "gene expression on the X-axis".
The correlation between percent composition of residues and gene expression level. The residues which have more than +0.2 correlation are shown in by underline and residue having correlation (negative) -0.15 are shown by bold letter. Second column have percent amino acid compositions in whole yeast genome.
| C | 1.26 | -0.003 | -0.102 | 0.030 | 0.000 |
| E | 6.54 | -0.061 | -0.045 | -0.105 | -0.088 |
| F | 4.42 | -0.122 | -0.109 | -0.093 | -0.107 |
| H | 2.23 | -0.052 | -0.056 | -0.131 | -0.048 |
| I | 6.56 | -0.136 | -0.116 | -0.091 | -0.128 |
| K | 7.35 | 0.166 | 0.158 | -0.117 | 0.182 |
| M | 2.08 | -0.087 | -0.098 | -0.003 | -0.094 |
| P | 4.37 | -0.064 | -0.057 | 0.039 | -0.086 |
| Q | 3.96 | -0.052 | -0.061 | -0.072 | -0.065 |
| T | 5.92 | 0.008 | 0.003 | 0.279 | -0.036 |
| V | 5.56 | 0.269 | 0.298 | 0.214 | 0.294 |
| W | 1.04 | -0.072 | -0.077 | -0.043 | -0.055 |
| Y | 3.38 | -0.009 | -0.018 | -0.030 | 0.018 |
The number of genes and the average expression level of genes which have percent composition of positively correlated residues (e.g. Ala, Gly, Arg & Val) in different bin/range.
| Percent | Ala | Gly | Arg | Val | ||||
| Composition | Genes* | E. Level** | Genes | E. Level | Genes | E. Level | Genes | E. Level |
| 1 – 3 | 163 | 2.59 | 339 | 2.96 | 558 | 3.77 | 106 | 2.93 |
| 3 – 5 | 1074 | 2.11 | 1180 | 2.43 | 1741 | 2.92 | 941 | 2.48 |
| 5 – 7 | 1212 | 2.80 | 1193 | 3.81 | 802 | 2.86 | 1498 | 3.00 |
| 7 – 9 | 626 | 5.36 | 523 | 6.55 | 201 | 8.03 | 705 | 5.37 |
| 9 – 11 | 246 | 9.41 | 158 | 7.82 | 58 | 20.34 | 171 | 13.22 |
| 11 – 13 | 64 | 15.25 | 36 | 12.79 | 27 | 17.56 | 24 | 16.45 |
| 13 – 15 | 31 | 15.76 | 15 | 12.73 | 9 | 29.74 | 5 | 22.04 |
| >15 | 35 | 15.78 | 7 | 13.54 | 3 | 23.90 | 2 | 19.55 |
* Total number of genes in this range
** Average expression level of genes in this range
The number of genes and the average expression level of genes (only genes having more than 100 residues) which have percent composition of positively correlated residues (e.g. Ala, Gly, Arg & Val) in different bin/range.
| 1 – 3 | 152 | 2.15 | 333 | 2.82 | 543 | 3.70 | 543 | 3.70 |
| 3 – 5 | 1063 | 2.05 | 1166 | 2.39 | 1728 | 2.86 | 1728 | 2.86 |
| 5 – 7 | 1204 | 2.75 | 1176 | 3.61 | 798 | 2.85 | 798 | 2.85 |
| 7 – 9 | 613 | 4.86 | 510 | 6.14 | 194 | 7.52 | 194 | 7.52 |
| 9 – 11 | 242 | 9.19 | 155 | 7.39 | 55 | 19.58 | 55 | 19.58 |
| 11 – 13 | 61 | 15.19 | 33 | 12.64 | 23 | 17.30 | 23 | 17.30 |
| 13 – 15 | 30 | 15.81 | 15 | 12.73 | 5 | 29.86 | 5 | 29.86 |
| > 15 | 32 | 16.52 | 7 | 13.54 | 2 | 12.85 | 2 | 12.85 |
* Total number of genes in this range
** Average expression level of genes in this range
The number of genes and the average expression level of negatively correlated residues.
| 1 – 3 | 260 | 9.81 | 20 | 14.06 | 249 | 8.69 | 31 | 12.52 |
| 3 – 5 | 847 | 3.97 | 114 | 12.19 | 1195 | 5.21 | 264 | 7.85 |
| 5 – 7 | 1505 | 3.35 | 467 | 6.20 | 1323 | 2.96 | 907 | 5.39 |
| 7 – 9 | 657 | 3.10 | 1033 | 4.05 | 478 | 1.72 | 1202 | 3.41 |
| 9 – 11 | 117 | 3.15 | 1118 | 3.09 | 148 | 1.83 | 645 | 2.06 |
| 11 – 13 | 32 | 2.38 | 526 | 2.42 | 34 | 3.15 | 221 | 2.77 |
| 13 – 15 | 10 | 2.86 | 151 | 1.98 | 11 | 0.95 | 103 | 1.98 |
| 15 | 5 | 1.22 | 32 | 2.57 | 9 | 1.14 | 87 | 3.16 |
* Total number of genes in this range
** Average expression level of genes in this range
The correlation between percent composition of residues and gene expression level on alternate dataset 1 (untreated) and 2 (treated). The residues in reference dataset having positive correlation are shown by underline and negative correlation by bold letter.
| C | -0.052 | -0.062 | -0.045 | -0.051 |
| E | 0.004 | 0.016 | -0.012 | -0.004 |
| F | -0.072 | -0.069 | -0.057 | -0.052 |
| H | -0.075 | -0.059 | -0.064 | -0.051 |
| I | -0.075 | -0.085 | -0.060 | -0.071 |
| K | 0.070 | 0.062 | 0.017 | 0.007 |
| M | -0.046 | -0.057 | -0.053 | -0.060 |
| P | -0.026 | -0.035 | -0.006 | -0.010 |
| Q | -0.037 | -0.049 | -0.036 | -0.047 |
| T | 0.029 | 0.022 | 0.041 | 0.032 |
| W | -0.056 | -0.064 | -0.034 | -0.043 |
| Y | -0.031 | -0.040 | -0.002 | -0.009 |
The number of genes and the average expression level of genes which have percent composition of positively correlated residues (e.g. Ala, Gly, Arg & Val) in different bin/range on alternate dataset 1 and 2.
| 1 – 3 | 156 | 2.51 | 3.30 | 248 | 3.61 | 4.24 | 464 | 4.24 | 5.44 | 95 | 3.50 | 4.95 |
| 3 – 5 | 784 | 1.90 | 2.72 | 868 | 2.35 | 3.09 | 1317 | 3.15 | 4.26 | 715 | 2.90 | 3.59 |
| 5 – 7 | 927 | 2.44 | 3.43 | 965 | 3.44 | 4.47 | 650 | 2.40 | 3.27 | 1173 | 2.91 | 3.76 |
| 7 – 9 | 521 | 3.93 | 5.46 | 433 | 5.81 | 6.74 | 157 | 5.08 | 4.98 | 572 | 4.46 | 5.39 |
| 9 – 11 | 211 | 9.90 | 10.72 | 122 | 5.30 | 5.90 | 32 | 12.41 | 8.29 | 113 | 11.02 | 12.20 |
| 11 – 13 | 47 | 18.77 | 14.10 | 27 | 9.54 | 9.55 | 13 | 12.03 | 8.18 | 16 | 15.08 | 16.01 |
| 13 – 15 | 22 | 16.86 | 13.19 | 13 | 14.68 | 13.59 | 3 | 51.07 | 32.87 | 2 | 22.45 | 15.25 |
| 15 | 16 | 26.69 | 21.81 | 5 | 20.46 | 20.50 | 2 | 16.75 | 15.25 | 2 | 36.30 | 25.65 |
* Number of genes in this range
** Average expression level of genes in this range
The analysis of genes in alternate dataset 1 & 2, whose expression changes 4 folds or more when treated with Alkylating agent. Residues showed in reference dataset positive and negative correlation are shown by undeline and bold font respectively.
| C | -0.029 | 0.023 |
| E | -0.070 | -0.118 |
| F | -0.087 | 0.017 |
| H | 0.026 | -0.011 |
| I | -0.073 | 0.067 |
| K | 0.075 | -0.035 |
| M | -0.041 | 0.027 |
| P | -0.000 | 0.028 |
| Q | -0.159 | -0.109 |
| T | 0.002 | 0.108 |
| W | -0.131 | 0.035 |
| y | -0.047 | 0.043 |
The correlation between amino acid composition and log (EC) where EC is (Expression of Treated Genes)/(Expression of Untreated Genes). Residues shows positive and negative correlations are shown by bold font and by underline respectively.
| C | 0.055 |
| D | 0.029 |
| E | -0.036 |
| H | -0.014 |
| M | 0.046 |
| P | 0.064 |
| Q | 0.068 |
| R | -0.036 |
| V | -0.068 |
| Y | 0.076 |
The correlation between predicted and experimentally determined gene expression is shown. The value shown by bold font is average correlation on 5 sets of data using in 5-fold cross-validation.
| No Function | ||||||
* Correlation achieved for each set
The performance of classification methods on a set of 2465 yeast genes which consists of 121 cytoplasmic ribosomes genes (positive examples) and 2344 other genes (negative examples). GEM and AACM are gene expression based and amino acid composition based SVM methods (RBF kernel) respectively. The SVM parameter for GEM and AACM were "-c 10 -g 0.03" and "-c 10 -g 0.55" respectively
| GEM | 4 | 6 | 115 | 2340 | 226 |
| AACM | 8 | 22 | 99 | 2336 | 190 |
| GEM + AACM | 4 | 2 | 119 | 2340 | 234 |
Figure 2An example plot between expression of genes and percent composition of ALA in their protein. Boxes along the X-axis show the range of composition of ALA and height of the box show the average expression of genes in that range.