| Literature DB >> 16483368 |
Age Tats1, Maido Remm, Tanel Tenson.
Abstract
BACKGROUND: Although the sequence requirements for translation initiation regions have been frequently analysed, usually the highly expressed genes are not treated as a separate dataset.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16483368 PMCID: PMC1397820 DOI: 10.1186/1471-2164-7-28
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Frequency of A at the beginning of . Average frequency per codon is shown. A. Highly expressed genes have significant increase of A in codons 3–5 (nucleotides 7–15), but decrease in the second codon compared to the all genes dataset. B. The preference for A nucleotide in codon 2 (nucleotides 4–6) decreases with the increase of expression level. In contrast, there is a positive correlation between the expression level and the frequency of A in codons 3–5 (nucleotides 7–15). Error bars indicate 1.96 standard errors of the mean.
Figure 2Nucleotide usage in the second codon of HEG. Nucleotide frequencies in all three positions of the second codon of HEG are divided by the corresponding frequencies of all genes. The asterisks mark significance probability less than 0.01 (H0: there is no difference of nucleotide frequencies between all genes and HEG).
Figure 3Frequency of A in the first 10 codons of HEG and in all genes.
Preference for codons at the beginning of HEG compared to all genes datasets. (H0: there is no difference between codon frequencies in all genes and HEG).
| organism | codon position | |||||||||||
| 2 | 3 | 4 | 5 | |||||||||
| codon | P-value | %HEG/ | codon | P-value | %HEG/ | codon | P-value | %HEG/ | codon | P-value | %HEG/ | |
| 7.9E-06 | 15/3 | ACU | 1.1E-04 | 10/2 | AUU | 4.0E-05 | 16/4 | AAA | 0.001 | 15/5 | ||
| 6.0E-04 | 11/3 | AAG | 0.004 | 11/4 | ACU | 0.007 | 6/1 | |||||
| UCC | 7.1E-04 | 8/1 | ||||||||||
| 8.5E-05 | 15/3 | GUA | 0.004 | 9/2 | GGA | 0.002 | 9/2 | - | ||||
| 0.001 | 15/4 | - | - | AAG | 0.009 | 9/2 | ||||||
| ACA | 0.005 | 6/1 | ||||||||||
| 1.0E-04 | 17/4 | GUA | 0.005 | 6/1 | GAA | 0.009 | 11/4 | - | ||||
| 0.002 | 11/3 | |||||||||||
| UCU | 0.006 | 10/3 | ||||||||||
| 3.1E-04 | 11/2 | AAG | 2.9E-04 | 11/2 | AAG | 2.1E-04 | 11/2 | AAG | 8.5E-05 | 11/2 | ||
| 0.008 | 11/4 | ACU | 0.008 | 5/1 | ||||||||
| 6.9E-04 | 15/3 | AAG | 0.010 | 13/4 | - | - | ||||||
| - | - | - | - | |||||||||
| - | - | - | - | |||||||||
| 1.0E-04 | 15/2 | - | - | - | ||||||||
| UCA | 9.9E-04 | 10/1 | ||||||||||
| GGU | 0.004 | 8/1 | ||||||||||
| - | - | AAG | 0.004 | 15/5 | AAG | 0.002 | 15/4 | |||||
| UAU | 0.006 | 8/1 | ||||||||||
| AGA | 0.006 | 11/3 | ||||||||||
| 1.1E-05 | 14/2 | GGA | 0.003 | 9/2 | - | - | ||||||
| 0.002 | 9/2 | |||||||||||
| CCA | 0.008 | 7/1 | ||||||||||
| - | - | AUG | 0.004 | 27/12 | - | |||||||
| 7.3E-08 | 19/3 | AGA | 2.5E-07 | 18/3 | GUU | 8.6E-08 | 15/2 | AAG | 8.4E-04 | 13/4 | ||
| UCU | 7.2E-07 | 24/6 | CCA | 0.006 | 6/1 | GGU | 0.002 | 8/2 | ACU | 0.001 | 11/3 | |
| GGU | 1.0E-05 | 13/2 | CCA | 0.005 | 8/2 | GUU | 0.002 | 9/2 | ||||
| 0.006 | 6/1 | |||||||||||
| 2.0E-07 | 22/4 | CGU | 4.5E-06 | 13/1 | - | AAG | 8.7E-04 | 13/3 | ||||
| GGA | 0.002 | 8/1 | AUU | 0.001 | 13/3 | AUC | 0.009 | 7/1 | ||||
| GGA | 0.001 | 11/2 | - | CAA | 0.004 | 11/2 | - | |||||
| 0.001 | 11/2 | |||||||||||
| UCA | 0.002 | 11/2 | ||||||||||
| 0.002 | 11/2 | |||||||||||
| CCA | 0.006 | 6/1 | ||||||||||
Ratio of G4 and C5 nucleotides between HEG and all genes datasets. For test of G4, genes with NCN codons and for test of C5, genes with GNN codons in the second position were removed from datasets. (H0: there is no difference between nucleotide frequencies in all genes and HEG).
| organism | G4 | C5 | ||
| P-value | HEG/all ratio | P-value | HEG/all ratio | |
| 1.000 | 1.0 | 0.048 | 1.5 | |
| 0.490 | 0.7 | 0.111 | 1.7 | |
| 0.644 | 1.2 | 0.004 | 2.6 | |
| 0.336 | 1.4 | 0.010 | 1.8 | |
| 0.010 | 0.0 | 0.495 | 1.2 | |
| 0.524 | 1.2 | 1.000 | 0.9 | |
| 0.012 | 2.8 | 0.010 | 1.9 | |
| 0.298 | 1.3 | 0.763 | 1.2 | |
| 0.011 | 2.1 | 0.011 | 2.6 | |
| 0.177 | 1.4 | 0.073 | 2.0 | |
| 0.282 | 1.3 | 0.033 | 2.5 | |
| 1.000 | 1.0 | 0.030 | 2.2 | |
| 0.006 | 1.8 | 2.8E-04 | 1.7 | |
| 0.011 | 1.7 | 0.010 | 1.6 | |
| 0.004 | 2.1 | 1.3E-05 | 3.9 | |
Frequency of GCN codons in the second position of HEG and in entire HEG of different organisms. (H0: there is no difference between GCN codon frequencies in the second codon of HEG and in all codons of HEG).
| percentage of codon | percentage of codon | ||||||||
| organism | codon | 2nd Position of HEG | entire HEG | P-value | organism | codon | 2nd Position of HEG | entire HEG | P-value |
| GCU | 57.1 | 41.8 | 0.184 | GCU | 21.4 | 24.4 | 1.000 | ||
| GCC | 0.0 | 8.8 | 0.250 | GCC | 21.4 | 22.7 | 1.000 | ||
| GCA | 42.9 | 25.7 | 0.082 | GCA | 28.6 | 28.0 | 1.000 | ||
| GCG | 0.0 | 23.7 | GCG | 28.6 | 24.8 | 0.758 | |||
| GCU | 33.3 | 40.2 | 0.792 | GCU | 38.5 | 48.3 | 0.582 | ||
| GCC | 0.0 | 8.1 | 0.625 | GCC | 0.0 | 4.3 | 1.000 | ||
| GCA | 53.3 | 31.3 | 0.091 | GCA | 61.5 | 44.7 | 0.268 | ||
| GCG | 13.3 | 20.4 | 0.749 | GCG | 0.0 | 2.8 | 1.000 | ||
| GCU | 28.6 | 41.3 | 0.419 | GCU | 0.0 | 38.0 | 0.049 | ||
| GCC | 0.0 | 16.6 | 0.144 | GCC | 42.9 | 22.6 | 0.199 | ||
| GCA | 57.1 | 11.8 | GCA | 28.6 | 31.0 | 1.000 | |||
| GCG | 14.3 | 30.2 | 0.251 | GCG | 28.6 | 8.4 | 0.113 | ||
| GCU | 40.0 | 26.1 | 0.199 | GCU | 68.2 | 75.7 | 0.453 | ||
| GCC | 0.0 | 4.9 | 0.621 | GCC | 22.7 | 23.4 | 1.000 | ||
| GCA | 60.0 | 50.4 | 0.502 | GCA | 9.1 | 0.6 | |||
| GCG | 0.0 | 18.7 | 0.037 | GCG | 0.0 | 0.3 | 1.000 | ||
| GCU | 19.2 | 8.0 | 0.054 | GCU | 17.6 | 62.5 | |||
| GCC | 26.9 | 49.4 | 0.029 | GCC | 5.9 | 31.5 | 0.031 | ||
| GCA | 26.9 | 7.6 | GCA | 76.5 | 5.3 | ||||
| GCG | 26.9 | 35.0 | 0.535 | GCG | 0.0 | 0.7 | 1.000 | ||
| GCU | 21.4 | 20.3 | 1.000 | GCU | 45.5 | 53.6 | 0.763 | ||
| GCC | 0.0 | 12.2 | 0.397 | GCC | 0.0 | 14.8 | 0.382 | ||
| GCA | 50.0 | 24.7 | 0.055 | GCA | 45.5 | 31.1 | 0.334 | ||
| GCG | 28.6 | 42.8 | 0.416 | GCG | 9.1 | 0.4 | 0.061 | ||
| GCU | 44.4 | 45.4 | 1.000 | ||||||
| GCC | 11.1 | 4.8 | 0.365 | ||||||
| GCA | 44.4 | 42.2 | 1.000 | ||||||
| GCG | 0.0 | 7.6 | 1.000 | ||||||
| GCU | 33.3 | 48.9 | 0.507 | ||||||
| GCC | 0.0 | 7.2 | 1.000 | ||||||
| GCA | 66.7 | 39.8 | 0.168 | ||||||
| GCG | 0.0 | 4.1 | 1.000 | ||||||
| GCU | 22.2 | 50.3 | 0.177 | ||||||
| GCC | 0.0 | 8.0 | 1.000 | ||||||
| GCA | 77.8 | 36.5 | 0.015 | ||||||
| GCG | 0.0 | 5.2 | 1.000 | ||||||
Preference for amino acids at the beginning of highly expressed proteins compared to all proteins datasets. (H0: there is no difference of amino acid frequencies between all proteins and highly expressed proteins).
| organism | amino acid position | |||||||||||
| 2 | 3 | 4 | 5 | |||||||||
| amino acid | P-value | %HEG/%all | amino acid | P-value | %HEG/%all | amino acid | P-value | %HEG/%all | amino acid | P-value | %HEG/%all | |
| 3.5E-06 | 26/9 | Lys | 0.004 | 23/11 | Ile | 7.2E-04 | 23/9 | Lys | 3.8E-05 | 21/7 | ||
| 3.6E-06 | 28/7 | - | Gly | 0.003 | 13/3 | - | ||||||
| 1.0E-04 | 26/8 | - | - | - | ||||||||
| 3.0E-06 | 29/9 | - | - | Val | 0.008 | 13/5 | ||||||
| Ser | 0.007 | 23/11 | ||||||||||
| 6.4E-09 | 43/12 | Lys | 1.3E-04 | 15/3 | Tyr | 0.005 | 8/2 | Lys | 2.5E-04 | 13/3 | ||
| Lys | 0.003 | 11/3 | Thr | 0.007 | 16/6 | |||||||
| 3.7E-05 | 29/8 | - | - | - | ||||||||
| 7.6E-04 | 18/5 | Thr | 0.003 | 18/5 | - | - | ||||||
| Ser | 0.007 | 27/11 | ||||||||||
| - | - | - | - | |||||||||
| 3.4E-04 | 19/4 | - | - | - | ||||||||
| Gly | 3.4E-04 | 15/3 | ||||||||||
| 8.6E-06 | 26/7 | - | - | - | ||||||||
| 1.8E-07 | 23/4 | - | Arg | 0.009 | 13/4 | - | ||||||
| 0.007 | 13/4 | - | Met | 0.004 | 27/12 | - | ||||||
| 3.5E-07 | 28/8 | Arg | 1.0E-05 | 21/6 | Val | 1.0E-04 | 18/5 | Lys | 0.002 | 20/9 | ||
| Gly | 0.002 | 14/5 | Leu | 0.002 | 0/9 | |||||||
| 6.4E-05 | 28/10 | - | - | - | ||||||||
| Gly | 6.3E-04 | 17/5 | ||||||||||
| 1.2E-05 | 23/5 | - | Gln | 0.008 | 11/3 | - | ||||||
| Asn | 0.003 | 0/12 | ||||||||||
The percentage of proteins containing the Ala, Gly, Pro, Ser, Thr and Val residues in the second position of highly expressed proteins and all proteins. (H0: there is no difference in frequency of this group of amino acids in the second position of highly expressed proteins and in the second position of all proteins).
| organism | Percentage of Ala, Ser, Thr, Gly, Val and Pro in the 2nd position | P-value | |
| HEG | all | ||
| 60 | 40 | 4.8E-04 | |
| 58 | 32 | 8.9E-05 | |
| 56 | 28 | 4.1E-05 | |
| 64 | 36 | 3.2E-06 | |
| 80 | 63 | 0.005 | |
| 52 | 38 | 0.068 | |
| 71 | 31 | 2.8E-08 | |
| 44 | 29 | 0.040 | |
| 58 | 24 | 1.3E-06 | |
| 55 | 30 | 2.0E-04 | |
| 54 | 26 | 2.0E-05 | |
| 48 | 31 | 0.015 | |
| 85 | 54 | 1.0E-08 | |
| 90 | 52 | 6.6E-10 | |
| 79 | 28 | 1.5E-12 | |