| Literature DB >> 24410898 |
Mohammad-Hadi Foroughmand-Araabi, Bahram Goliaei1, Kasra Alishahi, Mehdi Sadeghi.
Abstract
BACKGROUND: Codon degeneracy and codon usage by organisms is an interesting and challenging problem. Researchers demonstrated the relation between codon usage and various functions or properties of genes and proteins, such as gene regulation, translation rate, translation efficiency, mRNA stability, splicing, and protein domains. Researchers usually represent segments of proteins responsible for specific functions or structures in a family of proteins as sequence patterns or motifs. We asked the question if organisms use the same codons in pattern segments as compared to the rest of the sequence.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24410898 PMCID: PMC3896713 DOI: 10.1186/1742-4682-11-2
Source DB: PubMed Journal: Theor Biol Med Model ISSN: 1742-4682 Impact factor: 2.432
Figure 1Example of a PROSITE pattern and its corresponding pattern genes and pattern regions.(a) The pattern contains four positions: “G”, “G”, “[AG]”, and “[FY]” and has five positive hits. Their corresponding genes making rows. Number of occurrences of codons (b) in pattern genes and (c) in pattern regions are also presented.
Percentages of the patterns for which the pattern codon usage is not random
| C | 2 | 24 | 15 | %62.5 |
| D | 2 | 43 | 29 | %67.4 |
| E | 2 | 41 | 25 | %61.0 |
| F | 2 | 40 | 27 | %67.5 |
| H | 2 | 33 | 16 | %48.5 |
| K | 2 | 35 | 21 | %60.0 |
| N | 2 | 38 | 15 | %39.5 |
| Q | 2 | 38 | 22 | %57.9 |
| Y | 2 | 37 | 15 | %40.5 |
We excluded amino acids with exactly one codon. If an amino acid appears 10 times in a pattern region, and each of its codons appears at least once in this region, we consider the pattern as a valid pattern for the amino acid. A pattern with non-random codon usage is a pattern for which the randomness hypothesis with respect to a completely random background is rejected.
Percentages of the patterns with different “pattern gene codon usage” and “pattern region codon usage”
| C | 2 | 561 | 276 | %49.2 |
| D | 2 | 809 | 410 | %50.7 |
| E | 2 | 772 | 393 | %50.9 |
| F | 2 | 761 | 402 | %52.8 |
| H | 2 | 609 | 284 | %46.6 |
| K | 2 | 764 | 367 | %48.0 |
| N | 2 | 724 | 361 | %49.9 |
| Q | 2 | 639 | 310 | %48.5 |
| Y | 2 | 668 | 341 | %51.0 |
We excluded amino acids with exactly one codon. If an amino acid appears 30 times in a pattern region, and each of its codons appears at least once in this region, we consider the pattern as a valid pattern for the amino acid.
Figure 2The effect of the pattern length on the percentages of the patterns which have different “pattern region codon usage (RCU)” and “pattern gene codon usage (GCU)”.
Percentages of the patterns for which the hypothesis of equality of “pattern region codon usage (RCU)” and “pattern gene codon usage (GCU)” is rejected, grouped by the specificity of the pattern
| [ 9-12) | 12 | 10 | %83.3 |
| [ 12-15) | 709 | 529 | %74.6 |
| [ 15-18) | 2111 | 1420 | %67.3 |
| [ 18-21) | 3792 | 2503 | %66.0 |
| [ 21-24) | 3194 | 2112 | %66.1 |
| [ 24-27) | 1516 | 992 | %65.4 |
| [ 27-30) | 876 | 572 | %65.3 |
| [ 30-33) | 444 | 282 | %63.5 |
| [ 33-36) | 238 | 148 | %62.2 |
| [ 36-39) | 184 | 130 | %70.7 |
| [ 39-42) | 185 | 123 | %66.5 |
| [ 42-45) | 128 | 69 | %53.9 |
| [ 45-48) | 67 | 44 | %65.7 |
| [ 48-51) | 57 | 32 | %56.1 |
| [ 51-54) | 34 | 19 | %55.9 |
| [ 57-60) | 21 | 7 | %33.3 |
| [ 60-63) | 12 | 6 | %50.0 |
The range of pattern specificities are divided to subranges of length 3, and the results are provided for each range of pattern specificities. If an amino acid appears 30 times in a pattern region, and each of its codons appears at least once in this region, we consider the pattern as a valid pattern for the amino acid. The “number of patterns with non-equal pattern region codon usage and pattern gene codon usage” is the number of cases for which the hypothesis of the equality of “pattern region codon usage (RCU)” and “pattern gene codon usage (GCU)” is rejected. The “Percentages of patterns with non-equal pattern region codon usage and pattern gene codon usage for amino acids” column represents the ratio of the number of cases with non-equal codon usages to the number of valid cases. We presented the rows with at least 10 valid patterns.
Mutual information between “pattern gene codon usage (RCU)” and “pattern region codon usage (GCU)”
| C | 2 | 561 | 491 | %87.5 | 0.00009 |
| D | 2 | 809 | 735 | %90.9 | 0.00007 |
| E | 2 | 772 | 693 | %89.8 | 0.00005 |
| F | 2 | 761 | 686 | %90.1 | 0.00007 |
| H | 2 | 609 | 525 | %86.2 | 0.00007 |
| K | 2 | 764 | 689 | %90.2 | 0.00006 |
| N | 2 | 724 | 660 | %91.2 | 0.00005 |
| Q | 2 | 639 | 533 | %83.4 | 0.00009 |
| Y | 2 | 668 | 600 | %89.8 | 0.00007 |
| I | 3 | 860 | 715 | %83.1 | 0.00008 |
| A | 4 | 871 | 646 | %74.2 | 0.00015 |
| G | 4 | 937 | 659 | %70.3 | 0.00018 |
| P | 4 | 688 | 454 | %66.0 | 0.00023 |
| T | 4 | 824 | 601 | %72.9 | 0.00016 |
| V | 4 | 894 | 683 | %76.4 | 0.00012 |
| L | 6 | 831 | 499 | %60.0 | 0.00026 |
| R | 6 | 650 | 355 | %54.6 | 0.00035 |
| S | 6 | 724 | 491 | %67.8 | 0.00017 |
The mutual information is computed between two random variables, namely, “pattern gene codon usage (GCU)”, and “pattern region codon usage (RCU)”. We excluded amino acids with exactly one codon. If an amino acid appears 30 times in a pattern region, and each of its codons appears at least once in this region, we consider the pattern as a valid pattern for the amino acid.