| Literature DB >> 22805311 |
Mary J O'Connell1, Aisling M Doyle, Thomas E Juenger, Mark T A Donoghue, Channa Keshavaiah, Reetu Tuteja, Charles Spillane.
Abstract
BACKGROUND: Synonymous codon usage bias has typically been correlated with, and attributed to translational efficiency. However, there are other pressures on genomic sequence composition that can affect codon usage patterns such as mutational biases. This study provides an analysis of the codon usage patterns in Arabidopsis thaliana in relation to gene expression levels, codon volatility, mutational biases and selective pressures.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22805311 PMCID: PMC3502101 DOI: 10.1186/1756-0500-5-359
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Relationship between selection pressure (as measured by dN/dS) and codon volatility. The 2,181 significant codon volatility candidates are compared to their dN/dS value. Pairwise dN/dS calculations between A. thaliana and B. oleracea on the x-axis versus the 2,181 significant codon volatility candidates (P < 0.050) on the y-axis.
Figure 2Synonymous codon usage distribution in in comparison with codon volatility and highly expressed gene data. The two major axes are shown here, Axis 1 and Axis 2. The darker points on the plots represent the codon usage values for each of the 18,828 genes. (a) The lighter points overlayed on the codon usage distribution are those genes with significant codon volatility P- values (2,181). (b) The lighter points overlayed are those genes that are highly expressed.
Highest Volatility Candidates in and dN/dS comparative sequence analysis toand
| | | | |||||
|---|---|---|---|---|---|---|---|
| | | | |||||
| At1g62240 | expressed protein | 684 | 0.000000006 | | | 0.7975* | (100) |
| At1g64370 | expressed protein | 537 | 0.000000323 | | | 0.2567 | (94) |
| At1g69440 | PAZ domain-containing protein | 2973 | 0.000008714 | 0.1098 | (15) | 0.0910 | (93) |
| At2g27380 | proline-rich family protein | 2286 | 0.000000000 | | | 0.0412 | (25) |
| At3g21420 | oxidoreductase | 1095 | (22) | 0.1173* | (88) | ||
| 0.0913 | (84) | ||||||
| At3g28780 | glycine-rich protein | 1845 | | | 0.4217* | (95) | |
| 0.2545 | (53) | ||||||
| At4g15430 | early-responsive to dehydration | 2271 | 0.000004120 | 0.3808 | (17) | | |
| At4g31590 | glycosyl transferase family 2 | 2079 | 0.000004420 | 0.2099 | (29) | 0.1028 | (100) |
| At4g32420 | peptidyl-prolyl cis-trans isomerase | 2514 | 0.000005314 | | | 0.2893 | (98) |
| At5g07570 | glycine/proline-rich protein | 4515 | 0.000001603 | | | 0.4833 | (71) |
| At5g59990 | expressed protein | 726 | (55) | 0.1869 | (90) | ||
| 0.2227 | (100) | ||||||
The 11 most volatile genes in the A. thaliana genome with their corresponding volatility P- values (P < 10-6). Conventional analyses using pairwise dN/dS ratios (CODEML M0 from PAML). Displayed are those comparisons that were possible between A. thaliana and homologous sequences, using shotgun sequencing reads (B. oleracea), close to full length sequences (A. lyrata) and four cloned genes (A. lyrata) marked by *.
High codon volatility candidates and their paralogs
| At4g15430 | early-responsive to dehydration protein-related | 0.0000041 | |
| Paralog | At3g21620 | early-responsive to dehydration protein-related | 0.0720094 |
| At4g31590 | glycosyl transferase family 2 protein | 0.0000044 | |
| Paralog | At2g24630 | glycosyl transferase family 2 protein | 0.8029084 |
Comparison of volatility P-values for most volatile Arabidopsis genes and their paralogs. Paralog information was extracted from the ‘Paralogons in Arabidopsis thaliana’ database (http://wolfe.gen.tcd.ie/athal/dup). The first two columns display the candidate gene, its paralog, and a brief description of their function. The last column is the codon volatility P-value.
Figure 3Synonymous codon usage and codon volatility compared with the composition of GC at the third position. Codon usage bias, the major contributing axis (Axis 1), is compared to the composition of G and C at the 3rd position of codons. Those genes with significant codon volatilities are overlayed in pale grey. The linear regression for codon usage compared to GC3 composition is shown as the solid line, R2 = 0.2566 (equation shown on graph, y = −0.1805x + 0.0888). The linear regression for the codon usage compared to GC3 composition for those genes with significant volatility scores are shown as a dashed line, R2 = 0.2281 (equation shown on graph, y = −0.1864x + 0.0892).