| Literature DB >> 26797435 |
Xing Yang1, Xusheng Ma2, Xuenong Luo3, Houjun Ling3, Xichen Zhang3, Xuepeng Cai3,2.
Abstract
The tapeworm Taenia solium is an important human zoonotic parasite that causes great economic loss and also endangers public health. At present, an effective vaccine that will prevent infection and chemotherapy without any side effect remains to be developed. In this study, codon usage patterns in the T. solium genome were examined through 8,484 protein-coding genes. Neutrality analysis showed that T. solium had a narrow GC distribution, and a significant correlation was observed between GC12 and GC3. Examination of an NC (ENC vs GC3s)-plot showed a few genes on or close to the expected curve, but the majority of points with low-ENC (the effective number of codons) values were detected below the expected curve, suggesting that mutational bias plays a major role in shaping codon usage. The Parity Rule 2 plot (PR2) analysis showed that GC and AT were not used proportionally. We also identified 26 optimal codons in the T. solium genome, all of which ended with either a G or C residue. These optimal codons in the T. solium genome are likely consistent with tRNAs that are highly expressed in the cell, suggesting that mutational and translational selection forces are probably driving factors of codon usage bias in the T. solium genome.Entities:
Keywords: Taenia solium; codon usage bias; intron number; mutation bias; translation selection
Mesh:
Substances:
Year: 2015 PMID: 26797435 PMCID: PMC4725240 DOI: 10.3347/kjp.2015.53.6.689
Source DB: PubMed Journal: Korean J Parasitol ISSN: 0023-4001 Impact factor: 1.341
Fig. 1.The distribution of GC contents in T. solium genes. The GC content of the 8,484 T. solium genes (shown in blue) is unimodal.
Codon usage table in T. solium
| AA | Codon | N | RSCU | AA | Codon | N | RSCU | |
|---|---|---|---|---|---|---|---|---|
| Phe | UUU | 73863 | 0.96 | Ser | UCU | 66945 | 1.09 | |
| UUC | 80528 | 1.04 | UCC | 74835 | 1.22 | |||
| Leu | UUA | 29081 | 0.45 | UCA | 62935 | 1.03 | ||
| UUG | 71331 | 1.10 | UCG | 54145 | 0.88 | |||
| CUU | 79963 | 1.23 | Pro | CCU | 64730 | 1.10 | ||
| CUC | 91326 | 1.41 | CCC | 61128 | 1.04 | |||
| CUA | 40023 | 0.62 | CCA | 68271 | 1.16 | |||
| CUG | 77936 | 1.20 | CCG | 42041 | 0.71 | |||
| Ile | AUU | 81264 | 1.29 | Thr | ACU | 67286 | 1.12 | |
| AUC | 73474 | 1.16 | ACC | 69492 | 1.16 | |||
| AUA | 34730 | 0.55 | ACA | 59130 | 0.99 | |||
| Met | AUG | 85450 | 1.00 | ACG | 43999 | 0.73 | ||
| Val | GUU | 70073 | 1.09 | Ala | GCU | 89508 | 1.19 | |
| GUC | 65525 | 1.02 | GCC | 84269 | 1.12 | |||
| GUA | 34933 | 0.54 | GCA | 71732 | 0.96 | |||
| GUG | 86210 | 1.34 | GCG | 54139 | 0.72 | |||
| Tyr | UAU | 40934 | 0.80 | Cys | UGU | 42340 | 0.98 | |
| UAC | 61575 | 1.20 | UGC | 44305 | 1.02 | |||
| TER | UAA | 2562 | 0.91 | TER | UGA | 3493 | 1.24 | |
| UAG | 2429 | 0.86 | Trp | UGG | 44634 | 1.00 | ||
| His | CAU | 47676 | 0.94 | Arg | CGU | 55022 | 1.30 | |
| CAC | 53846 | 1.06 | CGC | 50562 | 1.19 | |||
| Gln | CAA | 77299 | 0.96 | CGA | 51124 | 1.21 | ||
| CAG | 83180 | 1.04 | CGG | 30983 | 0.73 | |||
| Asn | AAU | 83583 | 1.06 | Ser | AGU | 58100 | 0.95 | |
| AAC | 74133 | 0.94 | AGC | 50685 | 0.83 | |||
| Lys | AAA | 84273 | 0.91 | Arg | AGA | 34529 | 0.82 | |
| AAG | 99955 | 1.09 | AGG | 31888 | 0.75 | |||
| Asp | GAU | 108334 | 1.09 | Gly | GGU | 78723 | 1.35 | |
| GAC | 90233 | 0.91 | GGC | 63945 | 1.10 | |||
| Glu | GAA | 113616 | 0.91 | GGA | 58560 | 1.00 | ||
| GAG | 137061 | 1.09 | GGG | 31858 | 0.55 |
Relative synonymous codon usage (RSCU) values were calculated by summing over all the genes. A total of 8,484 genes comprising 4,001,735 codons were analyzed. N is the number of codons, AA is the amino acid. Preferentially used codons are displayed in bold.
Fig. 2.Correspondence analysis of relative synonymous codon usage (RSCU) for all 8,484 T. solium genes. (A) This panel shows the distribution of genes on the primary and secondary axes (accounting for 16.7% and 14.1% of the total variation, respectively). The 2 classes of genes (High GC and Low GC) are color coded; the high GC genes are shown in red and the low GC genes are shown in blue. (B) This panel shows the underlying distribution of codons on the same 2 axes. Codons ending with G or C are shown in red, and codons ending with A or U are shown in green.
Fig. 3.GC12 or ENC vs GC3s plot for 8,484 genes of T. solium. (A) GC12 vs GC3 plot (Neutrality plot analyses). The regression line is y=0.5513x+0.2196, R2=0.7309, OP=0.4894. The range of the GC3 values was 12.0%-93.4%. The cross point of the regression line and the diagonal line is defined as the optimum point (OP). (B) ENC versus GC3s plot (NC plot), the solid red line indicates the expected ENC. The ENC values of different genes ranged 21.5 to 61.0; values of GC3s ranged 10.8 to 93.3.
Fig. 4.Parity rule (PR2)-bias plot. The red open circle indicates the average position x=0.4340±0.0887 and y=0.4458±0.0927.
Fig. 5.Plot of ENC versus hydropathicity index and aromaticity score for T. solium genes.
Fig. 6.Plot of ENC versus protein length for T. solium.
Fig. 7.Plot of ENC versus intron number for T. solium.
Optimal codons and tRNA abundance in T. solium
| AA | Codon | tRNA gene | High | Low | AA | Codon | tRNA gene | High | Low | |
|---|---|---|---|---|---|---|---|---|---|---|
| Phe | UUU | AAA (NA) | 0.69 (2047) | 1.21 (2707) | Ser | UCU | AGA (4) | 0.88 (1710) | 1.21 (2609) | |
| UUC[ | GAA (2) | 1.31 (3863) | 0.79 (1751) | UCC[ | GGA (NA) | 1.59 (3088) | 0.87 (1863) | |||
| Leu | UUA | TAA (4) | 0.21 (492) | 0.94 (1719) | UCA | TGA (2) | 0.72 (1400) | 1.40 (3003) | ||
| UUG | CAA (NA) | 0.87 (2090) | 1.32 (2422) | UCG[ | CGA (7) | 1.00 (1937) | 0.75 (1613) | |||
| CUU | GAA (NA) | 0.99 (2362) | 1.32 (2405) | Pro | CCU | AGG (2) | 0.97 (1976) | 1.20 (2262) | ||
| CUC[ | GAG (NA) | 2.01 (4816) | 0.76 (1397) | CCC[ | GGG (NA) | 1.39 (2820) | 0.71 (1342) | |||
| CUA | TAG (3) | 0.45 (1068) | 0.74 (1355) | CCA | TGG (2) | 0.84 (1710) | 1.49 (2806) | |||
| CUG[ | CAG (1) | 1.47 (3525) | 0.92 (1675) | CCG[ | CGG (5) | 0.79 (1607) | 0.60 (1142) | |||
| Ile | AUU | AAT (3) | 1.04 (2444) | 1.39 (2828) | Thr | ACU | AGT (1) | 0.98 (2130) | 1.17 (2287) | |
| AUC[ | GAT (NA) | 1.61 (3779) | 0.79 (1613) | ACC[ | GGT (NA) | 1.53 (3327) | 0.84 (1644) | |||
| AUA | TAT (4) | 0.35 (813) | 0.82 (1668) | ACA | TGT (3) | 0.73 (1593) | 1.31 (2576) | |||
| Met | AUG | CAT (11) | 1.00 (3403) | 1.00 (2759) | ACG | CGT (1) | 0.76 (1661) | 0.68 (1341) | ||
| Val | GUU | AAC (1) | 0.75 (1794) | 1.32 (2544) | Ala | GCU | AGC (NA) | 1.05 (3038) | 1.32 (2669) | |
| GUC[ | GAC (NA) | 1.20 (2893) | 0.75 (1437) | GCC[ | GGC (NA) | 1.44 (4176) | 0.79 (1590) | |||
| GUA | UAC (2) | 0.38 (903) | 0.81 (1557) | GCA | TGC (1) | 0.70 (2031) | 1.30 (2630) | |||
| GUG[ | CAC (6) | 1.68 (4040) | 1.12 (2156) | GCG[ | CGC (1) | 0.82 (2383) | 0.59 (1193) | |||
| Tyr | UAU | ATA (NA) | 0.52 (1043) | 1.10 (1609) | Cys | UGU | ACA (NA) | 0.82 (1326) | 1.11 (1650) | |
| UAC[ | GTA (3) | 1.48 (2947) | 0.90 (1306) | UGC[ | GCA (1) | 1.18 (1899) | 0.89 (1314) | |||
| His | CAU | ATG (NA) | 0.69 (1259) | 1.19 (1850) | Trp | UGG | CCA (11) | 1.00 (1862) | 1.00 (1138) | |
| CAC[ | GTG (2) | 1.31 (2371) | 0.81 (1260) | Arg | CGU[ | ACG (4) | 1.47 (2200) | 0.88 (1097) | ||
| Gln | CAA | TTG (2) | 0.73 (2009) | 1.14 (3043) | CGC[ | GCG (NA) | 1.92 (2866) | 0.60 (746) | ||
| CAG[ | CTG (8) | 1.27 (3511) | 0.86 (2285) | CGA | TCG (8) | 1.05 (1569) | 1.05 (1306) | |||
| Asn | AAU | ATT (NA) | 0.82 (2145) | 1.25 (3689) | CGG | CCG (4) | 0.76 (1139) | 0.60 (742) | ||
| AAC[ | GTT (1) | 1.18 (3106) | 0.75 (2193) | Ser | AGU | ACT (NA) | 0.83 (1603) | 1.04 (2233) | ||
| Lys | AAA | TTT (4) | 0.67 (2071) | 1.11 (4356) | AGC[ | GCT (5) | 0.99 (1915) | 0.74 (1581) | ||
| AAG[ | CTT (1) | 1.33 (4138) | 0.89 (3489) | Arg | AGA | TCT (10) | 0.34 (501) | 1.70 (2115) | ||
| Asp | GAU | ATC (NA) | 0.86 (2935) | 1.26 (4243) | AGG | CCT (10) | 0.46 (680) | 1.17 (1462) | ||
| GAC[ | GTC (1) | 1.14 (3888) | 0.74 (2502) | Gly | GGU[ | ACC (NA) | 1.38 (3213) | 1.19 (2098) | ||
| Glu | GAA | TTC (2) | 0.62 (2561) | 1.12 (5859) | GGC[ | GCC (3) | 1.47 (3423) | 0.77 (1357) | ||
| GAG[ | CTC (3) | 1.38 (5755) | 0.88 (4598) | GGA | TCC (6) | 0.67 (1548) | 1.39 (2450) | |||
| GGG | CCC (3) | 0.48 (1127) | 0.65 (1140) |
Codon usage was compared using chi-squared contingency test to identify optimal codons. That occur significantly more often (P<0.01) are indicated with asterisk denote codons that occurred significantly more often (P<0.01). The number of codons in the high expressed genes was 143953; the number of codons in the low expressed genes was 129698.
denotes the codons that occurred significantly more often in the high expressed genes (P<0.01); they are the codons that were designated as ‘optimal’ codons. Thirteen (indicated by arrows) of the 26 optimal codons corresponded to the most abundant tRNAs in the T. solium genome.