| Literature DB >> 19461982 |
Abstract
Codon adaptation index is a widely used index for characterizing gene expression in general and translation efficiency in particular. Current computational implementations have a number of problems leading to various systematic biases. I illustrate these problems and provide a better computer implementation to solve these problems. The improved CAI can predict protein production better than CAI from other commonly used implementations.Entities:
Keywords: Codon usage bias; gene expression; tRNA; translation elongation
Year: 2007 PMID: 19461982 PMCID: PMC2684136
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1User interface for computing CAI in DAMBE. The top left panel lists the sequences in DAMBE’s buffer (5888 coding sequences from Saccharomyces cerevisiae genome). The top right panel lists sequences chosen to compute CAI. Clicking ‘Add all’ will include all sequences for analysis. The set of reference sequences for each species is selected by the dropdown box labeled ‘Choose a species’. The reference codon usage can be viewed by clicking ‘View codon table’. Adding one’s own reference codon usage table is done by clicking the ‘Add sp.’ button.
Data for evaluating the improved CAI from DAMBE (DCAI) and CAI from EMBOSS.cai (ECAI).
| APA1 | 966 | 0.452 | 0.405 | 0.7 | 8.7 |
| COR1 | 1374 | 0.378 | 0.380 | 0.7 | 2.5 |
| ENO1 | 1314 | 0.875 | 0.873 | 0.7 | 44.2 |
| FRS2 | 1512 | 0.396 | 0.373 | 0.7 | 2.3 |
| GYP6 | 1377 | 0.267 | 0.247 | 0.7 | 4.4 |
| HOR2 | 753 | 0.359 | 0.350 | 0.7 | 5.7 |
| IDP1 | 1287 | 0.414 | 0.382 | 0.7 | 7.7 |
| PRE8 | 753 | 0.231 | 0.220 | 0.7 | 6.9 |
| PUP2 | 783 | 0.268 | 0.226 | 0.7 | 4.4 |
| RPE1 | 717 | 0.391 | 0.357 | 0.7 | 5.8 |
| STI1 | 1770 | 0.354 | 0.363 | 0.7 | 13.1 |
| TFS1 | 660 | 0.222 | 0.226 | 0.7 | 8.1 |
| ZWF1 | 1518 | 0.263 | 0.256 | 0.7 | 5.6 |
| ACH1 | 1581 | 0.298 | 0.293 | 1.5 | 9.8 |
| ADE13 | 1449 | 0.412 | 0.398 | 1.5 | 6.3 |
| CCT8 | 1707 | 0.313 | 0.324 | 1.5 | 2.2 |
| PAB1 | 1734 | 0.535 | 0.515 | 1.5 | 30.4 |
| PRB1 | 1908 | 0.386 | 0.379 | 1.5 | 21.2 |
| SER1 | 1188 | 0.332 | 0.323 | 1.5 | 10.5 |
| YEL047C | 1413 | 0.355 | 0.331 | 1.5 | 3.8 |
| YNL134C | 1131 | 0.308 | 0.317 | 1.5 | 14.9 |
| ALD6 | 1503 | 0.615 | 0.551 | 2.2 | 44.3 |
| ATP1 | 1638 | 0.541 | 0.490 | 2.2 | 21.6 |
| LPD1 | 1500 | 0.330 | 0.321 | 2.2 | 18.9 |
| SOD2 | 702 | 0.283 | 0.291 | 2.2 | 12.6 |
| TOM40 | 1164 | 0.359 | 0.336 | 2.2 | 22.3 |
| YDR190C | 1392 | 0.294 | 0.273 | 2.2 | 4.8 |
| YHR049W | 732 | 0.508 | 0.497 | 2.2 | 18.4 |
| YMR226C | 804 | 0.287 | 0.285 | 2.2 | 14.5 |
| ARO8 | 1503 | 0.333 | 0.308 | 3 | 23.4 |
| CAR1 | 1002 | 0.306 | 0.302 | 3 | 5.2 |
| ILV6 | 930 | 0.354 | 0.301 | 3 | 13.9 |
| LEU4 | 1860 | 0.394 | 0.368 | 3 | 3.1 |
| PGM2 | 1710 | 0.392 | 0.374 | 3 | 2.2 |
| YEL071W | 1491 | 0.305 | 0.308 | 3 | 16.3 |
| GUK1 | 564 | 0.401 | 0.377 | 3.7 | 16.5 |
| IPP1 | 864 | 0.670 | 0.647 | 3.7 | 63.1 |
| LYS9 | 1341 | 0.418 | 0.376 | 3.7 | 16.2 |
| PRE4 | 801 | 0.250 | 0.257 | 3.7 | 3.4 |
| TAL1 | 1008 | 0.641 | 0.586 | 3.7 | 44.8 |
| THR4 | 1545 | 0.472 | 0.440 | 3.7 | 21.4 |
| VMA4 | 702 | 0.353 | 0.351 | 3.7 | 10.5 |
| YKL029C | 2010 | 0.329 | 0.308 | 3.7 | 2.8 |
| YNL010W | 726 | 0.434 | 0.386 | 3.7 | 31.6 |
| ERG10 | 1197 | 0.461 | 0.445 | 4.5 | 24.1 |
| HIS1 | 894 | 0.324 | 0.272 | 4.5 | 22.4 |
| HOM2 | 1098 | 0.502 | 0.462 | 4.5 | 60.3 |
| ILV3 | 1758 | 0.449 | 0.437 | 4.5 | 5.3 |
| ILV5 | 1188 | 0.857 | 0.809 | 4.5 | 76 |
| YDL124W | 939 | 0.282 | 0.277 | 4.5 | 6.4 |
| ADE1 | 921 | 0.332 | 0.291 | 5.2 | 8.7 |
| ADE3 | 2841 | 0.349 | 0.340 | 5.2 | 4.8 |
| DYS1 | 1164 | 0.520 | 0.487 | 5.2 | 15.8 |
| EGD2 | 525 | 0.625 | 0.587 | 5.2 | 20.1 |
| GSP1 | 660 | 0.647 | 0.632 | 5.2 | 26.3 |
| PRO2 | 1371 | 0.327 | 0.314 | 5.2 | 13.6 |
| AAT2 | 1257 | 0.330 | 0.301 | 6 | 11.7 |
| GLK1 | 1503 | 0.243 | 0.252 | 6 | 22.6 |
| SEC14 | 915 | 0.390 | 0.365 | 6 | 10.9 |
| URA5 | 681 | 0.379 | 0.365 | 6 | 25.4 |
| YBR025C | 1185 | 0.648 | 0.598 | 6 | 13.1 |
| IDH2 | 1110 | 0.328 | 0.300 | 6.7 | 29.4 |
| SPE3 | 882 | 0.450 | 0.425 | 6.7 | 15.1 |
| YER067W | 486 | 0.278 | 0.280 | 6.7 | 3.7 |
| YFR044C | 1446 | 0.378 | 0.342 | 6.7 | 30.2 |
| GRS1 | 2004 | 0.464 | 0.450 | 7.4 | 5.5 |
| HXK2 | 1461 | 0.681 | 0.664 | 7.4 | 26.5 |
| SHM2 | 1410 | 0.677 | 0.629 | 7.4 | 19.7 |
| TUB2 | 1374 | 0.342 | 0.337 | 7.4 | 11.2 |
| BAT2 | 1131 | 0.274 | 0.257 | 8.9 | 19 |
| CYS3 | 1185 | 0.505 | 0.470 | 8.9 | 6.7 |
| URA1 | 945 | 0.314 | 0.288 | 8.9 | 49.5 |
| VMA2 | 1554 | 0.455 | 0.437 | 8.9 | 33.7 |
(1) mRNA and protein abundance from Table 1 in Gygi et al. (1999), with mRNA in unit of mean copies/cell and protein in unit of 103 copies/cell. Only genes that have mRNA abundance identical to at least three other genes are included.
Correlation between DCAI and protein abundance (rDCAI) and between ECAI and protein abundance (rECAI) for three ranges of mRNA abundance, with Ngene being the number of genes within each mRNA abundance range. Results based on data in Table 1.
| mRNA range | Ngene | rDCAI | rECAI |
|---|---|---|---|
| 0.7–2.2 | 29 | 0.7705 | 0.7680 |
| 3–4.5 | 21 | 0.8590 | 0.8376 |
| 5.2–8.9 | 23 | 0.0534 | 0.0478 |