| Literature DB >> 28865429 |
John Athey1, Aikaterini Alexaki1, Ekaterina Osipova2, Alexandre Rostovtsev2, Luis V Santana-Quintero2, Upendra Katneni1, Vahan Simonyan2, Chava Kimchi-Sarfaty3.
Abstract
BACKGROUND: Due to the degeneracy of the genetic code, most amino acids can be encoded by multiple synonymous codons. Synonymous codons naturally occur with different frequencies in different organisms. The choice of codons may affect protein expression, structure, and function. Recombinant gene technologies commonly take advantage of the former effect by implementing a technique termed codon optimization, in which codons are replaced with synonymous ones in order to increase protein expression. This technique relies on the accurate knowledge of codon usage frequencies. Accurately quantifying codon usage bias for different organisms is useful not only for codon optimization, but also for evolutionary and translation studies: phylogenetic relations of organisms, and host-pathogen co-evolution relationships, may be explored through their codon usage similarities. Furthermore, codon usage has been shown to affect protein structure and function through interfering with translation kinetics, and cotranslational protein folding.Entities:
Keywords: Codon optimization; Codon usage bias; Recombinant protein therapeutics; Translational kinetics
Mesh:
Substances:
Year: 2017 PMID: 28865429 PMCID: PMC5581930 DOI: 10.1186/s12859-017-1793-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
HIVE-CUT database size and statistics
| Measure | GenBank | RefSeq | Total |
|---|---|---|---|
| Number of tables | 781,595 | 73,817 | 855,412 |
| Number of species | 665,044 | 37,904 | 689,420 |
| Genomic tables | 353,423 | 73,553 | 426,976 |
| Mitochondrial tables | 316,820 | 220 | 317,040 |
| All plastid tables | 111,352 | 44 | 111,396 |
| Total number of sequences | 34,885,329 | 253,803,831 | 288,689,160 |
This table contains statistics on the data in the database. While the GenBank division contains a much larger number of tables, the number of sequences in each table on average is much higher in RefSeq. The structure of RefSeq assemblies makes them a better representation of genomic codon usage for an organism when available. The HIVE-CUTs database contains substantially more entries than other codon usage databases
Fig. 1HIVE Platform [36]. A client process submits the information request from the HTML form or web application into the HIVE server; this request is queued for execution and it is computed inside the distributed environment. The front end monitors the status of the request and once the computation is finished, data is retrieved and visualizations are prepared to be sent to the client’s web page
Fig. 2Screenshot of HIVE-CUTs webpage with Homo sapiens results. Results include codon usage frequencies per 1000 codons as a plain text table (top left) and graph (bottom), in the default order specified by NCBI’s standard genetic code definition. The GC frequency in the genome and at each codon position is also presented in a graph (top right). The help panel is included (right)
Fig. 3Screenshots of HIVE-CUTs webpage with Candida albicans, Saccharomyces cerevisiae, and Aspergillus fumigatus results. a Taxonomy tree showing the evolutionary relationship between the species. b The GC frequency in the genome and at each position of the codon plotted for all three species for comparison. c Codon frequencies per 1000 codons plotted for all three species
Fig. 4Differences in codon optimization based on the HIVE-CUT and the Kazusa codon usage tables. The HIVE-CUT and the Kazusa codon usage tables were entered in the codon optimization algorithm ATGme to determine the number of suboptimal codons [39]. The Venn diagram shows how many codons were determined to be sub-optimal in the human coagulation factor IX gene for expression in CHO (Cricetulus griseus) cells. The codon usage tables used appear in Additional file 2
Fig. 5Rare codon cluster distribution based on the HIVE-CUT and the Kazusa codon usage tables. The %MinMax algorithm [40] was implemented to generate results for the interferon beta-1b gene sequence of Homo sapiens and Gorilla gorilla gorilla. The human and gorilla proteins have similar amino acid sequences and show similar results with the HIVE-CUT; however, highly divergent results were observed with Kazusa CUTs. The codon usage tables for these species used in the calculation of the translation rate appear in Additional file 3