| Literature DB >> 24685889 |
Abstract
Recent studies of geothermally heated aquatic ecosystems have found widely divergent viruses with unusual morphotypes. Archaeal viruses isolated from these hot habitats usually have double-stranded DNA genomes, linear or circular, and can infect members of the Archaea domain. In this study, the synonymous codon usage bias (SCUB) and dinucleotide composition in the available complete archaeal virus genome sequences have been investigated. It was found that there is a significant variation in SCUB among different Archaeal virus species, which is mainly determined by the base composition. The outcome of correspondence analysis (COA) and Spearman׳s rank correlation analysis shows that codon usage of selected archaeal virus genes depends mainly on GC richness of genome, and the gene׳s function, albeit with smaller effects, also contributes to codon usage in this virus. Furthermore, this investigation reveals that aromaticity of each protein is also critical in affecting SCUB of these viral genes although it was less important than that of the mutational bias. Especially, mutational pressure may influence SCUB in SIRV1, SIRV2, ARV1, AFV1, and PhiCh1 viruses, whereas translational selection could play a leading role in HRPV1׳s SCUB. These conclusions not only can offer an insight into the codon usage biases of archaeal virus and subsequently the possible relationship between archaeal viruses and their host, but also may help in understanding the evolution of archaeal viruses and their gene classification, and more helpful to explore the origin of life and the evolution of biology.Entities:
Keywords: Evolution; Gene function; Hierarchical cluster analysis; Mutational bias
Mesh:
Substances:
Year: 2014 PMID: 24685889 PMCID: PMC7094158 DOI: 10.1016/j.jtbi.2014.03.022
Source DB: PubMed Journal: J Theor Biol ISSN: 0022-5193 Impact factor: 2.691
Eleven complete genome sequences of Archaeal virus under study.
| I | Sulfolobus (spindle-shaped) virus Kamchatka 1 | SSVK1 | dsDNA | Sulfolobus [Crenarchaeota] | 1 | 17,385 | 38 | ||
| II | Sulfolobus (spindle-shaped) virus 2 | SSV2 | dsDNA | Sulfolobus [Crenarchaeota] | 1 | 14,796 | 38 | ||
| III | Sulfolobus (spindle-shaped) virus 1 | SSV1 | dsDNA | Sulfolobus [Crenarchaeota] | 1 | 15,465 | 39 | ||
| IV | Sulfolobus (spindle-shaped) virus Ragged Hills | SSVRH | dsDNA | Sulfolobus [Crenarchaeota] | 1 | 16,473 | 37 | ||
| V | Sulfolobus islandicus rod-shaped virus 1 | SIRV1 | dsDNA | Sulfolobus [Crenarchaeota] | 2 | 32,308 | 25 | ||
| VI | Sulfolobus islandicus rod-shaped virus 2 | SIRV2 | dsDNA | Sulfolobus [Crenarchaeota] | 2 | 35,450 | 25 | ||
| VII | Sulfolobus spindle-shaped virus 4 | SSSV4 | dsDNA | Sulfolobus [Crenarchaeota] | 1 | 15,135 | 38 | ||
| VIII | Acidianus rod-shaped virus 1 | ARV1 | dsDNA | Acidianus [Crenarchaeota] | 1 | 24,655 | 39 | ||
| IX | Acidianus filamentous virus 1 | AFV1 | dsDNA | Acidianus [Crenarchaeota] | 1 | 20,869 | 36 | ||
| X | Halorubrum pleomorphic virus 1 | HRPV1 | ssDNA | Halorubrum [Euryarchaeota] | 3 | 7048 | 54 | ||
| XI | Natrialba phage PhiCh1 | PhiCh1 | dsDNA | Natrialba [Euryarchaeota] | 3 | 58,498 | 61 |
Note: GSN: genome serial number.
Host lineage: Archaea, Crenarchaeota, Thermoprotei, Sulfolobales, Sulfolobaceae, Sulfolobus.
Archaea, Crenarchaeota, Thermoprotei, Sulfolobales, Sulfolobaceae, Acidianus.
Archaea, Euryarchaeota, Halobacteria, Halobacteriales, Halobacteriaceae, Halorubrum.
Archaea, Euryarchaeota, Halobacteria, Halobacteriales, Halobacteriaceae, Natrialba.
Selected genes which have certain matches in public sequence database.
| 1 | I | 58.73 | 0.361 | 0.367 | 0.246892 | 0.178836 | 0.17316 | f | 2 | VI | 37.62 | 0.183 | 0.358 | 0.64488 | −0.24913 | 0.104478 | i | ||
| 1 | I | 47.89 | 0.439 | 0.408 | 0.192363 | 0.360043 | 0.088235 | g | 1 | VII | 58.31 | 0.434 | 0.377 | −0.02827 | 0.083677 | 0.065421 | h | ||
| 1 | I | 61 | 0.446 | 0.415 | 0.058542 | 0.478179 | 0.065041 | h | 1 | VII | 55.91 | 0.457 | 0.413 | 0.073852 | 0.219034 | 0.137339 | f | ||
| 1 | II | 57.57 | 0.489 | 0.415 | 0.100366 | 0.181638 | 0.141631 | f | 1 | VII | 59.74 | 0.447 | 0.416 | 0.127912 | 0.191922 | 0.087379 | g | ||
| 1 | II | 54.22 | 0.405 | 0.403 | 0.303551 | 0.142189 | 0.097561 | g | 1 | VIII | 61 | 0.445 | 0.422 | 0.009764 | −0.04844 | 0.10628 | g | ||
| 1 | II | 60 | 0.459 | 0.393 | −0.01397 | 0.279834 | 0.071429 | h | 1 | IX | 38.42 | 0.327 | 0.381 | 0.372225 | −0.27826 | 0.107623 | g | ||
| 1 | III | 52.09 | 0.39 | 0.378 | 0.364101 | 0.206763 | 0.111554 | f | 3 | X | 46.01 | 0.546 | 0.548 | −0.32849 | −0.07058 | 0.046647 | k | ||
| 1 | IV | 46.01 | 0.329 | 0.368 | 0.422873 | 0.128048 | 0.125506 | f | 3 | XI | 35.83 | 0.86 | 0.63 | −0.96386 | −0.11877 | 0.08137 | l | ||
| 1 | IV | 52.93 | 0.399 | 0.389 | 0.301396 | 0.147137 | 0.097561 | g | 3 | XI | 37.08 | 0.833 | 0.625 | −0.85322 | −0.16631 | 0.058594 | m | ||
| 2 | V | 36.11 | 0.083 | 0.214 | 0.740341 | −0.30574 | 0.149758 | g | 3 | XI | 33.7 | 0.883 | 0.625 | −1.01563 | −0.2054 | 0.048583 | n | ||
| 2 | V | 45.39 | 0.229 | 0.376 | 0.592535 | 0.071706 | 0.104478 | i | 3 | XI | 38.28 | 0.857 | 0.641 | −0.85399 | −0.11729 | 0.081272 | o | ||
| 2 | V | 32.45 | 0.15 | 0.264 | 0.645066 | −0.56159 | 0.132231 | j | 3 | XI | 40.96 | 0.784 | 0.629 | −0.79422 | −0.10627 | 0.082589 | p | ||
| 2 | VI | 31.77 | 0.084 | 0.219 | 0.754924 | −0.35846 | 0.149758 | g | 3 | XI | 34.71 | 0.86 | 0.627 | −1.00412 | −0.00395 | 0.081818 | q |
Note: SN: sequence number; GSN: genome serial number; ENC: effective number of codon; GA: Gene Bank annotation.
fDistant but significant similarity to bacterial DnaA, a multifunctional DNA binding protein (replication initiation, transcription regulation).
gPutative RecB family exonuclease usually found in association with Clustered regularly interspaced short palindromic repeats (CRISP).
hPutative helix-turn-helix transcription protein. Part of the early transcription unit.
iMajor structural protein. Based on virion structure and high isoelectric point, these proteins likely interact directly with DNA.
jSimilar to archaeal holliday junction resolvases.
kPutative ATPase.
lPutative portal protein.
mContains ATP-binding motif.
nPutative proliferation cellular nuclear antigene.
oPutative C5-cytosine methyltransferase.
pPutative N4-cytosine methyltransferase.
qPutative three transmembrane helices protein.
The frequency of G+C at the third synonymously variable coding position.
The frequency of G+C of this gene.
The first axis values of each gene in COA.
The second axis values of each gene in COA.
The aromaticity value of each protein.
Number of codons with RSCU>1 in for each amino acid in different groups.
| Phe | Leu | Ile | Val | Ser | Pro | Thr | Ala | Tyr | His | Gln | Asn | Lys | Asp | Glu | Cys | Arg | Gly | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Group 1 | A- or U-ended | 8 | 17 | 12 | 16 | 25 | 12 | 21 | 17 | 11 | 7 | 6 | 4 | 8 | 11 | 10 | 4 | 17 | 12 |
| C- or G-ended | 5 | 18 | 4 | 6 | 9 | 10 | 6 | 6 | 2 | 1 | 4 | 10 | 4 | 2 | 2 | 0 | 15 | 13 | |
| Group 2 | A- or U-ended | 5 | 10 | 5 | 8 | 9 | 9 | 9 | 8 | 5 | 3 | 4 | 5 | 5 | 5 | 5 | 3 | 5 | 9 |
| C- or G-ended | 0 | 2 | 0 | 1 | 3 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
| Group 3 | A- or U-ended | 0 | 0 | 1 | 2 | 2 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 1 | 4 | 1 |
| C- or G-ended | 7 | 13 | 7 | 7 | 17 | 12 | 11 | 13 | 7 | 7 | 6 | 7 | 5 | 7 | 7 | 6 | 13 | 9 |
Fig. 1A plot of value of the first and second axis in COA. The first axis accounts for 48.81% of all variation among ORFs, which is much bigger than other axes (8.76%, 7.16% and 5.62%).
Fig. 2Effective number of codons used in each ORF plotted against the GC3s. The curve represents the relationship between GC3s and ENC in the absence of selection. The box indicates gene 20 in genome X.
Fig. 3Effective number of codons used in each ORF plotted against the axis1 values in COA. The box indicates gene 20 in genome X.
Relative abundance of the 16 dinucleotides in three gene groups.
| Range | 1.316−0.891 | 1.25−0.541 | 1.044−0.579 | 1.306−0.84 | 1.595−0.753 | 1.707−0.308 | 1.415−0.643 | 1.344−0.348 | |
| Mean±S.D | 1.125±0.124 | 0.912±0.165 | 0.91±0.116 | 1.053±0.136 | 1.169±0.231 | 0.859±0.378 | 1.014±0.218 | 0.926±0.244 | |
| Range | 1.165−1.064 | 1.441−0.652 | 1.094−0.64 | 1.481−0.979 | 1.12−0.824 | 1.292−0.867 | 1.395−1.103 | 0.715−0.154 | |
| Mean±S.D | 1.124±0.041 | 1.123±0.351 | 0.842±0.17 | 1.203±0.187 | 0.923±0.114 | 1.119±0.192 | 1.262±0.124 | 0.396±0.212 | |
| Range | 1.033−0.545 | 1.905−1.397 | 0.569−0.15 | 1.259−0.636 | 1.438−0.912 | 0.854−0.679 | 1.002−0.569 | 1.439−1.152 | |
| Mean±S.D | 0.82±0.196 | 1.678±0.187 | 0.351±0.161 | 0.891±0.206 | 1.082±0.182 | 0.765±0.059 | 0.796±0.147 | 1.323±0.103 | |
| Range | 1.192−0.797 | 1.369−0.625 | 1.213−0.895 | 1.149−0.784 | 1.087−0.677 | 1.507−0.81 | 1.196−0.879 | 1.261−0.854 | |
| Mean±S.D | 0.939±0.103 | 1.093±0.181 | 1.019±0.092 | 0.963±0.111 | 0.829±0.13 | 1.091±0.205 | 1.048±0.086 | 1.051±0.096 | |
| Range | 1.032−0.825 | 0.869−0.607 | 1.192−1.021 | 1.1−0.894 | 1.095−0.732 | 1.574−0.798 | 1.258−0.616 | 1.142−0.739 | |
| Mean±S.D | 0.953±0.078 | 0.729±0.119 | 1.081±0.066 | 1.04±0.085 | 0.929±0.171 | 1.293±0.342 | 0.952±0.304 | 0.929±0.181 | |
| Range | 1.822−0.861 | 1.13−1.02 | 1.416−0.441 | 1.031−0.589 | 1.045−0.834 | 0.896−0.763 | 1.814−1.457 | 0.894−0.656 | |
| Mean±S.D | 1.17±0.338 | 1.079±0.044 | 0.878±0.294 | 0.871±0.157 | 0.938±0.073 | 0.82±0.046 | 1.619±0.107 | 0.798±0.08 | |
The range of three gene groups׳ relative dinucleotide ratios.
Mean values of three gene groups׳ relative dinucleotide ratios±S.D.
Summary of correlation analysis between the first two axes in COA and sixteen dinucleotides and intercodon dinucleotides in the examined viruses.
| 0.543 | −0.607 | 0.586 | 0.474 | −0.323 | 0.621 | 0.602 | −0.824 | −0.113 | −0.32 | 0.391 | 0.459 | −0.049 | 0.451 | −0.566 | 0.307 | ||
| 0.004 | 0.001 | 0.002 | 0.014 | 0.107 | 0.001 | 0.001 | <0.001 | 0.582 | 0.111 | 0.048 | 0.018 | 0.813 | 0.021 | 0.003 | 0.127 | ||
| 0.238 | −0.48 | 0.587 | −0.232 | 0.551 | −0.331 | −0.352 | 0.179 | −0.249 | 0.356 | 0.124 | −0.073 | −0.486 | 0.24 | −0.28 | 0.571 | ||
| 0.241 | 0.013 | 0.002 | 0.254 | 0.004 | 0.099 | 0.077 | 0.38 | 0.221 | 0.074 | 0.546 | 0.724 | 0.012 | 0.238 | 0.166 | 0.002 | ||
| 0.683 | 0.322 | 0.636 | 0.711 | −0.538 | −0.094 | −0.581 | −0.739 | 0.805 | 0.329 | 0.578 | 0.851 | −0.544 | −0.013 | −0.751 | −0.324 | ||
| <0.001 | 0.109 | <0.001 | <0.001 | 0.005 | 0.648 | 0.002 | <0.001 | <0.001 | 0.101 | 0.002 | <0.001 | 0.004 | 0.951 | <0.001 | 0.106 | ||
| 0.124 | −0.293 | 0.488 | −0.147 | 0.484 | 0.192 | 0.082 | 0.011 | −0.144 | 0.454 | 0.079 | −0.183 | 0.199 | 0.238 | 0.321 | 0.701 | ||
| 0.546 | 0.146 | 0.011 | 0.473 | 0.012 | 0.348 | 0.689 | 0.959 | 0.483 | 0.020 | 0.701 | 0.371 | 0.329 | 0.241 | 0.110 | <0.001 | ||
Note: The vertical lines indicate the boundary between codons.
P-value≤0.05.
P-value≤0.01.
Fig. 4Dendroid chart of the cluster result of the 26 archaeal virus genes under study based on the hierarchical cluster method.