| Literature DB >> 29163404 |
Shahana S Malik1, Syeda Azem-E-Zahra1, Kyung Mo Kim2, Gustavo Caetano-Anollés3, Arshan Nasir1,3.
Abstract
Viruses can be classified into archaeoviruses, bacterioviruses, and eukaryoviruses according to the taxonomy of the infected host. The host-constrained perception of viruses implies preference of genetic exchange between viruses and cellular organisms of their host superkingdoms and viral origins from host cells either via escape or reduction. However, viruses frequently establish non-lytic interactions with organisms and endogenize into the genomes of bacterial endosymbionts that reside in eukaryotic cells. Such interactions create opportunities for genetic exchange between viruses and organisms of non-host superkingdoms. Here, we take an atypical approach to revisit virus-cell interactions by first identifying protein fold structures in the proteomes of archaeoviruses, bacterioviruses, and eukaryoviruses and second by tracing their spread in the proteomes of superkingdoms Archaea, Bacteria, and Eukarya. The exercise quantified protein structural homologies between viruses and organisms of their host and non-host superkingdoms and revealed likely candidates for virus-to-cell and cell-to-virus gene transfers. Unexpected lifestyle-driven genetic affiliations between bacterioviruses and Eukarya and eukaryoviruses and Bacteria were also predicted in addition to a large cohort of protein folds that were universally shared by viral and cellular proteomes and virus-specific protein folds not detected in cellular proteomes. These protein folds provide unique insights into viral origins and evolution that are generally difficult to recover with traditional sequence alignment-dependent evolutionary analyses owing to the fast mutation rates of viral gene sequences.Entities:
Keywords: comparative genomics; fold superfamily; horizontal gene transfer; protein structure; virus evolution; virus host
Year: 2017 PMID: 29163404 PMCID: PMC5671483 DOI: 10.3389/fmicb.2017.02110
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Demonstration of virus-to-cell and cell-to-virus HGT events. Ten genomes are displayed as colored closed disks each for Archaea (black), Bacteria (blue), Eukarya (green), and viruses. Seven out of 10 viral genomes encode different virus hallmark FSFs (with incidence represented by different shades of red) such as those involved in virion synthesis and capsid assembly. If any of these virus-hallmark FSFs is detected in no more than 1/10 cellular genomes (real f-values are even lower), the event is determined to be virus-to-cell HGT. In turn, any of the cellular FSFs that are widespread in cells (i.e., present in 9/10 cellular genomes) are detected in a viral genome, that event is determined to be cell-to-virus HGT.
Figure 2Sharing of protein structural domains between viral and cellular proteomes. The Venn diagrams illustrate the number of FSFs detected in the proteomes of archaeoviruses, bacterioviruses, and eukaryoviruses and their distributions in the proteomes of superkingdoms Archaea (A), Bacteria (B), and Eukarya (E). n = total number of viral proteomes, m = total number of FSFs detected in viral proteomes. V represents virus-specific FSFs (Table 2).
Virus specific FSFs (VSFs).
| 158974 | b.170.1 | WSSV envelope protein-like | ||
| 88648 | b.121.6 | Group I dsDNA viruses | ||
| 101089 | a.8.5 | Phosphoprotein XD domain | ||
| 69070 | a.150.1 | Anti-sigma factor AsiA | ||
| 89433 | b.127.1 | Baseplate structural protein gp8 | ||
| 160099 | d.346.1 | SARS Nsp1-like | ||
| 89428 | b.126.1 | Adsorption protein p2 | ||
| 143076 | d.302.1 | Coronavirus NSP8-like | ||
| 56502 | d.172.1 | gp120 core | ||
| 55671 | d.102.1 | Regulatory factor Nef | ||
| 56983 | f.10.1 | Viral glycoprotein, central and dimerisation domains | ||
| 50012 | b.31.1 | EV matrix protein | ||
| 118208 | e.58.1 | Viral ssDNA binding protein | ||
| 54957 | d.58.8 | Viral DNA-binding domain | ||
| 48493 | a.120.1 | gene 59 helicase assembly protein | ||
| 101816 | b.140.1 | Replicase NSP9 | ||
| 48145 | a.95.1 | Influenza virus matrix protein M1 | ||
| 140506 | a.30.8 | FHV B2 protein-like | ||
| 161240 | g.92.1 | T-antigen specific domain-like | ||
| 69922 | f.12.1 | Head and neck region of the ectodomain of NDV fusion glycoprotein | ||
| 101156 | a.30.3 | Nonstructural protein ns2, Nep, M1-binding domain | ||
| 143021 | d.299.1 | Ns1 effector domain-like | ||
| 49818 | b.19.1 | Viral protein domain | ||
| 75347 | d.13.2 | Rotavirus NSP2 fragment, C-terminal domain | ||
| 48345 | a.115.1 | A virus capsid protein alpha-helical domain | ||
| 141666 | b.164.1 | 'SARS ORF9b-like | ||
| 82046 | b.116.1 | Viral chemokine binding protein m3 | ||
| 56558 | d.182.1 | Baseplate structural protein gp11 | ||
| 103145 | d.255.1 | Tombusvirus P19 core protein, VP19 | ||
| 160892 | d.378.1 | Phosphoprotein oligomerization domain-like | ||
| 103068 | d.254.1 | Nucleocapsid protein dimerization domain | ||
| 51289 | b.85.5 | Tlp20, baculovirus telokin-like protein | ||
| 75574 | d.216.1 | Rotavirus NSP2 fragment, N-terminal domain | ||
| 49894 | b.28.1 | Baculovirus p35 protein | ||
| 161003 | e.75.1 | flu NP-like | ||
| 110304 | b.148.1 | Coronavirus RNA-binding domain | ||
| 48045 | a.84.1 | Scaffolding protein gpD of bacteriophage procapsid | ||
| 58030 | h.1.13 | Rotavirus nonstructural proteins | ||
| 69652 | d.199.1 | DNA-binding C-terminal domain of the transcription factor MotA | ||
| 58034 | h.1.14 | Multimerization domain of the phosphoprotein from sendai virus | ||
| 55064 | d.58.27 | Translational regulator protein regA | ||
| 50176 | b.37.1 | N-terminal domains of the minor coat protein g3p | ||
| 118173 | d.293.1 | Phosphoprotein M1, C-terminal domain | ||
| 47724 | a.54.1 | Domain of early E2A DNA-binding protein, ADDBP | ||
| 57917 | g.51.1 | Zn-binding domains of ADDBP | ||
| 143587 | d.318.1 | SARS receptor-binding domain-like | ||
| 75404 | d.213.1 | VSV matrix protein | ||
| 160957 | e.69.1 | Poly(A) polymerase catalytic subunit-like | ||
| 140367 | a.8.9 | Coronavirus NSP7-like | ||
| 160453 | d.361.1 | PB2 C-terminal domain-like | ||
| 56548 | d.180.1 | Conserved core of transcriptional regulatory protein vp16 | ||
| 49889 | b.27.1 | Soluble secreted chemokine inhibitor, VCCI | ||
| 144251 | g.87.1 | Viral leader polypeptide zinc finger | ||
| 89043 | a.178.1 | Soluble domain of poliovirus core protein 3a | ||
| 144246 | g.86.1 | Coronavirus NSP10-like | ||
| 47852 | a.62.1 | Hepatitis B viral capsid (hbcag) | ||
| 69903 | e.34.1 | NSP3 homodimer | ||
| 159936 | d.15.14 | NSP3A-like | ||
| 69908 | e.35.1 | Membrane penetration protein mu1 | ||
| 101257 | a.190.1 | Flavivirus capsid protein C | ||
| 111379 | f.47.1 | VP4 membrane interaction domain | ||
| 90246 | h.1.24 | Head morphogenesis protein gp7 | ||
| 57647 | g.34.1 | HIV-1 VPU cytoplasmic domain | ||
| 117066 | b.1.24 | Accessory protein X4 (ORF8, ORF7a) | ||
| 51332 | b.91.1 | E2 regulatory, transactivation domain |
FSFs are identified both by SCOP numeric IDs and alpha-numeric concise classification strings (ccs). See Nasir and Caetano-Anollés (.
Virus-Host FSF sharing.
| 109801 | a.30.5 | Hypothetical protein D-63 | 0.0082 | 0.0000 | 0.0000 | 0.0968 | 0.0000 | 0.0000 |
| 160570 | d.368.1 | YonK-like | 0.0000 | 0.0018 | 0.0000 | 0.0000 | 0.0008 | 0.0000 |
| 64210 | d.186.1 | Head-to-tail joining protein W, gpW | 0.0000 | 0.0170 | 0.0000 | 0.0000 | 0.0123 | 0.0000 |
| 159865 | d.186.2 | XkdW-like | 0.0000 | 0.0054 | 0.0000 | 0.0000 | 0.0049 | 0.0000 |
| 54857 | d.57.1 | DNA damage-inducible protein DinI | 0.0000 | 0.0520 | 0.0000 | 0.0000 | 0.0090 | 0.0000 |
| 51327 | b.90.1 | Head-binding domain of phage P22 tailspike protein | 0.0000 | 0.0135 | 0.0000 | 0.0000 | 0.0074 | 0.0000 |
| 143749 | d.323.1 | Phage tail protein-like | 0.0000 | 0.0278 | 0.0000 | 0.0000 | 0.0098 | 0.0000 |
| 89064 | a.179.1 | Replisome organizer (g39p helicase loader/inhibitor protein) | 0.0000 | 0.0009 | 0.0000 | 0.0000 | 0.0016 | 0.0000 |
| 54328 | d.15.5 | Staphylokinase/streptokinase | 0.0000 | 0.0036 | 0.0000 | 0.0000 | 0.0041 | 0.0000 |
| 56826 | e.27.1 | Upper collar protein gp10 (connector protein) | 0.0000 | 0.0009 | 0.0000 | 0.0000 | 0.0147 | 0.0000 |
| 46575 | a.237.1 | DNA polymerase III theta subunit-like | 0.0000 | 0.0493 | 0.0000 | 0.0000 | 0.0025 | 0.0000 |
| 140919 | a.263.1 | DNA terminal protein | 0.0000 | 0.0009 | 0.0000 | 0.0000 | 0.0025 | 0.0000 |
| 159871 | d.230.6 | YdgH-like | 0.0000 | 0.0502 | 0.0000 | 0.0000 | 0.0016 | 0.0000 |
| 68918 | a.140.4 | Recombination endonuclease VII, C-terminal and dimerization domains | 0.0000 | 0.0009 | 0.0000 | 0.0000 | 0.0311 | 0.0000 |
| 160582 | d.100.2 | MbtH-like | 0.0000 | 0.1623 | 0.0000 | 0.0000 | 0.0008 | 0.0000 |
| 141658 | b.163.1 | Bacteriophage trimeric proteins domain | 0.0000 | 0.0027 | 0.0000 | 0.0000 | 0.0139 | 0.0000 |
| 51274 | b.85.2 | Head decoration protein D (gpD, major capsid protein D) | 0.0000 | 0.0072 | 0.0000 | 0.0000 | 0.0090 | 0.0000 |
| 58046 | h.1.17 | Fibritin | 0.0000 | 0.0009 | 0.0000 | 0.0000 | 0.0417 | 0.0000 |
| 58059 | h.2.1 | Tetramerization domain of the Mnt repressor | 0.0000 | 0.0027 | 0.0000 | 0.0000 | 0.0041 | 0.0000 |
| 50017 | b.32.1 | gp9 | 0.0000 | 0.0009 | 0.0000 | 0.0000 | 0.0581 | 0.0000 |
| 58091 | h.4.2 | Clostridium neurotoxins, “coiled-coil” domain | 0.0000 | 0.0018 | 0.0000 | 0.0000 | 0.0008 | 0.0000 |
| 57987 | h.1.4 | Inovirus (filamentous phage) major coat protein | 0.0000 | 0.0099 | 0.0000 | 0.0000 | 0.0074 | 0.0000 |
| 101059 | a.159.3 | B-form DNA mimic Ocr | 0.0000 | 0.0009 | 0.0000 | 0.0000 | 0.0123 | 0.0000 |
| 158668 | a.285.1 | MtlR-like | 0.0000 | 0.0753 | 0.0000 | 0.0000 | 0.0008 | 0.0000 |
| 103370 | d.262.1 | NinB | 0.0000 | 0.0386 | 0.0000 | 0.0000 | 0.0368 | 0.0000 |
| 118010 | d.64.2 | TM1457-like | 0.0000 | 0.2161 | 0.0000 | 0.0000 | 0.0057 | 0.0000 |
| 48657 | a.136.1 | FinO-like | 0.0000 | 0.1686 | 0.0000 | 0.0000 | 0.0033 | 0.0000 |
| 50610 | b.48.1 | mu transposase, C-terminal domain | 0.0000 | 0.0700 | 0.0000 | 0.0000 | 0.0139 | 0.0000 |
| 47681 | a.49.1 | C-terminal domain of B transposition protein | 0.0000 | 0.0135 | 0.0000 | 0.0000 | 0.0025 | 0.0000 |
| 58069 | h.3.2 | Virus ectodomain | 0.0000 | 0.0000 | 0.0757 | 0.0000 | 0.0000 | 0.0362 |
| 90229 | g.66.1 | CCCH zinc finger | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0074 |
| 101912 | b.69.12 | Sema domain | 0.0000 | 0.0000 | 0.3211 | 0.0000 | 0.0000 | 0.0060 |
| 57567 | g.22.1 | Serine protease inhibitors | 0.0000 | 0.0000 | 0.3316 | 0.0000 | 0.0000 | 0.0023 |
| 54117 | d.9.1 | Interleukin 8-like chemokines | 0.0000 | 0.0000 | 0.1540 | 0.0000 | 0.0000 | 0.0084 |
| 47836 | a.61.1 | Retroviral matrix proteins | 0.0000 | 0.0000 | 0.0366 | 0.0000 | 0.0000 | 0.0186 |
| 50353 | b.42.1 | Cytokine | 0.0000 | 0.0000 | 0.3264 | 0.0000 | 0.0000 | 0.0292 |
| 52087 | c.13.1 | CRAL/TRIO domain | 0.0000 | 0.0000 | 0.9948 | 0.0000 | 0.0000 | 0.0005 |
| 103417 | e.48.1 | Major capsid protein VP5 | 0.0000 | 0.0000 | 0.0026 | 0.0000 | 0.0000 | 0.0241 |
| 56994 | g.1.1 | Insulin-like | 0.0000 | 0.0000 | 0.3055 | 0.0000 | 0.0000 | 0.0009 |
| 57535 | g.18.1 | Complement control module/SCR domain | 0.0000 | 0.0000 | 0.3760 | 0.0000 | 0.0000 | 0.0125 |
| 57180 | g.3.8 | Cellulose-binding domain | 0.0000 | 0.0000 | 0.2846 | 0.0000 | 0.0000 | 0.0005 |
| 161008 | e.76.1 | Viral glycoprotein ectodomain-like | 0.0000 | 0.0000 | 0.0131 | 0.0000 | 0.0000 | 0.0390 |
| 54277 | d.15.2 | CAD & PB1 domains | 0.0000 | 0.0000 | 0.9530 | 0.0000 | 0.0000 | 0.0023 |
| 47195 | a.24.5 | TMV-like viral coat proteins | 0.0000 | 0.0000 | 0.0418 | 0.0000 | 0.0000 | 0.0190 |
| 82856 | e.42.1 | L-A virus major coat protein | 0.0000 | 0.0000 | 0.0104 | 0.0000 | 0.0000 | 0.0019 |
| 158235 | a.271.1 | SOCS box-like | 0.0000 | 0.0000 | 0.3159 | 0.0000 | 0.0000 | 0.0005 |
| 47943 | a.73.1 | Retrovirus capsid protein, N-terminal core domain | 0.0000 | 0.0000 | 0.0522 | 0.0000 | 0.0000 | 0.0190 |
| 47353 | a.28.3 | Retrovirus capsid dimerization domain-like | 0.0000 | 0.0000 | 0.1723 | 0.0000 | 0.0000 | 0.0125 |
| 101399 | a.206.1 | P40 nucleoprotein | 0.0000 | 0.0000 | 0.0104 | 0.0000 | 0.0000 | 0.0005 |
| 110132 | b.147.1 | BTV NS2-like ssRNA-binding domain | 0.0000 | 0.0000 | 0.0026 | 0.0000 | 0.0000 | 0.0046 |
| 49599 | b.8.1 | TRAF domain-like | 0.0000 | 0.0000 | 0.9974 | 0.0000 | 0.0000 | 0.0005 |
| 57302 | g.7.1 | Snake toxin-like | 0.0000 | 0.0000 | 0.3211 | 0.0000 | 0.0000 | 0.0005 |
| 50122 | b.34.7 | DNA-binding domain of retroviral integrase | 0.0000 | 0.0000 | 0.0235 | 0.0000 | 0.0000 | 0.0097 |
| 140809 | a.260.1 | Rhabdovirus nucleoprotein-like | 0.0000 | 0.0000 | 0.0183 | 0.0000 | 0.0000 | 0.0125 |
| 46919 | a.4.10 | N-terminal Zn binding domain of HIV integrase | 0.0000 | 0.0000 | 0.0261 | 0.0000 | 0.0000 | 0.0084 |
| 57924 | g.52.1 | Inhibitor of apoptosis (IAP) repeat | 0.0000 | 0.0000 | 0.7441 | 0.0000 | 0.0000 | 0.0376 |
| 57933 | g.53.1 | TAZ domain | 0.0000 | 0.0000 | 0.4700 | 0.0000 | 0.0000 | 0.0005 |
| 103575 | g.16.2 | Plexin repeat | 0.0000 | 0.0000 | 0.3316 | 0.0000 | 0.0000 | 0.0023 |
| 57059 | g.3.6 | omega toxin-like | 0.0000 | 0.0000 | 0.0444 | 0.0000 | 0.0000 | 0.0014 |
| 140586 | a.242.1 | Dcp2 domain-like | 0.0000 | 0.0000 | 0.9034 | 0.0000 | 0.0000 | 0.0005 |
| 57501 | g.17.1 | Cystine-knot cytokines | 0.0000 | 0.0000 | 0.3185 | 0.0000 | 0.0000 | 0.0046 |
| 69340 | b.80.5 | C-terminal domain of adenylylcyclase associated protein | 0.0000 | 0.0000 | 0.9765 | 0.0000 | 0.0000 | 0.0009 |
| 49830 | b.20.1 | ENV polyprotein, receptor-binding domain | 0.0000 | 0.0000 | 0.0313 | 0.0000 | 0.0000 | 0.0042 |
| 81382 | a.157.1 | Skp1 dimerisation domain-like | 0.0000 | 0.0000 | 0.9687 | 0.0000 | 0.0000 | 0.0032 |
FSFs shared exclusively between the proteomes of host superkingdoms, Archaea (A), Bacteria (B), and Eukarya (E), and the proteomes of their viruses, archaeoviruses (AV), bacterioviruses (BV), and eukaryoviruses (EV). FSFs are identified both by SCOP numeric IDs and alpha-numeric concise classification strings (ccs). FSF distribution (f-values, number of proteomes in a superkingdom or virus group encoding an FSF/total number of proteomes in that superkingdom or virus group) are also listed. FSF b.57.1 was also detected in eukaryoviruses in addition to Bacteria and bacterioviruses and FSFs b.121.2 and b.121.5 were also detected in bacterioviruses in addition Eukarya and eukaryoviruses possibly indicating genetic crosstalk or ancient ancestry (read text). These FSFs are highlighted in bold.
Figure 3The structural domain composition of the BE Venn group in bacterioviruses and eukaryoviruses. (A) The Venn diagram describes how the BE FSFs were shared between bacterioviruses and eukaryoviruses. (B) Boxplots display the distribution of f-values (number of proteomes in a superkingdom encoding an FSF divided by the total number of proteomes in that superkingdom) for BE FSFs unique to bacterioviruses, unique to eukaryoviruses, and common to both the bacterial and eukaryal proteomes (see also Table S5). P-values were calculated from two-sample Welch t-tests.
Figure 4Breakdown of the 489 FSFs detected in eukaryoviruses. Eukaryoviruses were divided into viruses of plants (included all plants, blue-green algae, and diatoms), metazoa (vertebrates and invertebrates), protozoa (animal-like protists), fungi, and a group that includes invertebrates and plants (IP). For each subgroup, bars indicate the percentage of FSFs present in one of the seven Venn groups listed on the right (see also Figure 2) and the percentage of virus-specific FSFs. Numbers on bars indicate actual count. n = total number of viral proteomes, m = total number of FSFs detected in viral proteomes.
Figure 5ABE FSFs are widespread in the proteomes of Archaea, Bacteria, and Eukarya. The f-value (number of proteomes in a superkingdom encoding an FSF / total number of proteomes in that superkingdom) distribution is plotted for the ABE Venn group of FSF domains for archaeoviruses, bacterioviruses, and eukaryoviruses. The three boxplots in each viral group describe FSF spread individually for Archaea (A), Bacteria (B), and Eukarya (E).
Figure 6Breakdown of pooled non-redundant 442 viral ABE FSFs (see text) into seven possible Venn groups for archaeoviruses, bacterioviruses, and eukaryoviruses. Numbers on bars indicate actual count (see also Table S6).