Literature DB >> 30660046

Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO).

Christian M Zmasek1, David M Knipe2, Philip E Pellett3, Richard H Scheuermann4.   

Abstract

We developed a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO) for the analysis of protein orthology by combining phylogenetic and protein domain-architecture information. Using DAIO, we performed a systematic study of the proteomes of all human Herpesviridae species to define Strict Ortholog Groups (SOGs). In addition to assessing the taxonomic distribution for each protein based on sequence similarity, we performed a protein domain-architecture analysis for every protein family and computationally inferred gene duplication events. While many herpesvirus proteins have evolved without any detectable gene duplications or domain rearrangements, numerous herpesvirus protein families do exhibit complex evolutionary histories. Some proteins acquired additional domains (e.g., DNA polymerase), whereas others show a combination of domain acquisition and gene duplication (e.g., betaherpesvirus US22 family), with possible functional implications. This novel classification system of SOGs for human Herpesviridae proteins is available through the Virus Pathogen Resource (ViPR, www.viprbrc.org).
Copyright © 2019. Published by Elsevier Inc.

Entities:  

Keywords:  Comparative genomics; Domain architecture; Evolution; Gene duplication; Herpesviridae; Nomenclature; Ortholog; Phylogenetics; Protein domain; Protein family

Mesh:

Substances:

Year:  2019        PMID: 30660046      PMCID: PMC6502252          DOI: 10.1016/j.virol.2019.01.005

Source DB:  PubMed          Journal:  Virology        ISSN: 0042-6822            Impact factor:   3.616


Introduction

Human herpesviruses

Herpesviruses comprise a large and diverse order (Herpesvirales) of double stranded DNA viruses that infect humans and a wide range of other hosts (Pellet and Roizman, 2007, Virus Taxonomy: The Classification and Nomenclature of Viruses The Online (10th) Report of the ICTV, 2017). Human diseases caused by herpesviruses range from vesicular rashes to cancer. The order Herpesvirales is subdivided into three families, including the Herpesviridae, which is further subdivided into three subfamilies, the Alpha-, Beta-, and Gammaherpesvirinae. Within subfamilies, groups of related herpesvirus species are classified into genera. The nine species of human herpesviruses are distributed across the three subfamilies and several genera ( Table 1); these viruses are the main focus of this work. Prior studies found that the Beta- and Gammaherpesvirinae are more closely related to each other than to Alphaherpesvirinae (Montague and Hutchison, 2000). In contrast to some other human viruses, the human herpesviruses have a long evolutionary history, with evidence suggesting that the primordial herpesvirus diverged into the Alpha-, Beta, and Gammaherpesvirinae approximately 180 million to 220 million years ago (McGeoch et al., 1995). Coupled with their genome complexity and the availability of numerous complete genome sequences, this deep evolutionary history makes herpesviruses a tractable and informative model to study virus genome evolution at the levels of gene duplication and protein domain rearrangement.
Table 1

Classification and properties of the human herpesviruses.

SubfamilyGenusSpeciesCommon nameGenome length (kb)RefSeq AccessionNumber of annotated proteinsa
Alphaherpesvirinae
SimplexvirusHuman alphaherpesvirus 1Herpes simplex 1 (HSV1)152NC_00180677
SimplexvirusHuman alphaherpesvirus 2Herpes simplex 2 (HSV2)155NC_00179877
VaricellovirusHuman alphaherpesvirus 3Varicella-zoster virus (VZV)125NC_00134873
Betaherpesvirinae
CytomegalovirusHuman betaherpesvirus 5Human cytomegalovirus (HCMV)236NC_006273169
RoseolovirusHuman betaherpesvirus 6AHuman herpesvirus 6A (HHV-6A)159NC_00166488
RoseolovirusHuman betaherpesvirus 6BHuman herpesvirus 6B (HHV-6B)162NC_000898104
RoseolovirusHuman betaherpesvirus 7Human herpesvirus 7 (HHV-7)153NC_00171686
Gammaherpesvirinae
LymphocryptovirusHuman gammaherpesvirus 4Epstein-Barr Virus (EBV)172NC_00760594
RhadinovirusHuman gammaherpesvirus 8Kaposi sarcoma-associated herpesvirus (KSHV); Human herpesvirus 8 (HHV-8)138NC_00933386

Protein numbers are based on CDS entries in the associated RefSeq.

Classification and properties of the human herpesviruses. Protein numbers are based on CDS entries in the associated RefSeq.

Phylogenomics

Homologs are genes that are evolutionarily related, regardless of the mechanism. Orthologs were defined by Fitch in 1970 as homologous genes in different species that diverged from a common ancestral gene by speciation. Genes that, either in the same or different species, diverged by a gene duplication have been termed paralogs (Fitch, 2000, Fitch, 1970). While the terms ortholog and paralog have no consistent functional implications (Jensen, 2001), orthologs are oftentimes considered more functionally similar than paralogs at the same level of sequence divergence. This has been termed the “ortholog conjecture”, which remains a topic of active research (Altenhoff et al., 2012, Chen and Zhang, 2012, Nehrt et al., 2011, Rogozin et al., 2014), due to its importance for computational sequence functional analysis (Eisen, 1998, Zmasek and Eddy, 2002) and the significance of gene duplications for biological evolution (Zhang, 2003). Orthologs (or groups/clusters of orthologs) have often been inferred by indirect methods based on (reciprocal) pairwise highest similarities [e.g. (Remm et al., 2001, Tatusov et al., 1997)]. In this work, we used explicit phylogenetic inference combined with comparison to a trusted species tree for orthology inference, as this approach is likely to yield more accurate results (Zmasek and Eddy, 2002, Zmasek and Eddy, 2001).

Protein domains and domain architectures

Many eukaryotic proteins, and by extension, proteins of eukaryotic viruses, are composed of multiple domains, components that can each have their own evolutionary history and functional implications. The architecture of a protein is a product of the ordered arrangement of its several domains and their overall tertiary structure. Evolutionarily, individual domains can combine with other partner domains, enabling formation of a vast number of domain combinations, even within the same species (Moore et al., 2008). Assembling multiple domains into a single protein creates a distinct entity that can be more than the sum of its constituent parts. The emergence of proteins with novel combinations of duplicated and then diverged domains is considered to be a major mechanism for rapid evolution of new functionality in eukaryotic genomes (Itoh et al., 2007, Peisajovich et al., 2010). It is especially important in the evolution of pathways, where novel linkages between existing domains may result in the rearrangement of pathways and their behaviors in the cell (Peisajovich et al., 2010). The modular structure of eukaryotic proteins provides a mechanism that enables evolutionarily-rapid differentiation and emergence of a multitude of novel protein functions from an initially limited array of functional domains. Proteins can gain (or lose) new domains via genome rearrangements, creating (or removing) domain combinations, in addition to modification of domains themselves by small-scale mutations (Patthy, 2003, Ye and Godzik, 2004). Here we present a systematic classification of proteins catalogued in the NCBI RefSeq entries for each of the nine human herpesviruses plus selected comparisons with homologs from non-human herpesviruses based on phylogenetic inferencing and domain architecture analysis using Domain-Architecture Aware Inference of Orthologs (DAIO). This analysis resulted in the classification of proteins into “Strict Ortholog Groups” (SOGs), in which all proteins are orthologous to each other (related by speciation events) and exhibit the same domain architecture. The SOG classification also enabled the development of an informative name convention for each SOG that includes information about the protein's function (if known) and a suffix indicating the taxonomic distribution of the protein. For example, an “aBG” suffix would indicate that proteins of this group are found in some (but not all) human Alphaherpesvirinae species (lowercase “a”), and all human Beta- and Gammaherpesvirinae species (uppercase “B” and “G”). Such suffixes allow for the quick understanding of presumed conserved protein function and minimal common genome across the Herpesviridae family. The SOG classification results have been made publicly available through the Virus Pathogen Resource (ViPR) (Pickett et al., 2012) at https://www.viprbrc.org.

Results and discussion

For this analysis, we developed a rational, phylogeny- and domain architecture-aware classification approach for human herpesvirus proteins, the Domain-architecture Aware Inference of Orthologs (DAIO) method, which produces Strict Ortholog Groups (SOGs) of proteins. Before we present genome-wide findings, we show results for a few instructive SOG examples, including protein groups that have evolved in a “simple” manner, recapitulating the Herpesviridae evolutionary tree without gene duplications or domain rearrangements, and protein groups in which domain rearrangements (domain gains) and/or gene duplications have occurred. Table 2 lists the 23 SOGs common to all nine human herpesviruses. For every SOG, a suggested name is provided, composed of a protein names and a suffix indicating the taxonomic distribution (A, B, G: present in all human members of the Alpha-, Beta-, Gammaherpesvirinae, respectively; a, b, g: present in some but not all human members of the Alpha-, Beta-, Gammaherpesvirinae, respectively). Gene names/symbols (a forward slash is either part of the accepted gene name or is used to separate multiple gene names) and Pfam domain architecture names are also included. The table is organized into three sections. The first section lists protein families that have apparently evolved without gene duplication or domain rearrangements [e.g., uracil DNA glycosylase and the capsid scaffolding protein protease (CSPP)]; the second section lists proteins that have evolved with domain rearrangements and or duplications [e.g., glycoprotein B (gB), DNA polymerase, and multifunctional regulator of expression proteins (mRE)], and the third section lists proteins that share some function (and even genome region) but have been formed from distantly or unrelated domains (e.g., gL, gN, and DNA polymerase processivity factor).
Table 2

Names of Herpesviridae Proteins Common to All 9 Herpesviruses Based on Strict Ortholog Groups.

Suggested NameAlpha
Beta
Gamma
DA (Pfam domains)
HSV-1/2
VZVCMVHHV-6AHHV-6BHHV-7EBVKSHV
UL/USOther
Uracil-DNA glycosidase_ABGUL2ORF59UL114U81U81U81BKRF3ORF46UDG
Helicase-primase ATPase subunit_ABGUL5ORF55UL105U77U77U77BBLF4ORF44Herpes_Helicase
Glycoprotein M_ABGUL10ORF50UL100U72U72U72BBRF3ORF39Herpes_glycop
Alkaline deoxyribonuclease_ABGUL12ORF48UL98U70U70U70BGLF5ORF37Herpes_alk_exo
Serine threonine protein kinase_ABGUL13ORF47UL97U69U69U69BGLF4ORF36UL97 Pfam domain (Beta)
Terminase_ABGUL15ORF42UL89U66/U60U66/U60U66/U60LMP2ORF29DNA_pack_N––DNA_pack_C
Tegument protein_ABGUL16ORF44UL94U65U65U65BGLF2ORF33Herpes_UL16
Capsid transport tegument protein_ABGUL17ORF43UL93U64U64U64BGLF1ORF32Herpes_UL17
Triplex dimer protein_ABGUL18VP23ORF41UL85U56U56U56BDLF1ORF26Herpes_V23
Major capsid protein_ABGUL19VP5/ICP5ORF40UL86U57U57U57BcLF1ORF25Herpes_MCP
Glycoprotein H_ABGUL22ORF37UL75U48U48U48BXLF2ORF22Herpes_glycop_H
UL24 Protein_ABGUL24ORF35UL76U49U49U49BXRF1ORF20Herpes_UL24
Portal capping protein_ABGUL25ORF34UL77U50U50U50BVRF1ORF19Herpes_UL25
Protease-scaffolding protein_ABGUL26ORF33UL80U53U53U53BVRF2ORF17Peptidase_S21
Terminase DNA binding subunit_ABGUL28ORF30UL56U40U40U40BALF3ORF7PRTP
Major DNA binding protein_ABGUL29ICP8ORF29UL57U41U41U41BALF2ORF6Viral_DNA_bp
Nuclear egress lamina protein_ABGUL31ORF27UL53U37U37U37BFLF2ORF69Herpes_UL31
Capsid transport nuclear protein_ABGUL32ORF26UL52U36U36U36BFLF1ORF68Herpes_env
Terminase binding protein_ABGUL33ORF25UL51U35U35U35BFRF1AORF67AHerpes_UL33
Nuclear egress membrane protein_ABGUL34ORF24UL50U34U34U34BFRF1ORF67Herpes_U34
Triplex monomer_ABGUL38VP19cORF20UL46U29U29U29BORF1ORF62Herpes_VP19C
Deoxyuridine 5′-triphosphate nucleotidohydrolase_ABGUL50ORF8UL72U45U45U45BLLF3ORF54dUTPase
DNA primase_ABGUL52ORF6UL70U43U43U43BSLF1ORF56Herpes_UL52
Portal protein_ABG.ABGUL6ORF54UL104U76U76U76LMP2ORF43Herpes_UL6
Encapsidation and egress protein_ABG.ABGUL7ORF53UL103U75U75U75BBRF2ORF42Herpes_UL7
Encapsidation and egress protein_ABG.gORF43Herpes_UL6––Herpes_UL7
Helicase primase subunit_ABG.ABgUL8ORF52UL102U74U74U74BBLF2/BBLF3ORF41Herpes_HEPA
Helicase primase subunit_ABG.gORF40Herpes_HEPA––Herpes_heli_pri
Glycoprotein B_ABG.AbGUL27ORF31U39U39U39BALF4ORF8Glycoprotein_B
Glycoprotein B_ABG.bUL55HCMVantigenic_N––Glycoprotein_B
DNA polymerase_ABG.aUL30DNA_pol_B_exo1––DNA_pol_B––DNAPolymera_Pol
DNA polymerase_ABG.aBGORF28UL54U38U38U38BALF5ORF9DNA_pol_B_exo1––DNA_pol_B
Large tegument protein_ABG.AUL36VP1–2ORF22Herpes_teg_N––Herpes_UL36
Large tegument protein_ABG.BGUL48U31U31U31BPLF1ORF64Herpes_teg_N
Ribonucleotide reductase large subunit_ABG.AGUL39ICP6ORF19BORF2ORF61Ribonuc_red_lgN––Ribonuc_red_lgC
Ribonucleotide reductase large subunit_ABG.BUL45UL28UL28UL28Ribonuc_red_lgC
Multifunctional regulator of expression_ABG.aUL54ICP27HHV− 1_VABD––Herpes_UL69
Multifunctional regulator of expression_ABG.aBGORF4UL69U42U42U42BSLF2/BMLF1ORF57Herpes_UL69
Cytoplasmic egress facilitator-1_AUL51ORF7Herpes_UL51
Cytoplasmic egress facilitator-1_BGU71U44U44U44BSRF1ORF55Herpes_U44
Cytoplasmic egress facilitator-2_AUL21ORF38Herpes_UL21
Cytoplasmic egress facilitator-2_BUL88U59U59U59Herpes_U59
Cytoplasmic egress facilitator-2_GBTRF1ORF23Herpes_BTRF1
Cytoplasmic egress tegument protein_AUL11ORF49UL11
Cytoplasmic egress tegument protein_CMVUL99
Cytoplasmic egress tegument protein_GBBLF1ORF38DUF2733
Cytoplasmic egress tegument protein_RU71U71U71
DNA polymerase processivity subunit_AUL42ORF16Herpes_UL42––Herpes_UL42
DNA polymerase processivity subunit_BUL44U27U27U27Herpes_PAP
DNA polymerase processivity subunit_GBMRF1ORF59Herpes_DNAp_acc
Tegument protein UL14_AUL14ORF46Herpes_UL14
Tegument protein UL14_BGUL95U67U67U67BGLF3ORF34Herpes_UL95
Glycoprotein L_A.aORF60Herpes_UL1
Glycoprotein L_A.SUL1Herpes_UL1––GlyL_C
Glycoprotein L_BUL115U82U82U82Cytomega_gL
Glycoprotein L_GBKRF2ORF47Phage_glycop_gL
Glycoprotein N_AUL49A
Glycoprotein N_BG.bUL73UL73_N––Herpes_UL73
Glycoprotein N_BG.BGUL73U46U46U46BLRF1ORF53Herpes_UL73
Glycoprotein N_aORF9AHerpes_UL49_5
Inner tegument protein UL37_AUL37ORF21Herpes_UL37_1
Inner tegument protein UL37_BGUL47U30U30U30BOLF1ORF63Herpes_U30
Small capsid protein_AUL35VP26ORF23Herpes_UL35
Small capsid protein_BUL48AU32U32U32HV_small_capsid
Small capsid protein_GBFRF3ORF65Herpes_capsid

Abbreviations.

ICP: infected cell protein.

VP: virion protein.

Names of Herpesviridae Proteins Common to All 9 Herpesviruses Based on Strict Ortholog Groups. Abbreviations. ICP: infected cell protein. VP: virion protein.

Uracil DNA glycosylase and capsid scaffolding protein protease: Evolution of a stable domain architecture without gene duplications

Uracil DNA glycosylases catalyze the first step – removal of the RNA base uracil from DNA – in base excision repair, the mechanism by which damaged bases in DNA are removed and replaced (Krusong et al., 2006). Uracil DNA glycosylases are found in eukaryotes, bacteria, and archaea, as well as in herpesviruses and poxviruses (Chen et al., 2002). Our phylogenomic analysis shows that for all nine human herpesviruses, uracil DNA glycosylase is well conserved and contains one Pfam domain, UDG (uracil DNA glycosylase superfamily). In addition, the gene tree for human herpesvirus uracil DNA glycosylases ( Fig. 1B) precisely recapitulates the herpesvirus species tree (Fig. 1A); therefore, this protein family can be inferred to have evolved from a single common ancestor and without any gene duplications or domain rearrangements (see Table 2 for virus-specific gene names).
Fig. 1

Proteins with conserved domain architectures that mirror the Herpesvirus species tree. (A) A current view of herpesvirus evolution. The human herpesvirus species tree is based on previous reports (McGeoch et al., 2000, McGeoch et al., 1995, Davison, 2010, Davison, 2002). (B) Maximum likelihood gene tree for uracil DNA glycosylase proteins based on an alignment for UDG Pfam domain amino acid sequences. (C) Maximum likelihood gene tree for capsid scaffolding protein proteases, based on Peptidase_S21 Pfam domain amino acid sequences. For the gene trees, bootstrap values are shown. Branch length distances are proportional to expected changes per site.

Proteins with conserved domain architectures that mirror the Herpesvirus species tree. (A) A current view of herpesvirus evolution. The human herpesvirus species tree is based on previous reports (McGeoch et al., 2000, McGeoch et al., 1995, Davison, 2010, Davison, 2002). (B) Maximum likelihood gene tree for uracil DNA glycosylase proteins based on an alignment for UDG Pfam domain amino acid sequences. (C) Maximum likelihood gene tree for capsid scaffolding protein proteases, based on Peptidase_S21 Pfam domain amino acid sequences. For the gene trees, bootstrap values are shown. Branch length distances are proportional to expected changes per site. Capsid scaffolding protein proteases are essential for herpesvirus capsid assembly and maturation, and have an essential serine protease activity (Liu and Roizman, 1993). These proteins contain one Pfam domain, Peptidase_S21. In contrast to uracil DNA glycosylases, currently available data indicate that protease-scaffolding proteins with a Peptidase_S21 domain are unique to Herpesvirales. Like uracil DNA glycosylases, CSPP evolved without domain architecture rearrangements or gene duplications (Fig. 1C, Table 2). Other examples of Herpesviridae genes that have evolved without any domain architecture rearrangements or gene duplications are listed in the first section of Table 2.

Molecular evolution of gB: A highly conserved protein required for viral fusion with a recent domain acquisition in one virus lineage

Herpesvirus virions have an envelope that consists of an outer lipid bilayer studded with 12 or more surface glycoproteins (originally defined in HSV). After virion glycoprotein engagement with cell surface receptors, the envelope fuses with the plasma membrane – a process which, for herpes simplex virus 1 (HSV-1), requires four of its 12 envelope glycoproteins, namely glycoproteins gB, gD, gH, and gL (Cai et al., 1988, Forrester et al., 1992, Ligas and Johnson, 1988, Roop et al., 1993, Spear and Longnecker, 2003). In contrast, for other herpesviruses, only glycoproteins gB, gH, and gL have been reported to be required for membrane fusion (AlHajri et al., 2017). gB and gH are highly conserved across all nine human herpesviruses (Table 2). A protein annotated as gL is also present in all nine human herpesviruses, yet its occurrences in members of the Alpha-, Beta- and Gammaherpesvirinae are homologous within, but not between subfamilies. gLs from different subfamilies contain unrelated protein domains (Pfam: Herpes_UL1, Cytomega_gL, and Phage_glycop_gL). gL is discussed in more detail below. Detailed phylogenetic analysis of the human herpesvirus gB family ( Fig. 2A), including proteins from selected non-human members of the Herpesviridae, shows a picture of a protein that has evolved without gene duplications (or, at the very least, duplicated genes have not been retained) and with nearly completely conserved domain architectures.
Fig. 2

Proteins in which an additional domain has been added during the course of evolution. (A) Maximum likelihood gene tree for glycoprotein B proteins based on an alignment for the main glycoprotein_B domain amino acid sequences. (B) Maximum likelihood gene tree for DNA polymerase proteins based on an alignment for DNA_pol_B_exo1––DNA_pol_B domain amino acid sequences. (C) Maximum likelihood gene tree for multifunctional regulator of expression proteins based on an alignment for Herpes_UL69 domain amino acid sequences. Bootstrap values larger than 50 are shown. Branch length distances are proportional to expected changes per site.

Proteins in which an additional domain has been added during the course of evolution. (A) Maximum likelihood gene tree for glycoprotein B proteins based on an alignment for the main glycoprotein_B domain amino acid sequences. (B) Maximum likelihood gene tree for DNA polymerase proteins based on an alignment for DNA_pol_B_exo1––DNA_pol_B domain amino acid sequences. (C) Maximum likelihood gene tree for multifunctional regulator of expression proteins based on an alignment for Herpes_UL69 domain amino acid sequences. Bootstrap values larger than 50 are shown. Branch length distances are proportional to expected changes per site. The one exception to this is that human cytomegalovirus (HCMV) glycoprotein B (gB) has a short region of about 40 amino acids near its N-terminus that comes in two forms that differ by approximately 50% at the amino acid level. This sequence variant was identified in HCMV strains isolated from Chinese patients (Shiu et al., 1994) and is identified in Pfam as “HCMVantigenic_N domain”. In our global hmmscan analysis (applying the same threshold of E = 10−6 for every Pfam domain) E-value support for presence of this domain in some strains is strong (E < 10–22) and matching over the entire Pfam model while other HCMV strains do not exhibit significant sequence similarity with this domain. It has been suggested that this domain polymorphism may be implicated in HCMV-induced immunopathogenesis, as well as in strain-specific behaviors, such as tissue-tropism and the ability to establish persistent or latent infections (Pignatelli et al., 2004). In our new systemic naming approach (see below) we term the SOG of the protein with HCMVantigenic_N domain “Glycoprotein B_ ABG.b”, whereas all other proteins fall into the “Glycoprotein B_ ABG.AbG” SOG.

Molecular evolution of DNA polymerase: A highly conserved protein with domain acquisition

All members of the Herpesviridae encode six conserved proteins that play essential roles at the replication fork during viral DNA replication: a single-strand DNA binding protein (major DNA binding protein), a DNA polymerase composed of two independently coded subunits (the catalytic DNA polymerase subunit and a DNA polymerase processivity factor encoded by three distantly related genes in members of the Alpha-, Beta-, and Gammaherpesvirinae, see below), and a three subunit helicase/primase complex (DNA replication helicase, DNA helicase primase complex associated protein, and DNA primase) (Pellet and Roizman, 2007). Our analysis shows that the catalytic DNA polymerase subunits of all members of the Herpesviridae contain two domains: an N-terminal DNA polymerase family B exonuclease domain, and a C-terminal polymerase domain from DNA polymerase family B (Fig. 2B). Cellular family B DNA polymerases are the main polymerases involved with nuclear DNA replication and repair in eukaryotes and prokaryotes, and include DNA polymerases II and B, and polymerases α, δ, and ε (Garcia-Diaz and Bebenek, 2007). Family B DNA polymerases are also found in other dsDNA viruses, such as the insect Ascoviridae, and members of the Iridoviridae (e.g., fish lymphocystis disease virus) and Phycoviridae (e.g., chlorella virus) (Villarreal and DeFilippis, 2000). In addition to these two large and ubiquitous domains, Simplexvirus (which include human simplex virus 1 and 2) and Mardivirus also possess a small C-terminal domain, called the DNA polymerase catalytic subunit Pol (DNAPolymera_Pol) domain in Pfam (Zuccola et al., 2000), and are longer by about 45 aa on average than DNA polymerase proteins from other Herpesviridae. According to currently available genomic data, DNAPolymera_Pol is found in members of the Simplexvirus genus of the Alphaherpesvirinae. While varicella-zoster virus (Human herpesvirus 3) and other members of the Varicellovirus genus of the Alphaherpesvirinae possesses DNA polymerases that also tend to be longer, similarity of these protein regions to the DNAPolymera_Pol domain is low, using the current Pfam model for DNAPolymera_Pol (Pfam version 31.0). The function of this third domain is to mediate interaction between DNA polymerase and its cognate processivity factor (Bridges et al., 2000, Loregian et al., 2000) based on the observation that a peptide corresponding to the 27 C-terminal amino acids of HSV-1 DNA polymerase has been shown to inhibit viral replication by disrupting the interaction between DNA polymerase and UL42 (Digard et al., 1995, Loregian et al., 1999). In this context, it is interesting to note that the DNA polymerase processivity factors are only distantly-related across the Alpha-, Beta-, and Gammaherpesvirinae (see below). It is therefore conceivable that the interactions of Beta-, and Gammaherpesvirinae DNA polymerase processivity factors with their corresponding DNA polymerases (which lack a DNAPolymera_Pol domain) is different in nature than for Alphaherpesvirinae. As for Varicellovirus it is unclear whether they possess a functional DNAPolymera_Pol domain, and a definitive answer will require similar biochemical assays as have been performed for HSV-1. Phylogenetic analysis of human herpesvirus DNA polymerase proteins, plus related proteins from selected mammalian herpesviruses, shows that, similar to the glycoprotein B family, DNA polymerases of the Herpesviride evolved without gene duplication. Nonetheless, in contrast to gB, DNA polymerases acquired a new domain early in Alphaherpesvirinae evolution. This domain might have been lost again, or underwent significant mutations, during Varicellovirus evolution. The presence of the longer domain in Varicelloviruses suggests that the longer domain emerged prior to the Varicellovirus/Simplexvirus split.

Evolution of viral multifunctional regulator of expression (mRE) proteins (homologs of HSV1 ICP27)

Multifunctional regulator of expression (mRE; also known as immediate-early protein IE63, infected cell protein 27, ICP27, and α27) is a protein with homologs in all human herpesviruses (for gene names see Table 2). Multifunctional regulator of expression is a regulatory protein that plays a role in the prevention of apoptosis during HSV1 infection (Aubert and Blaho, 1999). Multifunctional regulator of expression interacts directly with a number of proteins in performing its many roles. In particular, multifunctional regulator of expression protein contributes to host shut‐off by inhibiting pre‐mRNA splicing by interacting with essential splicing factors, termed SR proteins, and affecting their phosphorylation (Sciabica et al., 2003). Furthermore, the mRE protein has been shown to associate with cellular RNA polymerase II holoenzyme in a DNA- and RNA-independent manner and to recruit RNA polymerase II to viral transcription/replication sites (Dai-Ju et al., 2006, Zhou and Knipe, 2002). mRE also competes with some transport receptors, resulting in the inhibition of host pathways while supporting mRNA export factor-mediated transport of HSV-1 mRNAs (Malik et al., 2012). All of the multifunctional regulator of expression proteins analyzed here have a single copy of a Pfam “Herpesvirus transcriptional regulator family” (Herpes_UL69) domain that is specific to members of the Herpesviridae. In addition to the Herpes_UL69 domain, human Simplexvirus mRE have an additional N-terminal domain, the “Herpes viral adaptor-to-host cellular mRNA binding domain” (HHV-1_VABD) (Tunnicliffe et al., 2011). Besides human Simplexvirus, architectures with C-terminal HHV-1_VABD and N-terminal Herpes_UL69 domains are also found in Chimpanzee herpesviruses (e.g. NCBI Reference Sequence: YP_009011042 (Severini et al., 2013)), while other non-human Simplexviruses lack the HHV-1_VABD domain. Using currently available genomic data, we were unable to detect HHV-1_VABD domains outside of the Simplexvirus genus. Phylogenetic analysis of human herpesvirus mRE proteins, including proteins from selected herpesviruses of other mammals, shows that multifunctional regulator of expression proteins evolved without observable gene duplications (since this gene tree recapitulates the herpesvirus species tree).

Different domains performing the same, or similar, functions

Nine groups of human herpesviruses are annotated as performing the same, or very similar function, in the absence of discernable protein sequence similarity (Table 2, Fig. 3).
Fig. 3

Examples ofproteins composed of unrelated or only very distantly related proteins, annotated as performing the same, or very similar function. (A, B, C) Maximum likelihood gene trees for DNA polymerase processivity factor proteins from Alpha-, Beta-, and Gammaherpesvirinae based on alignments for Herpes_UL42 (A), Herpes_PAP (B), and Herpes_DNAp_acc (C) domain amino acid sequences, respectively. The internal domain duplication at the root the Herpes_UL42 tree is shown as a red square. (D, E, F) Maximum likelihood gene trees for gL proteins from human Alpha-, Beta-, and Gammaherpesvirinae based on alignments for Herpes_UL1 (D), Cytomega_gL (E), and Phage_glycop_gL (F) domain amino acid sequences, respectively. Bootstrap support values are shown. Branch length distances are proportional to expected changes per site.

Examples ofproteins composed of unrelated or only very distantly related proteins, annotated as performing the same, or very similar function. (A, B, C) Maximum likelihood gene trees for DNA polymerase processivity factor proteins from Alpha-, Beta-, and Gammaherpesvirinae based on alignments for Herpes_UL42 (A), Herpes_PAP (B), and Herpes_DNAp_acc (C) domain amino acid sequences, respectively. The internal domain duplication at the root the Herpes_UL42 tree is shown as a red square. (D, E, F) Maximum likelihood gene trees for gL proteins from human Alpha-, Beta-, and Gammaherpesvirinae based on alignments for Herpes_UL1 (D), Cytomega_gL (E), and Phage_glycop_gL (F) domain amino acid sequences, respectively. Bootstrap support values are shown. Branch length distances are proportional to expected changes per site. As mentioned above, DNA polymerase processivity factor is one of the six proteins that play essential roles at the replication fork during viral DNA replication. Processivity factors, also called clamp proteins, help to overcome the tendency of DNA polymerase to dissociate from the template DNA, and thus greatly enhance DNA polymerase processivity (Weisshart et al., 1999, Zhuang and Ai, 2010). In contrast to the protein families discussed so far, DNA polymerase processivity factors are only distantly-related across the Alpha-, Beta-, and Gammaherpesvirinae. In the Alphaherpesvirinae, the protein is composed of two tandem Herpes_UL42 domains; Betaherpesvirinae have a single Herpes_PAP domain; Gammaherpesvirinae have a single Herpes_DNAp_acc domain (Fig. 3A, B, C). These three domains are very distant homologs and are members of the DNA clamp superfamily (Pfam clan CL0060). gL (Fig. 3D, E, F) is another example of a protein function performed by different, probably non-homologous domains present in different Herpesviridae subfamilies (Pfam domains Herpes_UL1, GlyL_C, Cytomega_gL, and Phage_glycop_gL). Interestingly, the open reading frames for these seemingly unrelated proteins are located in analogous conserved genomic contexts, including open reading frame sizes and orientations relative to the surrounding conserved coding regions. The remaining seven groups with these characteristics are: cytoplasmic egress tegument protein, cytoplasmic egress facilitator-1, cytoplasmic egress facilitator-2, encapsidation chaperone protein, glycoprotein N Pfam clan Herpes_glyco, CL0146), LTP binding protein, and small capsid protein (Table 1 and Supplementary Table 1).

Gene duplication during viral 7-transmembrane receptor domain protein evolution

In contrast to the protein families discussed so far, the evolutionary history of human Herpesviridae proteins with 7-transmembrane receptor domains is more complex ( Fig. 4) (Spiess et al., 2015). By comparing this gene tree with a species tree for human Herpesviridae (Fig. 1A), we can infer three gene duplication events (marked as red squares in Fig. 4), resulting in four groups of orthologous genes: UL33/U12, US27, U51/ORF74, and US28. In our new nomenclature (see below), we call the first group “G-protein coupled receptor homolog UL33/U12_B” because it is found in all four human Betaherpesvirinae species (uppercase B suffix). The second group is called “G-protein coupled receptor homolog US27_b” as it is found in some human Betaherpesvirinae (lowercase b suffix). The third group is called “G-protein coupled receptor homolog U51/ORF74_bg” because it found in some human Betaherpesvirinae and in some human Gammaherpesvirinae (lowercase “bg” suffix). The fourth group is called “Envelope protein US28_b”. No orthologous genes were found in the human Alphaherpesvirinae. Whenever available, we base our names preferably on (Mocarski, 2007) or the “Recommended name” (under “Protein names”) from the UniProtKB database (Bateman et al., 2017). For reasons of consistency and objectivity, we used an automated approach to root all trees by mid-point rooting. It is possible, that the true root for the 7-transmembrane domain proteins tree is at the base of the U51-ORF74 subtree. In this case there would be only two duplications in the tree, but still the same four ortholog groups: U51/ORF74, US28, US27, UL33/U12. Functionally, all these proteins appear to be hijacked human proteins that are being used by the virus to modulate the host immune system. In particular, many of them appear to act as chemokine (orphan) receptors (Casarosa et al., 2003, Casarosa et al., 2001, Isegawa et al., 1998, Murphy, 2001, Zhen et al., 2005) ( Fig. 5).
Fig. 4

Gene tree for humanproteins with a 7-transmembrane receptor domain. This maximum likelihood tree is based on an alignment of 7tm_1 domain amino acid sequences. Bootstrap values are shown. Branch length distances are proportional to expected changes per site. Red squares indicate gene duplications.

Fig. 5

Gene tree for humanproteins with US22 domain(s). This maximum likelihood tree is based on an alignment of full length protein sequences. Pfam domains are shown with a E = 10−1 cutoff. Bootstrap values larger than 50 are shown. Branch length distances are proportional to expected changes per site. Red rectangles squares indicate the sometimes duplicated US22 domains. Green rectangles indicate the locations of Herpes_U5 domains.

Gene tree for humanproteins with a 7-transmembrane receptor domain. This maximum likelihood tree is based on an alignment of 7tm_1 domain amino acid sequences. Bootstrap values are shown. Branch length distances are proportional to expected changes per site. Red squares indicate gene duplications. Gene tree for humanproteins with US22 domain(s). This maximum likelihood tree is based on an alignment of full length protein sequences. Pfam domains are shown with a E = 10−1 cutoff. Bootstrap values larger than 50 are shown. Branch length distances are proportional to expected changes per site. Red rectangles squares indicate the sometimes duplicated US22 domains. Green rectangles indicate the locations of Herpes_U5 domains.

The complex evolution of US22 domain proteins

Proteins with US22 domains have the most complex evolutionary history of all Herpesviridae proteins, even though among the human herpesviruses, the US22 domain has been found only in betaherpesviruses (Hanson et al., 1999). US22 domain proteins are also present in Gallid herpesvirus 2 (a member of the Alphaherpesvirinae), in members of the Alloherpesviridae family, in other dsDNA viruses (e.g., Poxviridae and Iridoviridae), and in some animal species. Most proteins with US22 domains carry two copies of the domain. US22 is a member of a large group of distantly homologous proteins (the SUKH superfamily, Pfam clan CL0526), which, for example include bacterial Syd proteins. It has been suggested that a function of the US22 family is to act against various anti-viral responses by interacting with specific host proteins (Zhang et al., 2011). Here we summarize the results of our phylogenetic analysis of US22 domain proteins of the human bataherpesviruses. Unfortunately, the phylogenetic signal across this group of protens is weak, thus some support values are low. Two groups of US22 orthologs span all four human betaherpesviruses: CMV tegument protein UL23 is likely to have orthologs in HHV-6A, HHV-6B, HHV-7 (Roseolovirus) Protein U3 (“Tegument protein UL23/Protein U3_B”). Similarly, CMV Tegument protein UL43 is likely to be orthologous to HHV-6A, HHV-6B, HHV-7 (Roseolovirus) Protein U25 (“Tegument protein UL43/Protein U25_B”). U3 and U25 are paralogous towards each other, as they are connected by a gene duplication, as are HCMV UL23 and 43. Four groups of orthologs specific to Roseolovirus are Tegument protein DR1, Tegument protein DR6, Protein U7, and Protein U17/U16. In U17/U16 proteins, it is unclear whether they possess a second US22 domain, as the similarity to this domain is weak to the point of insignificance. In contrast, U7 proteins possess at least three US22 domains and an additional C-terminal Herpes_U5 domain. Proteins U7 are most closely related to CMV UL29, but differ in their domain architecture (lack of Herpes_U5 domain). Thus CMV UL29 forms its own species-specific group of orthologs. Numerous proteins with US22 domains are specific to CMV (and thus all paralogous to each other) given current data: apoptosis inhibitor UL38, early nuclear protein HWLF1, tegument protein UL26, US24, protein UL24, UL29, UL36, US23, US26, protein IRS1, and protein TRS1.

The inferred minimal proteomes of the human herpesviruses

As described above, we classified viral proteins into “strict ortholog groups,” requiring that all proteins exhibit the same domain architecture and are orthologous to each other. We attempted to give an informative name for each of these groups including a suffix that indicates the taxonomic distribution of a protein. For example, an “aG” suffix would indicate that proteins of this group are found in some (but not all) members of human alphaherpesvirus species (lowercase “a”), and members of both human gammaherpesvirus species (uppercase “G”). Families which have a (some) domain(s) in common but differ in their domain architectures, are more difficult to rationally name (we found 17 of these cases). An example of such a family is DNA polymerase. In such cases, the suffix is split by a period into two parts. The first part indicates overall presence of common domain(s) for all members of this SOG, the second part (after the period) relates to specific domain architectures. Thus, “DNA polymerase_ ABG.aBG” refers to the simpler DNA_pol_B_exo1––DNA_pol_B domain architecture present in nearly all Alphaherpesvirinae species. “DNA polymerase_ ABG.a” refers to the DNA_pol_B_exo1––DNA_pol_B—DNAPolymera_Pol DA that is present in a smaller subset of Alphaherpesvirinae species. The rationale behind this approach for labeling members of protein families that have different domain architectures is that it gives users a choice between “traditional” ortholog groups, which do not consider domain architectures (by ignoring the part after the period), and SOGs (taking the full name into account). In total, we were able to establish 169 SOGs (Supplementary Table 1). Of these, 40 (23 +8 +9) functionally similar groups (Table 2) are present in all 9 human Herpesviridae species and represent the core proteins of human herpesviruses. Besides proteins with clearly defined Pfam domains, we found 29 protein families for which Pfam domains have not been defined. Classification of these proteins was based on manual BLAST searches. An example of such a family is the virion host shutoff protein UL41. Another unusual case is the HSV1 UL13 serine threonine protein kinase. All nine human herpesviruses have homologs of this protein, but its associated Pfam domain UL97 only matches sequences in betaherpesviruses. Extension of the family to alpha- and gammaherpesviruses is thus based on manual BLAST searches. Finally, two protein families could not be classified due to lack of phylogenetic signal: protein B8 of HHV-6A and HHV-6B (associated gene names U92, U93, HN1, HN92D, B8) and protein UL28/UL29/U8 of HHV-6A, HHV-6B, and HHV-7. Proteins which are species or strain specific are listed in Supplementary Table 2.

Dissemination of SOG data through the ViPR database

In order to make the results of DAIO classification available to all Herpesvirus researchers for experimental hypothesis testing, we incorporated SOG data into the Virus Pathogen Resource (ViPR) at https://www.viprbrc.org (Pickett et al., 2012). Through ViPR, scientists can search, sort, and download SOG names (including taxonomic distribution), Pfam domain architecture data, and individual protein sequences belonging to selected SOGs. Fig. 6A shows an example of a search result table, which includes data for some of the protein families discussed above, namely glycoprotein B family members (associated with two distinct SOGs: “Glycoprotein B_ ABG.b” and “Glycoprotein B_ ABG.AbG”), DNA polymerase (“DNA polymerase_ ABG.a” and “DNA polymerase_ ABG.aBG”), and multifunctional regulator of expression (“Multifunctional regulator of expression_ABG.a” and “Multifunctional regulator of expression_ABG.aBG”). By clicking on the “Total # of Proteins” table entries, users can view and download the individual protein sequences belonging to a given SOG. Fig. 6B shows how SOG data, including domain architecture information, is part of protein annotations in ViPR (Simplexvirus “DNA polymerase_ABG.a” example). As new genome sequence data become available, the SOG data in ViPR is continuously updated in order to keep current with the ever expanding universe of Herpesvirus protein sequences. In addition, SOG annotations in ViPR will be expanded to include non-human Herpesviruses in the future. SOG data is also available for Pox- and Coronaviruses in ViPR, and will be applied to other virus families in the future.
Fig. 6

SOG data in the Virus Pathogen Resource (ViPR,www.viprbrc.org). (A) An example of a protein ortholog group search result is shown. Clicking on the “Total # of Proteins” table entries, allows users to view and download the individual protein sequences belonging to a given SOG. (B) The annotations of an individual protein (Simplexvirus “DNA polymerase_ABG.a” in this example), including SOG name and HMM/Pfam domain architectures, from the Human herpesvirus 1 KOS strain are shown.

SOG data in the Virus Pathogen Resource (ViPR,www.viprbrc.org). (A) An example of a protein ortholog group search result is shown. Clicking on the “Total # of Proteins” table entries, allows users to view and download the individual protein sequences belonging to a given SOG. (B) The annotations of an individual protein (Simplexvirus “DNA polymerase_ABG.a” in this example), including SOG name and HMM/Pfam domain architectures, from the Human herpesvirus 1 KOS strain are shown.

Conclusions

In this work, we used Domain-architecture Aware Inference of Orthologs (DAIO) to provide a classification for proteins of human herpesviruses, based on domain architecture and phylogenetic history. While the work presented here is limited to human herpesviruses, and thus does not take full advantage of all the sequence data that is currently available, we plan to extend our DAIO approach to all herpesviruses with a known phylogenetic history. A major contribution of our classification system to herpesvirus biology is that it provides a series of testable hypotheses for further experimental investigations. For example, it informs experimental reconstruction of minimal genome viruses. Such synthesized minimal genomes could prove useful for identification of genes responsible for pathogenic and other biological differences between viruses. Of particular interest in the field of molecular biology is the relationship between domain architecture and protein function. The detailed analysis of domain architectures presented here suggests studies that investigate the functional effects of removing or swapping domains in viral multidomain protein architectures The fact that Simplexvirus DNA polymerases contain the extra DNAPolymera_Pol domain and that this domain architecture is conserved among Simplexvirus isolates suggests that it may provide some unique function necessary for efficient replication of Simplexviruses. This hypothesis could be explored experimentally. Similarly, what would be the consequence of adding a C-terminal GlyL_C domain to the gL protein of VZV (which contains one Herpes_UL1 domain), and so making it similar to the gL protein found in HSV-1 and HSV-2 (which has a Herpes_UL1––GlyL_C architecture)? Interestingly, while it has been noted that domain loss is an important mechanism in eukaryote evolution (probably equally—and possibly even more—important than domain gain) (Zmasek and Godzik, 2011); and references therein), in herpesvirus evolution domain loss seems to play a lesser role, as most of the events we were able to detect are domain gains (according to the parsimony principle). Another implication of this work relates to the observation that in some cases proteins that share the same name are composed of either unrelated (e.g. gL) or very distantly related domains (e.g. DNA polymerase processivity factor) in different herpesvirus species. This raises the question - are such share named truly justified for proteins composed of unrelated domains? And to what extent has their putative shared function been experimentally validated. Our approach is also expected to facilitate the detection and subsequent experimental study of species- (and strain-) specific proteins (listed in Supplementary Table 2). Whereas HSV1 and HSV2 do not have any species specific proteins given current data, VZV has six, and CMV has by far the most with 130 proteins which are not found in any other species. Interestingly, many of these 130 proteins are specific to one strain (or isolate) of CMV. Unsurprisingly, many of these species- and strain-specific protein do not yet have a Pfam domain (and thus were analyzed by manual BLAST searches in this work). An example of such a protein is the ORF45 protein of KSHV (Zhu and Yuan, 2003). Our automated approach provides a starting point for the systematic computational and experimental study of these species- and strain-specific proteins—studies, which eventually will provide answers to such questions as: Are these species- and strain-specitic proteins essential under certain conditions? Do they result in altered pathology or clinical symptoms? Do they function in host interaction? Do they possess as of yet undiscovered, but shared protein domains? In summary, we developed a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO) for the classification of viral proteins into groups of orthologous proteins with identical domain architecures (SOGs). In addition, we established a nomenclature for SOGs that provides the user with information about the biological function and taxonomic distribution for the member proteins of a SOG. We applied this classification and nomenclature to the proteomes of all human Herpesviridae species and made the results publicly accessible via the ViPR database. The acquisition and retention of novel domain architectures suggests that some Herpesviridae proteins may have acquired novel functional characteristics, which can now be explored experimentally.

Materials and methods

We developed a semi automated software pipeline to analyze amino acid sequences for their protein domain based architectures and to infer multiple sequence alignments and phylogenetic trees for the molecular sequences corresponding to these architectures, followed by gene duplication inference. This pipeline contains the following five major steps: (1) sequence retrival; (2) domain architecture anlysis, including the inference of the taxonomic distributions of domain architectures – each of which corresponding to one preliminary SOG, and manual naming of domain architecures/preliminary SOGs (to be automated in future versions of this pipeline); (3) extraction of molecular sequences corresponding to domain architectures/preliminary SOGs; (4) multiple sequence alignment and phylogenetic inference; (5) gene duplication inference, to determine which preliminary SOGs contain sequences related by gene duplications and thus need to divided in multiple, final SOGs. Links to all custom software programs developed for this work are available here: https://sites.google.com/site/cmzmasek/home/software/forester/daio. In the following the tools and methods used are described in more detail.

Sequence retrieval

Individual protein sequences were downloaded from the ViPR database (Pickett et al., 2012), while entire proteomes were downloaded from UniProtKB (Bateman et al., 2017).

Multiple sequence alignments

Multiple sequence alignments were calculated using MAFFT version 7.313 (with “localpair” and “maxiterate 1000” options) (Katoh and Standley, 2013, Kuraku et al., 2013). Prior to phylogenetic inference, multiple sequence alignment columns with more than 50% gaps were deleted. For comparison we also performed the analyses based on alignments for which we only deleted columns with more than 90% gaps.

Protein domain analysis

Protein domains were analyzed using hmmscan from HMMER v3.1b2 (Eddy, 2011) and the Pfam 31.0 database (Finn et al., 2016).

Phylogenetic analyses

Phylogenetic trees were calculated for individual domain architectures (not full-length sequences) except for US22 domain proteins, because US22 domain alignments lack phylogeneticly sufficient signal. Distance-based minimal evolution trees were inferred by FastME 2.0 (Desper and Gascuel, 2002) (with balanced tree swapping and “GME” initial tree options) based on pairwise distances calculated by TREE-PUZZLE 5.2 (Schmidt et al., 2002) using the WAG substitution model (Whelan and Goldman, 2001), a uniform model of rate heterogeneity, estimation of amino acid frequencies from the dataset, and approximate parameter estimation using a Neighbor-Joining tree. For maximum likelihood approaches, we employed RAxML version 8.2.9 (Stamatakis et al., 2005) (using 100 bootstrapped data sets and the WAG substitution model). Tree and domain composition diagrams were drawn using Archaeopteryx [https://sites.google.com/site/cmzmasek/home/software/forester]. Rooting was performed by the midpoint rooting method. Unless otherwise noted, Pfam domains are displayed ith a E = 10−6 cutoff. Gene duplication inferences were performed using the SDI and RIO methods (Zmasek and Eddy, 2002, Zmasek and Eddy, 2001). Automated genome wide domain composition analysis was performed using a specialized software tool, Surfacing version 2.002 [Zmasek CM (2012), a tool for the functional analysis of domainome/genome evolution [available at https://sites.google.com/site/cmzmasek/home/software/forester/surfacing]. All conclusions presented in this work are robust relative to the alignment methods, the alignment processing, the phylogeny reconstruction methods, and the parameters used. All sequence, alignment, and phylogeny files are available upon request.

Phylogenomic analyses and development of novel naming schema using strict ortholog groups

The processes for defining and naming strict ortholog groups were formalized into a set of “rules” and then implemented into a semi-automatic domain-centric phyloinformatics pipeline. Any unique arrangement of single or multiple Pfam domains is considered a domain architecture (DA) (Zmasek and Godzik, 2012, Zmasek and Godzik, 2011). Most proteins of members of the Herpesviridae have DAs consisting of only a single domain. For example, the UDG domain of uracil DNA glycosylase is a single domain DA, whereas the combination of N-terminal DNA_pol_B_exo1 and C-terminal DNA_pol_B (denoted as DNA_pol_B_exo1––DNA_pol_B) of DNA polymerases is a DA with two domains. In this analysis, we consider a given DA “present” in a given Herpesviridae species S if the DA is present under a set of thresholds in at least one strain of the species S. The rationale for this is that it is possible to miss a DA in a genome, due to incomplete or erroneous sequences, erroneous assembly and gene-predication (false negatives), and even recent, actual gene loss. The opposite (false positive), on the other hand, is far less likely. For this work, we used two thresholds: a minimal domain length of 40% of the length set forth in the Pfam database (domain fragments are unlikely to be functionally equivalent to full length domains) and a hmmscan E-value cutoff of E = 10−6. For every domain architecture, a set of bootstrap resampled phylogenetic trees (gene trees) was calculated by RAxML (Stamatakis et al., 2005) using protein sequences from one representative for each of the nine human Herpesviridae species. For comparison and validation, we also calculated phylogenetic trees that included non-human hosted Herpesviridae. For illustrations, gene duplications were inferred by comparing the consensus gene trees to the species tree (Fig. 1) for Herpesviridae using the SDI (Speciation Duplication Inference) algorithm (Zmasek and Eddy, 2001). To obtain confidence values on orthology assignments (bootstrap support values), we employed the RIO approach (Resampled Inference of Orthologs) to compare sets of bootstrap resampled phylogenetic trees with the species tree for Herpesviridae (Zmasek and Eddy, 2002). In this work, we define a strict ortholog group (SOG) as sequences related by speciation events and exhibiting the same domain architecture (based on Pfam domains from Pfam 31.0, a length threshold of 40%, and E-value cutoff of E = 10−6). Based on this approach for defining SOGs, we developed the following naming syntax. For protein families such as uracil DNA glycosylase, which exhibit the same DA in all nine human Herpesviridae, and which are related by speciation events only, we base our names on (Mocarski and Edward, 2007) as the base name and add a case-sensitive suffix that indicates the taxonomic distribution - “ABG” in this case, since uracil DNA glycosylase appears in each human Alpha-, Beta-, and Gammaherpesvirinae species. Therefore, the full name is “uracil DNA glycosylase_ABG”. To indicate presence in some, but not all members of a subfamily, we use lower-case suffixes. “Replication origin-binding protein_ Ab” implies that members of this SOG are present in all human Alphaherpesvirinae species (“A”), and in some (but not all) Betaherpesvirinae (“b”). While most of the human Herpesviridae protein families fall into these basic cases, families which have a (some) domain(s) in common but differ in their DA, are more difficult to rationally name. An example of such a family is glycoprotein B described above. Because members of this family have different DAs, namely “Glycoprotein_B” and “HCMVantigenic_N—Glycoprotein_B”, it is composed of two SOGs (named “Glycoprotein B_ ABG.AbG” and “Glycoprotein B_ ABG.b”). In such cases, we split the suffix into two parts, separated by a period. The first part (“ABG”) indicates overall presence of common domain(s) for all members of this SOG, Glycoprotein_B in this case. The second part (after the period) relates to entire DAs. “. AbG” of “Glycoprotein B_ ABG.AbG” means that the Glycoprotein_B DA is present in all human Alpha- and Gamma-, and some Betaherpesvirinae. “.b” of “Glycoprotein B_ ABG.b” implies that the “HCMVantigenic_N—Glycoprotein_B” DA is present in some Betaherpesvirinae.
  68 in total

1.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.

Authors:  S Whelan; N Goldman
Journal:  Mol Biol Evol       Date:  2001-05       Impact factor: 16.240

2.  Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.

Authors:  Richard Desper; Olivier Gascuel
Journal:  J Comput Biol       Date:  2002       Impact factor: 1.479

3.  A comparative study of uracil-DNA glycosylases from human and herpes simplex virus type 1.

Authors:  Kuakarun Krusong; Elisabeth P Carpenter; Stuart R W Bellamy; Renos Savva; Geoff S Baldwin
Journal:  J Biol Chem       Date:  2005-11-22       Impact factor: 5.157

4.  The human herpesvirus 6 G protein-coupled receptor homolog U51 positively regulates virus replication and enhances cell-cell fusion in vitro.

Authors:  Zhu Zhen; Birgit Bradel-Tretheway; Sarah Sumagin; Jean M Bidlack; Stephen Dewhurst
Journal:  J Virol       Date:  2005-09       Impact factor: 5.103

5.  Multiple functions of DNA polymerases.

Authors:  Miguel Garcia-Diaz; Katarzyna Bebenek
Journal:  CRC Crit Rev Plant Sci       Date:  2007-03       Impact factor: 5.188

6.  A novel immunity system for bacterial nucleic acid degrading toxins and its recruitment in various eukaryotic and DNA viral systems.

Authors:  Dapeng Zhang; Lakshminarayan M Iyer; L Aravind
Journal:  Nucleic Acids Res       Date:  2011-02-08       Impact factor: 16.971

7.  ViPR: an open bioinformatics database and analysis resource for virology research.

Authors:  Brett E Pickett; Eva L Sadat; Yun Zhang; Jyothi M Noronha; R Burke Squires; Victoria Hunt; Mengya Liu; Sanjeev Kumar; Sam Zaremba; Zhiping Gu; Liwei Zhou; Christopher N Larson; Jonathan Dietrich; Edward B Klem; Richard H Scheuermann
Journal:  Nucleic Acids Res       Date:  2011-10-17       Impact factor: 16.971

8.  Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs.

Authors:  Adrian M Altenhoff; Romain A Studer; Marc Robinson-Rechavi; Christophe Dessimoz
Journal:  PLoS Comput Biol       Date:  2012-05-17       Impact factor: 4.475

9.  Herpes simplex virus processivity factor UL42 imparts increased DNA-binding specificity to the viral DNA polymerase and decreased dissociation from primer-template without reducing the elongation rate.

Authors:  K Weisshart; C S Chow; D M Coen
Journal:  J Virol       Date:  1999-01       Impact factor: 5.103

10.  Evolutionary history and functional implications of protein domains and their combinations in eukaryotes.

Authors:  Masumi Itoh; Jose C Nacher; Kei-ichi Kuma; Susumu Goto; Minoru Kanehisa
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more
  6 in total

1.  User-Friendly Reverse Genetics System for Modification of the Right End of Fowl Adenovirus 4 Genome.

Authors:  Bingyu Yan; Xiaohui Zou; Xinglong Liu; Jiaming Zhao; Wenfeng Zhang; Xiaojuan Guo; Min Wang; Yingtao Lv; Zhuozhuang Lu
Journal:  Viruses       Date:  2020-03-11       Impact factor: 5.048

2.  Ten Years of Collaborative Progress in the Quest for Orthologs.

Authors:  Benjamin Linard; Ingo Ebersberger; Shawn E McGlynn; Natasha Glover; Tomohiro Mochizuki; Mateus Patricio; Odile Lecompte; Yannis Nevers; Paul D Thomas; Toni Gabaldón; Erik Sonnhammer; Christophe Dessimoz; Ikuo Uchiyama
Journal:  Mol Biol Evol       Date:  2021-07-29       Impact factor: 16.240

3.  Genomic evolution of the Coronaviridae family.

Authors:  Christian M Zmasek; Elliot J Lefkowitz; Anna Niewiadomska; Richard H Scheuermann
Journal:  Virology       Date:  2022-03-30       Impact factor: 3.513

4.  Development of a high specificity typing method for the detection of herpes simplex virus.

Authors:  Zhu Chen; Kaixuan Zhao; Boyu Tan; Zengrui Tong; Ziyu He; Xiaofang Luo; Lei Cai; Hanming Wang; Polly H M Leung; Franklin Wang-Ngai Chow; Hui Chen; Yan Deng
Journal:  Front Bioeng Biotechnol       Date:  2022-08-17

5.  A Variant Allele in Varicella-Zoster Virus Glycoprotein B Selected during Production of the Varicella Vaccine Contributes to Its Attenuation.

Authors:  Tomohiko Sadaoka; Daniel P Depledge; Labchan Rajbhandari; Judith Breuer; Arun Venkatesan; Jeffrey I Cohen
Journal:  mBio       Date:  2022-08-02       Impact factor: 7.786

Review 6.  Two Sides to Every Story: Herpes Simplex Type-1 Viral Glycoproteins gB, gD, gH/gL, gK, and Cellular Receptors Function as Key Players in Membrane Fusion.

Authors:  Nithya Jambunathan; Carolyn M Clark; Farhana Musarrat; Vladimir N Chouljenko; Jared Rudd; Konstantin G Kousoulas
Journal:  Viruses       Date:  2021-09-16       Impact factor: 5.048

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.