| Literature DB >> 30660046 |
Christian M Zmasek1, David M Knipe2, Philip E Pellett3, Richard H Scheuermann4.
Abstract
We developed a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO) for the analysis of protein orthology by combining phylogenetic and protein domain-architecture information. Using DAIO, we performed a systematic study of the proteomes of all human Herpesviridae species to define Strict Ortholog Groups (SOGs). In addition to assessing the taxonomic distribution for each protein based on sequence similarity, we performed a protein domain-architecture analysis for every protein family and computationally inferred gene duplication events. While many herpesvirus proteins have evolved without any detectable gene duplications or domain rearrangements, numerous herpesvirus protein families do exhibit complex evolutionary histories. Some proteins acquired additional domains (e.g., DNA polymerase), whereas others show a combination of domain acquisition and gene duplication (e.g., betaherpesvirus US22 family), with possible functional implications. This novel classification system of SOGs for human Herpesviridae proteins is available through the Virus Pathogen Resource (ViPR, www.viprbrc.org).Entities:
Keywords: Comparative genomics; Domain architecture; Evolution; Gene duplication; Herpesviridae; Nomenclature; Ortholog; Phylogenetics; Protein domain; Protein family
Mesh:
Substances:
Year: 2019 PMID: 30660046 PMCID: PMC6502252 DOI: 10.1016/j.virol.2019.01.005
Source DB: PubMed Journal: Virology ISSN: 0042-6822 Impact factor: 3.616
Classification and properties of the human herpesviruses.
| Subfamily | Genus | Species | Common name | Genome length (kb) | RefSeq Accession | Number of annotated proteins |
|---|---|---|---|---|---|---|
| Herpes simplex 1 (HSV1) | 152 | 77 | ||||
| Herpes simplex 2 (HSV2) | 155 | 77 | ||||
| Varicella-zoster virus (VZV) | 125 | 73 | ||||
| Human cytomegalovirus (HCMV) | 236 | 169 | ||||
| Human herpesvirus 6A (HHV-6A) | 159 | 88 | ||||
| Human herpesvirus 6B (HHV-6B) | 162 | 104 | ||||
| Human herpesvirus 7 (HHV-7) | 153 | 86 | ||||
| Epstein-Barr Virus (EBV) | 172 | 94 | ||||
| Kaposi sarcoma-associated herpesvirus (KSHV); Human herpesvirus 8 (HHV-8) | 138 | 86 |
Protein numbers are based on CDS entries in the associated RefSeq.
Names of Herpesviridae Proteins Common to All 9 Herpesviruses Based on Strict Ortholog Groups.
| Uracil-DNA glycosidase_ABG | UL2 | ORF59 | UL114 | U81 | U81 | U81 | BKRF3 | ORF46 | UDG | |
| Helicase-primase ATPase subunit_ABG | UL5 | ORF55 | UL105 | U77 | U77 | U77 | BBLF4 | ORF44 | Herpes_Helicase | |
| Glycoprotein M_ABG | UL10 | ORF50 | UL100 | U72 | U72 | U72 | BBRF3 | ORF39 | Herpes_glycop | |
| Alkaline deoxyribonuclease_ABG | UL12 | ORF48 | UL98 | U70 | U70 | U70 | BGLF5 | ORF37 | Herpes_alk_exo | |
| Serine threonine protein kinase_ABG | UL13 | ORF47 | UL97 | U69 | U69 | U69 | BGLF4 | ORF36 | UL97 Pfam domain (Beta) | |
| Terminase_ABG | UL15 | ORF42 | UL89 | U66/U60 | U66/U60 | U66/U60 | LMP2 | ORF29 | DNA_pack_N––DNA_pack_C | |
| Tegument protein_ABG | UL16 | ORF44 | UL94 | U65 | U65 | U65 | BGLF2 | ORF33 | Herpes_UL16 | |
| Capsid transport tegument protein_ABG | UL17 | ORF43 | UL93 | U64 | U64 | U64 | BGLF1 | ORF32 | Herpes_UL17 | |
| Triplex dimer protein_ABG | UL18 | VP23 | ORF41 | UL85 | U56 | U56 | U56 | BDLF1 | ORF26 | Herpes_V23 |
| Major capsid protein_ABG | UL19 | VP5/ICP5 | ORF40 | UL86 | U57 | U57 | U57 | BcLF1 | ORF25 | Herpes_MCP |
| Glycoprotein H_ABG | UL22 | ORF37 | UL75 | U48 | U48 | U48 | BXLF2 | ORF22 | Herpes_glycop_H | |
| UL24 Protein_ABG | UL24 | ORF35 | UL76 | U49 | U49 | U49 | BXRF1 | ORF20 | Herpes_UL24 | |
| Portal capping protein_ABG | UL25 | ORF34 | UL77 | U50 | U50 | U50 | BVRF1 | ORF19 | Herpes_UL25 | |
| Protease-scaffolding protein_ABG | UL26 | ORF33 | UL80 | U53 | U53 | U53 | BVRF2 | ORF17 | Peptidase_S21 | |
| Terminase DNA binding subunit_ABG | UL28 | ORF30 | UL56 | U40 | U40 | U40 | BALF3 | ORF7 | PRTP | |
| Major DNA binding protein_ABG | UL29 | ICP8 | ORF29 | UL57 | U41 | U41 | U41 | BALF2 | ORF6 | Viral_DNA_bp |
| Nuclear egress lamina protein_ABG | UL31 | ORF27 | UL53 | U37 | U37 | U37 | BFLF2 | ORF69 | Herpes_UL31 | |
| Capsid transport nuclear protein_ABG | UL32 | ORF26 | UL52 | U36 | U36 | U36 | BFLF1 | ORF68 | Herpes_env | |
| Terminase binding protein_ABG | UL33 | ORF25 | UL51 | U35 | U35 | U35 | BFRF1A | ORF67A | Herpes_UL33 | |
| Nuclear egress membrane protein_ABG | UL34 | ORF24 | UL50 | U34 | U34 | U34 | BFRF1 | ORF67 | Herpes_U34 | |
| Triplex monomer_ABG | UL38 | VP19c | ORF20 | UL46 | U29 | U29 | U29 | BORF1 | ORF62 | Herpes_VP19C |
| Deoxyuridine 5′-triphosphate nucleotidohydrolase_ABG | UL50 | ORF8 | UL72 | U45 | U45 | U45 | BLLF3 | ORF54 | dUTPase | |
| DNA primase_ABG | UL52 | ORF6 | UL70 | U43 | U43 | U43 | BSLF1 | ORF56 | Herpes_UL52 | |
| Portal protein_ABG.ABG | UL6 | ORF54 | UL104 | U76 | U76 | U76 | LMP2 | ORF43 | Herpes_UL6 | |
| Encapsidation and egress protein_ABG.ABG | UL7 | ORF53 | UL103 | U75 | U75 | U75 | BBRF2 | ORF42 | Herpes_UL7 | |
| Encapsidation and egress protein_ABG.g | ORF43 | Herpes_UL6––Herpes_UL7 | ||||||||
| Helicase primase subunit_ABG.ABg | UL8 | ORF52 | UL102 | U74 | U74 | U74 | BBLF2/BBLF3 | ORF41 | Herpes_HEPA | |
| Helicase primase subunit_ABG.g | ORF40 | Herpes_HEPA––Herpes_heli_pri | ||||||||
| Glycoprotein B_ABG.AbG | UL27 | ORF31 | U39 | U39 | U39 | BALF4 | ORF8 | Glycoprotein_B | ||
| Glycoprotein B_ABG.b | UL55 | HCMVantigenic_N––Glycoprotein_B | ||||||||
| DNA polymerase_ABG.a | UL30 | DNA_pol_B_exo1––DNA_pol_B––DNAPolymera_Pol | ||||||||
| DNA polymerase_ABG.aBG | ORF28 | UL54 | U38 | U38 | U38 | BALF5 | ORF9 | DNA_pol_B_exo1––DNA_pol_B | ||
| Large tegument protein_ABG.A | UL36 | VP1–2 | ORF22 | Herpes_teg_N––Herpes_UL36 | ||||||
| Large tegument protein_ABG.BG | UL48 | U31 | U31 | U31 | BPLF1 | ORF64 | Herpes_teg_N | |||
| Ribonucleotide reductase large subunit_ABG.AG | UL39 | ICP6 | ORF19 | BORF2 | ORF61 | Ribonuc_red_lgN––Ribonuc_red_lgC | ||||
| Ribonucleotide reductase large subunit_ABG.B | UL45 | UL28 | UL28 | UL28 | Ribonuc_red_lgC | |||||
| Multifunctional regulator of expression_ABG.a | UL54 | ICP27 | HHV− 1_VABD––Herpes_UL69 | |||||||
| Multifunctional regulator of expression_ABG.aBG | ORF4 | UL69 | U42 | U42 | U42 | BSLF2/BMLF1 | ORF57 | Herpes_UL69 | ||
| Cytoplasmic egress facilitator-1_A | UL51 | ORF7 | Herpes_UL51 | |||||||
| Cytoplasmic egress facilitator-1_BG | U71 | U44 | U44 | U44 | BSRF1 | ORF55 | Herpes_U44 | |||
| Cytoplasmic egress facilitator-2_A | UL21 | ORF38 | Herpes_UL21 | |||||||
| Cytoplasmic egress facilitator-2_B | UL88 | U59 | U59 | U59 | Herpes_U59 | |||||
| Cytoplasmic egress facilitator-2_G | BTRF1 | ORF23 | Herpes_BTRF1 | |||||||
| Cytoplasmic egress tegument protein_A | UL11 | ORF49 | UL11 | |||||||
| Cytoplasmic egress tegument protein_CMV | UL99 | |||||||||
| Cytoplasmic egress tegument protein_G | BBLF1 | ORF38 | DUF2733 | |||||||
| Cytoplasmic egress tegument protein_R | U71 | U71 | U71 | |||||||
| DNA polymerase processivity subunit_A | UL42 | ORF16 | Herpes_UL42––Herpes_UL42 | |||||||
| DNA polymerase processivity subunit_B | UL44 | U27 | U27 | U27 | Herpes_PAP | |||||
| DNA polymerase processivity subunit_G | BMRF1 | ORF59 | Herpes_DNAp_acc | |||||||
| Tegument protein UL14_A | UL14 | ORF46 | Herpes_UL14 | |||||||
| Tegument protein UL14_BG | UL95 | U67 | U67 | U67 | BGLF3 | ORF34 | Herpes_UL95 | |||
| Glycoprotein L_A.a | ORF60 | Herpes_UL1 | ||||||||
| Glycoprotein L_A.S | UL1 | Herpes_UL1––GlyL_C | ||||||||
| Glycoprotein L_B | UL115 | U82 | U82 | U82 | Cytomega_gL | |||||
| Glycoprotein L_G | BKRF2 | ORF47 | Phage_glycop_gL | |||||||
| Glycoprotein N_A | UL49A | |||||||||
| Glycoprotein N_BG.b | UL73 | UL73_N––Herpes_UL73 | ||||||||
| Glycoprotein N_BG.BG | UL73 | U46 | U46 | U46 | BLRF1 | ORF53 | Herpes_UL73 | |||
| Glycoprotein N_a | ORF9A | Herpes_UL49_5 | ||||||||
| Inner tegument protein UL37_A | UL37 | ORF21 | Herpes_UL37_1 | |||||||
| Inner tegument protein UL37_BG | UL47 | U30 | U30 | U30 | BOLF1 | ORF63 | Herpes_U30 | |||
| Small capsid protein_A | UL35 | VP26 | ORF23 | Herpes_UL35 | ||||||
| Small capsid protein_B | UL48A | U32 | U32 | U32 | HV_small_capsid | |||||
| Small capsid protein_G | BFRF3 | ORF65 | Herpes_capsid | |||||||
Abbreviations.
ICP: infected cell protein.
VP: virion protein.
Fig. 1Proteins with conserved domain architectures that mirror the Herpesvirus species tree. (A) A current view of herpesvirus evolution. The human herpesvirus species tree is based on previous reports (McGeoch et al., 2000, McGeoch et al., 1995, Davison, 2010, Davison, 2002). (B) Maximum likelihood gene tree for uracil DNA glycosylase proteins based on an alignment for UDG Pfam domain amino acid sequences. (C) Maximum likelihood gene tree for capsid scaffolding protein proteases, based on Peptidase_S21 Pfam domain amino acid sequences. For the gene trees, bootstrap values are shown. Branch length distances are proportional to expected changes per site.
Fig. 2Proteins in which an additional domain has been added during the course of evolution. (A) Maximum likelihood gene tree for glycoprotein B proteins based on an alignment for the main glycoprotein_B domain amino acid sequences. (B) Maximum likelihood gene tree for DNA polymerase proteins based on an alignment for DNA_pol_B_exo1––DNA_pol_B domain amino acid sequences. (C) Maximum likelihood gene tree for multifunctional regulator of expression proteins based on an alignment for Herpes_UL69 domain amino acid sequences. Bootstrap values larger than 50 are shown. Branch length distances are proportional to expected changes per site.
Fig. 3Examples ofproteins composed of unrelated or only very distantly related proteins, annotated as performing the same, or very similar function. (A, B, C) Maximum likelihood gene trees for DNA polymerase processivity factor proteins from Alpha-, Beta-, and Gammaherpesvirinae based on alignments for Herpes_UL42 (A), Herpes_PAP (B), and Herpes_DNAp_acc (C) domain amino acid sequences, respectively. The internal domain duplication at the root the Herpes_UL42 tree is shown as a red square. (D, E, F) Maximum likelihood gene trees for gL proteins from human Alpha-, Beta-, and Gammaherpesvirinae based on alignments for Herpes_UL1 (D), Cytomega_gL (E), and Phage_glycop_gL (F) domain amino acid sequences, respectively. Bootstrap support values are shown. Branch length distances are proportional to expected changes per site.
Fig. 4Gene tree for humanproteins with a 7-transmembrane receptor domain. This maximum likelihood tree is based on an alignment of 7tm_1 domain amino acid sequences. Bootstrap values are shown. Branch length distances are proportional to expected changes per site. Red squares indicate gene duplications.
Fig. 5Gene tree for humanproteins with US22 domain(s). This maximum likelihood tree is based on an alignment of full length protein sequences. Pfam domains are shown with a E = 10−1 cutoff. Bootstrap values larger than 50 are shown. Branch length distances are proportional to expected changes per site. Red rectangles squares indicate the sometimes duplicated US22 domains. Green rectangles indicate the locations of Herpes_U5 domains.
Fig. 6SOG data in the Virus Pathogen Resource (ViPR,www.viprbrc.org). (A) An example of a protein ortholog group search result is shown. Clicking on the “Total # of Proteins” table entries, allows users to view and download the individual protein sequences belonging to a given SOG. (B) The annotations of an individual protein (Simplexvirus “DNA polymerase_ABG.a” in this example), including SOG name and HMM/Pfam domain architectures, from the Human herpesvirus 1 KOS strain are shown.