Literature DB >> 32363241

Pathogenicity-associated protein domains: The fiercely-conserved evolutionary signatures.

Seema Patel1.   

Abstract

Proteins have highly conserved domains that determine their functionality. Out of the thousands of domains discovered so far across all living forms, some of the predominant clinically-relevant domains include IENR1, HNHc, HELICc, Pro-kuma_activ, Tryp_SPc, Lactamase_B, PbH1, ChtBD3, CBM49, acidPPc, G3P_acyltransf, RPOL8c, KbaA, HAMP, HisKA, Hr1, Dak2, APC2, Citrate_ly_lig, DALR, VKc, YARHG, WR1, PWI, ZnF_BED, TUDOR, MHC_II_beta, Integrin_B_tail, Excalibur, DISIN, Cadherin, ACTIN, PROF, Robl_LC7, MIT, Kelch, GAS2, B41, Cyclin_C, Connexin_CCC, OmpH, Bac_rhodopsin, AAA, Knot1, NH, Galanin, IB, Elicitin, ACTH, Cache_2, CHASE, AgrB, PRP, IGR, and Antimicrobial21. These domains are distributed in nucleases/helicases, proteases, esterases, lipases, glycosylase, GTPases, phosphatases, methyltransferases, acyltransferase, acetyltransferase, polymerase, kinase, ligase, synthetase, oxidoreductase, protease inhibitors, nucleic acid binding proteins, adhesion and immunity-related proteins, cytoskeletal component-manipulating proteins, lipid biosynthesis and metabolism proteins, membrane-associated proteins, hormone-like and signaling proteins, etc. These domains are ubiquitous stretches or folds of the proteins in pathogens and allergens. Pathogenesis alleviation efforts can benefit enormously if the characteristics of these domains are known. Hence, this review catalogs and discusses the role of such pivotal domains, suggesting hypotheses for better understanding of pathogenesis at molecular level.
© 2017 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  CARDs, caspase activation and recruitment domains; CBM, carbohydrate binding module; CTD, C-terminal domain; ChtBD, chitin-binding domain; Diversification; HNHc, homing endonucleases; HTH, helix-turn-helix; IENR1, intron-encoded endonuclease repeat; Immune manipulation; PAMPs, pathogen associated molecular patterns; Pathogenesis; Phylogenetic conservation; Protein domains; SMART, Simple Modular Architecture Research Tool; Shuffling; UDG, uracil DNA glycosylase

Year:  2017        PMID: 32363241      PMCID: PMC7185390          DOI: 10.1016/j.genrep.2017.04.004

Source DB:  PubMed          Journal:  Gene Rep        ISSN: 2452-0144


Introduction

Proteins are prone to stress-driven modifications in their primary sequences (Marchler-Bauer et al., 2014). Due to the accumulated heterogeneity, homology analysis tools like Basic Local Alignment Search Tool (BLAST) assign low identity scores to even closely-related proteins (Pearson, 2013). Also, the in vitro conditions prevent elaboration of some pathogenically-critical proteins, which misguide interpretations. Such experimentally-absent proteins are termed as ‘hypothetical proteins’ and dismissed as unimportant from potential drug target candidate list. To overcome these lacunae, an innovative way of analysis is of paramount importance. Proteins have motifs which are strictly conserved, which can reveal their phylogenetic links, diversification paths and functions (Marchler-Bauer et al., 2014). These conserved sites or folds are domains (Marchler-Bauer et al., 2014). Protein databases vary in the number and notations of the domains. SMART (Simple Modular Architecture Research Tool) database catalogs more than a thousand protein domains (Ponting et al., 1999). These domains belong to different categories of protein. In silico analysis of several viral, bacterial and allergen (from animal and plant origin) proteins have identified frequently-occurring domains. A majority of these pathogenesis-mediating domains are shared between the pathogens as well as allergens. It is important to understand these crucial domains that facilitate the establishment of pathogenesis, as they are potential druggable targets. This review discusses some of such protein domains that manipulate human components and lead to morbidity and lethality.

Domains and their functions

The frequently-appearing domains can be clustered into different categories for ease of understanding, though the boundaries are not crisp and often overlapping. Many proteins are known to be modular which contain domains belonging to more than one category of the six possible enzyme classes such as hydrolases, transferases, lyases, oxidoreductases, isomerases and ligases (Cai and Chou, 2005). Also, the critical domain-harboring proteins and peptides are protease inhibitors, nucleic acid binding proteins, adhesion and immunity-related proteins, cytoskeletal protein manipulating proteins, lipid biosynthesis and metabolism proteins, membrane-associated proteins, hormone-like and signaling proteins, etc. Relevance of the oft-appearing domains has been discussed below.

Hydrolase

Nuclease/helicase

HNHc (histidine (His) – asparagine (Asn) – histidine (His)) is a domain in homing endonucleases (the DNA and RNA-targeting enzymes), inteins and introns. Homing nucleases are vital for recombination, genome rearrangement, and virulence (Mehta et al., 2004). This domain of roughly 50 aa has conserved Asn and His residues (Veluchamy et al., 2009). IENR1 (intron-encoded endonuclease repeat) domain in C terminal of HNH family nucleases is made of HTH (helix-turn-helix) sub-domain and globular ββααβ fold (as in type II KH domains), that binds to DNA (Landthaler and Shub, 2003, Oddone et al., 2007). YaeQ domain is a variation of the PD-(D/E)XK motifs in nucleases (and phosphodiesterases), occurring generally in hypothetical proteins (Guzzo et al., 2007). This domain shows homology to transcription elongation protein RfaH and exhibits compensatory activity (Wong et al., 1998). YqgFc is a ribonuclease domain, present in ribosomal and RNA-associated proteins (Jin and Pawson, 2012). YqgF proteins are substitutes of RuvC, the nuclease, resolving Holliday junctions during recombination (Bennett et al., 1993). Spt6p (a critical chromatin control gene of Saccharomyces cerevisiae) has a domain homologous to YqgF (Ponting, 2002). HELICc (helicase domain near the C terminus) is part of RNA helicases (such as RIG-I (retinoic acid-inducible gene I), MDA5 (melanoma differentiation-associated gene 5), and LGP2 (laboratory of genetics and physiology 2)) (Zou et al., 2009). This domain is involved in viral PAMPs (pathogen associated molecular patterns) recognition (Bhat et al., 2015). This domain co-occurs with other critical domains like DExD/H (proteins instrumental in fidelity control of splicing process) and CARDs (caspase activation and recruitment domains) (Liu and Cheng, 2015, Zou et al., 2009).

Protease

Pro-kuma_activ (named after pro-kumamolisin, an extracellular proteinase) is a α and β sandwich folded propeptide present in proteases (like trypsin, M28, pyrolisins), which when cleaved, the enzyme becomes active (Comellas-Bigler et al., 2004, Muszewska et al., 2011). Tryp_SPc (Trypsin-like serine protease) domains are present in the serine protease zymogens, which undergo partial cleavage for activation of the protease. Serine proteases are offense and defense proteins of all living organisms -virus, bacteria, plants, invertebrates to human, occurring as surface proteins, secreted molecules, plant latex, digestive enzymes, venom in glands etc. (Gasteiger et al., 2003, Tripathi and Sowdhamini, 2008). Encompassing trypsin, chymotrypsin, collagenase, elastase, thrombin, and subtilisin, among others, it performs a plethora of functions like digestion, blood coagulation, immune regulation, protein metabolism, and apoptosis (Di Cera, 2009, Gohara and Di Cera, 2011). If sensitized and activated, this enzyme leads to inflammation, neural diseases and cancers (Jirásková-Vaníčková et al., 2011, Patel, 2017a, Patel, 2017b).

Esterase

Lactamase_B is a domain in metal-dependent hydrolases, which include the proteins like β-lactamases, thiolesterases, glyoxalase II family, glutathione hydrolase, and competence proteins etc. (Bradford, 2001). Most of the Lactamase_B-containing proteins bind two zinc ions as cofactor and resist β-lactam antibiotics (Kong et al., 2010). A majority of this domain-containing proteins in bacteria are hypothetical proteins (van Tonder et al., 2014). Lack of the right conditions in vitro might be causal of the lack of experimental evidence of Lactamase_B-containing proteins.

Lipase

DDHD (named after the four conserved residues) domain has conserved Asp and His residues, modification of which leads to loss of phospholipase and membrane trafficking activity (Inoue et al., 2012). DDHD domain-containing phospholipase A1 family of proteins are required for organelle biogenesis and brain functioning (Inoue et al., 2012, Yamashita et al., 2010). Mutation in this motif has been associated with hereditary spastic paraplegia, a neural disease of slowly progressive weakness (Gonzalez et al., 2013). The acidPPc (acid phosphatase) domain is present in phosphatidate phosphatase, a critical enzyme that acts on phosphate monoesters, liberating diacylglycerol and inorganic phosphate (Carman and Han, 2006). This domain has been detected in HCV and virulent strains of dengue virus. Perturbation of this enzyme in human has been associated with diseases like prostate cancer and osteoporosis, among other pathologies (Araujo and Vihko, 2013).

Glycosylase

UDG (Uracil DNA glycosylase) domain occurs in proteins of uracil DNA glycosylase superfamily. This enzyme removes any uracil (generated by deamination of cytosine) from DNA, averting mutations and aberrations in information pathways (Lucas-Lledó et al., 2011). A study on Vaccinia virus reveals that the uracil DNA glycosylase is crucial for virus DNA replication (De Silva and Moss, 2003). Also, in silico analysis discovered the UDG domain in hepatitis C virus (HCV). PbH1 (parallel beta-helix repeats) are motifs present in many carbohydrate-lysing enzymes such as pectate lyases, and rhamnogalacturonases (Heffron et al., 1998). These domains are present in SHCBP1 (centralspindlin complex, made of motor protein MKLP1 and GTPase-activating protein MgcRacGAP), involved in cytokinesis initiation (Asano et al., 2014, Pavicic-Kaltenbrunner et al., 2007). These domains are present in polyductin proteins, defect in which causes autosomal recessive polycystic kidney disease (PKD) (Onuchic et al., 2002). These repeats are abundant in exopolygalacturonase allergens of Platanus acerifolia (London planetree) and pectate lyase 1 of Juniperus ashei (Ashejuniper). These leurice-rich repeats (LRR) are present in highly N-glycosylated proteins and are involved in carbohydrate moiety recognition and/or modification (Heffron et al., 1998). The pkhd1 (polycystic kidney and hepatic disease 1) gene product polyductin, associated with kidney disease (Igarashi, 2002, Menezes and Onuchic, 2006) and congenital hepatic fibrosis (Gunay-Aygun et al., 2010) contains these repeats. ChtBD3 (Chitin-binding domain type 3), a chitin-binding domain has been associated with host pathogenesis (Tran et al., 2011). ChtBD3 is present in some Ebola virus strains (such as some isolates of Mayinga-76 outbreak and isolate A0A0F7IMH5 from Libria-14 outbreak) as well as dengue virus serotype 3 strains (Messina et al., 2014). A number of pathogenic bacteria, including Vibrio cholerae elaborate an enzyme chitin oligosaccharide deacetylase which contains a ChtBD3 domain. Immense role of this domain in virulence is well-substantiated. CBM49 is a carbohydrate binding module (CBM), found at the C terminal of cellulases (Guillén et al., 2010, Shoseyov et al., 2006). The binding of CBM domains to complex glycans has been linked to pathogenesis. Some dengue virus serotype 2 isolates such as P14337, and Q9WDA6 contain a CBM49, whereas isolate Q9WDA6 contains a CBM25 (a starch binding domain found in bacterial amylases).

Guanosine triphosphate

RAB is a domain in Rab subfamily of small guanosine triphosphate (GTPases) (Diekmann et al., 2011). These proteins have wide and tissue-specific distribution, which play part in vesicle trafficking across membranes to their destined targets. These GTPases interact with numerous other components like sorting adaptors, tethering factors, kinases, phosphatases etc. for proper vesicular transport, defect in which can lead to immunodeficiencies, inflammations, neural pathologies and cancers (Stenmark, 2009). RUN is an N-terminal domain present in proteins crosstalking with Ras-like GTPase (especially in Rap and Rab family members), thus plays role in signaling pathways (Callebaut et al., 2001, Terawaki et al., 2015). The proteins harboring this domain regulate cytoskeletal organization, autophagy, endocytosis, and endosomal maturation; the functions clearly indicating role in pathogenesis. Further, this domain is often associated with DUF4206 domain (Callebaut et al., 2001, Patel and Côté, 2013). DUFs (domains of unknown function), as their name suggest are heavily-modified domains with poor annotations (Goodacre et al., 2014). Tubulin is a domain in tubulin proteins belonging to GTPase family, playing role in polymer formation (Prigozhina et al., 2001). Tubulin proteins harbor immense heterogeneity at their C-terminal end (Redeker et al., 1992). Bacteria have a tubulin homolog, known as FtsZ (filamentous temperature-sensitive protein Z) proteins, that plays role in cell division. FtsZ protein is drafted to the membrane by the actin-related protein FtsA, and together both the proteins form Z ring, initiating bacterial cytokinesis (Loose and Mitchison, 2014). EFh (EF-hand) are Ca2 + binding α helical domains of Miro GTPases, the Ca2 + sensors maintaining mitochondrial homeostasis (Suzuki et al., 2014). Trematode tegument proteins have this domain, which is characterized to show Ig (immunoglobulin)-binding properties (Wu et al., 2015). ARF (ADP-ribosylation factor) domains are present in GTPases (like Ras) and homologues. This domain is involved in post-Golgi vesicular transport (Boman et al., 2002). A tyrosine kinase Pyk2 regulates Arf1 gene activity through the protein ASAP1 (Arf GTPase-activating protein) (Inoue et al., 2008, Kruljac-Letunic et al., 2003).

Transferases

PreSET are N-terminal part of cysteine-rich Zn2 +-binding SET (Su(var)3–9, Enhancer-of-zeste, Trithorax) domains in histone lysine methyltransferases (HMTase) (Binda et al., 2010, Dillon et al., 2005). PreSET domain has been detected in plant pollen allergens (such as Lig v and Bet v). G3P_acyltransf domain is present in glycerol-3-phosphate acyltransferase, a rate limiting enzyme for triacylglycerol biosynthesis (Wendel et al., 2009). This enzyme is required for immune response of the host as observed in Coxsackievirus infection to mice (Karlsson et al., 2009). Some aggressive viral pathogens like dengue serotype 2 isolates lack it, though other serotypes harbor it. CAT (chloramphenicol acetyltransferase) is a trimeric domain that exists in chloramphenicol acetyltransferase, a bacterial enzyme that can metabolize antibiotic chloramphenicol, leading to drug resistance (Biswas et al., 2012, Yao et al., 1999). CTD (C-terminal domain) of RNA polymerase II plays role in pre-mRNA processing, including splicing via phosphorylation (at Tyr residue) (Millhouse and Manley, 2005). On the other hand, transcription termination can occur by dephosphorylation of the Tyr residue at the CTD (Schreieck et al., 2014). CTD crosstalks with the complex transcript elongation factor SPT4/SPT5 to regulate transcription (Dürr et al., 2014). RPOL8c is a subunit of RNA polymerase I, II and III, with role in transcription (Cramer et al., 2001) and microRNA gene regulation (Wang et al., 2010).

Kinases

KbaA is a key domain in the protein for KinB-signaling pathway activation in sporulation. KinA kinase regulates sporulation initiation in Bacillus subtilis, by controlling phosphate supply to the phosphorelay system (Dartois et al., 1996). Going by the literature, this domain in dengue virus might be involved in signaling pathways as well. Though most of these viruses contain this domain, it is missing in a serotype isolate P27909. TPK_B1_binding domain of thiamine pyrophosphokinase binds to vitamin B1 as the enzyme transfers pyrophosphate group from ATP to vitamin B1, in order to form the coenzyme thiamin pyrophosphate (Baker et al., 2001). This coenzyme is required for functionality of cytosolic transketolase and mitochondrial enzymes for oxidative decarboxylation of pyruvate, α-ketoglutarate or branched chain amino acids (Mayr et al., 2011). TyrKc are catalytic domain of Tyr-specific kinase subfamily, a group of cell surface receptors. This domain often co-occurs with FN3 (fibronectin type-III), IG (immunoglobulin) and Igc2 (immunoglobulin C-2 type) domains (Bernsel and Von Heijne, 2005). UBA (ubiquitin associated) domain is present in C terminus of proteins like p62, BMSC-UbP, HHR23A, Rad23, SNF1-like kinases, and plays role in inter- and intramolecular communications (Chang et al., 2006, Raasi et al., 2004). These proteins bind to ubiquitin which mediates proteasome complex degradation and optimal protein level retention in cells (Su and Lau, 2009). HAMP (Histidine kinases, Adenylyl cyclases, Methyl binding proteins, Phosphatases) are approximately 55 aa-long domains present in the proteins coded by transducing genes (Kishii et al., 2007). This domain containing helices and coiled-coil regions, often undergo conformational changes, relaying signals for chemotaxis, pathogenesis, and biofilm formation (Airola et al., 2013, Airola et al., 2010, Hulko et al., 2006, Matamouros et al., 2015). HWE_HK domain is present in HWE type histidine kinases, known to mediate environmental signaling (Galperin, 2005, Karniol and Vierstra, 2004, Lavín et al., 2007). HisKA is a crucial sensor kinase in pathogenic bacteria, including plant pathogen Pseudomonas syringae (Willett and Kirby, 2012). It has been detected in pathogenic virus Ebola as well. Hr1 (homology region 1) domain is N-terminal part of Rho effector, or Serine/threonine C-related kinase proteins (PKN/PRK) that occur in multiple isozyme forms. PKN1 (Protein Kinase N1) isoforms abound in neural cells, playing role in cytoskeletal organization and neuronal differentiation. Neuro-pathologies like amyotrophic lateral sclerosis (ALS) and Alzheimer's disease arise due to malfunction of PKN1. Hr1 domain interacts with the small GTPases Rho and Rac, regulating actin dynamics (Flynn et al., 1998, Thauerer et al., 2014, Watson et al., 2016). Dak2 (di-Mg2 + ATP binding) domain is found in dihydroxyacetone kinases family, which helps bacteria to imbibe host fatty acids into their membrane phospholipids, via phosphotransferase activity (Kinch et al., 2005, Parsons et al., 2014). YARHG is a 70 amino acid-long extracellular domain, which gets its name from the corresponding conserved motif in the protein sequence. This domain is detected in peptidases and kinase proteins, and predicted to bind bacterial cell wall or its adjacent components as outer membrane lipid or lipopolysaccharide (Coggill et al., 2013, Coggill and Bateman, 2012).

Ligase and synthetases

APC2 (cullin homology protein), the anaphase promoting complex or cyclosome is part of a ubiquitin ligase that regulates phase transition of mitosis (Puliyappadamba et al., 2011, Zhang et al., 2010, Zhou et al., 2011). Some dengue virus isolates (such as P14337 and P29990) contain this domain. Citrate_ly_lig is the C-terminal domain in the cytosolic enzyme citrate lyase ligase that catalyzes citrate fermentation into acetyl-CoA, and oxaloacetate, coupled with ATP hydrolysis. However, apart from lipid biosynthesis, this domain has been discovered to bind DNA as well (Meyer et al., 1997) and to play role in tumor growth, following acetylation of lysine residues (Lin et al., 2013, Zaidi et al., 2012). DALR is an anticodon binding domain of tRNA synthetase (arginyl/cysteinyl), made of α helices. In human, this domain-containing protein DALRD3 interacts with protein WDR6 (WD Repeat Domain 6) and C3orf60 (chromosome 3 open reading frame 60), involved in autophagy and protein assembly, respectively (Grinchuk et al., 2010, Schyth et al., 2015). DALR_1 domain detected in pollen might have role in manipulating gene expression. DALR_2 domain is found in cysteinyl-tRNA-synthetases that link amino acid to its cognate transfer RNA (Tveit et al., 2014). VKc is the catalytic subunit of vitamin K epoxide reductase. This enzyme processes blood coagulation factors to vitamin K (Oldenburg et al., 2006).

Protease inhibitors

WR1 is domain in Worm-specific repeat type 1 proteins. This cysteine-rich domain is detected in nematode Caenorhabditis elegans (Marchler-Bauer et al., 2014); however, many pathogenic viruses possess this domain or homologues. This domain often co-occurs with KU (BPTI/Kunitz family of serine protease inhibitors) domains.

Nucleic acid binding proteins

PWI (proline-tryptophan-isoleucine) domains are present in pre-mRNA processing components, the spliceosome, and known to bind RNA as well as DNA (Szymczyna et al., 2003). PWI-like domains are present in N-terminal of helicases (e.g. Brr2) (Absmeier et al., 2015). Zinc fingers are motifs known to bind DNA, which can be of many types such as BED (named after the Drosophila proteins BEAF and DREF), UBR1 (Ubiquitin Protein Ligase E3 Component N-Recognin 1), UBP (ubiquitin-binding domain), U1, LIM (named after the LIN-11, ISL-1 and MEC-3 proteins in Caenorhabditis elegans), TTF (transcription termination factor), DBF, CHCC, CDGSH (Cys-Asp-Gly-Ser-His), ZZ, PMZ (plant mutator transposase), and C4 (Gupta et al., 2012). ZnF_BED is a zinc finger domain in chromatin-boundary-element-binding proteins and transposases, required for terminal inverted repeat (TIR) and sub-terminal repeat binding, facilitating their autonomous transposition (Smith et al., 2012). ZnF_A20 domain in N terminus of ZNF216 protein is an inhibitor of cell death-like zinc finger. This domain crosstalking with IKKgamma, RIP, and TRAF6 proteins is involved in ubiquitin mediated IL-1-induced NF-kappaB activation, apoptosis and proteasomal degradation (Huang et al., 2004, Searle et al., 2012). ZnF_NFX (nuclear transcription factor, X-box binding-like 1) is a zinc finger domain in several proteins, including blast resistance Pi54 protein in rice plant (Gupta et al., 2012). Homologues of all these zinc finger motifs have been detected in pathogenic viruses like HCV, HIV, and dengue. ZM (ZASP (Z band alternatively spliced PDZ-containing protein) -like motif) is about 26 aa-long pattern in an α-actinin-binding protein ZASP, and homologues (Lin et al., 2014). ZM domain plays a role in cytoskeletal protein-protein interactions and provides structural integrity to sarcomeres (Klaavuniemi et al., 2004). As a number of proteins involved in ion channel interactions, cytoplasmic and nuclear signaling, enzymatic reactions and cytoskeletal organization bind to Z-line, mutation in ZASP leads to muscular diseases (Martinelli et al., 2014). Mutation and aberrant isoforms in ZASP can lead to myofibrillar myopathy, cardiomyopathy etc. In human, the ZASP binds to mechanosensing protein Ankrd2 (Ankyrin Repeat Domain 2) and the tumor suppressor protein p53. TUDOR domain, a 60 aa-long motif is present in RNA-binding proteins and is involved in RNA metabolism and interactions. Several copies have been detected in arthropods like Drosophila (23 instances), and their epigenetic role in modification of chromatin, and gene expression has come forth (Altschul et al., 1997). Binding of this domains to methyl-arginine ⁄ lysine residues, ligand, microRNPs, small RNAs and PIWI (named after P-element Induced WImpy testis in Drosophila) proteins has surfaced. Literature reveals their presence in fungi, protozoa, plants and metazoans (Ying and Chen, 2012), but in silico analyses are revealing their presence in viruses as well. TUDOR domain-containing protein 1, 4 and 5 are antigens expressed on testis cells and are hallmarks of cancer (Yoon et al., 2011).

Adhesion and immunity-related proteins

Human cell membrane manipulation property was evidenced from domains like MHC_II_beta, and Integrin_B_tail etc. MHC_II_beta (Class II histocompatibility antigen beta) domain is part of the MHC II glycoproteins expressed on antigen-presenting cells (APC) like macrophages, dendritic cells and B lymphocytes. These components are critical as they display fragmented antigens for recognition by helper T cells and successive immune response (Vyas et al., 2008). Integrin_B_tail (Integrin beta subunit cytoplasmic) domain is involved in cell adhesion (Bodeau et al., 2001). Flo11 domain made mostly of β sheets occurs at the N-terminal of Flo11 protein (a flocculin family adhesion protein) as found in yeast (Saccharomyces cerevisiae). This protein mediates hyphal formation, invasive growth and plays role in inter-cellular communications (Goossens and Willaert, 2012, Kraushaar et al., 2015). Excalibur (extracellular calcium-binding region) are domains of bacterial surface proteins, showing similarity with Ca2 +-binding loop of calmodulin-like EFh domains (Rigden et al., 2003). SVWC (Single domain von Willebrand factor type C) is a group of adhesin proteins. These cysteine-rich proteins play role in immunity and diseases. Bone morphogenetic protein (BMP) is regulated by proteins with VWC domain such as chordin, CHL2 (chordin-like 2), and CV2 (crossveinless 2) (Fujisawa et al., 2009, Zhang et al., 2007). TNFR (Tumor necrosis factor receptor/nerve growth factor receptor) are repeat-rich extracellular domains, with role in growth factor and cytokine binding. TNF-α (tumor necrosis factor-alpha) is a cytokine mediating diverse inflammatory conditions. The pathological mechanism involves binding of TNF-α to TNFR (Deng, 2007). TNFR-1 acts as a death receptor on ligand-mediated activation, leading to apoptosis (Park et al., 2014). TSP1 (thrombospondin) domain is characterized to regulate cell interactions in vertebrates. Thrombospondins are glycoproteins with calcium-dependent anti-angiogenic property (Iruela-Arispe et al., 2004, Lawler and Lawler, 2012). SCPU (Spore Coat Protein U) domain is found in a bacterial protein family including spore coat proteins, adhesive pili proteins and biofilm-forming proteins (Chin et al., 2015). Myxococcus xanthus mcu gene cluster (a CU (chaperone/usher) gene cluster) plays role in spore coat formation (Cao et al., 2015). SCP (sperm coating protein) is a member in the large family SCP/Tpx-1/Ag5/PR-1/Sc7, known to contain extracellular domains. This domain, spanning 120–170 aa and capable of acquiring α-β-α sandwich conformation has been identified in nematode secretome, insect allergen, and semen. During pathogenesis, the expression of genes coding for this protein is upregulated, playing role in immune exacerbation and chronic condition (Chalmers et al., 2008). DISIN (disintegrins) domains inhibit ligand-receptor association. The disintegrin proteins and metalloproteases, are together termed as ADAMs (a disintegrin and metalloprotease), which mediate cellular adhesion and recognition of sequences (Huang et al., 2003). An ADAM with thrombospondin type 1 repeats-13 (ADAMTS13) inhibits platelet aggregation and arterial thrombosis by cleavage of VWF (Xiao et al., 2011). Canary grass (Phalaris canariensis) pollen Pha a 1 DISIN is likely to induce pathogenesis via interference with adhesion of integrins. Amb_V domain is found in Amb V pollen allergen in ragweed (Ambrosia sp.). A C-terminal helix is the key T cell epitope, leading to immune reactions, though free sulfhydryl groups play role too (Canis et al., 2012). The presence of a similar domain in HCV indicates strong conservation of this domain. C4 is the C-terminal domain in type 4 procollagens, distributed in skin. This domain with tandem repeat renders the triple-helix collagen protein kinked and sheet-like. Mutation in this protein leads to autoimmune diseases like Goodpasture's syndrome (kidney and lungs inflammation) and Alport syndrome (kidney disease) (Abreu-Velez and Howard, 2012). Cadherin proteins mediate calcium dependent cell-cell adhesion and CNS (central nervous system) synapse control. Cadherin_pro domain occurs in N-terminal of cadherins. This prodomain lacks cadherin-cadherin interaction ability, but cleavage of its prosequence in the endoplasmic reticulum (ER) and Golgi apparatus can activate adhesive nature of the cadherin, conferring ability to control synapses (Koch et al., 2004, Latefi et al., 2009, Reinés et al., 2012). CCP (complement control protein) domain containing SUSHI repeats (60 aa long and cysteine-rich) was identified in pathogenic viruses, which played role in complement activation by them. Literature review implies these CCP domains in arthropods like mosquitoes (including Aedes sp.) and fruit flies (Drosophila sp.), acting as human complement analog and eradicating bacteria (Xiao et al., 2014). CHAD (conserved histidine alpha-helical domain) is an α-helical domain with conserved His residues, which chelates metals. It interacts with CYTH domain present in adenylyl cyclase and the mammalian thiamine triphosphatases (Iyer and Aravind, 2002). Cell adhesion necessitates binding of integrins with their ligands, which can be influenced by multiple domains. B_lectin is domain present in mannose-specific proteins. Apart from mannose, it recognizes N-acetylglucosamine, which can activate classical complement pathway (Muto et al., 2001).

Cytoskeletal protein binding

ACTIN domain is characteristic of ACTIN subfamily in ACTIN/mreB/sugarkinase/Hsp70 superfamily, clustered together by their common ATPase domain. Cortactin is an actin (F-actin- and Arp2/3 complex)-binding protein, regulating cytoskeleton dynamics and cortical actin-assembly (Shvetsov et al., 2009). PROF domain binds to actin monomers, membrane polyphosphoinositides and poly-L-proline (Michaelsen-Preusse et al., 2016). Robl_LC7 (Roadblock/LC7 family) domains regulate dynein, a motor protein, mediating several other adaptive functions. Mgl is a type of Robl_LC7, gene for which co-occur with gene encoding small GTPases (such as Ras superfamily involved in transduction pathways) (Miertzschke et al., 2011, Wuichet and Søgaard-Andersen, 2015). Also, Robl_LC7 domains group with PROF domain, under profilin-like clan. MIT, involved in microtubule manipulation is present in virulent strains (Zaire and Sudan) of Ebola, while missing in avirulent strain (Reston). Kelch is a conserved domain with β-propeller topology. This repeat-rich domain is widely present across organisms, from virus (Wang et al., 2014, Wilton et al., 2008), plants to humans, and it mediates protein-protein interactions. The kelch-like (KLHL) gene family is spread across multiple chromosomes in human, and several of their coded proteins bind to the E3 ligase cullin 3, playing role in ubiquitination, signaling (such as NF-κB pathway inhibition), gene expression, actin binding and involved in several diseases (Dhanoa et al., 2013). GAS2 (Growth-arrest-specific protein 2) domains manipulate actin microfilaments, bind to microtubules and lead to cell division arrest (Zhang et al., 2011). Bet v protein essentially contains a GAS2 domain. B41, a plasma membrane-binding domain appears to be a critical domain for pathogenesis. It clearly indicates the role of this domain in attaching to host membrane. A conserved neuronal protein GRP1-associated scaffolding protein (GASP) has a B41 domain (as part of a FERM domain), implicated in binding to membrane as well as cytoskeletal elements like actin (MacNeil and Pohajdak, 2009). DPBB (double-psi beta-barrel) domains are N terminal motifs present in lipoproteins like expansins (such as Phl p) (Kerff et al., 2008).

Lipid biosynthesis and metabolism

TLC, the acronym of TRAM, LAG1 and CLN8 homology is a domain in membrane proteins, and it has been linked to ceramide synthesis, lipid regulation and neural processes (Winter and Ponting, 2002). Dengue virus polyprotein has this domain, and as the virus manipulates human neural system, it seems rational. AAI (alpha-amylase inhibitor) domain is a tetra-helix fold, which forms a part of LTP (lipid transfer protein) proteins (Zottich et al., 2011). Polyketide synthases are a large group of multifunctional enzymes responsible for elaboration of myriad secondary metabolites, the polyketides, including antibiotics (Anand et al., 2010, Ansari et al., 2004). The diverse array of polyketides is formed by molecular assembly, characterized by the successive addition of chain extension units. This group of enzyme occurs in bacteria, fungi and plants. Apart from the acyl carrier protein (ACP), acyltransferase (AT), and a ketosynthase domains, a variety of β-carbon processing domains (such as ER, KR) occur in these long, modular proteins (Cane, 2010), some of which have been discussed here. PKS_ER are domains in enoylreductase in polyketide synthase enzymes (Gu et al., 2009). PKS_KR is a ketoreductase that reduces keto group to a hydroxy group. Also, studies have found the epimerase activity of PKS_KR which lies in the conserved Tyr or Ser, flanked by either Tyr or the triad of Leu, Asp, Asp residues (Bonnett et al., 2013, Xie et al., 2016). PKS_TE are domains in thioesterases, catalyzing non-ribosomal synthesis of cyclic peptide antibiotics (Heathcote et al., 2001). PLP (proteolipid protein) is a transmembrane myelin protein or lipophilin, playing role in stabilization of myelin sheaths and axonal survival. Mutant form of this protein causes neuropathies like Pelizaeus-Merzbacher disease and spastic paraplegia type 2 (Arvanitis et al., 2002, Garbern et al., 1997, Miller et al., 2009). COLIPASE is a domain in small pancreatic protein with five conserved disulfide bonds, playing role in lipid metabolism (Berton et al., 2007). In human, colipase-dependent pancreatic triglyceride lipase digests fat into fatty acids and monoacylglycerols (Johnson et al., 2013). COLIPASE is part of flavivirus polyprotein propeptide (119–204 amino acid).

Transcriptional regulators and stress-related proteins

UreE_C is the C-terminal domain in an accessory protein UreE, which hydrolyses urea into ammonia and carbamic acid. A study reports that Klebsiella aerogenes urease catalytic site binds to nickel ions and interacts with accessory proteins, including UreE for activation (Merloni et al., 2014, Song et al., 2001). Skp1 (S-phase kinase-associated protein 1) is a component of the kinase complex and it binds to F-box containing proteins like Cdc4, Skp2, and cyclin F. These adapter proteins act as transcription elongation factor and carry out proteasomal degradation of target proteins (in the form of Skp1-Cul1-F-box protein (SCF), the E3 ubiquitin ligase) by ubiquitylation (Chandra Dantu et al., 2016, Yumimoto et al., 2013). FoP_duplication domain, the acronym of C-terminal duplication domain of Friend of PRMT1, is a target of arginine methyltransferases in humans (van Dijk et al., 2010). Fop is associated with chromatin and is an activator of estrogen receptor target genes (van Dijk et al., 2010). Most isolates of dengue virus have this domain in their polyprotein. H4 is the domain of Histone H4 protein, which is likely to be involved in manipulating human interferon-beta (IFN-β) genes, as literature on hyperacetylation of H3 and H4, inactivating interferon gene expression exists (Parekh and Maniatis, 1999). A yeast study has reported the role of H4 in nucleosome assembly during replication (Shibahara et al., 2000). This domain is conserved in pathogenic viruses like dengue. HALZ is a homeobox associated leucin zipper domain present in transcription factors. The homeodomain binds to DNA while the leucine zipper carries out protein-protein interactions. With prolific growth regulatory role, this domain has been widely studied in plants (Elhiti and Stasolla, 2009). HTH_MARR domain is a helix-turn-helix motif occurring in MarR-family transcriptional regulators, thus facilitating in multiple antibiotic resistance. Also, this DNA binding domain is frequently found in hypothetical proteins, as in Staphylococcus aureus (Mohan and Venugopal, 2012), so likely to be present in other bacterial hypothetical proteins as well. Cyclin_C are domains in Cyclin family of proteins, critical for cell cycle progression. The proteins and cyclin-dependent kinase (Cdk) enzymes work in sync to induce phosphorylation of RNA polymerase II. The C-type cyclin along with Cdk8 (recruited by the multi-protein complex Mediator) also responds to stress and carries out transcription regulation and modulation of gene expression by recruiting the fission tools comprising mitochondrial fragmentation and programmed cell death (Strich and Cooper, 2014). Connexin_CCC is a cysteine-rich domain in gap junction channel protein connexin (Riquelme et al., 2013). This protein has many subtypes. The C-terminal of connexin43 regulates assembly, gating, and binding to regulatory proteins by undergoing phosphorylation by kinases (Shin et al., 2001). BAG domain in heat shock protein regulator plays role as co-chaperone of Hsp70 chaperones for proper protein folding with quality control and degradation pathways (Bracher and Verghese, 2015). Role of this domain in regulating the heat shock protein quality check pathways can be correlated to the pathogenesis of the isolates harboring it. BTP (in bromodomain transcription factors and PHD domain (a small protein domain) containing proteins) are domains of histone-like transcription factors (chromatin-associated proteins, histone acetyltransferases) (Koutelou et al., 2010). This domain recognizes lysine in histones and acetylates them, following which chromatin configuration and gene expression change, leading to viral replication, cancer, and inflammation (Sanchez and Zhou, 2009). BTAD (bacterial transcriptional activator domain) is present in Actinobacteria (Huang et al., 2015); BRLZ (basic region leucin zipper also known as bZIP) is a domain in DNA-binding transcription regulators. This domains performs myriad critical gene regulation tasks by undergoing certain degree of flexibility in their configuration (Miller, 2009). Some well-characterized members of this group of proteins include CREB (cAMP response element binding protein) (Thiel et al., 2005), and MafK (MAF BZIP Transcription Factor K) (Töröcsik et al., 2002). WHy domain occurs in water stress and hypersensitive response proteins, playing role in adaptation to stress, including cold temperature, and desiccation (Ciccarelli and Bork, 2005, Jaspard and Hunault, 2014). This domain is found in LEA (late embryogenesis abundant) proteins, detected in bacteria and archaea, plants, nematodes, and typically induced by exposure to stress conditions (Anderson et al., 2015).

Membrane-associated proteins

LANC_like domains are present in membrane-associated Lanthionine synthetase C-like protein (LanC) (Chen and Ellis, 2008). The proteins LANCL1 (P40 seven-transmembrane-domain protein) and LANCL2 (testes-specific adriamycin sensitivity protein) are produced profusely in the brain and testes, for their immune defense role (Landlinger et al., 2006, Mayer et al., 2001). Lanthionines (macrocyclic thioether) are present in lantibiotics, a type of antimicrobial peptides, elaborated by some bacterial strains. OmpH (outer membrane protein H) domain has been found critical for pathogenesis. Pasteurella multocida, a Gram-negative bacterium causes fatal diseases in animals (porcine atrophic rhinitis, bovine diseases), birds (avian fowl cholera) and sometimes in human, of which OmpH is a major surface antigen (Lee et al., 2007, Okay et al., 2012). Also, other enteric pathogenic bacteria like Salmonella typhimurium, Escherichia coli, Yersinia enterocolitica have the ompH genes, which can be borne in chromosome or plasmids. OmpH shares homology with α helix of the HLA-B27 (human leukocyte antigen subtype), which has been suspected to play role in inflammatory arthritis (Singh and Karrar, 2014). Presence of this critical domain in pollen of grasses suggests similar pathogenic mechanism. Some other pathogenicity-causing motifs in several pollens were readily identified, such as OmpH in KBG41 (Kentucky bluegrass) allergen. Bac_rhodopsin (Bacteriorhodopsin-like protein for sensory function) domain is present in G protein-coupled receptors, a type of photoreceptors (Palfi et al., 2010). Cg6151-P domain is part of a conserved membrane protein of about 190–200 aa long, but little functional characterization (Yao et al., 2009). 7TM_GPCR_Srsx domain occurs in serpentine type seven-transmembrane G-protein-coupled receptor class chemoreceptor Srsx (a Srg superfamily member) (Nagarathnam et al., 2012). This domain has been detected in pathogenic viruses, including dengue virus. AAA are domains in the metalloproteases ATPases, the membrane-tethered enzymes with wide array of functions, including degradation of misfolded proteins, membrane quality control, membrane fusion, DNA replication etc. (Krzywda et al., 2002). The AAA domain is located in the middle of the protein and it often co-occurs with C-terminal Zn2 +-dependent protease (Scharfenberg et al., 2015). Viral non-structural protein NS4A contains this domain as well.

Hormone-like and signaling proteins

Knot1 is domain in knottins, a broad array of proteins including plant lectins, antimicrobial peptides (e.g. a plant cyclotide kalata B1), plant proteinase/amylase inhibitors, plant γ-thionins and arthropod defensins. These proteins possess a multitude of functions including inhibitory, cytotoxic, antiviral or hormone-like activity (Gracy et al., 2008). The cysteine-rich domain derives its name from its knot-like topology. Three sulfide bridges in the knot interconnecting each other provide high stability to the protein against temperature, pH and chemicals (Herzig and King, 2015). This protein has shown potential to block insect voltage-gated calcium channels (Herzig and King, 2015). NH (Neurohypophysial hormone) is a domain in proteins of vasopressin/oxytocin gene family. The neurophysin protein with this domain serves as receptor for peptide hormone oxytocin, regulated by phosphatidylinositol-calcium second messenger system (de Bree and Burbach, 1998, Elphick and Rowe, 2009, Van Kesteren et al., 1995). Galanin is a 29 aa-long neuropeptide that controls growth hormone, insulin, somatostatin, adrenal secretion, smooth muscle activity etc. By its endocrine regulation, it intervenes in pain, inflammation, memory, learning, mood swings, feeding, and sexual activities (Kask et al., 1996). Role of this peptide in neural diseases, angiogenesis, cancer, obesity and diabetes has come forth as well (Poritsanos et al., 2009, Stevenson et al., 2012). Pathogenesis via the stimulation of phospholipase C (GAL2) has been recognized (Lang et al., 2015). IB (Insulin growth factor-binding) domain-containing proteins are growth factors, which bind to receptors for their functions (Siegfried et al., 1992). Elicitin is a group of plant necrotic proteins, exuded by pytopathogenic fungi and oomycetes like Phytophthora, Pythium, Hyaloperonospora, Albugo etc. (Uhlíková et al., 2016). This PAMP domain is sulfide-rich (about 6 in number) and it possesses versatile functionality, including the manipulation of host signaling pathways. Some of this domain's sulfur residues have been identified as glycosylation sites. Pythiosis-causing Pythium insidiosum elaborates elicitin, which might be mediating the human pathogenesis. The elicitin is a sterol-carrying protein which might be sequestering human cholesterol (Lerksuthirat et al., 2015). The β isoform of elicitin has higher plasma membrane affinity than that of α isoform. The genes inf2A and inf2B were identified to induce the elicitin activity (Huitema et al., 2005). Interestingly, pathogenic bacteria like Mycobacterium tuberculosis have three inf genes as well. CT or CTCK (C-terminal cystine knot-like) domains are present in growth factors such as TGFβ (transforming growth factor-beta), NGF (nerve growth factor), PDGF (platelet-derived growth factor) and GCH (human chorionic gonadotropin). The knot formed of six cysteines is conserved in the CT domain, though the proteins harboring them can assume multi-meric forms, mediating an array of functions like cell growth, embryonic development, organogenesis, intercellular communication, differentiation, tissue repair and remodeling etc. This domain occurs in VWF, the glycoprotein involved in cell adhesion, homeostasis (Zhou and Springer, 2014), and mucins (Iyer and Acharya, 2011. ACTH_domain, which is present in corticotrophins has been linked to virus immune evasion. As per literature, SARS (Severe Acute Respiratory Syndrome), and influenza virus manipulate host corticosteroid stress response to circumvent the immune response, by expressing protein homologous to host ACTH (adrenocorticotropin hormone) (Wheatland, 2004). As host immunity produces antibodies against the viral ACTH, the antibody binds to host ACTH as well, leading to adrenal gland injuries, hampering corticosteroid secretion (Wheatland, 2004). Also, adrenal deficiency, and the dearth of ACTH in HIV patients has been reported (Shashidhar and Shashikala, 2012). ACTH_domain has been detected only in dengue serotype 2 isolates (P14337, P29990, Q9WDA6 etc.). This ACTH-based molecular mimicry mechanism might be linked to the higher virulence of this serotype. Amelin (Ameloblastin precursor) are a group of proteins, found in mammalian enamel matrix. This amenoblastin amelin plays role in tooth crystal formation, as growth factor, though it has been detected to occur in extracellular matrix during embryogenesis and has been discovered to play role in bone repair (Tamburstuen et al., 2011). Ameloblastin binds to calcium and is sensitive to matrix proteases like enamelysin and kallikrein. A study has found that ameloblastin can regulate the genes related to immune responses, by expression of cytokines and induction of STAT (signal transducer and activator of transcription) in the interferon pathway (Tamburstuen et al., 2010). Interestingly, analysis has showed that HCV and HIV have amelin domain in their glycoproteins. DEP (named after the proteins Dishevelled, Egl-10, and Pleckstrin P) domain is present in G-protein signaling regulatory proteins. This globular domain made of three-helix bundle, a β-hairpin and two other β-strands modulates signal transduction by manipulating GTPase activity (Capelluto et al., 2014, Wong et al., 2000). DDRGK is a domain occurring widely in plant and vertebrate proteins, and it is named after the corresponding amino acid motif. Studies reveal its role in multiple cell signaling pathways, including NF-kappaB signaling (Wu et al., 2010). Cache_2 is an extracellular domain involved in signaling via recognition of small-molecules. Proteins forming voltage-gated Ca2 + channels and bacterial chemotaxis receptors possess this domain. This domain has been well-studied in Vibrio cholerae (Upadhyay et al., 2016). CHASE (cyclase/histidine kinase-associated sensing extracellular) is a conserved extracellular sensory domain that helps in perception of environmental changes. As the name indicates, this domain is present in signal transducing systems like histidine kinases, adenylate cyclases, diguanylate cyclases, serine/threonine protein kinases, phosphodiesterases and methyl-accepting chemotaxis proteins. CHASE domains can be of many types based on functions, out of which CHASE2, 3, 6 are well-studied. CHASE2 are part of serine/threonine kinases, which is followed by transmembrane helices (Mascher et al., 2006, Zhulin et al., 2003). Adequate numbers of studies report their presence in bacteria (Cyanobacteria etc.) signal sensing proteins, however their presence in viruses and pollen is rather new discovery. AgrB (Accessory gene regulator B) family proteins include AgrB from Staphylococcus aureus and FsrB from Enterococcus faecalis, both regulating expression of virulence genes (Robinson et al., 2005). These are quorum-sensing apparatus in the bacteria, coordinating bacterial communication (Hsieh et al., 2008). Also, these signaling genes have been recently discovered in bacteriophage genomes (Hargreaves et al., 2014). This domain is assumed to perform regulatory role in the dengue virus. Fig. 1 illustrates the pathogenic mechanisms of these domains.
Fig. 1

Pathogenic domains with hormone manipulation properties (A) and (B).

Pathogenic domains with hormone manipulation properties (A) and (B).

Other proteins

PRP (prion protein) domain occurs in prion proteins, known to cause neural diseases among animals, such as scrapie, bovine spongiform encephalopathy (BSE), kuru and Creutzfeldt-Jakob disease. The disease progresses when cellular α-helix-rich prion protein converts into β-sheet-rich amyloid fibril-forming form (Krammer et al., 2008, Kupfer et al., 2009). IGR domain is found in fungal and plant proteins; however, its annotation is too sparse and its function is unknown. Antimicrobial21 is a plant peptide, with two disulfide bonds which gives the peptide an α-helical hairpin fold topology. This peptide is antimicrobial, and antifungal, which binds to fungal conidia, penetrates and pools in the cytoplasm, leading to fungal death (Gautam et al., 2012, Nolde et al., 2011). Several DUFs (Domain of unknown functions), though poorly-annotated frequently occur in pathogenesis-related proteins. DUF1237 occur in Ebola virus isolate from Zaire strain (isolate Q6V1Q2). This domain overlaps with B41 domain, also adjacent domains occurring before this DUF are exactly same (IENR1, DEP, LamG, Lipid_DES, YqgFc) in another Zaire isolate (A0A0G2Y8I7) which indicates the domain DUF1237 might be just a modified form of B41 domain. DUF1338 in DENV-1 (P17763) and DENV-4 (Q2YHF0 and Q5UCB8) has zinc-binding function and it is a part of putative metal hydrolase (Marchler-Bauer et al., 2014). DUF1866 in Ebola virus isolate from Reston strain (isolate Q8JPX5 and Q91DD4) is likely to be either Cyt-b5 (cytochrome b5-like heme/steroid binding domain) or CASc (caspase, interleukin-1 beta converting enzyme homologues) domain. DUF862 in Ebola virus isolate from Reston strain (isolate Q8JPX5) lies just above HisKA domain, which might be a Telomerase_RBD domain. DUF4208 in Ebola virus Sudan isolate (Q5XX01) could be PLCYc, RUN, or Cyclin_C domains. Also, a myriad other catalytically-active motifs present in protozoans and metazoans were detected in the virus polyproteins, the motifs/ domain being Amb_V_allergen, AWS, BTAD, CBD_II, CHAD, CHASE2, ClpB_D2-small, CT, CTD, CxxC_CXXC_SSSS, Dak2, DALR_1, DHDPS, DISIN, DPBB_1, DUF1907, EB_dh, EFh, Excalibur, FA58C, FCD, Galanin, GAS2, GCK, Gp_dh_N, HAMP, HELICc, HWE_HK, Kelch, Knot1, KR, Lig_chan-Glu_bd, LU, MA, MBD, NH, OmpH, PKS_TE, PROF, PWI, RAB, RL11, Robl_LC7, Romo1, SAA, ShKT, Skp1, SMC_hinge, SRP54,TNFR, YaeQ, and ZnF_C3H1. SMART database can be referred to for further information on these protein domains (Ponting et al., 1999). Even if some of the domains are non-functional, the findings indicate homology and phylogenetic conservation among organisms. Also, positional shuffling of the domains affirms mosaic nature of the virus nucleic acid, which has been already proven in some DNA virus (Iyer et al., 2002). Table 1 lists all the pathogenically-critical protein domains outlined above.
Table 1

Protein classes, subclasses, the protein domains falling under them and the functions of the domains.

Protein classProtein subclassProtein domainsFunctions
HydrolaseNuclease/helicaseHNHcFor recombination, genome rearrangement, and virulence
IENR1Binds to DNA
YaeQShows homology to transcription elongation protein
YqgFcRole in recombination
HELICcRecognize viral PAMPs (pathogen associated molecular patterns)
ProteasePro-kuma_activProtein hydrolysis
Tryp_SPcProtein hydrolysis
EsteraseLactamase_BEster hydrolysisHydrolyse β-lactam antibiotics
LipaseDDHDPhospholipase and membrane trafficking activity
acidPPcPresent in phosphatidate phosphatase
GlycosylaseUDGRequired for virus DNA replication
PbH1Present in pectate lyases, and rhamnogalacturonasesInvolved in cytokinesis initiation
ChtBD3In virulence factors of viruses
CBMIn virulence factors of viruses, bacteria
Guanosine triphosphateRABRole in vesicle trafficking across membranes
RUNRole in signaling pathwaysRole in regulation of cytoskeletal organization, autophagy, endocytosis, and endosomal maturation
TubulinRole in polymer formation
FtsZRole in cell division
EFhPart of Ca2 + sensors maintaining mitochondrial homeostasis
ARFInvolved in post-Golgi vesicular transport
TransferasesPreSETDetected in plant pollen allergens
G3P_acyltransfRequired for triacylglycerol biosynthesisRequired for immune response of the host
CATMetabolize antibiotic chloramphenicol
CTDRole in pre-mRNA processing
RPOL8cRole in transcriptionRole in microRNA gene regulation
KinasesKbaARole in sporulation
TPK_B1_bindingRequired for the functions of transketolase and mitochondrial enzymes
TyrKcPresent in cell surface receptors
UBAPlays role in inter- and intramolecular communications
HAMPRelay signals for chemotaxis, pathogenesis, and biofilm formation
HWE_HKMediate environmental signaling
HisKAA crucial sensor kinase in pathogens
Hr1Part of Rho effector or PKN enzymeRole in cytoskeletal organization and neuronal differentiationHelps bacteria to imbibe host fatty acids into their membrane phospholipids
YARHGBinds to bacterial cell wall or its adjacent components as outer membrane lipid or lipopolysaccharide
Ligase and synthetasesAPC2Regulates phase transition of mitosis
Citrate_ly_ligRole in lipid biosynthesisBinds to DNARole in tumor growth
DALRInvolved in autophagy and protein assembly
VKcRole in blood coagulation
Protease inhibitorsWR1Inhibits certain proteins
Nucleic acid binding proteinsPWIBinds to RNA as well as DNA
ZnF_BEDPresent in transposases
ZnF_A20Role in apoptosis and proteasomal degradation
ZMProvides structural integrity to sarcomeres
TUDORInvolved in RNA metabolism and interactionsModulates gene expressionBiomarkers of cancer
Adhesion and immunity-related proteinsMHC_II_betaEvoke immune response
Integrin_B_tailInvolved in cell adhesion
Flo11Mediates hyphal formation, invasive growth and plays role in inter-cellular communication
ExcaliburIn bacterial surface proteins
SVWCRole in immunity and diseases
TNFRRole in growth factor and cytokine binding
TSP1Regulate cell interactions in vertebrates
SCPUPresent in spore coat protein, adhesive pili proteins and biofilm-forming proteins
SCPPresent in nematode secretome, insect allergen, and semenPlay role in immune exacerbation and chronic condition
DISINInhibit ligand-receptor association
Amb_VTrigger immune reactions
C4Component of collagens
Cadherin_proControl synapses
CCPRole in complement activation by pathogens
CHADChelates metals
B_lectinCan activate classical complement pathway
Cytoskeletal protein bindingACTINRegulate cytoskeleton dynamics and cortical actin-assembly
PROFRegulate cytoskeleton dynamics by binding to actin monomers
Robl_LC7Regulates dynein, a motor protein functions
MITManipulate microtubules
KelchMediates protein-protein interactions
GAS2Manipulates actin microfilaments, bind to microtubules and lead to cell division arrest
B41Attaches to host membrane and controls actin
DPBBHigh homology to expansins and GH45 enzymes; Part of plant allergens
Lipid biosynthesis and metabolismTLCRole in ceramide synthesis, lipid regulation and neural processes
AAIForms a part of LTP (lipid transfer protein) protein, a plant allergen
PKS_KRPKS_TEPolyketide synthase enzymes
PLPRole in stabilization of myelin sheaths and axonal survival
COLIPASERole in lipid metabolism
Transcriptional regulators and stress-related proteinsUreE_CRole in urea hydrolysis
Skp1Role as transcription elongation factorRole in proteasomal degradation
FoP_duplicationActivation of estrogen receptor target genes
H4Manipulation of human IFN-β genes
HALZRole in protein-protein interactions
HTH_MARRPart of transcriptional regulatorsConfers multiple antibiotic resistance
Cyclin_CImportant for cell cycle progression
Connexin_CCCRegulates assembly, gating, and binding to regulatory proteins
BAGActs as co-chaperones
BTPPart of histone-like transcription factors
BTADRole as transcription regulator
BRLZRole as transcription regulator
WHyPlays role in adaptation to stress, including cold temperature, and desiccation
Membrane-associated proteinsLANC_likeHave immune defense role
OmpHHighly pathogenic (from bacteria to plant pollens)
Bac_rhodopsinPresent in photoreceptors
7TM_GPCR_SrsxPresent in pathogenic viruses
AAAPart of ATPases; Play role in degradation of misfolded proteins, membrane quality control, membrane fusion, DNA replication
Hormone-like and signaling proteinsKnot1Have inhibitory, cytotoxic, antiviral or hormone-like activity
NHPart of vasopressin/oxytocin
GalaninControls growth hormone, insulin, somatostatin, adrenal secretion, smooth muscle activity
IBPart of growth factors
ElicitinSequesters human cholesterol; manipulates host signaling pathways
CT or CTCKPart of growth factors
ACTH_domainPresent in corticotrophins
AmelinPart of growth factors; regulates immune responses
DEPModulates signal transduction
DDRGKPlays role in NF-kappaB signaling
CHASEPlays role in perception of environmental changes
AgrBRegulates the expression of virulence genes; Controls quorum-sensing
Other proteinsPRPMalfunction can cause neural diseases
IGR
Antimicrobial21Exerts antimicrobial, and antifungal properties
Protein classes, subclasses, the protein domains falling under them and the functions of the domains.

Discussion

The diverse repertoires of domains have originated by stepwise or drastic reshuffling, depending on the stressors encountered. Several domains co-occur in one protein and crosstalk for critical functions. A huge number of above-discussed domains manipulate host actin protein, hormones and neurons. Its striking that despite being at different levels of evolutionary hierarchy, organisms have significant number of domains shared. The identified domain number is vast and ever-increasing, also they are being frequently re-annotated. Yet, the domains characterized here constitute the core of the pathogenesis mechanisms exploited by most pathogens and allergens. Many proteins are intrinsically unstructured (such as stress encountering proteins), yet they have highly conserved, structured domains. These domains are clues to the phylogenetic origin, evolutionary trajectories and permutation paths leading to the origin of other protein domains. Many of these critical domains occur in hypothetical proteins of pathogenic bacteria, which are normally ignored while searching for drug targets. It can be hypothesized that the hypothetical proteins with any of these domains are likely to be virulence factors and are eligible to be targeted. Patel has analyzed and reviewed extensively in this area, that has shed light on the conserved domains and their obligatory role in pathogenesis (Patel, 2016a, Patel, 2016b, Patel, 2016c, Patel, 2017a, Patel, 2017b, Patel and Patel, 2016). The current work will be an interesting addition in this direction.

Conclusion

To conclude, the most-conserved domains in pathogens and allergens are generally VWC, YARHG, WH1, RICTOR_M, Pro-kuma_activ, IENR1, B41, Y1_Tnp, HOX, HOLI, PLCYc, Hr1, H4, GGDEF, LPD_N, CHASE2, Galanin, Dak2, DALR_1, HAMP, PWI, EFh, Excalibur, CT, PbH1, HELICc, Kelch, Robl_LC7, YaeQ, PreSET, Bet_v_1, GAS2, CHAD, Integrin_B_tail, MHC_II_beta, DISIN, etc. Using these domains as clues, virulence agents and inflammation mediators can be identified. The genes responsible for coding these protein sequences deserves attention.

Compliance with ethical standards

The author declares that there is no competing interest. This work does not involve human participants or animal models. There are no coauthors, so consent is not required.
  233 in total

1.  SHCBP1 is required for midbody organization and cytokinesis completion.

Authors:  Eri Asano; Hitoki Hasegawa; Toshinori Hyodo; Satoko Ito; Masao Maeda; Dan Chen; Masahide Takahashi; Michinari Hamaguchi; Takeshi Senga
Journal:  Cell Cycle       Date:  2014       Impact factor: 4.534

2.  IgE reactivity patterns in patients with allergic rhinoconjunctivitis to ragweed and mugwort pollens.

Authors:  Martin Canis; Sven Becker; Moritz Gröger; Matthias F Kramer
Journal:  Am J Rhinol Allergy       Date:  2012 Jan-Feb       Impact factor: 2.467

3.  Purification, biochemical characterization and antifungal activity of a new lipid transfer protein (LTP) from Coffea canephora seeds with α-amylase inhibitor properties.

Authors:  Umberto Zottich; Maura Da Cunha; André O Carvalho; Germana B Dias; Nádia C M Silva; Izabela S Santos; Viviane V do Nacimento; Emílio C Miguel; Olga L T Machado; Valdirene M Gomes
Journal:  Biochim Biophys Acta       Date:  2010-12-16

4.  Thousands of rab GTPases for the cell biologist.

Authors:  Yoan Diekmann; Elsa Seixas; Marc Gouw; Filipe Tavares-Cadete; Miguel C Seabra; José B Pereira-Leal
Journal:  PLoS Comput Biol       Date:  2011-10-13       Impact factor: 4.475

5.  The ZASP-like motif in actinin-associated LIM protein is required for interaction with the alpha-actinin rod and for targeting to the muscle Z-line.

Authors:  Tuula Klaavuniemi; Annina Kelloniemi; Jari Ylänne
Journal:  J Biol Chem       Date:  2004-04-14       Impact factor: 5.157

6.  von Willebrand factor type C domain-containing proteins regulate bone morphogenetic protein signaling through different recognition mechanisms.

Authors:  Jin-Li Zhang; Yi Huang; Li-Yan Qiu; Joachim Nickel; Walter Sebald
Journal:  J Biol Chem       Date:  2007-05-04       Impact factor: 5.157

7.  Delineation of the peptide binding site of the human galanin receptor.

Authors:  K Kask; M Berthold; U Kahl; G Nordvall; T Bartfai
Journal:  EMBO J       Date:  1996-01-15       Impact factor: 11.598

Review 8.  Structural insights into specificity and diversity in mechanisms of ubiquitin recognition by ubiquitin-binding domains.

Authors:  Mark S Searle; Thomas P Garner; Joanna Strachan; Jed Long; Jennifer Adlington; James R Cavey; Barry Shaw; Robert Layfield
Journal:  Biochem Soc Trans       Date:  2012-04       Impact factor: 5.407

9.  Computational structural and functional analysis of hypothetical proteins of Staphylococcus aureus.

Authors:  Ramadevi Mohan; Subhashree Venugopal
Journal:  Bioinformation       Date:  2012-08-03

10.  Structural analysis of the Ras-like G protein MglA and its cognate GAP MglB and implications for bacterial polarity.

Authors:  Mandy Miertzschke; Carolin Koerner; Ingrid R Vetter; Daniela Keilberg; Edina Hot; Simone Leonardy; Lotte Søgaard-Andersen; Alfred Wittinghofer
Journal:  EMBO J       Date:  2011-08-16       Impact factor: 11.598

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.