Literature DB >> 32927009

Molecular phylogeny and missense mutations of envelope proteins across coronaviruses.

Sk Sarif Hassan1, Pabitra Pal Choudhury2, Bidyut Roy3.   

Abstract

Envelope (E) protein is one of the structural viroporins (76-109 amino acids) present in the coronavirus. Sixteen sequentially different E-proteins were observed from a total of 4917 available complete genomes as on 18th June 2020 in the NCBI database. The missense mutations over the envelope protein across various coronaviruses of the β-genus were analyzed to know the immediate parental origin of the envelope protein of SARS-CoV2. The evolutionary origin is also endorsed by the phylogenetic analysis of the envelope proteins comparing sequence homology as well as amino acid conservations.
Copyright © 2020. Published by Elsevier Inc.

Entities:  

Keywords:  Amino acid conservation; COVID-19; Envelope protein; Phylogeny; SARS-CoV2; Viroporin

Year:  2020        PMID: 32927009      PMCID: PMC7486180          DOI: 10.1016/j.ygeno.2020.09.014

Source DB:  PubMed          Journal:  Genomics        ISSN: 0888-7543            Impact factor:   5.736


Introduction

A novel coronavirus has been causing the ongoing pandemic which is certainly life threatening as our world is experiencing since December 2019 [1]. Coronaviruses (CoV), containing positive-sense RNA as genetic material, cause primarily respiratory infections in humans and a broad range of animals. Recently several new human coronaviruses, including severe acute respiratory syndrome coronavirus (SARS-CoV), MERS-CoV and SARS-CoV-2, were identified, which attract scientists in comprehensive understanding of viruses and identification of antiviral targets for development of therapeutic treatments. A CoV contains several proteins (structural, non-structural, accessory, etc.) among which two major structural proteins of the coronaviruses (CoVs) are spike (S) and membrane (M) glycoproteins [2]. Every Coronavirus of the β − genus does contain an envelope (E) protein, containing 75 to 84 amino acids, which plays essential roles in virus assembly, budding, morphogenesis, entry in the host cell and regulation of other cellular functions [3]. This E protein is an integral membrane protein mainly found in the ERGIC (Endoplasmic Reticulum-Golgi Intermediate Compartment) of cells transfected with a plasmid encoding E protein or infected with SARS-CoV [4]. Envelope protein of SARS-CoV-2 is 75 amino acids long, and it possesses three important domains viz. (N)-terminus containing 7–9 hydrophilic region, transmembrane domain (TMD) containing 29 amino acid residues with a high leucine/isoleucine/valine content (hydrophobic region) and (C)-terminus with hydrophilic region (Fig. 1 ) [5].
Fig. 1

Domains of the envelope protein of β-CoVs.

Domains of the envelope protein of β-CoVs. The envelope (E) protein of the coronavirus (CoV) of the β-genus famiy forms ion channels [6]. The transmembrane domain (TMD) of the E protein is responsible for the observed ion channel activity which may attenuate the infectivity. Missense mutations in the E protein which inhibited ion channel activity engendered attenuation [7,8]. It is reported that TMD forms stable pentamers and is confirmed by the molecular simulation and in vitro oligomerization [9]. It is reported that mutation of the hydrophobic amino acid residues in the TMD of the E protein with charged amino acids significantly alter the migrating properties of the E protein [3]. Analysis by Y. Liao et al. (2006) established that the TMD is essential for the membrane permeabilizing activity of the protein and also delineates that any missense mutations in the TMD of the E protein disrupt the function of the protein [3]. It is found that the envelope protein of SARS-CoV as well as SARS-CoV-2 contains three cysteine residues at positions 40, 43, and 44 respectively [10]. The first and third cysteine residues, at amino acid positions 40 and 44, respectively, were previously reported to play roles in oligomerization of the E protein [11]. Furthermore, from bio-chemical characterization it was learned that it undergoes translational modification by palmitoylation on all three cysteine residues [12]. Again, it may be noted from mutagenesis studies that the transmembrane domain is responsible for the membrane permeabilizing activity of the SARS-CoV E protein [13]. The (C)-terminal domain of envelope protein in SARS-CoV-2 binds to human PALS1, a tight junction-associated protein, which is essential for the establishment and maintenance of epithelial polarity in mammals [14]. Almost all the proteins embedded in the SARS-CoV-2 are being mutating as evidenced over the past few months [[15], [16], [17]]. It is hard to infer whether the mutations in E protein infect and sicken people deferentially due to COVID-19. In order to comprehend the effect of mutation over various proteins, one needs to accumulate all the mutations over the proteins from a large number of SARS-CoV-2 genomes available worldwide. On the other hand, most unsettled, controversial issue is the source/proximal origin of the SARS-CoV-2. Pattern of the genetic differences and motifs of the proteins present in SARS-CoV-2 distinguish it from any other known coronavirus E protein [18,19]. Zhang, Wu et al. (2020) showed that the natural reservoirs of SARS-CoV-2 are Bat and Pangolin [20]. Recently, based on genomic and protein sequences from few coronoviruses of different hosts including human, it was reported that Pangolin may not be intermediate host for coronavirus transmission from bat to human [21]. Presently, we wish to transact the transmission issue by analyzing mutations in one of most conserved proteins ( E protein) over the SARS-CoV-2 and other host-CoV genomes. In this study, using protein sequences from a large number of coronaviruses from different hosts including human, we analyzed the phylogenetic relationship among them. A comparative investigation of the envelope (E) protein of CoVs of the β-genus family including SARS-CoV-2 from the perspective of missense mutations as well as molecular organization of the amino acids in the envelope proteins has been performed in order to gain an insight and discover the intermediate hosts.

Materials and methods

This study considered all the envelope proteins of coronaviruses from different hosts viz. Bat, Camel, Cat, Cattle, Pangolin, Chimpanzee and human SARS-CoV-2. In the Table 1 , total number of available CoV genomes of respective hosts as well as distinct numbers of envelope proteins in them are presented. (See Table 2.)
Table 1

Envelope protein of different host-CoVs.

HostTotalDistinct% of Variability of the E protein
Bat792531.646%
Camel26993.346%
Cat421740.476%
Cattle2229.090%
Pangolin110%
Chimpanzee110%
SARS-CoV24917190.3864%
Table 2

List of distinct envelope (E) proteins from different host CoVs and their respective protein ID.

Protein IDHostProtein IDHostProtein IDHost
AIA62357Bat-CoVAIA62302Bat-CoVADO39821Feline-CoV
AIA62348Bat-CoVAVP78044Bat-CoVACT10858Feline-CoV
AHY61342Bat-CoVAVI15004Bovine-CoVACT10869Feline-CoV
ASL68958Bat-CoVAVZ61113Bovine-CoVACT10909Feline-CoV
ASL68947Bat-CoVALA50082Camel-CoVACT10941Feline-CoV
ATQ39391Bat-CoVQCI31474Camel-CoVACT10974Feline-CoV
AUM60029Bat-CoVQBM11741Camel-CoVACT10920Feline-CoV
QDF43841Bat-CoVASU89926Camel-CoVAWW13513Chimpanzee-CoV
YP_009072442Bat-CoVASU90554Camel-CoVQIG55947Pangolin CoV
YP_009273007Bat-CoVANI69894Camel-CoVQHZ00381Human-SARS-CoV-2
ABD75324Bat-CoVALA49346Camel-CoVQKI36855Human-SARS-CoV-2
AGC74167Bat-CoVALA49390Camel-CoVQKG87268Human-SARS-CoV-2
AKZ19089Bat-CoVASU90334Camel-CoVQKE45838Human-SARS-CoV-2
ADK66843Bat-CoVQDM36990Feline-CoVQJR88103Human-SARS-CoV-2
QDF43816Bat-CoVAYF53097Feline-CoVYP_009724392Human-SARS-CoV-2
ATO98160Bat-CoVAXE71624Feline-CoVQKI36831Human-SARS-CoV-2
ATO98184Bat-CoVASU62492Feline-CoVQJS53352Human-SARS-CoV-2
QDF43821Bat-CoVASU62503Feline-CoVQJA42107Human-SARS-CoV-2
ATO98135Bat-CoVAUG98123Feline-CoVQJQ84210Human-SARS-CoV-2
AHX37560Bat-CoVAMD11134Feline-CoVQJR89447Human-SARS-CoV-2
AIA62280Bat-CoVAGT52084Feline-CoVQJI54124Human-SARS-CoV-2
ABD75313Bat-CoVAEK25514Feline-CoVQKU31207Human-SARS-CoV-2
AIA62312Bat-CoVAEK25525Feline-CoVQKU37035Human-SARS-CoV-2
QKV07065Human-SARS-CoV-2
QKU32371Human-SARS-CoV-2
QKU28584Human-SARS-CoV-2
QKU52835Human-SARS-CoV-2
QKV06741Human-SARS-CoV-2
Envelope protein of different host-CoVs. List of distinct envelope (E) proteins from different host CoVs and their respective protein ID. From the NCBI virus database, all the protein sequences of 4917 complete SARS-CoV-2 genomes as on date 18th June 2020 as well as other host CoV genomes were fetched. Then the amino acid sequences of envelope protein of all the CoVs from different hosts viz. Bat, Cat, Cattle, Pangolin, Chimpanzee, Human, are exported in fasta format using file management operations through MATLAB ver. R2020a [22]. The following is the complete list of seventy-four distinct envelope (E) proteins from different host CoVs and their respective protein IDs (Table-2). Amino Acid Conservation Shannon Entropy: For each E protein, Shannon entropy of amino acid conservation over the amino acid sequence of E protein is computed using the following formula [23]: For a given amino acid sequence of E protein of length l, the conservation of amino acids is calculated as follows:where ; k represents the number of occurrences of an amino acid s in the given sequence.

Results

Mutations in the E protein of CoVs

It is noted that the envelope (E) protein of the CoVs of Pangolin and Chimpanzee are found to be 100% conserved as presented in Table 1 and consequently no mutation was found over there. In order to detect the missense mutations, we have made the multiple sequence alignment of the E protein sequences (Table-3) using the Clustal-Omega server [24,25]. In the following Table 4, description of the amino acid residues and their respective color and property are mentioned. These notations are also used in Fig. 2, 3, 4, 5 and 6
Table 4

Amino acid residues and their respective color and property used in Fig. 2.

ResidueColorProperty
A,V,F,P,M,I,L and WREDhydrophobic (incl.aromatic —Y)
D and EBLUEAcidic
Rand KMAGENTABasic - H
S,T,Y,H,C,N,G and QGREENHydroxyl + sulfhydryl + amine + G
Fig. 2

Sequence alignment of the E protein of Bat CoV.

Sequence alignment of the E protein of Bat CoV. It may be noted that an * (asterisk) indicates positions which have a single, fully conserved residue. Colon (:) indicates conservation between groups of strong similarity. Period (.) indicates conservation between groups of weak similarity [25].

Missense mutations of the E protein of bat CoV

Among 79 available complete CoV genomes of Bat, twenty-five unique sequences possess various mutations in the three domains of the E protein as presented in the Fig. 2. The missense mutations over the E proteins of Bat-CoV with the respective domains are described in the Table 5. There exists variety of mutations in the envelope proteins of Bat-CoV.
Table 5

Missense mutations in the envelope protein of the Bat CoV.

Protein IDMutationDomain
ATQ39391, AUM60029, AHY61342, ASL68958,Y2LN-terminal
ASL68947, AIA62357, AIA62348
YP_009072442, AUM60029E7QN-terminal
QDF43841E7AN-terminal
YP_009273007E7TN-terminal
AIA62348, ASL68947, AIA62357, ASL68958,E8QN-terminal
AHY61342, AUM60029, ATQ39391
QDF43841, YP_009273007E8DN-terminal
AIA62348, ASL68947, AIA62357, ASL68958,T9IN-terminal
AHY61342, AUM60029, ATQ39391
AIA62348T11AN-terminal
QDF43841, YP_009273007T11VN-terminal
AIA62348F20STMD
ASL68947, AIA62357, ASL68958, AHY61342,F20TTMD
AUM60029, ATQ39391
YP_009072442A22GTMD
AIA62348, ASL68947, AIA62357, ASL68958,F23CTMD
AHY61342, AUM60029, ATQ39391, QDF43841, YP_009273007
YP_009273007V25CTMD
AIA62348, ASL68947, AIA62357, ASL68958, ATQ39391F26TTMD
AKZ19089, YP_009072442,T30ATMD
AIA62348, ASL68947, AIA62357, ASL68958, AHY61342, AUM60029, ATQ39391T30CTMD
QDF43841, YP_009273007T30GTMD
QDF43841, YP_009273007L31CTMD
QDF43841, YP_009273007T35LTMD
YP_009072442A36CTMD
ASL68947, AIA62357, ASL68958, AHY61342, AUM60029, ATQ39391L37TC-terminal
QDF43841C40VC-terminal
YP_009273007C40IC-terminal
ASL68947, ASL68958A41MC-terminal
AIA62348C44VC-terminal
AIA62357, ASL68958, AHY61342, AUM60029C44AC-terminal
ATQ39391C44IC-terminal
AIA62357, ASL68958, AHY61342N45IC-terminal
AUM60029N45VC-terminal
AIA62348, ASL68947, AIA62357, ASL68958, AHY61342, AUM60029, ATQ39391I46GC-terminal
YP_009072442, AUM60029I46CC-terminal
AIA62348V47CC-terminal
YP_009072442N48DC-terminal
YP_009072442, AUM60029N48FC-terminal
YP_009072442V49QC-terminal
AIA62348, ASL68947, AIA62357, ASL68958V49TC-terminal
QDF43841V49NC-terminal
AIA62348, ASL68947, AIA62357, ASL68958, AHY61342, AUM60029, ATQ39391S50LC-terminal
QDF43841S50IC-terminal
QDF43841, YP_009273007V52CC-terminal
AIA62348K53LC-terminal
AIA62357K53VC-terminal
YP_009072442V56RC-terminal
QDF43841Y57LC-terminal
YP_009072442S60LC-terminal
ASL68958S60IC-terminal
YP_009072442R61QC-terminal
AIA62348, ASL68947, AIA62357, ASL68958, AHY61342, AUM60029, ATQ39391R61TC-terminal
AIA62348, ASL68947, AIA62357, ASL68958, AHY61342, AUM60029, ATQ39391V62GC-terminal
YP_009072442K63QC-terminal
YP_009072442N64AC-terminal
YP_009273007L65DC-terminal
QDF43841L65EC-terminal
AIA62348, ASL68947, ASL68958, AHY61342, AUM60029, ATQ39391S67VC-terminal
AIA62357S67FC-terminal
QDF43841, YP_009273007S67LC-terminal
ATO98160, AIA62280S68AC-terminal
YP_009072442, AIA62348, ASL68947, AIA62357, ASL68958,S68KC-terminal
AHY61342, AUM60029, ATQ39391
QDF43841, YP_009273007S68LC-terminal
AGC74167E69VC-terminal
ATO98135E69QC-terminal
AVP78044E69RC-terminal
YP_009072442E69LC-terminal
AIA62348, ASL68947, ASL68958, AHY61342, AUM60029, ATQ39391E69FC-terminal
QDF43841, YP_009273007E69NC-terminal
QDF43841, YP_009273007G70EC-terminal
AIA62348, ASL68947, ASL68958, AHY61342, AUM60029, ATQ39391V71EC-terminal
QDF43841, YP_009273007V71QC-terminal
AIA62348, ASL68947, ASL68958, AHY61342, AUM60029, ATQ39391P72SC-terminal
AIA62357P72NC-terminal
QDF43841, YP_009273007P72EC-terminal
AIA62348, AIA62357L73DC-terminal
ASL68947, ASL68958, AHY61342, AUM60029, ATQ39391L73EC-terminal
QDF43841L73GC-terminal
The most of the frame-shift mutations occurred in the C-terminal domain of the protein. There are also mutations in other two domains viz. TMD and N-terminal. Clearly, changes in the R-group property from Hydrophobic/Acidic to Hydrophilic/Basic of the amino acid residues of the three domains of the E protein may affect the function of the envelope protein. It is to be noted that envelope protein sequence of the protein QDF43841, YP_009273007, AIA62348 and ATQ39391 possess mutations at the cysteine residue such as C40V, C40I, C44V, C44I respectively. E protein sequence of the proteins AIA62357, ASL68958, AHY61342, AUM60029 contain the mutation C44A. These missense mutations at the cysteine residue may affect virus growth, release, entry, protein transport, and stability [26]. There is an important mutation V25C which is found in the TMD of E protein in the genome YP_009273007, which might stop the ion channel activity and led to in vivo attenuation. The TMD of the E protein for Bat CoV genomes AIA62348, ASL68947, AIA62357, ASL68958, ATQ39391 contains a mutation F26T and it may also cause stopping the ion channel activity [[27], [28], [29]]. Mutations in the motif"DFLV" might also affect its binding to the PALS1 protein and accordingly may influence replication and/or infectivity of the virus [30].

missense mutations of the E protein from camel CoV

Among 269 available complete CoV genomes of Camel, only 9 of them possess mutations as presented in the Fig. 3 .
Fig. 3

Sequence alignment of the E protein of Camel CoV.

Sequence alignment of the E protein of Camel CoV. Most of the envelope proteins of the Camel CoV do not contain any mutations, only nine E proteins among the 269 Camel-CoV genomes possess few mutations. The envelope (E) protein possesses only three missense mutations viz. F17S in TMD of the protein ALA49346, S64L and D79H in C-terminal of the proteins QBM11741 and ANI69894 respectively. It is to be noted that the motif is '′DEWV′′ in the C-terminal end is absolutely conserved within the host-CoV except in ANI69894.

Missense mutations of the E protein of cat CoV

The highest amount (40.476%) of variability among the E proteins is found in the case of Cat-CoV although the mutations over the sequences is limited to seven different positions with 8.536% over the three domains as presented in the Fig. 4 .
Fig. 4

Sequence alignment of the E protein of Cat CoV.

Sequence alignment of the E protein of Cat CoV. These missense mutations over TMD and C-terminal domains of the envelope protein of Cat CoV are shown in Table 6. It is worth noting that though the amount of variability of E proteins is too high comparatively, but the N-terminal of each E protein is absolutely conserved.
Table 6

Missense mutation of the envelope protein of the Cat CoV.

Protein IDMissense MutationsDomain
AXE71624K51N, L81SC-terminal
AEK25514W22LTMD
ADO39821N48DC-terminal
ACT10869V19G, R59CTMD, C-terminal
ACT10909L81MC-terminal
ACT10941V19GTMD
The mutations in the TMD and C-terminal in the E protein across the Cat CoV would possibly affect the functions of the protein. The mutations in the TMD of the E protein would impact on ion channel activity of the envelope protein in the Cat CoV.

Missense mutations of the E protein of cattle CoV

Among 22 available complete CoV genomes of Cattle, only two of them had variations due with frame-shifts as shown in Fig. 5 .
Fig. 5

Sequence alignment of the E protein of Cattle CoV.

Sequence alignment of the E protein of Cattle CoV. The envelope proteins of the cattle CoV are highly conserved as shown in Fig. 5. It is noted that there are two frame-shifts in the N-terminal sequence.

Missense mutations of the E protein of human SARS-CoV-2

The E protein is present over all the available 4917 SARS-CoV-2 genomes as on 18th June 2020 in the NCBI database. There are only sixteen distinct E proteins over the 4917 available SARS-CoV-2 genomes. The mutations of the E proteins (presented in Table 7) are determined through the multiple sequence alignment as shown in Fig. 6 . It is to be noted that the mutations in the C-terminal domain of E protein from SARS-CoV to SARS-CoV-2 is already described in the unpublished article [31].
Table 7

Protein ID and respective location of mutation of the E proteins over SARS-CoV-2.

Protein ID and Respective Geo-locationMutationsDomainR-Group
QKO24093 (USA: San Diego, California)E8KN-terminalAcidic to Basic
QKU52835 (USA: WA)E7QN-terminalAcidic to Basic
QKN20885 (USA), QJQ84210 (USA: New Orleans, LA)F26LTMDHydrophobic to Hydrophobic
QKI36831 (China: Guangzhou)D72YC-terminalHydrophilic to Hydrophobic
QKI36855 (China: Guangzhou)S68CC-terminalHydrophilic to Hydrophobic
QKG87268, QKG88576 (USA: Massachusetts)S68FC-terminalHydrophilic to Hydrophobic
QKE45838 (USA:CA), QKE45886 (USA:CA)P71LC-terminalHydrophobic to Hydrophobic
QKE45898 (USA:CA), QKE45910 (USA:CA)P71LC-terminalHydrophobic to Hydrophobic
QJE38284 (USA:CA), QIU81527 (USA:WA), QKV06741 (USA: WA)P71LC-terminalHydrophobic to Hydrophobic
QKU32371 (USA: CA)P71LC-terminalHydrophobic to Hydrophobic
QJS53352 (Greece: Athens)L39MTMDHydrophobic to Hydrophobic
QJR88103 (Australia: Victoria)L73FC-terminalHydrophobic to Hydrophobic
QJA42107 (USA: VA)A36VTMDHydrophobic to Hydrophobic
QHZ00381 (South Korea)L37HTMDHydrophobic to Hydrophilic
QKU31207 (USA: CA)T9ITMDHydrophilic to Hydrophobic
QKU37035 (Saudi Arabia: Jeddah)L19FTMDHydrophobic to Hydrophobic
QKV07065 (USA: WA)S55FC-terminalHydrophilic to Hydrophobic
QKU28584 (USA: WA)A41SC-terminalHydrophobic Hydrophilic
Fig. 6

Sequence alignment of the E protein of SARS-CoV-2.

Sequence alignment of the E protein of SARS-CoV-2. Most of the missense mutation occurred in the C-terminal. The E protein of QKN20885 (USA) and QJQ84210 (USA: New Orleans, LA) have a mutation at F26L in the TMD of the E protein. This particular mutation in the TMD terminate the ion channel activity and may led to in vivo attenuation. The E protein of QJS53352 (Greece: Athens), QJA42107 (USA: VA) and QHZ00381 (South Korea) contain mutations L39M, A36V and L37H respectively in the TMD of the E protein. These mutations in the TMD terminates the ion channel activity and led to in vivo attenuation. Several mutations have been found in the C-terminal of E proteins of SARS-CoV-2 and some of these mutations lead to non-synonymous R-group properties of amino acids, which might affect interaction of E protein with host proteins. From the mutation data of different host-CoVs, it is concluded that the mutations over the E proteins of the SARS-CoV-2, Pangolin CoVs and Bat CoVs are almost similar in nature. It is to be mentioned that the SARS-CoV-2 E protein is much closer to that of the Pangolin-CoV, from the variability perspective. This closeness is also supported by sequence based homology. Here we illustrate the phylogenetic relationship among the E proteins ( Table 3 ) across different CoVs based on sequence homology, as shown in Fig. 7 .
Table 3

Envelope proteins across different host CoVs.

Host-CoVsE protein sequence (N to C terminal of protein)Length
Human SARS-CoV2MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV75
Chimpanzee-CoVMFMADAYLADTVWYVGQIIFIVAICLLVTIVVVAFLATFKLCIQLCGMCNTLVLSPSIYVFNRGRQFYEFYNDIKPPVLDVDDV84
Pangolin-CoVMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV75
Feline or Cat-CoVMMFPRAFTIIDDHGMVVSVFFWLLLIIILILFSIALLNVIKLCMVCCNLGKTIIVLPARHAYDAYKNFMHIKAYDPDEAFLV82
Camel-CoVMLPFVQERIGLFIVNFFIFTVVCAITLLVCMAFLTATRLCVQCITGFNTLLVQPALYLYNTGRSVYVKFQDSKPPLPPDEWV82
Cattle or Bovine-CoVMFMADAYFADTVWYVGQIIFIVAICLLVIIVVVAFLATFKLCIQLCGMCNTLVLSPSIYVFNRGRQFYEFYNDVKPPVLDVDDV84
Bat-CoVMYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPTVYVYSRVKNLNSSEGVPDLLV76
Fig. 7

Sequence homology based phylogeny of the envelope protein of different host-CoVs.

Envelope proteins across different host CoVs. Sequence homology based phylogeny of the envelope protein of different host-CoVs. Amino acid residues and their respective color and property used in Fig. 2. Missense mutations in the envelope protein of the Bat CoV. Missense mutation of the envelope protein of the Cat CoV. Protein ID and respective location of mutation of the E proteins over SARS-CoV-2. Amino acid counts over the envelope proteins over the different host CoVs. Frequency of amino acids over the envelope proteins across the seven different host-CoVs. Shannon entropy of the amino acid conservation of the E protein of the host CoVs. From the phylogeny Fig. 7, it is derived that among all E proteins of all the host CoVs, the E proteins of Pangolin-CoV and SARS-CoV-2 are very much close to each other. In order to get a more intensive phylogenetic relationship among the E proteins of the host CoVs, we further did amino acid frequency based phylogeny. We determined the amino acid frequencies for each of the common E proteins from each of the host CoV as tabulated in Table 8. Based on the frequency vector for each E protein, pairwise euclidean distance has been calculated and consequently the phylogeny is derived (Fig. 8 ).
Table 8

Amino acid counts over the envelope proteins over the different host CoVs.

Host-CoVsARNDCQEGHILKMFPSTWYV
SARS-CoV-24351302103142152840413
Chimpanzee-CoV623643130892373241512
Pangolin-CoV4351302103142152840413
Cat-CoV7235301231111457322137
Camel-CoV4332442305112286271310
Bovine-CoV623643130882383231513
Bat-CoV4251303203142142750414
Fig. 8

Phylogenetic relationship among the different host CoVs with respect to the amino acids conservation the envelope protein.

Phylogenetic relationship among the different host CoVs with respect to the amino acids conservation the envelope protein. From the amino acid frequency based phylogeny, it is reconfirmed that the E protein of BatCoV and SARS-CoV-2 are co-evolved from the same origin. Further it is also confirmed that the E protein of Pangolin-CoV and SARS-CoV-2 are very much conserved from the point of amino acid conservation in the protein. It is worth mentioning that the Chimpanzee-CoV and Bovine-CoV contain the most closest E proteins as confirmed from the sequence based homology as well as amino acid conservation.

Phylogeny of the envelope proteins of host-CoVs

The sequence based homology of 74 distinct E proteins across the different host CoVs are presented in Fig. 9 .
Fig. 9

Phylogenetic relationship among envelope proteins of the different host CoVs with respect to the sequence based homology.

Phylogenetic relationship among envelope proteins of the different host CoVs with respect to the sequence based homology. The E proteins of the Bat-CoV, Pangolin-CoV and SARS-CoV-2 belong to the left hand side of the cladogram (from root) exclusively as shown in Fig. 9. The other side contains the E proteins of the other host CoVs. It is also observed that all the sixteen different E proteins of SARS-CoV-2 and that of Pangolin belong to a nearby neighbourhood. In Table 9, for each of the E proteins of the CoVs, frequency of each amino acids is computed, which yields the amino acids conservation based phylogeny (Fig. 10 ).
Table 9

Frequency of amino acids over the envelope proteins across the seven different host-CoVs.

NameHostARNDCQEGHILKMFPSTWYV
ASL68947Bat CoV72314533068237636139
ASU90554Camel CoV4332432314112386271310
ASL68958Bat CoV72214533078237636139
ANI69894Camel CoV4331442314112386271310
AXE71624Feline CoV7244311221110357333137
ALA49346Camel CoV4332442304112376371310
AGT52084Feline CoV7234311221111456333137
AUM60029Bat CoV623146230682376351311
ACT10909Feline CoV7334311221011366323138
AHY61342Bat CoV723145330792276341310
AIA62348Bat CoV722154333681166341313
ACT10941Feline CoV7234311321112456323136
AYF53097Feline CoV7234311221011457323138
QCI31474Camel CoV4332442304112386271310
ASU62492Feline CoV7234311221111457323137
ACT10974Feline CoV6234311221111457323138
ALA49390Camel CoV4332442304122376271310
ASU89926Camel CoV4332442305112286271310
ACT10869Feline CoV7134411321112456323136
AIA62357Bat CoV635143331881186161310
ASU90334Camel CoV4332442304112286271310
ASU62503Feline CoV7234311221112456323137
ADO39821Feline CoV7225311221111457323137
QDM36990Feline CoV7244311221111447322138
ACT10858Feline CoV7244311221112456322137
AWW13513Chimpanzee CoV623643130892373241512
ACT10920Feline CoV7244311221111357222137
ATQ39391Bat CoV423045430582376371312
QBM11741Camel CoV4332442304122386171310
AVI15004Bovine CoV623643130882383231513
AUG98123Feline CoV7334301221211457323136
AMD11134Feline CoV7235301231111457322137
AEK25525Feline CoV7234301221111557323137
AVZ61113Bovine CoV623643130882273231513
ALA50082Camel CoV623643130982283231513
AEK25514Feline CoV7234311221112457323037
YP_009072442Bat CoV6331453305121142350513
QDF43841Bat CoV31525254110152151430210
YP_009273007Bat CoV2032624207123141560412
QHZ00381SARS-CoV-24351302113132152840413
ATO98135Bat CoV4251312203142142750414
AIA62302Bat CoV4252402103122242750415
QJS53352SARS-CoV-24351302103132252840413
QDF43816Bat CoV4251303204142142750413
ABD75324Bat CoV4251303203132152750414
ADK66843Bat CoV4240314103132162850413
ATO98160Bat CoV5251303203142142650414
AIA62280Bat CoV5251303203132142650415
QJR88103SARS-CoV-24351302103132162840413
AKZ19089Bat CoV5251303203142142740414
QDF43821Bat CoV4251303203142142750414
QKI36855SARS-CoV-24351402103142152740413
AIA62312Bat CoV4251303203132142750415
ABD75313Bat CoV4252402103132142750415
QKG87268SARS-CoV-24351302103142162740413
AHX37560Bat CoV4241303203142142850414
AGC74167Bat CoV4251302203132152750415
AVP78044Bat CoV4351302103142152840413
QIG55947Pangolin CoV4351302103142152840413
YP_009724392SARS-CoV-24351302103142152840413
QJR89447SARS-CoV-24351302103142152840412
QJQ84210SARS-CoV-24351302103152142840413
QJA42107SARS-CoV-23351302103142152840414
ATO98184Bat CoV4251303203152141750414
QKE45838SARS-CoV-24351302103152151840413
QKI36831SARS-CoV-24350302103142152840513
QJI54124SARS-CoV-24350302103132152840413
QKU31207SARS-CoV-24351302104142152830413
QKU37035SARS-CoV-24351302103132162840413
QKV07065SARS-CoV-24351302103142162740413
QKU28584SARS-CoV-23351302103142152940413
QKU52835SARS-CoV-24351311103142152840413
Fig. 10

Phylogenetic relationship among envelope proteins of the different host CoVs with respect to the amino acids conservation.

Phylogenetic relationship among envelope proteins of the different host CoVs with respect to the amino acids conservation. From the sequence homology (Fig. 9) it is observed that the E proteins of AIA62312 and ABD75324 of Bat-CoV are very much close. Based on the amino acid conservation over the E protein, the phylogeny (Fig. 10) further showed that the E proteins of QDF43821, ATO98184, AHX37560, AGC74167, AKZ19089, AIA62280, ATO98160, ATO98135 of Bat-CoV are in the same branch with same level of the phylogeny. The phylogeny in Fig. 9 describes that the E proteins of QJR89447 and QJQ84210 of SARS-CoV2 are very close. It is obtained that the E proteins of AVP78044 (Bat-CoV), QIG55947 (Pangolin-CoV) and YP_009724392 (SARS-CoV-2) are close to that of QJR89447 (SARS-CoV-2) from the phylogeny based on amino acid conservations (Fig. 10). The E proteins of QKI36855, QJA42107, QKE45838, QKI36831 and QJI54124 of SARS-CoV2 are in the close proximity to that of the QJQ84210 (SARS-CoV-2) based on amino acid conservations. Again the E proteins of QKU31207 and YP_009724392 of SARS-CoV-2 are found to be near enough based on the homology based phylogeny. It is observed that almost all the E proteins of SARS-CoV-2 as well as Bat and Pangolin-CoVs do not contain the amino acids tryptophan, glutamine and histidine. The E proteins of all the host CoVs are leucine and valine residues rich as observed in Table 10.
Table 10

Shannon entropy of the amino acid conservation of the E protein of the host CoVs.

NameHostSENameHostSENameHostSE
ASL68947Bat CoV0.933ACT10858Feline CoV0.919AIA62280Bat CoV0.851
ASU90554Camel CoV0.932AWW13513Chimpanzee CoV0.918QJR88103SARS-CoV20.850
ASL68958Bat CoV0.930ACT10920Feline CoV0.916QKU37035SARS-CoV20.850
ANI69894Camel CoV0.929ATQ39391Bat CoV0.916AKZ19089Bat CoV0.850
AXE71624Feline CoV0.928QBM11741Camel CoV0.915QDF43821Bat CoV0.850
ALA49346Camel CoV0.927AVI15004Bovine CoV0.914QKI36855SARS-CoV20.850
AGT52084Feline CoV0.926AUG98123Feline CoV0.912AIA62312Bat CoV0.849
AUM60029Bat CoV0.926AMD11134Feline CoV0.912ABD75313Bat CoV0.848
ACT10909Feline CoV0.926AEK25525Feline CoV0.912QKG87268SARS-CoV-20.848
AHY61342Bat CoV0.925AVZ61113Bovine CoV0.912QKV07065SARS-CoV-20.848
AIA62348Bat CoV0.925ALA50082Camel CoV0.909AHX37560Bat CoV0.847
ACT10941Feline CoV0.924AEK25514Feline CoV0.908AGC74167Bat CoV0.847
AYF53097Feline CoV0.924YP_009072442Bat CoV0.888AVP78044Bat CoV0.846
QCI31474Camel CoV0.923QDF43841Bat CoV0.881QIG55947Pangolin CoV0.846
ASU62492Feline CoV0.922YP_009273007Bat CoV0.868YP_009724392SARS-CoV-20.846
ACT10974Feline CoV0.922QHZ00381SARS-CoV-20.862QKU31207SARS-CoV-20.846
ALA49390Camel CoV0.921ATO98135Bat CoV0.858QJR89447SARS-CoV-20.843
ASU89926Camel CoV0.921AIA62302Bat CoV0.857QKU28584SARS-CoV-20.842
ACT10869Feline CoV0.920QJS53352SARS-CoV-20.856QJQ84210SARS-CoV-20.841
AIA62357Bat CoV0.920QDF43816Bat CoV0.856QJA42107SARS-CoV-20.840
ASU90334Camel CoV0.920ABD75324Bat CoV0.855ATO98184Bat CoV0.840
ASU62503Feline CoV0.920ADK66843Bat CoV0.852QKE45838SARS-CoV20.836
ADO39821Feline CoV0.920QKU52835SARS-CoV-20.852QKI36831SARS-CoV-20.835
QDM36990Feline CoV0.919ATO98160Bat CoV0.851QJI54124SARS-CoV-20.824
Based on the amino acid frequency vector for each proteins, the Shannon entropy (SE) is computed which is tabulated in Table 10. This SE of the amino acid conservation of the E protein suggests molecular level closeness of the E protein. From the Table 10, it is quite evident that the conservation of amino acids over the E protein of Bat-CoV is highly diverse as SE value is in the interval of 0.84 and 0.94 whereas SE of most common E protein of SARS-CoV-2 and that of the Pangolin-CoV are found to be identical and it is close to 0.846. Note that, there is an E protein of AVP78044 (Bat-CoV) whose SE is also identical to 0.846. The remaining fifteen different E proteins of SARS-CoV-2 are close enough to other Bat-CoVs by accumulating various missense mutations. The SE of the E protein of SARS-CoV2 lies in between 0.824 and 0.862. There are E proteins of SARS-CoV-2 whose SE of amino acid conservation is tightly bounded by that of Pangolin and Bat-CoVs. It is found that SE of E proteins of ADK66843 (Bat-CoV) and QKU52835 (SARS-CoV-2) are turned out to be identical (0.852). There are other such examples too which are clearly observed in the Table 10. This phylogenetic relationship is endorsed by the amino acid conservation and their associated SE found in Table 10. We also observed phylogenetic relationship among E proteins from Bat-CoV, Pangolin-CoV and SARS-CoV-2 (Fig. 11 ). This relationship was drawn using amino acid conservation and their associated SE (Table 10).
Fig. 11

Phylogenetic relationship among envelope proteins of the SARS-CoV2, Bat and Pangolin CoVs with respect to the amino acids conservation.

Phylogenetic relationship among envelope proteins of the SARS-CoV2, Bat and Pangolin CoVs with respect to the amino acids conservation.

Conclusions

Here, we performed phylogenetic analysis of E protein sequences of coronaviruses from different hosts although different investigators also performed phylogenetic analysis using the genomic and protein sequences of few coronaviruses from different hosts [21]. But the phylogenetic analysis, using E protein sequences from a large number of seuquences, may provide a better picture of the relationship among hosts coronaviruses so far as the intermediate host between human and bat is concerned since protein is the functional unit in the cell. So, this study, using protein sequence variations, may provide the clue why few hosts are resistant or sensitive to the disease Covid-19. We observed variations in protein sequences of E-protein in Human-SARS-CoV-2, Bat-CoV, Camel-CoV etc. Based on mutation characteristics and amino acid conservations over the E proteins across various host CoVs, this report predicts potential close kins of human SARS-CoV-2 as the Pangolin-CoV and Bat-CoV which was also reported in a recent study [21]. Pangolin, the closest kin of SARS-CoV-2, is also confirmed by the analysis made in this study. The missense mutations of the E protein across various host CoVs, may bar the usual functions of the envelope protein and consequently the virus may become weaker in infectivity. It is our belief that various missense mutations in the E protein could weaken the SARS-CoV-2 and would help us gets rid of COVID-19 in future since any virus does not like to destroy its host for its survival for a long to come.

Data availability

The protein sequences of the SARS-CoV-2 and other host CoVs used in this study are available in the NCBI virus database https : //www. ncbi. nlm. nih. gov/labs/virus/vssi/.

Author contributions

SH conceived the problem. SH determined the mutations. SH, PPC, BR analyzed the data and result. SH wrote the initial draft which was checked and edited by all other authors to generate the final version.

Declaration of Competing Interest

The authors do not have any conflicts of interest to declare.
  28 in total

1.  Relative von Neumann entropy for evaluating amino acid conservation.

Authors:  Fredrik Johansson; Hiroyuki Toh
Journal:  J Bioinform Comput Biol       Date:  2010-10       Impact factor: 1.122

2.  44-amino-acid E5 transforming protein of bovine papillomavirus requires a hydrophobic core and specific carboxyl-terminal amino acids.

Authors:  B H Horwitz; A L Burkhardt; R Schlegel; D DiMaio
Journal:  Mol Cell Biol       Date:  1988-10       Impact factor: 4.272

3.  Channel-Inactivating Mutations and Their Revertant Mutants in the Envelope Protein of Infectious Bronchitis Virus.

Authors:  Janet To; Wahyu Surya; To Sing Fung; Yan Li; Carmina Verdià-Bàguena; Maria Queralt-Martin; Vicente M Aguilella; Ding Xiang Liu; Jaume Torres
Journal:  J Virol       Date:  2017-02-14       Impact factor: 5.103

4.  The SARS coronavirus E protein interacts with PALS1 and alters tight junction formation and epithelial morphogenesis.

Authors:  Kim-Tat Teoh; Yu-Lam Siu; Wing-Lim Chan; Marc A Schlüter; Chia-Jen Liu; J S Malik Peiris; Roberto Bruzzone; Benjamin Margolis; Béatrice Nal
Journal:  Mol Biol Cell       Date:  2010-09-22       Impact factor: 4.138

5.  The human and simian immunodeficiency virus envelope glycoprotein transmembrane subunits are palmitoylated.

Authors:  C Yang; C P Spies; R W Compans
Journal:  Proc Natl Acad Sci U S A       Date:  1995-10-10       Impact factor: 11.205

6.  Severe acute respiratory syndrome coronavirus envelope protein ion channel activity promotes virus fitness and pathogenesis.

Authors:  Jose L Nieto-Torres; Marta L DeDiego; Carmina Verdiá-Báguena; Jose M Jimenez-Guardeño; Jose A Regla-Nava; Raul Fernandez-Delgado; Carlos Castaño-Rodriguez; Antonio Alcaraz; Jaume Torres; Vicente M Aguilella; Luis Enjuanes
Journal:  PLoS Pathog       Date:  2014-05-01       Impact factor: 6.823

7.  Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)?

Authors:  Ping Liu; Jing-Zhe Jiang; Xiu-Feng Wan; Yan Hua; Linmiao Li; Jiabin Zhou; Xiaohu Wang; Fanghui Hou; Jing Chen; Jiejian Zou; Jinping Chen
Journal:  PLoS Pathog       Date:  2020-05-14       Impact factor: 6.823

Review 8.  Structural insights into SARS coronavirus proteins.

Authors:  Mark Bartlam; Haitao Yang; Zihe Rao
Journal:  Curr Opin Struct Biol       Date:  2005-11-02       Impact factor: 6.809

9.  Molecular conservation and differential mutation on ORF3a gene in Indian SARS-CoV2 genomes.

Authors:  Sk Sarif Hassan; Pabitra Pal Choudhury; Pallab Basu; Siddhartha Sankar Jana
Journal:  Genomics       Date:  2020-06-12       Impact factor: 5.736

10.  Improved binding of SARS-CoV-2 Envelope protein to tight junction-associated PALS1 could play a key role in COVID-19 pathogenesis.

Authors:  Flavio De Maio; Ettore Lo Cascio; Gabriele Babini; Michela Sali; Stefano Della Longa; Bruno Tilocca; Paola Roncada; Alessandro Arcovito; Maurizio Sanguinetti; Giovanni Scambia; Andrea Urbani
Journal:  Microbes Infect       Date:  2020-09-04       Impact factor: 2.700

View more
  3 in total

1.  Structural basis for SARS-CoV-2 envelope protein recognition of human cell junction protein PALS1.

Authors:  Jin Chai; Yuanheng Cai; Changxu Pang; Liguo Wang; Sean McSweeney; John Shanklin; Qun Liu
Journal:  Nat Commun       Date:  2021-06-08       Impact factor: 14.919

2.  Host PDZ-containing proteins targeted by SARS-CoV-2.

Authors:  Célia Caillet-Saguy; Fabien Durbesson; Veronica V Rezelj; Gergö Gogl; Quang Dinh Tran; Jean-Claude Twizere; Marco Vignuzzi; Renaud Vincentelli; Nicolas Wolff
Journal:  FEBS J       Date:  2021-05-01       Impact factor: 5.622

3.  Evolutionary Analysis of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Reveals Genomic Divergence with Implications for Universal Vaccine Efficacy.

Authors:  Nanda Kumar Yellapu; Shachi Patel; Bo Zhang; Richard Meier; Lisa Neums; Dong Pei; Qing Xia; Duncan Rotich; Rosalyn C Zimmermann; Emily Nissen; Shelby Bell-Glenn; Whitney Shae; Jinxiang Hu; Prabhakar Chalise; Lynn Chollet-Hinton; Devin C Koestler; Jeffery A Thompson
Journal:  Vaccines (Basel)       Date:  2020-10-08
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.