| Literature DB >> 15626343 |
Qingfa Wu1, Yilin Zhang, Hong Lü, Jing Wang, Ximiao He, Yong Liu, Chen Ye, Wei Lin, Jianfei Hu, Jia Ji, Jing Xu, Jie Ye, Yongwu Hu, Wenjun Chen, Songgang Li, Jun Wang, Jian Wang, Shengli Bi, Huanming Yang.
Abstract
The E (envelope) protein is the smallest structural protein in all coronaviruses and is the only viral structural protein in which no variation has been detected. We conducted genome sequencing and phylogenetic analyses of SARS-CoV. Based on genome sequencing, we predicted the E protein is a transmembrane (TM) protein characterized by a TM region with strong hydrophobicity and alpha-helix conformation. We identified a segment (NH2-_L-Cys-A-Y-Cys-Cys-N_-COOH) in the carboxyl-terminal region of the E protein that appears to form three disulfide bonds with another segment of corresponding cysteines in the carboxyl-terminus of the S (spike) protein. These bonds point to a possible structural association between the E and S proteins. Our phylogenetic analyses of the E protein sequences in all published coronaviruses place SARS-CoV in an independent group in Coronaviridae and suggest a non-human animal origin.Entities:
Mesh:
Substances:
Year: 2003 PMID: 15626343 PMCID: PMC5172412 DOI: 10.1016/s1672-0229(03)01017-9
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Fig. 1The predicted TM region in the E protein of SARS-CoV by TMHMM.
Fig. 2Predicted secondary structures in the E protein of SARS-CoV. Software programs: I. PSIpred; II. NNPredict; III. SPLIT; IV. 1-3. Antheprot 5.0.
The Genomic and Biochemical Features of the Entire E Protein and Its Three Subregions
| TM region | N-terminus | C-terminus | E protein | |
|---|---|---|---|---|
| G+C (%) | 42.0 | 35.7 | 40.8 | 40.2 |
| A | 11 (15.9%) | 13 (31.0%) | 30 (25.0%) | 54 (23.4%) |
| U | 29 (42.1%) | 14 (33.3%) | 41 (34.2%) | 84 (36.4%) |
| C | 18 (26.1%) | 6 (14.3%) | 23 (19.2%) | 47 (20.3%) |
| G | 11 (15.9%) | 9 (21.4%) | 26 (21.6%) | 46 (19.9%) |
| Total (nt) | 69 (100%) | 42 (100%) | 120 (100%) | 231 (100%) |
| Leu | 8 (10.6) | 1 (1.3) | 5 (6.6) | 14 (18.4) |
| Val | 4 (5.3) | 2 (2.6) | 8 (10.6) | 14 (18.4) |
| Phe | 3 (3.9) | 1 (1.3) | 0 | 4 (5.3) |
| Ala | 3 (3.9) | 0 | 1 (1.3) | 4 (5.3) |
| Thr | 2 (2.7) | 2 (2.6) | 1 (1.3) | 5 (6.6) |
| Ile | 1 (1.3) | 1 (1.3) | 1 (1.3) | 3 (3.9) |
| Asn | 1 (1.3) | 0 | 4 (5.3) | 5 (6.6) |
| Ser | 1 (1.3) | 2 (2.6) | 4 (5.3) | 7 (9.2) |
| Glu | 0 | 2 (2.6) | 1 (1.3) | 3 (3.9) |
| Tyr | 0 | 1 (1.3) | 3 (3.9) | 4 (5.3) |
| Gly | 0 | 1 (1.3) | 1 (1.3) | 2 (2.6) |
| Met | 0 | 1(1.3) | 0 | 1(1.3) |
| Cys | 0 | 0 | 3 (3.9) | 3 (3.9) |
| Lys | 0 | 0 | 2 (2.6) | 2 (2.6) |
| Arg | 0 | 0 | 2 (2.6) | 2 (2.6) |
| Pro | 0 | 0 | 2 (2.6) | 2 (2.6) |
| Asp | 0 | 0 | 1 (1.3) | 1 (1.3) |
| Total | 23 (30.3) | 14 (18.4) | 39 (51.3) | 76 (100) |
| Molecular Weight (a.a.) | 2491 | 1576 | 4330 | 8361 |
| pI | 5.52 | 3.79 | 8.61 | 6.01 |
| Net Charge | 0 | -2 | +2 | 0 |
| (-2.7%) | (-2.7%, +5.4%) | (-5.4%, +5.4%) |
Fig. 3-I, 3-II, 3-IIIPredicted TM regions in the E protein of the three groups of coronaviruses.
Fig. 4-I, 4-II, 4-IIIThe distribution of GC content (I), charge (II) and hydrophobicity (III) in the E protein of SARS-CoV.
Codon Usage Frequency of the TM Region and the Entire E Protein
| a. a. | TM region | E protein | Codon | TM region | E protein |
|---|---|---|---|---|---|
| Ala | 3 (4.00%) | 4 (5.33%) | GCA | 0 | 0 |
| GCC | 1 (1.33%) | 1 (1.33%) | |||
| GCG | 1 (1.33%) | 2 (2.67%) | |||
| GCU | 1 (1.33%) | 1 (1.33%) | |||
| Cys | 0 | 3 (4.00%) | UGC | 0 | 2 (2.67%) |
| UGU | 0 | 1 (1.33%) | |||
| Asp | 0 | 1 (1.33%) | GAC | 0 | 0 |
| GAU | 0 | 1 (1.33%) | |||
| Glu | 0 | 3 (4.00%) | GAA | 0 | 3 (4.00%) |
| GAG | 0 | 0 | |||
| Phe | 3 (4.00%) | 4 (5.33%) | UUC | 2 (2.67%) | 3 (4.00%) |
| UUU | 1 (1.33%) | 1 (1.33%) | |||
| Gly | 0 | 2 (2.67%) | GGA | 0 | 1 (1.33%) |
| GGC | 0 | 0 | |||
| GGG | 0 | 0 | |||
| GGU | 0 | 1 (1.33%) | |||
| His | 0 | 0 | CAC | 0 | 0 |
| CAU | 0 | 0 | |||
| Ile | 1 (1.33%) | 3 (4.00%) | AUA | 0 | 1 (1.33%) |
| AUC | 1 (1.33%) | 1 (1.33%) | |||
| AUU | 0 | 1 (1.33%) | |||
| Lys | 0 | 2 (2.67%) | AAA | 0 | 2 (2.67%) |
| AAG | 0 | 0 | |||
| Leu | 8 (10.67%) | 14 (18.67%) | CUA | 2 (2.67%) | 2 (2.67%) |
| CUC | 0 | 0 | |||
| CUG | 0 | 2 (2.67%) | |||
| CUU | 5 (6.67%) | 6 (8.00%) | |||
| UUA | 0 | 2 (2.67%) | |||
| UUG | 1 (1.33%) | 2 (2.67%) | |||
| Met | 0 | 1 (1.33%) | AUG | 0 | 1 (1.33%) |
| Asn | 1 (1.33%) | 5 (6.67%) | AAC | 0 | 2 (2.67%) |
| AAU | 1 (1.33%) | 3 (4.00%) | |||
| Pro | 0 | 2 (2.67%) | CCA | 0 | 1 (1.33%) |
| CCC | 0 | 0 | |||
| CCG | 0 | 0 | |||
| CCU | 0 | 1 (1.33%) | |||
| Gln | 0 | 0 | CAA | 0 | 0 |
| CAG | 0 | 0 | |||
| Arg | 0 | 2 (2.67%) | AGA | 0 | 0 |
| AGG | 0 | 0 | |||
| CGA | 0 | 1 (1.33%) | |||
| CGC | 0 | 0 | |||
| CGG | 0 | 0 | |||
| CGU | 0 | 1 (1.33%) | |||
| Ser | 1 (1.33%) | 7 (9.33%) | AGC | 1 (1.33%) | 1 (1.33%) |
| AGU | 0 | 1 (1.33%) | |||
| UCA | 0 | 1 (1.33%) | |||
| UCC | 0 | 0 | |||
| UCG | 0 | 2 (2.67%) | |||
| UCU | 0 | 2 (2.67%) | |||
| Thr | 2 (2.67%) | 5 (6.67%) | ACA | 1 (1.33%) | 2 (2.67%) |
| ACC | 0 | 0 | |||
| ACG | 0 | 2 (2.67%) | |||
| ACU | 1 (1.33%) | 1 (1.33%) | |||
| Val | 4 (5.33%) | 14 (18.67%) 0 | GUA | 2 (2.67%) | 3 (4.00%) |
| GUC | 1 (1.33%) | 3 (4.00%) | |||
| GUG | 1 (1.33%) | 2 (2.67%) | |||
| GUU | 0 | 6 (8.00%) | |||
| Trp | 0 | 0 | UGG | 0 | 0 |
| Tyr | 0 | 4 (5.33%) | UAC | 0 | 4 (5.33%) |
| UAU | 0 | 0 | |||
| STOP | 0 | 1 (1.33%) | UAA | 0 | 1 (1.33%) |
| UAG | 0 | 0 | |||
| UGA | 0 | 0 | |||
Fig. 5Multi-alignment of amino acid sequences of the E protein between SARS-CoV and other coronaviruses. AIBV: avian infectious bronchitis virus; BCoV: bovine coronavirus; CcoV: canine coronavirus; FCoV: feline coronavirus; HCoV-229E: human coronavirus 229E; HCoV-OC43: human coronavirus OC43; MHV: murine hepatitis virus; PEDV: porcine epidemic diarrhea virus; PHEV: porcine hemagglutinating encephalomyelitis virus; RCoV: rat coronavirus; TCoV: turkey coronavirus; TGV: transmissible gastroenteritis virus.
Fig. 6Pairwise similarity between coronaviruses based on the E protein. The numbers indicate the percentage of similarity between each listed coronavirus and BJ01. (HCoV-E: HCoV-229E; HCoV-O: HCoV-OC43.)
Fig. 7An unrooted phylogenetic tree of the coronaviruses based on the amino acid sequence of the E protein by ClustalW and TreeView.
Fig. 8The predicted disulfide bonds between the E and S proteins of SARS-CoV.