| Literature DB >> 15629035 |
Jing Xu1, Jianfei Hu, Jing Wang, Yujun Han, Yongwu Hu, Jie Wen, Yan Li, Jia Ji, Jia Ye, Zizhang Zhang, Wei Wei, Songgang Li, Jun Wang, Jian Wang, Jun Yu, Huanming Yang.
Abstract
Annotation of the genome sequence of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) is indispensable to understand its evolution and pathogenesis. We have performed a full annotation of the SARS-CoV genome sequences by using annotation programs publicly available or developed by ourselves. Totally, 21 open reading frames (ORFs) of genes or putative uncharacterized proteins (PUPs) were predicted. Seven PUPs had not been reported previously, and two of them were predicted to contain transmembrane regions. Eight ORFs partially overlapped with or embedded into those of known genes, revealing that the SARS-CoV genome is a small and compact one with overlapped coding regions. The most striking discovery is that an ORF locates on the minus strand. We have also annotated non-coding regions and identified the transcription regulating sequences (TRS) in the intergenic regions. The analysis of TRS supports the minus strand extending transcription mechanism of coronavirus. The SNP analysis of different isolates reveals that mutations of the sequences do not affect the prediction results of ORFs.Entities:
Mesh:
Year: 2003 PMID: 15629035 PMCID: PMC5172239 DOI: 10.1016/s1672-0229(03)01028-3
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Predicted ORFs and Their Physiochemical Characteristics in the SARS-CoV Genome (Isolate BJ01)
| ORF | Position | Length (nt) | GC content (%) | Average MW (kDa) | pI | Hydrophobicity (%) | Hydrophilicity (%) | Charge (+)(%) | Charge (−)(%) |
|---|---|---|---|---|---|---|---|---|---|
| R | 246-13,379 | 21,222 | 40.8 | 790.28 | 6.3 | 30.8 | 44.3 | 11.8 | 10.5 |
| BGI-PUP-R-1 | 715-1,206 | 492 | 46.7 | 17.74 | 11.5 | 33.1 | 49.1 | 16.0 | 1.2 |
| S | 21,473-25,240 | 3,768 | 38.7 | 139.17 | 5.5 | 30.4 | 44.8 | 9.1 | 9.2 |
| BGI-PUP-S-1 | 21,936-22,082 | 147 | 32.6 | 5.64 | 9.7 | 47.9 | 37.5 | 12.5 | 2.1 |
| BGI-PUP-S-2 | 22,461-22,595 | 135 | 36.2 | 4.99 | 9.6 | 47.7 | 38.6 | 11.4 | 2.3 |
| BGI-PUP-S-3 | 23,238-23,384 | 147 | 40.1 | 5.75 | 9.3 | 50.0 | 35.4 | 12.5 | 2.1 |
| BGI-PUP-S-4 | 24,798-24,998 | 201 | 38.8 | 7.43 | 11.0 | 39.4 | 53.0 | 16.7 | 0.0 |
| BGI-PUP-S-5 | 25,188-25,310 | 123 | 34.9 | 4.91 | 9.2 | 30.0 | 50.0 | 15.0 | 2.5 |
| PUP1 | 25,249-26,073 | 825 | 40.3 | 30.90 | 5.6 | 34.7 | 39.1 | 8.4 | 8.0 |
| PUP2 | 25,670-26,134 | 465 | 40.6 | 17.72 | 11.0 | 37.0 | 51.9 | 19.5 | 0.6 |
| E | 26,098-26,328 | 231 | 40.3 | 8.36 | 6.0 | 47.4 | 32.9 | 5.3 | 5.3 |
| M | 26,379-27,044 | 666 | 45.2 | 25.06 | 9.3 | 40.7 | 36.2 | 10.9 | 5.9 |
| PUP3 | 27,055-27,246 | 192 | 31.2 | 7.54 | 4.7 | 47.6 | 42.9 | 11.1 | 15.9 |
| PUP4 | 27,254-27,622 | 369 | 40.1 | 13.94 | 8.3 | 33.6 | 42.6 | 13.1 | 8.2 |
| BGI-PUP4-1 | 27,619-27,753 | 135 | 31.8 | 5.30 | 3.9 | 61.4 | 27.3 | 2.3 | 13.6 |
| PUP-Int-1 | 27,760-27,879 | 120 | 39.1 | 4.38 | 9.1 | 35.9 | 43.6 | 17.9 | 5.1 |
| PUP-Int-2 | 27,845-28,099 | 255 | 40.0 | 9.56 | 9.4 | 31.0 | 41.7 | 15.5 | 3.6 |
| N | 28,101-29,369 | 1,269 | 48.4 | 46.03 | 10.1 | 17.3 | 54.0 | 15.4 | 8.5 |
| PUP5 | 28,111-28,407 | 297 | 51.8 | 10.80 | 4.9 | 32.7 | 46.9 | 9.2 | 11.2 |
| PUP-N-1 | 28,564-28,776 | 213 | 53.5 | 7.85 | 6.3 | 34.3 | 35.7 | 12.9 | 10.0 |
| BGI-PUP-Neg-1 | 29,523-29,678 | 156 | 44.2 | 5.90 | 11.8 | 52.9 | 29.4 | 13.7 | 0.0 |
MW: molecular weight; nt: nucleotide; pI: isoelectric point.
Fig. 1The genome organization of the SARS-CoV (Isolate BJ01).
Fig. 2Predicted transmembrane structure of PUP1 (TMHMM). Red blocks on the top line are predicted transmembrane domains. The abscissa represents the position on sequence, and the ordinate represents the probability of prediction.
Fig. 3Predicted transmembrane structure of PUP4 (TMHMM).
Fig. 4Predicted transmembrane structure of BGI-PUP4-1 (TMHMM).
Fig. 5Predicted transmembrane structure of BGI-Neg-1 (TMHMM).
Substitution Status of ORFs in the Genome of SARS-CoV
| ORF | Size (nt) | Substitutions | Non-synonymous Substitutions | Substitute rate (%) | |||
|---|---|---|---|---|---|---|---|
| R | 21,222 | 223 | 171 | 1.05 | 0.91 | 0.98 | 0.93 |
| BGI-PUP-R-1 | 492 | 3 | 1 | 0.61 | 0.57 | 0.52 | 1.09 |
| S | 3,768 | 47 | 38 | 1.25 | 1.14 | 0.94 | 1.21 |
| BGI-PUP-S-1 | 147 | 0 | 0 | 0 | 0.00 | 0.00 | 0 |
| BGI-PUP-S-2 | 135 | 2 | 1 | 1.48 | 3.72 | 0.79 | 4.70 |
| BGI-PUP-S-3 | 147 | 0 | 0 | 0 | 0.00 | 0.00 | 0 |
| BGI-PUP-S-4 | 201 | 3 | 0 | 1.49 | 0.00 | 1.89 | 0 |
| BGI-PUP-S-5 | 123 | 5 | 1 | 4.07 | 3.18 | 3.69 | 0.86 |
| PUP1 | 825 | 25 | 21 | 3.03 | 2.88 | 1.93 | 1.49 |
| PUP2 | 465 | 14 | 10 | 3.01 | 2.45 | 3.33 | 0.73 |
| E | 231 | 2 | 2 | 0.87 | 1.02 | 0.00 | 0 |
| M | 666 | 8 | 4 | 1.2 | 0.69 | 2.23 | 0.31 |
| PUP3 | 192 | 8 | 7 | 4.17 | 3.99 | 2.34 | 1.71 |
| PUP4 | 369 | 3 | 3 | 0.81 | 0.94 | 0.00 | 0 |
| PUP4-1 | 135 | 0 | 0 | 0 | 0.00 | 0.00 | 0 |
| PUP-Int-1 | 120 | 5 | 0 | 4.17 | 0.00 | 4.62 | 0 |
| PUP-Int-2 | 255 | 2 | 0 | 0.78 | 0.00 | 0.91 | 0 |
| N | 1,269 | 9 | 4 | 0.71 | 0.36 | 1.49 | 0.24 |
| PUP-N-1 | 213 | 2 | 0 | 0.94 | 0.00 | 1.29 | 0 |
| PUP5 | 297 | 2 | 2 | 0.67 | 0.77 | 0.00 | 0 |
| BGI-PUP-Neg-1 | 156 | 0 | 0 | 0 | 0.00 | 0.00 | 0 |
Fig. 6The TRS sequences in the SARS-CoV genome (Isolate BJ01). *This refers to the number of nucleotides between the first nucleotide of the TRSs and the first letter of the start codon of the corresponding ORFs.
Fig. 7Homological comparison of the 5’ end and 3’ end of the SARS-CoV genome (Isolate BJ01).
Comparison of Prediction and Annotation of SARS-CoV (Isolate BJ01)
| Prediction | Combined result | Annotation | |||||
|---|---|---|---|---|---|---|---|
| FGENESV | Glimmer | ZCURVE_CoV | BGFV | BJ01 | Tor2 | Urbani | SIN2500 |
| R | ORF1 | R | R | ||||
| F1 | G1 | orf1a | B1 | ORF1a | ORF1a | orf1a | orf1 |
| F2 | BGI-PUP-R-1 | ||||||
| F3 | G2 | orf1b | B2 | ORF1b | ORF1b | ||
| F4 | G3 | S | B3 | S | S | S | S |
| F5 | G4 | Sars274 | B4 | PUP1 | ORF3 | X1 | PUP1 |
| F6 | PUP2 | ORF4 | X2 | PUP2 | |||
| F7 | E | E | E | E | E | ||
| F8 | G5 | M | B5 | M | M | M | M |
| F9 | Sars63 | B6 | PUP3 | ORF7 | X3 | PUP3 | |
| F10 | G6 | Sars122 | B7 | PUP4 | ORF8 | X4 | PUP4 |
| G7 | Sars44 | PUP4-1 | ORF9 | ||||
| F11 | Sars39 | PUP-Int-1 | ORF10 | ||||
| F12 | G8 | Sars84 | B8 | PUP-Int-2 | ORF11 | X5 | |
| F13 | G9 | N | B9 | N | N | N | N |
| PUP5 | ORF13 | PUP5 | |||||
| PUP-N-1 | ORF14 | ||||||
| F14 | BGI-PUP-Neg-1 | ||||||
The ORF on the minus-strand, predicted by FGENESV.
Glimmer (Version 2) predicted ORFs, not starting with AUG.
Fig. 8Comparison of TRS in SARS-CoV Isolate BJ01 (minus sense). Core segments (CUAAACGAA) are mark up in bold style. Distance in part B is the number of nucleotide between the last letters of the TRSs to the terminal codon of their corresponding ORFs.