| Literature DB >> 15629034 |
Jianfei Hu1, Jing Wang, Jing Xu, Wei Li, Yujun Han, Yan Li, Jia Ji, Jia Ye, Zhao Xu, Zizhang Zhang, Wei Wei, Songgang Li, Jun Wang, Jian Wang, Jun Yu, Huanming Yang.
Abstract
Knowledge of the evolution of pathogens is of great medical and biological significance to the prevention, diagnosis, and therapy of infectious diseases. In order to understand the origin and evolution of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus), we collected complete genome sequences of all viruses available in GenBank, and made comparative analyses with the SARS-CoV. Genomic signature analysis demonstrates that the coronaviruses all take the TGTT as their richest tetranucleotide except the SARS-CoV. A detailed analysis of the forty-two complete SARS-CoV genome sequences revealed the existence of two distinct genotypes, and showed that these isolates could be classified into four groups. Our manual analysis of the BLASTN results demonstrates that the HE (hemagglutinin-esterase) gene exists in the SARS-CoV, and many mutations made it unfamiliar to us.Entities:
Mesh:
Substances:
Year: 2003 PMID: 15629034 PMCID: PMC5172238 DOI: 10.1016/s1672-0229(03)01027-1
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Fig. 1Comparison of the GC content and genome size between SARS-CoV with other viruses.
Fig. 2Conservative map created by comparing the SARS-CoV genome sequence against a database of non-coronaviruse sequence. Only the comparison result with segment length greater than 20 nt was extracted.
Fig. 3Homology comparison of BJ01 with other coronaviruses. The darkness of a pixel corresponds to the strength (identities value) of the match between a SARS-CoV fragment and a coronavirus genome, and the width of rectangle corresponds to the length of the match.
Fig. 4Dinucleotide motif frequency profile. The darkness of a pixel corresponds to the frequency. The darker the pixel shows, the greater the frequency is.
Fig. 5Tetranucleotide motif frequency profile.
The Frequency and Content of the First Two Richest Tetranucleotides
| Primary | Secondary | Viruses |
|---|---|---|
| TGTT(67, 1.45%) | TGGT(62, 1.34%) | BMV |
| TGTT(42, 1.14%) | GTTA(39, 1.06%) | CMV |
| TGTT(353, 1.23%) | TTGT(329, 1.15%) | TGEV |
| TGTT(387, 1.40%) | TTGT(354, 1.28%) | AIBV |
| TGTT(397, 1.42%) | TTTT(337, 1.20%) | PEDV |
| TGTT(397, 1.42%) | TTTT(337, 1.20%) | PEDV_strainC |
| TGTT(413, 1.32%) | TTGT(355, 1.14%) | MHV_strain2 |
| TGTT(417, 1.33%) | TTGT(369, 1.18%) | MHV |
| TGTT(418, 1.34%) | TTGT(371, 1.19%) | MHV_ML-10 |
| TGTT(426, 1.37%) | TTGT(365, 1.17%) | MHV_Penn |
| TGTT(499, 1.61%) | TTTT(456, 1.47%) | Bovine_CoV |
| TGTT(501, 1.83%) | TTGT(402, 1.47%) | HCoV-229E |
| TGTT(502, 1.61%) | TTTT(466, 1.50%) | BCoV_Quebac |
Beet Mosaic Virus (Accession number: AF061869; Beet soil-borne mosaic virus, ssRNA positive-strand viruses, Benyvirus)
CMV: cereal mosaic virus (Accession number: AJ132577; Soil-borne cereal mosaic virus, ssRNA positive-strand viruses; Furovirus).
Fig. 6The codon usage frequency of mouse, human, human papillomavirus and six coronaviruses.
The Statistics of Type and Codon Phase of Nucleotide Substitution in Coding Region
| Codon phase | Transition | Transversion | Total | Percent | ||||
|---|---|---|---|---|---|---|---|---|
| AG | CT | AC | AT | GC | GT | |||
| 1 | 33 | 28(5) | 17 | 8 | 2 | 10 | 98(5) | 29.79% |
| 2 | 35 | 40 | 11 | 12 | 5 | 9 | 112 | 34.04% |
| 3 | 24(23) | 57(57) | 12(7) | 15(10) | 2 | 9(4) | 119(101) | 36.17% |
| Total | 92(23) | 125(63) | 40(7) | 35(10) | 9 | 28(4) | 329(106) | 100% |
| Percent | 27.96% | 37.99% | 12.16% | 10.64% | 2.74% | 8.51% | 100% | |
The number in the parentheses is synonymous number.
Fig. 7The phylogenetic tree of 42 SARS-CoV isolates with every sequence being constructed by the nucleotides of the 338 substitution sites.
The Big Segment Insertion and Deletion of SARS-CoV
| Isolate | Genome size (nt) | Indel (ref. to BJ01) | Source |
|---|---|---|---|
| GD01 | 29,757 | 29 nt insertion (27,863–27,864) | Guangdong |
| GD02 | 29,753 | 29 nt insertion (27,863–27,864) | Guangdong |
| GD05 | 29,757 | 29 nt insertion (27,863–27,864) | Guangdong |
| GD06 | 29,675 | 54 nt deletion (27,837–27,900) | Guangdong |
| HK02 | 29,339 | 386 nt deletion (27,698–28,083) | Hong Kong |
The Complete Genome Sequences of 12 Isolates of Coronavirus
| Isolate | Accession number | Genome size(nt) | Modification date |
|---|---|---|---|
| BJ01 | AY278488 | 29,726 | 12-May-03 |
| AIBV | NC_001451.1 | 27,608 | 19-Nov-02 |
| HCoV-229E | NC_002645.1 | 27,317 | 19-Apr-03 |
| PEDV | NC_003436.1 | 28,033 | 26-Apr-03 |
| PEDV_strainC | AF353511.1 | 28,033 | 29-Nov-01 |
| TGEV | NC_002306.2 | 28,586 | 28-Apr-03 |
| Bovine_CoV | NC_003045.1 | 31,026 | 25-Apr-03 |
| BCoV_Quebac strain | AF220295.1 | 31,100 | 1-Apr-03 |
| MHV_Penn 97-1 | AF208066 | 31,112 | 11-May-00 |
| MHV_ML-10 | AF208067 | 31,233 | 3-Jan-02 |
| MHV_strain2 | AF201929.1 | 31,276 | 3-Jan-02 |
| MHV | NC_001846 | 31,357 | 7-Jan-03 |
The Complete Genome Sequences of 32 SARS-CoV Isolates in GenBank (17-Aug-03 update)
| Isolate | Genome size(nt) | Accession number | Modification date |
|---|---|---|---|
| BJ01 | 29,725 | AY278488.2 | 1-May-03 |
| BJ02 | 29,745 | AY278487.3 | 5-Jun-03 |
| BJ03 | 29,740 | AY278490.3 | 5-Jun-03 |
| BJ04 | 29,732 | AY279354.2 | 5-Jun-03 |
| GD01 | 29,757 | AY278489.2 | 5-Jun-03 |
| ZMY1 | 29,749 | AY351680.1 | 3-Aug-03 |
| ZJ01 | 29,715 | AY297028.1 | 19-May-03 |
| TOR2 | 29,751 | NC_004718.3 | 13-Aug-03 |
| Urbani | 29,727 | AY278741.1 | 12-Aug-03 |
| CUHK-Su10 | 29,736 | AY282752.1 | 7-May-03 |
| CUHK-W1 | 29,736 | AY278554.2 | 31-Jul-03 |
| HKU-39849 | 29,742 | AY278491.2 | 18-Apr-03 |
| Frankfurt1 | 29,727 | AY291315.1 | 11-Jun-03 |
| FRA | 29,740 | AY310120.1 | 12-Aug-03 |
| HSR1 | 29,751 | AY323977.2 | 22-Jul-03 |
| Sin2500 | 29,711 | AY283794.1 | 12-Aug-03 |
| Sin2677 | 29,705 | AY283795.1 | 12-Aug-03 |
| Sin2679 | 29,711 | AY283796.1 | 12-Aug-03 |
| Sin2748 | 29,706 | AY283797.1 | 12-Aug-03 |
| Sin2774 | 29,711 | AY283798.1 | 12-Aug-03 |
| TC1 | 29,573 | AY338174.1 | 28-Jul-03 |
| TC2 | 29,573 | AY338175.1 | 28-Jul-03 |
| TC3 | 29,573 | AY348314.1 | 29-Jul-03 |
| TW1 | 29,729 | AY291451.1 | 14-May-03 |
| TWC | 29,725 | AY321118.1 | 26-Jun-03 |
| TWC2 | 29,727 | AY362698.1 | 13-Aug-03 |
| TWC3 | 29,727 | AY362699.1 | 13-Aug-03 |
| TWH | 29,727 | AP006557.1 | 2-Aug-03 |
| TWJ | 29,725 | AP006558.1 | 2-Aug-03 |
| TWK | 29,727 | AP006559.1 | 2-Aug-03 |
| TWS | 29,727 | AP006560.1 | 2-Aug-03 |
| TWY | 29,727 | AP006561.1 | 2-Aug-03 |
Ten Newly Sequenced Complete SARS-CoV Genomes by Beijing Genomics Institute
| Isolate | Genome size (nt) | Source |
|---|---|---|
| HK01 | 29,720 | Hong Kong |
| HK02 | 29,339 | Hong Kong |
| HK03 | 29,721 | Hong Kong |
| HK04 | 29,723 | Hong Kong |
| GD02 | 29,753 | Guangdong |
| GD03 | 29,720 | Guangdong |
| GD04 | 29,725 | Guangdong |
| GD05 | 29,757 | Guangdong |
| GD06 | 29,675 | Guangdong |
| GD07 | 29,725 | Guangdong |
Fig. 8CGR arrangement of motif or codon.