| Literature DB >> 16144519 |
Gordana M Pavlović-Lazetić1, Nenad S Mitić, Andrija M Tomović, Mirjana D Pavlović, Milos V Beljanski.
Abstract
A dataset of 103 SARS-CoV isolates (101 human patients and 2 palm civets) was investigated on different aspects of genome polymorphism and isolate classification. The number and the distribution of single nucleotide variations (SNVs) and insertions and deletions, with respect to a "profile", were determined and discussed ("profile" being a sequence containing the most represented letter per position). Distribution of substitution categories per codon positions, as well as synonymous and non-synonymous substitutions in coding regions of annotated isolates, was determined, along with amino acid (a.a.) property changes. Similar analysis was performed for the spike (S) protein in all the isolates (55 of them being predicted for the first time). The ratio Ka/Ks confirmed that the S gene was subjected to the Darwinian selection during virus transmission from animals to humans. Isolates from the dataset were classified according to genome polymorphism and genotypes. Genome polymorphism yields to two groups, one with a small number of SNVs and another with a large number of SNVs, with up to four subgroups with respect to insertions and deletions. We identified three basic nine-locus genotypes: TTTT/TTCGG, CGCC/TTCAT, and TGCC/TTCGT, with four subgenotypes. Both classifications proposed are in accordance with the new insights into possible epidemiological spread, both in space and time.Entities:
Mesh:
Year: 2005 PMID: 16144519 PMCID: PMC5172477 DOI: 10.1016/s1672-0229(05)03004-4
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
SARS-CoV Genome Polymorphism 20 Geno
Shaded entries correspond to annotated isolates. Identification (Label and ID) is given in accordance with the labels and identifiers from Table S1. The four SNVs columns correspond to: the total number of SNVs, the number of SNVs in genes, in 5’ and 3’ UTRs, and in IGR. The seven columns named INDELs include the number of deletions at the 5’ end (5’ del), the length of long insertions (longIns) and long deletions (longDel), the number and length of short insertions (shortIns) and short deletions (shortDel) in the form a × b where b denotes the length and a denotes the number of occurrences, the number of deletions at the 3’ end (3’ del), and the length of a poly-A sequence at the 3’ end (3’ poly-A). Classification includes two columns. The Type column corresponds to the nine-locus nucleotides that are given in the form NNNN/NNNNN and represent nucleotides at (relative to CLUSTAL X output) positions 9,420, 17,604, 222,274, 27,891 / 3,861, 9,495, 11,514, 21,773, 26,534, respectively (absolute HSR 1 positions 9,404, 17,564, 22,222, 27,827 / 3,852, 9,479, 11,493, 21,721, 26,477). The last column, Group, reflects grouping of isolates.
Fig. 1Density distribution of SNVs (B), INDELs (C), mapped onto the gene map of the HSR 1 isolate, coinciding with the “profile” (A). Central region of the genome is rather conserved (lower density of SNVs is exhibited in the second third of the genome, ORF 1b), while the rest of the genome features high SNVs density. SNV peaks are present at (absolute HSR 1) positions 3,852, 9,404, 9,479, 11,493, 17,564 (ORF lab), 21,721, 22,222 (S protein), 26,477 (M protein), and 27,827 (ORF 8a).
Fig. 2Distribution of nucleotide substitution categories. The most represented are the substitutions C↔T and the least represented are the substitutions C↔G.
Fig. 3Comparison of nucleotide structures of SARS-CoV complete genome isolates, represented in parts A and B of the figure according to similarity in their SNVs or INDELs positions.
Distribution of Nucleotides on Distance 1 Left and Right to SNV Sites
| Nt | (−1)num | (−1)% | (−1)diff% | (+1)num | (+1)% | (+1)diff% |
|---|---|---|---|---|---|---|
| A | 358 | 35.59% | 7.17% | 283 | 28.13% | −0.29% |
| C | 179 | 17.79% | −2.16% | 203 | 20.18% | 0.23% |
| G | 230 | 22.86% | 2.05% | 215 | 21.37% | 0.56% |
| Τ | 238 | 23.66% | −7.13% | 302 | 30.02% | −0.77% |
The distribution of nucleotides on distance 1 left to SNV sites (−1) and right to SNV sites (+1) is presented in total number of nucleotides, percentage, and difference from their overall percentage in the dataset.
Mutation Analysis of the S Protein: Categories of Nucleotide Substitutions
| 1.pos | 2.pos | 3.pos | Total No. | 1.pos% | 2.pos% | 3.pos% | Total% | Silent | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Transitions | A-G | A→G | 6/15 | 2/8 | 3/20 | 11/43 | 16/73 | 7.21% | 3.85% | 9.62% | 20.68% | 3/20 |
| G→A | 2/7 | 2/22 | 1/1 | 5/30 | 3.37% | 10.58% | 0.48% | 14.43% | 1/1 | |||
| C-T | C→T | 3/7 | 6/25 | 6/19 | 15/51 | 24/94 | 3.37% | 12.02% | 9.13% | 24.52% | 6/19 | |
| T→C | 2/3 | 3/29 | 4/11 | 9/43 | 1.44% | 13.94% | 5.29% | 20.67% | 4/11 | |||
| Total | 13/32 | 13/84 | 14/51 | 40/167 | 15.38% | 40.38% | 24.52% | 80.28% | 14/51 | |||
| Transversions | A→C | A→C | 2/2 | 1/1 | 2/2 | 5/5 | 8/10 | 0.96% | 0.48% | 0.96% | 2.40% | 2/2 |
| C→A | 1/1 | 1/2 | 1/2 | 3/5 | 0.48% | 0.96% | 0.96% | 2.40% | 0 | |||
| A-T | A→T | 0 | 0 | 0 | 0 | 6/7 | 0 | 0 | 0 | 0 | 0 | |
| T→A | 2/2 | 1/1 | 3/4 | 6/7 | 0.96% | 0.48% | 1.92% | 3.37% | 1/1 | |||
| G-C | G→C | 1/1 | 0 | 0 | 1/1 | 3/5 | 0.48% | 0 | 0 | 0.48% | 0 | |
| C→G | 0 | 2/4 | 0 | 2/4 | 0 | 1.92% | 0 | 1.92% | 0 | |||
| G-T | G→T | 0 | 1/1 | 1/1 | 2/2 | 5/19 | 0 | 0.48% | 0.48% | 0.96% | 1/1 | |
| T→G | 2/14 | 0 | 1/3 | 3/17 | 6.73% | 0 | 1.44% | 8.17% | 1/3 | |||
| Total | 8/20 | 6/9 | 8/12 | 22/41 | 9.62% | 4.33% | 5.77% | 19.72% | 5/7 | |||
| Total | 21/52 | 19/93 | 22/63 | 62/208 | 25.00% | 44.71% | 30.29% | 100% | 19/58 | |||
S proteins in 91 isolates are considered. The number of transition and transversion sites and the number of SNVs (in the form N1/N2) per position in codon and per mutation type, as well as the percentage of SNVs, and the number of silent mutation sites and silent SNVs (in the form N1/N2), are presented.
Fig. 4Positions of synonymous and non-synonymous a.a. substitutions plotted against S protein primary structure. The y-axis represents number of SNVs per positions. SP, signal peptide; ED, external domain; TM, trans-membrane domain; and ID, internal domain (http://expasy.org/). A. RBD determined by: 1. Babcock et al. , 2. Xiao et al. , 3. Wong et al. , 4. Zhao et al. , and 5. Zhou et al. ; B. epitope regions determined by: 1. Wang et al. , 2. Chou et al. , 3. Greenough et al. , 4. Sui et al. , 5. van den Brink et al. , 6. Lu et al. , 7. Hua et al. , 8. Ren et al. , 9. He et al. , 10. Zhou et al. , 11. Zhang et al. , and 12. Keng et al. .
Mutation Analysis of the S Protein: Coefficients Ka, Ks, and the Ratio Ka/Ks with An Outgroup
| Outgroup | Ka | Ks | Ka/Ks |
|---|---|---|---|
| SZ16 (AY304488) | 0.006257 | 0.004930 | 1.26935>M |
| SZ3 (AY304486) | 0.005889 | 0.003803 | 1.54856>1 |
Coefficients Ka, Ks are calculated for all the human patients’ isolates and one of the palm civet isolates as an outgroup.
Fig. 5Three-level classification of 103 SARS-CoV genome isolates. Grouping of isolates is based on genome polymorphism, and classification is based on nine distinguished loci, mapped onto the bootstrapped phylogenetic tree obtained using CLUSTAL X and Neighbor Joining method, and drawn using PhyloDraw programs. Bootstrapping is performed with random number generated seed 111 and number of trials in bootstrap 1000. The two basic groups, A and B, are represented in yellow and blue, respectively. Types obtained according to the nine genome loci (9,404, 17,564, 22,222, 27,827 / 3,852, 9,479, 11,493, 21,721, 26,477) are labeled along the left edge of the figure and have the form NNNN / NNNNN, where N represents any nucleotide. Different subtypes are denoted by the corresponding substituted nucleotides in red. Dotted lines distinguish between the three epidemiological phases.