| Literature DB >> 27676249 |
Canping Huang1, William J Liu1,2, Wen Xu3, Tao Jin4, Yingze Zhao1, Jingdong Song1, Yi Shi5, Wei Ji1, Hao Jia1,2, Yongming Zhou3, Honghua Wen6, Honglan Zhao1, Huaxing Liu6, Hong Li3, Qihui Wang7, Ying Wu5, Liang Wang5, Di Liu5,8, Guang Liu4, Hongjie Yu9, Edward C Holmes10, Lin Lu3, George F Gao1,2,5,11,12,13.
Abstract
The emergence of severe acute respiratory syndrome coronavirus (SARS-CoV) in 2002 and Middle East respiratory syndrome coronavirus (MERS-CoV) in 2012 has generated enormous interest in the biodiversity, genomics and cross-species transmission potential of coronaviruses, especially those from bats, the second most speciose order of mammals. Herein, we identified a novel coronavirus, provisionally designated Rousettus bat coronavirus GCCDC1 (Ro-BatCoV GCCDC1), in the rectal swab samples of Rousettus leschenaulti bats by using pan-coronavirus RT-PCR and next-generation sequencing. Although the virus is similar to Rousettus bat coronavirus HKU9 (Ro-BatCoV HKU9) in genome characteristics, it is sufficiently distinct to be classified as a new species according to the criteria defined by the International Committee of Taxonomy of Viruses (ICTV). More striking was that Ro-BatCoV GCCDC1 contained a unique gene integrated into the 3'-end of the genome that has no homologs in any known coronavirus, but which sequence and phylogeny analyses indicated most likely originated from the p10 gene of a bat orthoreovirus. Subgenomic mRNA and cellular-level observations demonstrated that the p10 gene is functional and induces the formation of cell syncytia. Therefore, here we report a putative heterologous inter-family recombination event between a single-stranded, positive-sense RNA virus and a double-stranded segmented RNA virus, providing insights into the fundamental mechanisms of viral evolution.Entities:
Year: 2016 PMID: 27676249 PMCID: PMC5038965 DOI: 10.1371/journal.ppat.1005883
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Fig 1Genome organization and phylogenetic history of Ro-BatCoV GCCDC1.
Genome organization of Ro-BatCoV GCCDC1. Nonstructural genes and putative mature nonstructural proteins, structural genes, and 5’- and 3’-UTR are illustrated with yellow, dark blue and light blue colors, respectively. The remarkable p10 gene is shown in red. The potential origin of the p10 gene is indicated by a dotted arrow and a question mark. The leader sequence and leader transcription regulatory sequence (TRS) are directly shown with nucleobases. The bat, Rousettus leschenaulti, is used to show the host species that Ro-BatCoV GCCDC1 was discovered. The schematic virion of coronavirus is used to show the virus that identified in the present study. The schematic virion of orthoreovirus and the segment S1 of the genome that it contains are used to demonstrate the possible origin of the p10 gene.
Coding potential, transcription regulatory sequences and sequence comparisons of Ro-BatCoV GCCDC1 with Ro-BatCoV HKU9 strains, SARS-CoV, BatCoV HKU3 stains, MERS-CoV, BatCoV HKU4 strains and BatCoV HKU5 strains.
| ORFs | Nucleotide positions (start to end) | Predicted size (aa) of protein | Pairwise amino acid identity (%) | Leader TRS region and intergenic TRS | Distance (nt) from TRS to ATG | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| HKU9s | SARS-CoV | HKU3s | MERS-CoV | HKU4s | HKU5s | |||||
|
| 235–21155 | 6973 | 74.6–75.4 | 47.1 | 47.3–47.4 | 46.3 | 45.7–45.8 | 46.1–46.2 | TCTA | 156 |
|
| 21121–24993 | 1290 | 58.8–66.4 | 32.1 | 31.3–31.4 | 27.9 | 28.4–28.6 | 27.7–27.8 | CTGA | 42 |
|
| 24990–25676 | 228 | 48.9–50.7 |
|
|
|
|
| GAAA | 3 |
|
| 25676–25906 | 76 | 61.8–69.7 | 32.9 | 32.9 | 23.7 | 26.3 | 26.3 | GATG | 4 |
|
| 25914–26579 | 221 | 75.6–80.5 | 43.2 | 42.3–42.7 | 41.1 | 41.1 | 43.6 | TCTA | 27 |
|
| 26627–27958 | 443 | 68.0–71.7 | 39.1 | 39.5–40.0 | 41.7 | 41.4–41.7 | 38.4–38.7 | GGAA | 5 |
|
| 27994–29269 | 91 |
|
|
|
|
|
| AATG | 97 |
|
| 28287–28850 | 187 | 40.0 |
|
|
|
|
| TCTA | 5 |
|
| 28863–29444 | 193 | 32.0–42.0 |
|
|
|
|
| TCTA | 1 |
|
| 29444–29893 | 149 | 30.0–53.0 |
|
|
|
|
| GTAG | 4 |
a: Calculated with MEGA6 [21] using a pairwise deletion option [13].
b: Underlined and bold type indicates the conserved nucleotide in the TRS core sequence.
c: GenBank accession numbers of the viruses used in this analysis: NC_009021, EF065514, EF065515, EF065516, HM211098, HM211099, HM211100, HM211101.
d: GenBank accession numbers of the viruses used in this analysis: NC_004718.
e: GenBank accession numbers of the viruses used in this analysis: DQ022305, DQ084199, DQ084200, GQ153539, GQ153540, GQ153541, GQ153542, GQ153543, GQ153544, GQ153545, GQ153546, GQ153547, GQ153548.
f: GenBank accession numbers of the viruses used in this analysis: NC_019843.
g: GenBank accession numbers of the viruses used in this analysis: NC_009019, EF065506, EF065507, EF065508.
h: GenBank accession numbers of the viruses used in this analysis: NC_009020, EF065510, EF065511, EF065512.
i: In the genomes of Ro-BatCoV GCCDC1, Ro-BatCoV HKU9 strains, SARS-CoV, BatCoV HKU3 strains, MERS-CoV, BatCoV HKU4 strains and BatCoV HKU5 strains, the organization of the region between the spike and E genes is divergent; there is no comparability of ORFs in this region.
j: As the p10 gene is specific to Ro-BatCoV GCCDC1, there are no homologous genes in the genomes of Ro-BatCoV HKU9, SARS-CoV, BatCoV HKU3, MERS-CoV, BatCoV HKU4 or BatCoV HKU5.
k: The NS7a protein of Ro-BatCoV GCCDC1 shares 40% amino acid identity (87% coverage) with the NS7a protein of Bat coronavirus HKU9-1; There is no comparability with any ORF of other coronaviruses.
l: The NS7b protein of Ro-BatCoV GCCDC1 shares 32% - 42% amino acid identity (88% - 91% coverage) with the NS7a proteins of Bat coronavirus HKU9-3, HKU9-4, HKU9-10-1 and HKU9-10-2.
m: The NS7c protein of Ro-BatCoV GCCDC1 shares 30% - 53% amino acid identity (90% - 99% coverage) with the NS7b protein of Bat coronavirus HKU9-1, HKU9-2, HKU9-3, HKU9-4, HKU9-10-1 and HKU9-10-2.
Prediction of the putative pp1a/pp1ab cleavage sites of Ro-BatCoV GCCDC1 based on comparison with prototype coronaviruses .
| NSP | amino acids positions in ORF1a/ORF1b | Predicted size (aa) of protein | C-end putative cleavage site | Putative functional domain(s) |
|---|---|---|---|---|
|
| M1-G174 | 174 | RG|GN | Unknown |
|
| G175-G772 | 598 | GG|GK | Unknown |
|
| G773-G2653 | 1881 | VG|GN | ADRP, PL2 pro |
|
| G2654-Q3147 | 494 | LQ|AG | Hydrophobic domain |
|
| A3148-Q3453 | 306 | LQ|SR | 3CL pro |
|
| S3454-Q3741 | 288 | IQ|SN | Hydrophobic domain |
|
| S3742-Q3824 | 83 | LQ|AV | Unknown |
|
| A3825-Q4024 | 200 | LQ|NN | Primase |
|
| N4025-H4136 | 112 |
| Unknown |
|
| A4137-H4275 | 139 |
| Unknown |
|
| A4276-S4289 | 14 | Unknown (short peptide at the end of ORF1a) | |
|
| A4276-Q5207 | 932 | LQ|SV | RdRp |
|
| S5207-Q5808 | 601 | TQ|SA | HEL, NTPase |
|
| S5809-Q6338 | 530 | LQ|SL | ExoN, NMT |
|
| S6339-Q6680 | 342 | LQ|SK | NendoU |
|
| S6681-V6973 | 293 | O-MT |
a: GenBank accession numbers of the viruses used in this analysis: SARS-CoV, NC_004718; HCoV HKU1, NC_006577; IBV, NC_001451; TCoV, NC_010800; BCoV, NC_003045; MHV, NC_001846; and PEDV, NC_003436.
b: ADRP, ADP-ribose 1-phosphatase; PL2pro, papain-like protease 2; 3CL pro, coronavirus nsp5 protease; HEL, helicase; NTPase, nucleoside triphosphatase; ExoN, exoribonuclease; NMT, N7 methyltransferase; NendoU, endoribonuclease; OMT, 2’-O-methyltransferase.
Comparison of amino acid identities of seven conserved replicase domains of Ro-BatCoV GCCDC1 for species classification.
| Coronavirus CCDC1 strain 356 | amino acid identity (%) | |||||
|---|---|---|---|---|---|---|
| HKU9s | SARS-CoV | HKU3s | MERS-CoV | HKU4s | HKU5s | |
|
| 65.8–68.3 | 48.8 | 49.4–50.0 | 45.0 | 42.5–43.1 | 39.4 |
|
| 83.3–84.3 | 52.3 | 52.0–52.3 | 49.2 | 49.8 | 49.5–49.8 |
|
| 89.8–90.3 | 72.3 | 72.3–72.4 | 69.7 | 70.1–70.2 | 70.5–70.6 |
|
| 90.8–91.5 | 73.7 | 73.7–74.0 | 72.7 | 73.1 | 72.9–73.1 |
|
| 82.1–82.8 | 61.4 | 61.8–62.1 | 61.1 | 60.1 | 60.7 |
|
| 70.9–73.3 | 49.7 | 49.4–49.7 | 46.6 | 46.0 | 47.5–47.8 |
|
| 80.1–84.2 | 63.8 | 63.1–63.5 | 61.8 | 62.1 | 62.5 |
|
| 84.4–84.8 | 64.4 | 64.4–64.6 | 62.4 | 62.3–62.4 | 62.5–62.6 |
a: Calculated with MEGA6 [21] using a pairwise deletion option [13].
b: GenBank accession numbers: NC_009021, EF065514, EF065515, EF065516, HM211098, HM211099, HM211100, HM211101.
c: GenBank accession number: NC_004718.
d: GenBank accession numbers: DQ022305, DQ084199, DQ084200, GQ153539, GQ153540, GQ153541, GQ153542, GQ153543, GQ153544, GQ153545, GQ153546, GQ153547, GQ153548.
e: GenBank accession numbers: NC_019843.
f: GenBank accession numbers: NC_009019, EF065506, EF065507, EF065508.
g: GenBank accession numbers: NC_009020, EF065510, EF065511, EF065512.
Fig 2Phylogenetic analyses of representative coronaviruses, including Ro-BatCoV GCCDC1.
All trees (A: RdRp; B: S and C: N) were inferred using the maximum likelihood method available in PhyML. Bootstrap values are shown at relevant nodes. The GenBank accession numbers used in this analysis are listed in S2 Table.
Fig 3Phylogenetic analyses of p10 from representative reoviruses and Ro-BatCoV GCCDC1.
The tree was inferred using the maximum likelihood method available in PhyML. Bootstrap values are shown at relevant nodes. The GenBank accession numbers used in this analysis are listed in S3 Table.
Fig 4Identification of the recombinant p10 gene and its TRS.
(A) Confirmation of the “exotic” p10 gene. The sequences that cover the upstream junction site between the N and p10 genes, and downstream junction site between the p10 and NS7a genes, are illustrated with sequencing patterns. The length of the intergenic sequence between the N and p10 genes is indicated with a number. The TRS preceding the NS7a gene in the intergenic sequence is marked with red arrow. (B) Identification of the TRS of the p10 gene. The TRS of the p10 gene in the N gene is illustrated with a sequencing pattern. The distance from the TRS to the AUG codon of p10 gene is indicated with a number. The length of the intergenic sequence between the N gene and genes just downstream of N gene are indicated with numbers. The TRSs of genes just downstream of N gene are marked with red arrows.
Fig 5Comparison of the 3'-terminus of the N gene of Ro-BatCoV GCCDC1 with those of Ro-BatCoV HKU9 strains and Ro-BatCoV Kenya.
Alignment of nucleotide and amino acid sequence of the 3'-terminus of the N gene among Ro-BatCoV GCCDC1, Ro-BatCoV HKU9 strains and Ro-BatCoV Kenya. The eight amino acid truncation and two-amino-acid deletion at the 3’-terminus of N protein of the Ro-BatCoV GCCDC1 are illustrated.
Fig 6Subgenomic structures of Ro-BatCoV GCCDC1.
(A) Schematic of the Ro-BatCoV GCCDC1 genome. The genome is represented by a black line; ORFs, and the 5’-UTR and 3’-UTRs are indicated by yellow and grey arrows, respectively. The TRSs are marked with small red triangles. The genomic locations of the leader and body TRS(s) are shown with blue and red arrows, respectively. (B) Schematic structures of putative transcribed subgenomic mRNAs. Subgenomes are represented by a black rectangles and the common leader sequence is denoted by a blue box. The target sites of forward and reverse primers are marked and indicated with letter F and R, respectively. Two numbers are shown in front of each subgenomic mRNA. The black number to the right of the slash indicates the potential number of fragment(s) that could be amplified using this set of primers, while the red one to the left represents the actual numbers of the fragment(s) obtained in this experiment which corresponds to the number of band(s) on each lane marked with a red arrow(s) on the agarose gel. (C) Agarose gel electrophoresis of the PCR products of subgenomic mRNA. The lowest band marked with a red arrow on each lane is the specific amplicon of each subgenomic mRNA. Other marked bands are amplicons of upper subgenomic mRNAs as shown in Fig 6B. (D) mRNA junctions of the detected subgenomic mRNAs. The TRSs and fusion sites are shown in a black frame. The bias of the TRS of p10 gene is highlighted with a yellow block. The leader sequence and CDS are indicated. The lengths of intergenic sequences are shown with numbers.
Fig 7Comparison of the p10 protein of Ro-BatCoV GCCDC1 with those of avian and bat origin orthoreovirus.
The absolutely, highly, moderately and non-conserved amino acids of p10 proteins as defined previously [26], are illustrated with red, blue, green and black colors, respectively. The motifs and domains in the p10 molecule are represented as previously reported [26]. Motifs present in the ectodomain (HP, hydrophobic patch; CM, conserved motif), endodomain (PB, polybasic) and the central transmembrane domain (TMD) are depicted with yellow rectangles. The four conserved cysteine residues (C) are shown. The two cysteines in the ectodomain form an intra-molecular disulfide bond. Comparison of the p10 protein of Ro-BatCoV GCCDC1 with those of avian and bat origin orthoreoviruses, the 8 different amino acids (including a 2 amino acid deletion) in the 28 absolutely conserved amino acids are symbolized with red star.
Fig 8Syncytium formation and functional analyses of Ro-BatCoV GCCDC1 p10 gene.
(A) The construction of transient expression plasmid of p10 gene based on a pCAGGS vector. (B) Transient expression of the p10 gene and syncytium formation. Top: the observation of syncytium formation with Wright-Giemsa staining on the monolayer BHK-21 cells transfected with recombinant plasmid of Pulau virus p10 gene, recombinant plasmid of Ro-BatCoV GCCDC1 p10 gene, and empty pCAGGS vector; Bottom: the observation of syncytium formation with indirect immunofluorescence staining on the cells treated as described above. (C) The construction of subgenomic plasmid of p10 gene. The putative subgenome of p10 was cloned into a pcDNA3.0-derived vector. (D) Transient expression of the p10 gene and syncytium formation with recombinant subgenomic p10 plasmid. Top: the observation of syncytium formation with Wright-Giemsa staining on the monolayer BHK-21 cells transfected with recombinant plasmid of Pulau virus p10 gene, recombinant plasmid of p10 subgenome of Ro-BatCoV GCCDC1 and empty pcDNA3.0 vector; Bottom: the observation of syncytium formation with indirect immunofluorescence staining on the cells treated as described above. (Wright-Giemsa staining: stained monolayers were imaged using an Olympus IX51FL+DP70 microscope under 100× magnification, scale bars = 200 μm; indirect immunofluorescence staining: stained monolayers were imaged using a Nikon DIAPHOT-TMD microscope under 200× magnification, scale bars = 50 μm).