| Literature DB >> 32854725 |
Christian Jean Michel1, Claudine Mayer1,2,3, Olivier Poch1, Julie Dawn Thompson4.
Abstract
BACKGROUND: TheEntities:
Keywords: Accessory genes; COVID-19; Circular code motifs; Coronavirus; ORF prediction; SARS-CoV; SARS-CoV-2
Mesh:
Substances:
Year: 2020 PMID: 32854725 PMCID: PMC7450977 DOI: 10.1186/s12985-020-01402-1
Source DB: PubMed Journal: Virol J ISSN: 1743-422X Impact factor: 4.099
Genome sequences selected for the current study. Note that the SARS-CoV strain hTor02 is from humans infected during the middle and late phases of the SARS epidemic of 2013, and has a deletion of 29 nucleotides in the region of ORF8
| Description | Genbank accession number | |
|---|---|---|
| Bat-CoV | Bat SARS-like coronavirus isolate As6526 | KY417142 |
| Civet-CoV | Civet SARS coronavirus civet007 | AY572034 |
| SARS-CoV | Human severe acute respiratory syndrome-related coronavirus strain hTor02 | NC_004718 |
| Pangolin-CoV | Pangolin coronavirus isolate PCoV_GX-P2V | MT072864 |
| SARS-CoV-2 | Human severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1 | MT072688 |
CDS annotations extracted from Genbank, with ORF names standardized according to the SARS-CoV-2 nomenclature
| Name | Start | Stop | Length |
|---|---|---|---|
| Bat-CoV | |||
| ORF1aa | 265 | 13,398 | 13,134 |
| ORF1ba | 13,398 | 21,485 | 8086 |
| S | 21,492 | 25,217 | 3726 |
| ORF3a | 25,227 | 26,051 | 825 |
| ORF3b | 25,648 | 25,992 | 345 |
| E | 26,076 | 26,306 | 231 |
| M | 26,357 | 27,022 | 666 |
| ORF6 | 27,033 | 27,224 | 192 |
| ORF7a | 27,232 | 27,600 | 369 |
| ORF7b | 27,597 | 27,731 | 135 |
| ORF8 | 27,738 | 28,103 | 366 |
| N | 28,118 | 29,386 | 1269 |
| Pangolin-CoV | |||
| ORF1a | 249 | 13,427 | 13,179 |
| ORF1b | 13,427 | 21,514 | 8086 |
| S | 21,522 | 25,331 | 3810 |
| ORF3a | 25,341 | 26,168 | 828 |
| E | 26,193 | 26,420 | 228 |
| M | 26,468 | 27,136 | 669 |
| 6 | 27,147 | 27,332 | 186 |
| 7a | 27,339 | 27,704 | 366 |
| 7b | 27,701 | 27,832 | 132 |
| 8 | 27,839 | 28,202 | 366 |
| N | 28,218 | 29,471 | 1254 |
| Civet-CoV | |||
| ORF1a | 239 | 13,366 | 13,128 |
| ORF1b | 13,366 | 21,459 | 8092 |
| S | 21,466 | 25,233 | 3768 |
| ORF3a | 25,242 | 26,066 | 825 |
| ORF3b | 25,663 | 26,127 | 465 |
| E | 26,091 | 26,321 | 231 |
| M | 26,372 | 27,037 | 666 |
| ORF6 | 27,048 | 27,239 | 192 |
| ORF7a | 27,247 | 27,615 | 369 |
| ORF7b | 27,612 | 27,746 | 135 |
| ORF8 | 27,753 | 28,121 | 369 |
| N | 28,123 | 29,391 | 1269 |
| ORF9b | 28,133 | 28,429 | 297 |
| ORF9c | 28,586 | 28,798 | 213 |
| SARS-CoV-2 | |||
| ORF1a | 251 | 13,453 | 13,203 |
| ORF1b | 13,453 | 21,538 | 8086 |
| S | 21,521 | 25,369 | 3849 |
| ORF3a | 25,378 | 26,205 | 828 |
| ORF3bc | 25,509 | 25,680 | 172 |
| E | 26,230 | 26,457 | 228 |
| M | 26,508 | 27,176 | 669 |
| ORF6 | 27,187 | 27,372 | 186 |
| ORF7a | 27,379 | 27,744 | 366 |
| ORF7bc | 27,741 | 27,872 | 130 |
| ORF8 | 27,879 | 28,244 | 366 |
| ORFN | 28,259 | 29,518 | 1260 |
| ORF9bc | 28,269 | 28,562 | 294 |
| ORF9cc | 28,719 | 28,940 | 222 |
| ORF10 | 29,543 | 29,659 | 117 |
| SARS-CoV | |||
| ORF1a | 265 | 13,398 | 13,134 |
| ORF1b | 13,398 | 21,485 | 8086 |
| S | 21,492 | 25,259 | 3768 |
| ORF3a | 25,268 | 26,092 | 825 |
| ORF3b | 25,689 | 26,153 | 465 |
| E | 26,117 | 26,347 | 231 |
| M | 26,398 | 27,063 | 666 |
| ORF6 | 27,074 | 27,265 | 192 |
| ORF7a | 27,273 | 27,641 | 369 |
| ORF7b | 27,638 | 27,772 | 135 |
| ORF8a | 27,779 | 27,898 | 120 |
| ORF8b | 27,864 | 28,118 | 255 |
| N | 28,120 | 29,388 | 1269 |
| ORF9b | 28,130 | 28,426 | 297 |
| ORF9cb | 28,583 | 28,793 | 211 |
a For convenience, ORF1ab is split into 2 regions corresponding the ORF1ab gene regions upstream and downstream of the frameshift
b SARS-CoV annotation for ORF9c was propagated from Genbank entry AY274119: SARS-CoV isolate Tor2, where it is annotated as ORF14
c SARS-CoV-2 annotations for ORF3b, ORF7b, ORF9b and ORF9c were propagated from Genbank entry MN985325: Severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV/USA-WA1/2020
Fig. 1X motif enrichment (XME) scores in the three frames f = 0, 1 and 2 (green, blue, yellow respectively) of the SARS-CoV genome, using a sliding window of length 150 nucleotides. Genomic organization of known ORFs is shown underneath the plots. a Polyprotein gene ORF1ab. b Spike protein. c C-terminal structural and accessory proteins. The colors used in the enrichment plot and in the boxes representing ORFs (green, blue, yellow) indicate the three frames 0,1 and 2 respectively
Fig. 2XME scores calculated by GOFIX for potential ORFs in the 3′ terminal region of the SARS-CoV genome, in the three frames f = 0, 1 and 2 (green, blue, yellow respectively). For clarity, only Genbank annotated ORFs or new ORFs predicted in this work are shown. The red line represents the threshold value XME = XME = 5 (where f is the reading frame) for the prediction of a functional ORF. Known ORFs are indicated below the histogram using the color corresponding to the ORF reading frame. Known ORFs not predicted to be functional by GOFIX are outlined in red. Novel ORFs predicted by GOFIX are outlined in blue
Prediction performance of the GOFIX method on the set of known ORFs in the SARS-CoV genome
| Predicted: YES | Predicted: NO | Total | |
|---|---|---|---|
| Known ORF | 11 | 2 | 13 |
| Unknown ORF | 2 | 10 | 12 |
| Total | 13 | 12 | 25 |
| Sensitivity = 0.85 | Specificity = 0.83 |
Fig. 3Prediction of ORFs in representative SARS-like coronavirus genomes. A schema is provided for each genome, showing the Genbank annotated ORFs and new ORFs predicted in this work. The numbers in the tables below each schema indicate the XME scores in the reading frame of each ORF. Genbank annotated ORFs that are not predicted to be functional by the GOFIX method are highlighted in red. Novel ORFs predicted by GOFIX are shown in blue. ORFs with conflicting annotations in Genbank, but predicted by GOFIX are shown in brown. Note that ORF3b in Civet-CoV and SARS-CoV is not homologous to ORF3b in Pangolin-CoV and SARS-CoV-2
Fig. 4a Schematic view of genome organization of ORF3a, ORF3b and E gene. b Multiple alignment of ORF3a, ORF3b sequences, with X motifs in the reading frame of ORF3a shown in blue. The start and stop codons of the overlapping ORF3b sequences (in the + 1 reading frame of ORF3a) are indicated by purple and red boxes respectively. X motifs in the reading frame of ORF3b are shown in green
Fig. 5a Schematic view of genome organization of ORF8, highlighting the 29-nt deletion in SARS-CoV, resulting in 2 ORFs: ORF8a and ORF8b. b Multiple alignment of ORF8 sequences, with X motifs in the reading frame of ORF3a shown in blue. The start and stop codons of the SARS-CoV ORF8a and ORF8b sequences are indicated by purple and red boxes respectively. The X motif corresponding to the 29-nt deletion is shown in green
Fig. 6a Schematic view of genome organization of ORF N, with overlapping genes ORF9b, 9c and the novel predicted 9d. b Multiple alignment of ORF N sequences, with X motifs in the reading frame of ORF N shown in blue, in ORF9b in green, in ORF9c in yellow. Start and stop codons of the overlapping genes are indicated by violet and red boxes, respectively. c. The novel ORF9d predicted in Pangolin-Cov with X motifs in the reading frame shown in orange
Fig. 7Multiple alignment of ORF10 sequences, with X motifs in the reading frame shown in blue. Stop codons are indicated by red boxes
Fig. 8a Multiple alignment of ORFSa sequences, with X motifs in the reading frame of ORFS shown in blue and ORFSa in green. Start and stop codons of the overlapping genes are indicated by violet and red boxes, respectively. Bat-CoV (WIV16) sequence is from Genbank:KT444582. b Nucleotide and amino acid sequences of the novel ORF predicted to overlap the Spike protein in the genome of SARS-CoV. The nucleotide sequence segment (SARS-CoV:nt 22,732–22,926) encodes part (residues 414–478) of the RBD (residues 323–502) of the Spike protein (normal characters), while the reading frame + 1 encodes a potential overlapping ORF (italics), which we named Sa