| Literature DB >> 32473977 |
Hunduma Dinka1, Ashenafi Milkesa2.
Abstract
SARS-CoV-2 is a new virus responsible for an outbreak of respiratory illness known as COVID-19, which has spread to several countries around the world and a global effort is being undertaken to characterize the molecular features and evolutionary origins of this virus. In silico analysis of the transcription start sites, promoter regions, transcription factors and their binding sites, gene ontology, CpG islands for SARS-CoV-2 viral genome are a first step to understand the regulation mechanisms of gene expression and its association with genetic variations in the genomes. For this purpose, we first computationally surveyed all SARS-CoV-2 virus genes with the open reading frames from NCBI database and found eleven sequences to accomplish the mentioned features by using bioinformatics tools. Our analysis revealed that all (100%) of the SARS-CoV-2 virus genes have more than one TSS. By taking all TSSs with the highest predictive score we determined promoter regions and identified five common candidate motifs (MVI, MVII, MVIII, MVIV and MVV) of which MVI was found to be shared by all promoter regions of SARS-CoV-2 virus genes with the least E-value (3.8e-056, statistically highly significant). In our further analysis of MVI we showed MVI serve as binding sites for a single transcription factor (TF) family, EXPREG, involved in the regulatory mode of these genes. From EXPREG family four TFs that belongs to Cyclic AMP (cAMP) receptor protein (CRP) and Catabolite control protein A (CcpA) group mostly serve as transcriptional activator whereas two TFs that belong to LexA group always serve as transcriptional repressor in different kinds of cellular processes and molecular functions. Therefore, we unfolded SARS-CoV-2 viral genome to shed light on its gene expression regulation that could help to design and evaluate diagnostic tests, to track and trace the ongoing outbreak and to identify potential intervention options.Entities:
Keywords: COVID-19; CpG Island; Motif; Promoter; SARS-CoV-2; Transcription factor
Mesh:
Substances:
Year: 2020 PMID: 32473977 PMCID: PMC7256514 DOI: 10.1016/j.meegid.2020.104386
Source DB: PubMed Journal: Infect Genet Evol ISSN: 1567-1348 Impact factor: 3.342
Number and predictive score value for SARS-CoV-2 virus gene TSSs.
| Name/Gene ID | Corresponding promoter region name | Number of TSS identified | Predictive score at cutoff value of 0.8 | Location of the best TSS from start codon |
|---|---|---|---|---|
| orf1ab/43740578 | pro-43,740,578 | 3 | 0.92, 0.93, 0.88 | −77 |
| ORF8/43740577 | pro-43,740,577 | 13 | 0.84, 0.98, 0.90, 0.99, 0.85, 0.81, 0.89, 1.00, 0.93, 0.95, 0.99, 0.94, 0.96 | −376 |
| ORF10/43740576 | pro-43,740,576 | 4 | 0.90, 0.88, 0.82, 0.97 | −50 |
| N/43740575 | pro-43,740,575 | 14 | 0.99, 0.85, 0.81, 0.89, 1.00, 0.93, 0.95, 0.99, 0.94, 0.96, 0.81, 0.99, 0.83, 0.99 | −756 |
| ORF7b/43740574 | pro-43,740,574 | 10 | 0.84, 0.98, 0.90, 0.99, 0.85, 0.81, 0.89, 1.00, 0.93, 0.95, | −238 |
| ORF7a/43740573 | pro-43,740,573 | 12 | 0.99, 0.96, 0.84, 0.86, 0.91, 0.94, 0.84, 0.98, 0.90, 0.99, 0.85, 0.81 | −855 |
| ORF6/43740572 | pro-43,740,572 | 8 | 0.81, 0.99, 0.96, 0.84, 0.86, 0.91, 0.94, 0.84 | −663 |
| M/43740571 | pro-43,740,571 | 12 | 0.98, 0.88, 0.88, 0.93, 0.81, 0.90, 0.96, 0.96, 0.99, 0.98, 0.96, 0.81 | −422 |
| E/43740570 | pro-43,740,570 | 14 | 0.89, 0.97, 0.81, 0.98, 0.88, 0.88, 0.93, 0.81, 0.90, 0.96, 0.96, 0.99, 0.98, 0.96 | −144 |
| ORF3a/43740569 | pro-43,740,569 | 11 | 0.98, 0.97, 0.92, 0.84, 0.98, 0.82, 0.99, 0.90, 0.91, 0.83, 0.99 | −508 |
| S/43740568 | pro-43,740,568 | 21 | 0.95, 0.88, 0.94, 0.86, 0.81, 0.89, 0.98, 0.91, 0.86, 0.85, 0.99, 0.89, 0.94, 0.86, 0.99, 0.88, 0.82, 0.99, 0.99, 0.81, 0. 81 | −617 |
Identified common candidate motifs in SARS-CoV-2 virus gene promoter regions.
| Discovered motif | Number (%) of promoters containing each one of the motifs | Motif width | Total no. of binding sites | |
|---|---|---|---|---|
| MVI | 7(87.5) | 3.8e-056 | 50 | 7 |
| MVII | 7(87.5) | 5.7e-056 | 50 | 7 |
| MVIII | 7(87.5) | 2.6e-052 | 50 | 7 |
| MVIV | 7(87.5) | 1.3e-051 | 50 | 7 |
| MVV | 7(87.5) | 3.6e-050 | 50 | 7 |
Probability of finding an equally well-conserved motif in random sequences.
Fig. 1Block diagrams showing the relative positions of candidate motifs in different SARS-CoV-2 virus gene promoter sequences relative to TSSs. The nucleotide positions are indicated at the bottom ofthe graph from +1 (beginning of TSSs) to the upstream 1000 (−1000) bp.
Fig. 2Sequence logos for the identified best motif (MVI) for SARS-CoV-2 virus genes promoter regions. The analysis was carried out using MEME Suite.
The list of candidates from EXPREG transcription factors family which could bind to motif MVI.
| Candidate transcription factors | Statistical significance | Regulatory mode (%) | |||
|---|---|---|---|---|---|
| Activation | Repression | Dual | Not specified | ||
| CRP ( | 4.18e+00 | 76.0 | 23.0 | 0.0 | |
| CRP ( | 7.34e+00 | 82.0 | 17.0 | 0.0 | 0.0 |
| CcpA ( | 2.07e+00 | 69.0 | 30.0 | 0.0 | |
| CcpA ( | 9.82e+00 | 68.0 | 31.0 | 0.0 | 0.0 |
| CcpA ( | 9.68e+00 | 6.0 | 93.0 | 0.0 | 0.0 |
| Fur ( | 1.50e+00 | 0.0 | 13.0 | 0.0 | 85.0 |
| LexA ( | 5.41e+00 | 0.0 | 100.0 | 0.0 | 0.0 |
| LexA ( | 8.23e+00 | 0.0 | 100.0 | 0.0 | 0.0 |
| GInR ( | 2.36e+00 | 35.0 | 38.0 | 0.0 | 26.0 |
| ArcA ( | 9.62e+00 | 0.0 | 0.0 | 0.0 | 100.0 |
CcpA - Catabolite control protein A; ArcA - aerobic respiration response regulator; CRP - cAMP receptor protein; Fur - Ferric uptake regulation protein; GlnR - DNA-binding response OmpR family regulator; LexA - locus for X-ray sensitivity A.
Statistical significance for the binding of given transcription factors to MVI motif.
Fig. 3GO-term associated with MVI motif. No gene ontology was identified.
MspI cutting sites and fragment sizes for SARS-CoV-2 virus gene in promoter and gene body regions for eleven sequences.
| Region | Names of corresponding SARS-CoV-2 virus gene | Nucleotide positions of | Fragment sizes (between 40 and 220 bps) |
|---|---|---|---|
| Promoter region | Pro-43,740,578 | No cut | – |
| Pro-43,740,577 | Single cut (at 235) | ||
| Pro-43,740,576 | No cut | ||
| Pro-43,740,575 | Single cut (at 235) | ||
| Pro-43,740,574 | Single cut (at 234) | ||
| Pro-43,740,573 | Single cut (at 435) | ||
| Pro-43,740,572 | Single cut (at 612) | ||
| Pro-43,740,571 | No cut | ||
| Pro-43,740,570 | No cut | ||
| Pro-43,740,569 | No cut | ||
| Pro-43,740,568 | Single cut (at 201) | 201 | |
| ORF8/43740577 | No cut | – | |
| ORF10/43740576 | No cut | – | |
| N/43740575 | No cut | – | |
| Gene body region | ORF7b/43740574 | No cut | – |
| ORF7a/43740573 | No cut | – | |
| ORF6/43740572 | No cut | – | |
| M/43740571 | Single cut (at 229) | – | |
| E/43740570 | No cut | – | |
| ORF3a/43740569 | Single cut (at 757) | 71 | |
| S/43740568 | Single cut (at 1423) | – | |
| orf1ab/43740578 | Multiple cut (at 3987, 12466, 12,933, 14,409, 15,693, 20,411) | – |