Literature DB >> 35194246

Genetic Engineering Systems to Study Human Viral Pathogens from the Coronaviridae Family.

S O Galkin^1,2, A N Anisenko^1,2,3, O A Shadrina^2,3, M B Gottikh^2,3.

Abstract

The COVID-19 pandemic caused by the previously unknown SARS-CoV-2 Betacoronavirus made it extremely important to develop simple and safe cellular systems which allow manipulation of the viral genome and high-throughput screening of its potential inhibitors. In this review, we made an attempt at summarizing the currently existing data on genetic engineering systems used to study not only SARS-CoV-2, but also other viruses from the Coronaviridae family. In addition, the review covers the basic knowledge about the structure and the life cycle of coronaviruses. © Pleiades Publishing, Inc. 2022, ISSN 0026-8933, Molecular Biology, 2022, Vol. 56, No. 1, pp. 72–89. © Pleiades Publishing, Inc., 2022.Russian Text

Entities: Chemical

Keywords: COVID-19; SARS-CoV-2; pseudoviruses; replicons

Year: 2022 PMID： 35194246 PMCID： PMC8853348 DOI： 10.1134/S0026893322010022

Source DB: PubMed Journal: Mol Biol ISSN： 0026-8933 Impact factor: 1.374

INTRODUCTION

COVID-19 (Coronavirus infectious disease-2019), an infectious disease caused by a new strain of betacoronavirus, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), betacoronavirus, has already claimed millions of lives worldwide. Due to the high pathogenicity and contagiousness of the pathogen, the COVID-19 pandemic was declared by the World Health Organization. Research on SARS-CoV-2 is actively pursued in scientific centers all around the world, extensive work has been done to create vaccines, and significant efforts are aimed at finding drugs for the effective therapy and prevention of COVID-19. Over the past 20 years, coronaviruses have been the etiological agents of three large-scale outbreaks of severe human respiratory infections: severe acute respiratory syndrome (SARS) in 2002, Middle East respiratory syndrome (MERS) in 2012, and the 2019 coronavirus infection (COVID-19), which affected almost every country and caused a worldwide health crisis [1, 2]. SARS-CoV-2 differs from SARS-CoV-1 and MERS-CoV in its long incubation period, high percentage of asymptomatic carriers, and ability to be transmitted from person to person in the absence of clinical symptoms in infected individuals [3]. This set of features of SARS-CoV-2 led to a local outbreak in Wuhan, China, leading to the pandemic. The lethality of COVID-19 strongly depends on the age and sex of the individual: the probability of lethality is highest in elderly people, especially males [4]. A significant correlation has been found between the severity of COVID-19 disease and concomitant chronic diseases or excess body weight [5]. Success in the study of highly pathogenic coronaviruses and, in particular, in the search for potential drugs for the therapy of the diseases they cause largely depends on understanding the peculiarities of the virus infectious cycle and, consequently, on the development of technologies used for research. These are mainly genetically engineered systems, which are numerous and varied, and each has its own advantages and disadvantages. The present review attempts to systematize the currently available systems for studying coronaviruses, and details the strengths and weaknesses of each of them.

CORONAVIRUS BIOLOGY

Members of the Coronaviridae family infect mammals and birds, causing respiratory diseases of varying severity. Until 2019, 6 coronaviruses circulated in the human population, of which Betacoronavirus 1 (HCoV-OC43), HCoV-229E, HCoV-NL63, and HCoV-HKU1 are considered seasonal respiratory viruses. The coronavirus associated with severe acute respiratory syndrome (SARS-CoV, recommended name from 2020: SARS-CoV-1; genus Betacoronavirus, subgenus Sarbecovirus) and the Middle East respiratory syndrome coronavirus (MERS-CoV; genus Betacoronavirus, subgenus Merbecovirus) that emerged in 2002 have proven highly pathogenic to humans. In 2019, humanity encountered another strain in the subgenus Sarbecovirus called SARS-CoV-2 [6]. This section describes the main details of the structure and life cycle of the members of the Coronaviridae family in general and the seven human pathogens in particular.

VIRION STRUCTURE

Coronaviruses are enveloped (+)RNA viruses belonging to group IV according to the Baltimore classification [7]. The average virion diameter is 80–120 nm. The virions are spherically shaped with characteristic spikes formed by the S protein, resembling a stellar corona. The schematic structure of a coronavirus virion is shown in Fig. 1. On the outside, the virion is covered by a supercapsid, a cell membrane derivative with integrated viral proteins. The following viral proteins can be found in the supercapsid of most human coronaviruses: the spike-forming glycoprotein (S) which interacts with the host cell surface and initiates the entry of virion components inside the cell; the membrane protein (M) which facilitates the interaction of virion components with each other and is crucial for successful assembly of viral particles [8], and the envelope protein (E), which forms pentameric transmembrane channels and acts as a viroporin [9]. The incorporation of the E-protein into the virion is not strictly necessary for the formation of infectious viral particles, but viruses which lack it are attenuated relative to the wild-type viruses [8]. HCoV-OC43 and HCoV-HKU1 viruses additionally contain a hemagglutinin-esterase (HE) protein in the envelope, which disrupts the S-protein receptor and thus promotes more efficient budding of new virions [10].

Fig. 1.

Schematic representation of the Coronaviridae family virion structure. The helical symmetry nucleocapsid made up by the nucleocapsid (N) protein and one RNA genome molecule is surrounded by the supercapsid of cellular origin with embedded virus S, M, and E proteins. The hemagglutinin esterase (HE) protein is included in the virions of the human coronaviruses HCoV-OC43 and HCoV-HKU1, but is not found in HCoV-229E, HCoV-NL63, MERS-CoV, SARS-CoV-1, and SARS-CoV-2. The coronavirus virion contains about 300 S protein molecules, 2000 М protein molecules, 1000 N protein molecules, and 100 E protein molecules [11]. Inside the outer envelope there is a helical symmetry nucleocapsid consisting of N protein and genomic (+)-chain RNA. N protein is bound to the viral RNA and carries out structural functions including packaging and stabilization of the viral genome. In addition, N protein modulates the host cellular immune response to viral infection [8], and promotes more efficient synthesis of 3'-distal genes [12].

GENOME STRUCTURE

The genomic RNA of coronaviruses is unsegmented, capped at the 5'-end, and polyadenylated at the 3'-end. The distinctive feature of Coronaviridae genome is its high length of about 26–32 000 bases [13], which is unique for RNA-containing viruses. On average a single-stranded unsegmented RNA in human viruses does not exceed 10–12 000 bases [14]. It is supposed that the increased genome size of coronaviruses as compared to other viruses is due to the presence of editing activity in the RNA-dependent RNA polymerase (RdRp) complex, which reduces mutation rate [15]. Generally, the coronavirus genome can be subdivided into two principal parts. The first one contains the open reading frame (ORF) ORF1a/b, which encodes proteins mainly responsible for replication and transcription. The second part encodes structural and accessory proteins responsible for interaction with the cell surface and entry into the cell, as well as for adaptation to the host. The results of comparative analysis of the genome sequences of various coronaviruses confirm the functional significance of this division: in coronaviruses, the ORF1a/b conservation level is higher than that of the rest genome [16]. The number of ORFs in the genome differs in different coronavirus species and ranges from 6 to 11 [17]. The first ORF, ORF1a/b, makes up about 2/3 of the genome and encodes two polyproteins, pp1a and pp1ab. The pp1ab polyprotein is formed as a result of a ribosomal reading frame shift of one nucleotide backward (–1) just upstream of the ORF1a stop codon. pp1a/pp1ab encodes non-structural proteins (NSPs) responsible for replication and transcription, suppression of cellular mRNA synthesis, and modulation of the cellular immune response [18]. The analysis of a large body of data revealed that many NSPs simultaneously perform several functions in the viral replicative cycle, which allows the virus to cope with a wide range of tasks, from its own replication to counteracting the cellular immune response, without further enlarging the genome [19-22]. Table 1 provides up-to-date information on the most studied functions of NSPs. Several functional blocks can be distinguished among them: proteins responsible for the formation of the RdRp complex (NSP7–NSP16), proteins involved in the formation of double membrane-bound replicative organelles (NSP3–NSP6), proteins involved in the processing of pp1a and pp1ab polyproteins (NSP3, NSP5), and those involved in modulating cellular processes (NSP1, NSP2). The rest of the genome encodes the necessary structural proteins S, N, E, M, (HE) as well as a number of species-specific accessory proteins [23, 24]. It is believed that accessory proteins are not essential for viral replication, although they contribute to more efficient virion assembly both by suppressing intracellular immunity and by acting as scaffold proteins in virion assembly [25].

Table 1.

Coronavirus nonstructural protein functions

Protein name	Key functions
NSP1	Represses the translation of cellular mRNAs by binding the 18S rRNA of the 40S ribosome subunit [26]; promotes cellular mRNA cleavage in 5'UTR [27]
NSP2	Presumably, regulates cell survival signaling pathways; interacts with the prohibitin 1 and prohibitin 2 cell proteins [28]; not necessarily required for a successful infection [29]
NSP3	Processes pp1a and pp1ab, in particular NSP1–NSP4, owing to the presence of the papain-like protease-2 domain [30]; is a pore component in double-membrane-bound organelles which carry out viral genome and subgenomic RNA export into the cytoplasm [31]; possesses deubiquitination activity; cleaves mono-ADP-ribose from proteins thus suppressing cell interferon response [30]
NSP4	Participates in double-membrane-bound replicative organelle formation [32]
NSP5	3-Chymotrypsin-like protease, processes pp1a and pp1ab, in particular, NSP4–NSP11/NSP4–NSP16 [33]
NSP6	Prevents viral protein-containing vesicle fusion with lysosomes; participates in double-membrane-bound replicative organelle formation [34]
NSP7	Forms a hexadecameric complex with NSP8, which apparently works as a processivity factor [35] and primase [36, 37]
NSP8	Forms a hexadecameric complex with NSP7, which apparently works as a processivity factor [35] and primase [36, 37]; inhibits protein embedding into cell membrane via complex formation with signal recognition particle (SRP) 7SL RNA and 28S rRNA [26]
NSP9	Inhibits protein embedding into the cell membrane via complex formation with SRP 7SL RNA [26]; RNA polymerase complex component [19]
NSP10	Required for (‒)-chain RNA synthesis during virus replication [38]; involved in pp1a/ab processing [39]; regulates the activity of the RdRp complex components NSP14 and NSP16 [40, 41]
NSP11	Short peptide formed in the course of pp1a processing. Function unknown [19]
NSP12	RdRp component with polymerase activity [19]
NSP13	Possesses helicase activity [42]
NSP14	Increases the accuracy of viral RNA synthesis due to the presence of proof-reading 3'→5' exonuclease activity [43]; participates in 5'-capping of viral RNAs, possesses N7-methyl transferase activity [44]
NSP15	Inhibits the activity of cellular double-stranded RNA (dsRNA) sensors; possesses functionally relevant endoribonuclease activity, with an unknown precise mechanism of action [45]
NSP16	Disturbs splicing through binding with U1/U2 small nuclear RNA [26]; participates in 5'-capping of viral RNAs, possesses 2'-O-methyl transferase activity [40]

Coronavirus nonstructural protein functions A schematic representation of the genome structure of human-infecting Coronaviridae is shown in Fig. 2. It should be noted that for all coronaviruses, the order of structural protein genes: ORF1a/b→S→E→M→N is conservative, but apparently does not carry any functional load, since the infectivity of the resulting viruses did not change when gene order was rearranged [46].

Fig. 2.

Genome structure of coronaviruses circulating in human populations. In the figure, encoded virus proteins are indicated in capital letters to the right, virus-specific ORF nos. are indicated with numbers and small letters. UTR—untranslated region. Numbers to the left indicate encoded NSP nos.

LIFE CYCLE

Cell Entry

The life cycle of coronaviruses begins with virion binding to receptors on the host cell surface (Fig. 3). The viral surface glycoprotein S plays a key role in this process. It is this protein, the variability of which is higher than that of other viral proteins [16], which determines the species and tissue tropism of the pathogen. On the cell side, various surface components take part in the interaction with viral particles [47]. The HCoV-229E coronavirus uses aminopeptidase N (ANPEP) as a receptor. It is a metalloproteinase present on the surface of intestinal, lung, and kidney epithelial cells. In the intestine, this enzyme is involved in the hydrolysis of peptides produced by gastric and pancreatic proteases. The function of ANPEP in the lungs and kidneys is not fully established; most likely, it metabolizes various regulatory peptides [48]. For HCoV-NL63, the situation with cell receptors is still unclear. The S protein of this virus has been shown to interact with angiotensin-converting enzyme-2 (ACE2) [49], and the M protein binds to heparan sulfated proteins on the cell surface [50]. This M protein interaction may be a complementary factor for viral entry or increase the local concentration of virus on the cell surface. When S protein was removed from the studied virions, they retained the ability to infect cells even when ACE2 was blocked with antibodies, from which it could be concluded that there are alternative pathways for HCoV-NL63 entry into the cell, for example through the above-mentioned interaction of M protein with heparan sulfated proteins [50]. The HCoV-OC43 and HCoV-HKU1 viruses use 9-O-acetylated neuraminic acid-containing glycoproteins as a receptor [25]. These viruses, as mentioned earlier, have a hemagglutinin-esterase protein in their virion, which promotes more effective interaction with the cell surface and budding of new viral particles from the infected cell [51]. MERS-CoV uses dipeptidyl peptidase-4 (DPP4) as a receptor, a surface dimer protein that normally processes hormones and chemokines [52]. SARS-CoV-1 and SARS-CoV-2 use ACE2 as the main receptor [25], although SARS-CoV-2 also uses heparan sulfated proteins as a cofactor, interaction with which causes conformational changes in the S protein that contribute to its more efficient binding to ACE2 [53].

Fig. 3.

Schematic representation of Coronaviridae family viruses. Infection starts with the virus particle binding with its receptor on the host cell surface which leads to virus and cell membrane fusion (early entry pathway), or virion encapsulation into endosomes and further virus and cell membrane fusion (late entry pathway). Viral RNA in complex with N protein enters the cytoplasm. ORF1a/b first open reading frame translation results in the pp1a and pp1ab polyproteins which further undergo autoproteolysis to obtain NSP2s. NSPs form the RdRp complex which carries out viral RNA replication and transcription. This process takes place in the double-membrane-bound organelles which are formed from the rough endoplasmic reticulum in the presence of pp1a and pp1ab components. Replication/transcription results in full-size and subgenomic viral RNAs. Structural and accessory proteins are translated from subgenomic RNAs. Structural proteins are assembled into the intermediate structure in the endoplasmic reticulum–Golgi intermediate compartment (ERGIC) which then form new virions together with the N protein/genomic viral RNA complex. Newly assembled virions are accumulated in the intracellular vesicles, and then leave the infected cell by exocytosis [modified from 54]. After binding to the receptor, the S protein is processed by the surface cellular proteases and undergoes conformational changes contributing to the fusion between the viral and cellular membranes. This mode of entry is called early entry. If surface proteases are for some reason unavailable, the late pathway of entry into the cell is activated. In this case, the virus first undergoes endocytosis. In the endosome, the cellular protease cathepsin L is activated under low pH conditions, which carries out S protein proteolysis, causing viral and endosomal membranes to fuse [55]. After this, the viral nucleocapsid consisting of genomic RNA and N protein appears in the cytoplasm, and the processes of viral transcription and replication begin.

Replication and Transcription

As mentioned above, coronavirus RNA contains a large number of ORFs (from 6 to 11) [24], which leads to certain difficulties in translating 3'-distal genes. Coronaviruses address this problem by forming subgenomic RNAs [56, 57]. Among (+)RNA viruses, two main mechanisms for their synthesis are found: through the presence of RdRp internal binding sequences in the genome (Caliciviridae, Bromoviridae, Virgaviridae, Togaviridae families) or of premature transcription termination sites (Nodaviridae, Togaviridae, Roniviridae, Tombusviridae families) (Figs. 4a and 4b) [58]. However, in case of coronaviruses, the process of subgenomic RNA synthesis is discontinuous [40, 41, 43]: during the synthesis of the (–)RNA chain, the RdRp-complex dissociates from the (+)RNA matrix at certain sites, binds in the 5'UTR region, and continues RNA synthesis (Fig. 4b) [59]. The positions of dissociation and subsequent binding of RdRp are determined by transcription regulatory sequences (TRS). In some cases RdRp skipping is possible during (+)RNA-chain synthesis: in this case RdRp dissociates from TRS located in 5' UTR and re-binds with TRS located upstream of the start of the protein-coding sequence (Fig. 4b) [60]. The shortened RNAs obtained in the process of discontinuous transcription serve as a template for subsequent translation of viral proteins. Interestingly, it is the process of interrupted transcription that allows the virus to use specific mechanisms of cellular translation repression without affecting the translation of viral mRNAs. It is believed that the presence of the leader sequence within all viral mRNAs protects them from degradation by NSP1-protein, while cellular mRNAs are cleaved by it near their 5'-end [56]. If the RdRp-complex does not dissociate during RNA chain elongation, full-length RNA complementary to the genomic RNA is synthesized, and genomic (+)RNAs are then synthesized using it as a template, which are further used for the assembly of new viral particles and the synthesis of pp1a and pp1ab polyproteins. It has been shown that active synthesis of viral proteins starts as early as five hours after infection [61]. At the same time, the content of viral RNA in the cell significantly increases, which is caused not only by replication and discontinuous transcription, but also by the suppression of cellular mRNA synthesis by the viral NSP1 protein [62].

Fig. 4.

Schematic representation of coronavirus subgenomic RNA synthesis strategies. (a) Internal initiation model. IBS—internal RdRp binding site. (b) Premature transcription termination model. Termination may occur both during (+)-chain and (–)-chain RNA synthesis. PTS—premature termination site for RdRp. (c) Discontinuous transcription model. Transcription may be interrupted both during (+)-chain and (–)-chain RNA synthesis [58]. LS–leader sequence; TRS—transcription regulatory sequence; L-TRS—leader TRS, and B-TRS—TRS in the coding region.

Virion Assembly and Budding

A distinctive feature of the replication cycle of coronaviruses is the new virions’ assembly site. Most enveloped viruses are assembled in the cell plasma membrane, whereas coronavirus virions are assembled in the intermediate compartment between the endoplasmic reticulum and the Golgi, which is referred to as the endoplasmic reticulum–Golgi intermediate compartment (ERGIC). It was shown that after translation the structural S, M, E, and N proteins accumulate in ERGIC [63]. Genomic (+)RNA binds to the N protein to form the nucleocapsid. M protein participates in the recognition of a special genome packaging signal (package sequence, PS) in the viral RNA [64]. The ability of N protein to recognize PS depends on the virus species, for example, the packaging of mouse hepatitis virus (MHV) genomic RNA occurs via the N-independent mechanism [65], whereas that of the SARS-CoV-1 genome is N-dependent [66]. The packaging signal is located within ORF1a/b, which ensures that only full-length RNAs are incorporated into newly assembled virions [67, 68]. The assembly center of new viral particles is the M protein, which interacts with the S, N, and E proteins and genomic (+)RNA [69]. After M, S, E proteins and the N-protein–RNA complex appears in ERGIC, mature viral particles are assembled and budded. The assembled virions accumulate in intracellular vesicles, which, in turn, are released into the extracellular space as a result of exocytosis or lysis of the infected cell [70].

EXPERIMENTAL SYSTEMS TO STUDY CORONAVIRUS REPLICATION

Success in the study of highly pathogenic viruses, which include some coronaviruses, and in the search for potential drugs for their therapy depends on the development of technologies that allow manipulation of viruses and, in particular, of their genome. Traditionally, full-length viral systems, i.e., infectious virus particles, are used to study viral replication. The advantages of such systems include maximum correspondence to the real infection conditions. During infection, viral proteins come into contact with each other and with a multitude of cellular factors, forming complex networks of protein interactions that are recreated in these systems. However, the use of infectious viruses carries a high risk of infecting the operator, so only certified laboratories with appropriate equipment are allowed to work with them. Moreover, in the case of RNA-containing viruses, when working with full-size virus systems, it is quite difficult to obtain virus variants mutant for the genes of interest. In the case of coronaviruses, manipulations with their genomes are also extremely difficult due to the large size of their genomic RNA, which, as mentioned above, ranges from 26 to 32 000 bases [13]. In this context, much attention is paid both to the modification of full-length viral systems and to the development of simpler and safer systems that allow the study of individual stages of the coronavirus life cycle and searching for their inhibitors. We will consider in detail the currently available variants of modified full-length viral systems that can be used to obtain viral RNA containing the necessary mutations, as well as the safer, although often less informative, pseudoviral systems and replicons.

MODIFIED FULL-SIZE VIRUS SYSTEMS

Two approaches are used to obtain variously modified viral coronavirus genomes: direct modification of genomic RNA (defective interfering RNA, DI-RNA) or obtaining cDNA containing the necessary modifications, which is then used for synthesis of genomic RNA and subsequent assembly of viral particles.

DI-RNA-Systems

The study of coronavirus replication mechanisms began with the use of viral DI-RNAs containing only parts of viral genome: viral regulatory sequences required to maintain RNA replication and sequences encoding parts of viral proteins. Such RNAs are capable of replication, but exclusively in cells infected with the corresponding coronavirus, and in some cases they are also packaged into non-infectious virus-like particles (VLP) [67, 71, 72] (Fig. 5). In this way, new variants of viral particles carrying the necessary modifications can be obtained. DI-RNAs can also contain a reporter gene and if genome replication and transcription occur the efficiency of viral enzymes can be assessed using such systems. This is one of the first and most primitive systems, and with it a large body of data on the replication and packaging mechanisms of coronavirus genomes has been obtained. For example, the packaging signals of MHV [73-75], transmissible gastroenteritis virus (TGEV) [67], and SARS-CoV-2 [68] were determined using DI-RNA. M protein interaction with PS RNA has been demonstrated for MHV using this system [64]. Using SARS-CoV-2 as an example, it was shown that when the virus-infected cells are transfected with defective RNA, it begins to actively replicate and become incorporated into virions resulting in a reduction in infectious viral particle production by approximately half 24 h after DI-RNA transfection. Based on these data, it was suggested that DI-RNA could be used as a therapeutic agent to reduce the viral load [68].

Fig. 5.

DI-RNA system functioning principle. In coronavirus-infected cells, DI-RNAs may be included in the virion (1), and/or replicated (2), and/or express reporter proteins (3) in the presence of viral proteins translated from genomic coronavirus RNA.

Infectious cDNA Inserted in Artificial Bacterial Chromosomes

The development of molecular cloning methods made it possible to assemble full-length viral cDNAs part-wise. For the first time, full-length cDNA of the coronavirus genome was obtained for TGEV by cloning into an artificial bacterial chromosome (BAC) [76]. As part of the BAC, the viral cDNA was put under the control of the cytomegalovirus (CMV) promoter, and at the 3′-end it was bordered by the poly(A) sequence, the hepatitis delta virus (HDV) ribozyme sequence, and the BGH transcription terminator (Fig. 6a). HDV ribozyme catalyzes RNA chain breakage between the last nucleotide of the coronavirus genome and the first nucleotide of the ribozyme, which makes it possible to obtain RNA containing a 3'-end sequence identical to the viral 3'-end sequence [77]. This precise matching is necessary due to the fact that the 3'UTR sequence of coronaviruses is crucial for successful replication and transcription of viral RNA [78]. After BAC transfection, full-genome viral RNA and proteins are synthesized in cells and assembled into infectious viral particles. Using BAC to work with the cDNA of coronavirus genomes has several advantages. First, it becomes possible to efficiently generate unlimited amounts of the required cDNA in Escherichia coli. Second, it is possible to transfect mammalian cells with this bacterial chromosome with high efficiency, which allows intracellular expression of viral RNA without resorting to more complicated techniques for its production, such as in vitro transcription. Third, the BAC sequence is relatively easy to modify using Red-recombinase and restriction endonuclease I-SceI, which makes it possible to modify the virus genes under study [79].

Fig. 6.

Schematic representation of full-size virus cDNA production using artificial bacterial chromosomes (a), TAR cloning technique (b), in vitro ligation technique (c), or recombinant poxviruses (d). CMV—cytomegalovirus promoter; pA—poly(A) tail; Rz—HDV ribozyme, and MCS—polylinker. Subsequently, the approach of cloning viral cDNA in BAC, in addition to TGEV, was applied to create infectious clones of the following human pathogens: HCoV-OC43 [80], MERS-CoV [81], SARS-CoV-1 [82], and SARS-CoV-2 [83] (Table 2).

Table 2.

Approaches used to obtain full-size coronavirus cDNAs capable of human infection

Virus	Approach used to obtain cDNA
Virus	BAC	TAR-cloning	In vitro ligation	Poxvirus vectors
HCoV-OC43	+ [80]^а	‒	‒	‒
HCoV-229E	‒	+ [84]	–	+ [85]
HCoV-NL63	‒	‒	+ [86]	‒
HCoV-HKU1	‒	+ [84]	‒	‒
MERS-CoV	+ [81]	+ [84]	+ [87]	‒
SARS-CoV-1	+ [82]	‒	+ [88]	+ [89]
SARS-CoV-2	+ [83]	+ [84]	+ [90]	‒

а The reference is indicated in brackets.

Approaches used to obtain full-size coronavirus cDNAs capable of human infection а The reference is indicated in brackets.

Infectious cDNA Obtained Using TAR-Cloning

Transformation-associated recombination (TAR) is a cloning method that exploits the specific features of Saccharomyces cerevisiae cells, in which homologous recombination of overlapping DNA fragments occurs with high frequency (Fig. 6b) [91]. The use of yeast cells to obtain and generate full-length coronavirus cDNA has several advantages over the bacterial system. First, long DNA sequences are generally less stable in bacteria than in yeast. Second, it has been shown that sequence fragments of the gene encoding ORF1a/b can be toxic to bacteria [76]. Third, the high efficiency of TAR cloning makes it possible to quickly and without great difficulty introduce mutations into the viral genome: all that is required is to make changes in one or more transformed fragment. This method has been used to assemble and modify the long genomes of the DNA viruses, CMV (genome size ~236 kbp) [92, 93], and herpes simplex virus (genome size ~152 kbp) [94, 95]. As for the human pathogens in the Coronaviridae family, cDNAs of MERS-CoV, HCoV-229E, HCoV-HKU1, and SARS-CoV-2 genomes were obtained by TAR-cloning (Table 2) [84]. Note that the SARS-CoV-2 cDNA was obtained by this method in just two months [84].

Infectious cDNA Obtained by In Vitro Ligation

Another approach to obtain coronavirus genomes is to assemble a full-length viral cDNA from smaller fragments flanked by unique restriction sites. Sequences recognized by type I or type III restriction endonucleases can be used as such sites. In the latter case restriction sites are lost after ligation, so it becomes impossible to further modify the cDNA, because the recognition site and the break introduction site are spatially separated in the case of type III restriction endonucleases. In vitro ligation can use both restriction sites that already exist in the genome under study and those artificially created. In this case restriction sites are introduced using synonymous substitutions that do not change the amino acid sequence of viral proteins. Also, during cDNA preparation, the T7 promoter sequence required for RNA production in a cell-free system using T7 RNA polymerase is introduced into the start of the genome thus assembled [88, 96] (Fig. 6c). The resulting genomic RNA is delivered to the target cells by transfection or electroporation, followed by viral transcription and replication processes, as well as the assembly of infectious viral particles. The disadvantages of this method include the need to modify the native nucleotide sequence of the viral genome in some cases to introduce additional sites for restriction endonucleases or to remove random T7 terminator sequences found in the coronavirus genome. However, the use of type III restriction endonucleases, which recognize asymmetric DNA sequences and make cuts at a certain distance from the biding site minimizes the need to introduce synonymous substitutions. An undoubted advantage of this approach is the ability to fragment the viral genome in such a way as to break the genes of proteins that exhibit toxicity when produced in bacteria into separate fragments. Full-length cDNAs of human HCoV-NL63 [86], SARS-CoV-1 [88], MERS-CoV [87], and SARS-CoV-2 [90] coronaviruses have been obtained using in vitro ligation (Table 2).

Infectious cDNA Obtained Using Poxvirus-Based Vectors

An alternative strategy for obtaining and modifying infectious cDNAs could be the use of vectors based on the smallpox virus (Poxviridae family). A full-length coronavirus cDNA with a size of about 30 000 bp can easily be produced as part of the smallpox virus genome, which has an extremely large size (about 200 000 bp). As part of this approach, the first step involves in vitro ligation of overlapping coronavirus cDNA fragments to obtain full-length cDNA, which is incorporated into the poxvirus vector after a purification step (Fig. 6d). However, in some cases, it is possible to clone several large cDNA fragments into separate vectors [85]. For example, in the case of obtaining SARS-CoV-1 cDNA from eight initial fragments, two poxviral vectors were obtained that contained fragments 1–20288 and 20272–29727 of the SARS-CoV-1 genome; the first vector also included the T7 promoter sequence. This was followed by another in vitro DNA ligation of the obtained recombinant viruses followed by in vitro transcription and transfection of eukaryotic cells with the generated RNA to obtain full-fledged infectious SARS-CoV-1 viral particles [89]. The introduction of an additional step using poxviral vectors, although it complicates the described in vitro ligation approach, still has several advantages. For example, the genomes of recombinant poxviruses are stable and efficiently replicate in cell culture, which makes it possible to overcome the problem of toxicity of some coronavirus sequences to bacteria [97]. It is also important that there are approaches that make it possible to efficiently introduce mutations into the genome of recombinant poxviruses using homologous recombination, and this leads to great opportunities for modifying the viral genes under study. Coronavirus HCoV-229E [85] and SARS-CoV-1 [89] cDNAs have been obtained using recombinant poxviruses (Table 2). However, this approach is rather complicated, and perhaps this can explain the absence of articles that use poxvirus-based vectors to obtain full-length SARS-CoV-2 cDNA.

PSEUDOVIRUS SYSTEM

As an alternative to full-length viruses, so-called pseudoviruses are often used in research. Pseudoviruses, also referred to as VLPs, are replication incompetent versions of original viruses. In general terms, a pseudoviral system consists of at least two DNA fragments: one encodes a viral genome from which one or more genes encoding important structural viral proteins have been deleted, and a reporter gene has been inserted in their place, allowing the effects on viral replication to be evaluated in cells; the second DNA fragment encodes the missing segment of the genome. Joint expression of the two fragments in special packaging cells leads to the assembly of viral particles capable of infecting the target cells and reproducing certain stages of the coronavirus life cycle, but unable to produce new viral particles in transduced cells because they contain a shortened genome. The undoubted and most important advantage of pseudoviruses is their safety: they do not require any special approvals and the appropriate levels of protection to work with them are lower. Pseudoviruses are used in both basic and applied research. VLP-based antiviral vaccines have already been developed (see review [98]). They are also used as vectors for targeted drug delivery and gene therapy [99], as a positive control in various test-systems for diagnosis of viral infections [100], and as models to study various stages of the virus life cycle under near-physiological conditions. Many different pseudoviral systems have been developed for pathogens in the family Coronaviridae. Most of the studies using VLPs are devoted to various aspects of the structure of coronaviruses, as well as the mechanisms of virion entry into the cell and detachment from it. Pioneering studies have been performed on MHV [101]. It was shown that the expression of only two viral proteins, M and E, is required for the assembly and efficient budding of pseudoviral particles. The S protein, although included in the VLP, was not necessary for the production of viral particles. It was surprising that N-protein not only did not affect the assembly of pseudoviral particles, but also was not included in their composition when co-expressed with M and E proteins [65]. It is worth noting that such unusual results are consistent with the data that the MHV genomic RNA packaging signal interacts with M rather than N protein [64]. VLPs were also obtained for HCoV-NL63. The minimum set of structural proteins for pseudovirus formation in this case included only M and E proteins [102]. In the case of SARS-CoV-1, the data on the minimal set of proteins for the formation of pseudovirus particles are quite contradictory. According to one study, the expression of two proteins, M and N, is required for the formation of pseudovirus particles in the cell. However, effective pseudovirus budding occurred only when three proteins, S, M, and N, were co-expressed. E protein, despite its inclusion in VLP during co-expression with M and N proteins, was not necessary for pseudovirus formation [103]. According to another study, SARS-CoV-1-based pseudoviruses require expression of M, E, and N proteins for efficient assembly and budding [104]. This difference is probably due to differences in the packaging cells used: HEK-293 and Vero E6, respectively. In the case of SARS-CoV-2, proteins M, S, and E are required for VLP assembly in HEK-293T cells [105]. Interestingly, differences in VLP assembly in HEK-293T and Vero E6 cells were also observed for SARS-CoV-2, when S, E, M, and N proteins were co-expressed in Vero E6 cells, the S protein was more efficiently incorporated into the VLP than in HEK-293T cells [106]. A smaller number of studies are devoted to the production of VLPs for investigating the amplification and transcription of coronavirus RNA. For example, a special cellular system was developed for SARS-CoV-2, which allows the study of many fundamental aspects of the biology of the virus [107]. As part of this work, the N protein gene in the SARS-CoV-2 genome was replaced with a green fluorescent protein (GFP) gene. To assemble VLPs, the N protein was expressed ectopically in the packaging cells. As a result of this approach, VLPs cannot be generated in cells that do not express N protein. Culturing N protein-expressing packer cells for one month did not result in recombination acts contributing to the formation of the full-length SARS-CoV-2 genome and infectious virus production, confirming the safety of such systems. Nevertheless, further measures were proposed to improve the safety of the created pseudoviral system. For this purpose, the N protein gene was split into two parts and inserted into different vectors. The complete N‑protein was formed as a result of intein-mediated protein splicing [107]. The created pseudoviral particles can be used for a wide range of tasks: fundamental study of SARS-CoV-2 biology, screening of potential inhibitors of virus penetration, transcription and replication, creation of vaccines and detection of neutralizing antibodies in patients. Through the use of this system much has already been learned about SARS-CoV-2. For example, it was found that N-protein contacts the components of cellular stress granules G3BP1 and G3BP2, and potential inhibitors of virus replication were also found [107]. The use of pseudoviruses based on the Coronaviridae family has an important practical application. A large number of studies are devoted to the development of VLP-based vaccines [102, 106–108]. A VLP vector capable of transducing human dendritic cells was created based on the neurotropic coronavirus HCoV-229E [109]. The RNA in the obtained pseudovirus encodes the complete ORF1a/b gene as well as three reporter genes, which can be subsequently replaced with target genes. Through discontinuous transcription, all three genes can be expressed in transduced cells. The uniqueness of the created vector lies in the fact that several reporter genes with a total length of about 6 thousand bases can fit into one pseudovirus at once. However, this system has a significant drawback: in addition to the target genes, the entire polyprotein pp1a/ab is expressed, some fragments of which exhibit cytotoxicity [110].

REPLICONS

Replicons are self-sustaining RNAs containing all the regulatory elements of the viral genome. In the presence of viral nonstructural proteins that ensure viral replication and transcription, new RNA molecules are synthesized from the replicons [111]. The required nonstructural proteins can be encoded in the replicon itself or expressed in the cell from additional vectors. Replicons do not contain structural proteins and do not produce infectious viral particles, so they represent a safe alternative to full-length viral systems. Reporter genes are often encoded within the replicons, making it relatively easy to assess the efficiency of RNA synthesis under appropriate conditions. Pseudoviruses are also well suited for studying individual stages of the coronavirus life cycle, including genome replication, transcription, and translation, but nevertheless, it is replicons that are widely used in this kind of research because they are much easier to handle and safer than VLP, however they are farther from the conditions of real infection. In general, all coronavirus replicons are arranged similarly: the 5′- and 3′-terminal sequences of the replicons coincide completely with the 5'- and 3′UTR of coronaviruses, respectively. In the study of discontinuous transcription, the sequence corresponding to the intergenic region of the coronavirus under study and regulating the process of RdRp complex skipping is placed inside the replicon [112-118]. As can be seen from Fig. 7, in the vast majority of cases the replicons contain a large ORF1a and 1b reading frame, which encodes nonstructural proteins involved in viral RNA replication, as well as the gene sequence of N protein required for efficient viral RNA synthesis [119, 120]. Replicons often contain reporter protein genes in the place of genes for various structural proteins of the virus. Such reporters are under the control of viral regulatory TRS sequences taken from the genes of the corresponding structural proteins TRS-S (from protein S), TRS-N (from protein N), and TRS-M (from protein M). In the case of minimal replicons (Fig. 7h) that do not encode pp1a/b and N protein, the viral proteins that provide replication/transcription must be expressed in the cells under study from additional vectors.

Fig. 7.

Key structural elements of coronavirus replicons. (a) SARS-CoV-2 schematic genome structure (provided as a reference structure). (b‒h) Structure of several different replicons [112-118]; regulatory elements (5'UTR, 3'UTR, TRS) are always encoded in a replicon, pp1a/ab components are absent only in minimal replicons (h) [118]. Replicon coding part (body) may be put under the control of either T7, or a cellular promoter. In the first case, it is possible to both carry out in vitro transcription using the obtained RNA for cell transfection and transfect cells with the DNA construct with the simultaneous T7 RNA polymerase expression [112-118]. Encoded proteins are indicated with letters across the diagram. TRS are indicated with red lines. GFP—green fluorescent protein; LUC—luciferase; BSD—blasticidin-resistance protein; NEO—neomycin-resistance protein, and IRES—internal ribosome entry site. Replicons are frequently used for high-throughput screening of viral replication and transcription inhibitors [81, 112, 113, 115, 117, 118, 121–125]. SARS-CoV-1-based replicons have found application in testing the inhibitors of viral 3CLpro and PLpro proteases and viral helicase, as well as inhibitors of reading frame shift during translation [126-128]. In some studies, stable replicon-containing cell lines were obtained for which, in addition to the reporter genes, an antibiotic resistance gene, most commonly blasticidin and zeocin resistance, was incorporated into the replicon [115, 123]. It is convenient to use replicons to assess the effect of the expression of various proteins on the processes of viral genome replication and transcription. The use of replicons makes it possible to reliably show that one or another effect on the viral life cycle is related to the effect on the replication/transcription processes rather than virion assembly and budding. For example, using replicons, Wang et al. [117] showed that NSP16 but not NSP1 and NSP2 expression is required for effective SARS-CoV-1 RNA replication. And recently, Luo et al. [118] found that the replication efficiency of SARS-CoV-2 RNA was significantly increased in the presence of the ORF6 expression product.

CONCLUSIONS

In the present review, we attempted to summarize the data on the genetically engineered systems developed to date for studying human pathogens in the Coronaviridae family. For each approach, examples of possible uses were provided and the advantages and disadvantages of each system were outlined. We do not claim to cover all the systems used to study coronaviruses and the results obtained with their help, since coronavirus research, and primarily SARS-CoV-2 research, is actively carried out worldwide and new data on the mechanisms of pathogenesis of new coronavirus infection, SARS-CoV-2 replication inhibitors, and drugs and vaccines under development appear every day. We hope that the data presented in this review will be of interest to those involved in coronavirus research and will guide them in choosing the right system for their specific scientific tasks.

127 in total

Review 1. Continuous and Discontinuous RNA Synthesis in Coronaviruses.

Authors: Isabel Sola; Fernando Almazán; Sonia Zúñiga; Luis Enjuanes
Journal: Annu Rev Virol Date: 2015-11 Impact factor: 10.431

2. Analyses of Coronavirus Assembly Interactions with Interspecies Membrane and Nucleocapsid Protein Chimeras.

Authors: Lili Kuo; Kelley R Hurst-Hess; Cheri A Koetzner; Paul S Masters
Journal: J Virol Date: 2016-04-14 Impact factor: 5.103

3. Sequence motifs involved in the regulation of discontinuous coronavirus subgenomic RNA synthesis.

Authors: Sonia Zúñiga; Isabel Sola; Sara Alonso; Luis Enjuanes
Journal: J Virol Date: 2004-01 Impact factor: 5.103

4. The SARS-coronavirus nsp7+nsp8 complex is a unique multimeric RNA polymerase capable of both de novo initiation and primer extension.

Authors: Aartjan J W te Velthuis; Sjoerd H E van den Worm; Eric J Snijder
Journal: Nucleic Acids Res Date: 2011-10-29 Impact factor: 16.971