Quadruplex structures have been identified in a plethora of organisms where they play important functions in the regulation of molecular processes, and hence have been proposed as therapeutic targets for many diseases. In this paper we report the extensive bioinformatic analysis of the SARS-CoV-2 genome and related viruses using an upgraded version of the open-source algorithm G4-iM Grinder. This version improves the functionality of the software, including an easy way to determine the potential biological features affected by the candidates found. The quadruplex definitions of the algorithm were optimized for SARS-CoV-2. Using a lax quadruplex definition ruleset, which accepts amongst other parameters two residue G- and C-tracks, 512 potential quadruplex candidates were discovered. These sequences were evaluated by their in vitro formation probability, their position in the viral RNA, their uniqueness and their conservation rates (calculated in over seventeen thousand different COVID-19 clinical cases and sequenced at different times and locations during the ongoing pandemic). These results were then compared subsequently to other Coronaviridae members, other Group IV (+)ssRNA viruses and the entire viral realm. Sequences found in common with other viral species were further analyzed and characterized. Sequences with high scores unique to the SARS-CoV-2 were studied to investigate the variations amongst similar species. Quadruplex formation of the best candidates were then confirmed experimentally. Using NMR and CD spectroscopy, we found several highly stable RNA quadruplexes that may be suitable therapeutic targets for the SARS-CoV-2.
Quadruplex structures have been identified in a plethora of organisms where they play important functions in the regulation of molecular processes, and hence have been proposed as therapeutic targets for many diseases. In this paper we report the extensive bioinformatic analysis of the SARS-CoV-2 genome and related viruses using an upgraded version of the open-source algorithm G4-iM Grinder. This version improves the functionality of the software, including an easy way to determine the potential biological features affected by the candidates found. The quadruplex definitions of the algorithm were optimized for SARS-CoV-2. Using a lax quadruplex definition ruleset, which accepts amongst other parameters two residue G- and C-tracks, 512 potential quadruplex candidates were discovered. These sequences were evaluated by their in vitro formation probability, their position in the viral RNA, their uniqueness and their conservation rates (calculated in over seventeen thousand different COVID-19clinical cases and sequenced at different times and locations during the ongoing pandemic). These results were then compared subsequently to other Coronaviridae members, other Group IV (+)ssRNA viruses and the entire viral realm. Sequences found in common with other viral species were further analyzed and characterized. Sequences with high scores unique to the SARS-CoV-2 were studied to investigate the variations amongst similar species. Quadruplex formation of the best candidates were then confirmed experimentally. Using NMR and CD spectroscopy, we found several highly stable RNA quadruplexes that may be suitable therapeutic targets for the SARS-CoV-2.
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a positive-sense single-stranded RNA virus from the Betacoronavirus genus, within the Coronaviridae family of the Nidovirales order. Although it is believed to have originated from a bat-borne coronavirus [1-5], the SARS-CoV-2can spread between humans with no need of other vectors or reservoirs for its transmission. The virus is responsible for the ongoing COVID-19 pandemic that has caused hundreds of thousands of deaths, millions of infected and a disastrous strain on the economy of most countries and citizens worldwide.The origin of the virus has been traced back to the Chinese city of Wuhan, where the first cases of infected individuals were reported amongst the workers of the Huanan Seafood Market [6, 7]. This wet exotic animal market, where wild animals including bats and pangolins are sold and prepared for consumption, offers ample opportunities for pathogenic bacteria and viruses to adapt and thrive [8, 9]. Such circumstances led Cheng and colleagues to predict the current pandemic back in 2007 [10]. In their own words: “the presence of a large reservoir of SARS-CoV-like viruses in horseshoe bats, together with the culture of eating exotic mammals in southern China, is a time bomb. The possibility of the re-emergence of SARS and other novel viruses from animals or laboratories and therefore the need for preparedness should not be ignored”.SARS-CoV-2 has now become a global problem. In this current scenario, the scientificcommunity is playing a fundamental role in minimizing the number of victims. Their work includes, to name a few, the development of fast and reliable detection methods, the identification of therapeutic targets within the virus, and the development of active drugs and vaccines to cure and to prevent infections, respectively.G-Quadruplexes (G4s) and i-Motifs (iMs) have been proposed as therapeutic targets in many disease aetiologies. G4s are Guanine (G) rich DNA or RNA nucleic acid sequences where successive Gs stack in a planar fashion via Hoogsteen bonds to form four-stranded structures, stabilized by monovalent cations [11]. iMs on the contrary, are Cytosine (C)-rich regions that fold into tetrameric structures of stranded duplexes [12-14]. These are sustained by hydrogen bonds between the intercalated nucleotide base pairs when under acidic physiological conditions.The importance of these genomic secondary structures has been abundantly studied during the last years [15-20]. They have been found to be regulatory elements in the human genome implicated in key functions such as telomere maintenance and genome transcription regulation, replication and repair [21]. G4 structures have also been identified in fungi [22-25], bacteria [26-30] and parasites [31-36]. Their occurrence are known in many viruses that infect humans as well. These include the HIV-1 [37-39], Epstein-Barr [40, 41], human and manateepapilloma [42, 43], herpes simplex 1 [44, 45], Hepatitis B [46], Ebola [47] and Zika [48] viruses. Here they can regulate the viral replication, recombination and virulence [32, 49, 50].iMs have been less studied in general, especially outside of the humancontext. With regards to viruses, Ruggiero et al. recently published the formation of an iM in HIV-1 [51], whilst we reported the presence of the known cMyb.S [52] iM within the Epstein-Barr virus [53]. Despite the lack off reports, iMs are interesting potential therapeutic targets for viruses. For example, the in silico analysis of the rubella virus revealed an extremely dense genome of potential iMs (density as counts per genomic length) that surpassed its humancounterpart by over an order of magnitude [53]. In the same study, other viruses such as the measles and hepacivirus C presented potential iMs densities similar to the human genome.In this work, we wished to contribute to the ongoing research efforts related to the COVID-19 pandemic by investigating SARS-CoV-2 for the presence of quadruplex structures. With this aim, we analysed the prevalence, distribution and relationships of Potential G4 Sequences (PQS) and Potential iM Sequences (PiMS) in its genome. These PQS and PiMS have been evaluated according to their potential to form quadruplex structures in vitro and localization within the genome. The presence of confirmed quadruplex-forming sequences and the candidate’s frequency, uniqueness and conservation rates between 17312 different SARS-CoV-2clinical cases were also analyzed. The study of the SARS-CoV-2 and its quadruplex results were expanded to integrate the Coronaviridae family, Group IV of the Baltimore classification and the entire virus realm, as to allow a wider range of interpretation. With all this information at hand, our final objective was to identify biologically important PQS and PiMS candidates in the virus. To substantiate our bioinformatic analysis, we analysed experimentally some of these sequences by CD and NMR spectroscopies. Our in vitro results confirmed the formation of stable quadruplexes that can form in the viral genome, suggesting that they may be suitable targets for new therapeutic or diagnostic agents [50, 54]. Hence, our analysis of the SARS-CoV-2, and by extension of the entire virus realm, may provide useful insights into using quadruplex structures as targets in future anti-viral treatments.
Materials and methods
G4-iM Grinder and G4-iM Grinder’ parameter configuration
In this work, we have used an upgraded version of the G4-iM Grinder package (GiG, https://github.com/EfresBR/G4iMGrinder) for the analysis of all viruses (S1 File, section 1). GiG is an R-based algorithm that locates, quantifies and qualifies PQS, PiMS and their potential higher-order versions in RNA and DNA genomes [53]. We retrieved the SARS-CoV-2’s reference sequence (GCF_009858895.2) from the NCBI database [55]. We also downloaded those of 18 other viruses which can cause mortal illness in humans, including six other pathogenicCoronavirus, as comparison (S1 File, section 2).As a workflow, we applied the functions GiG.Seq.Analysis (to study their G- and C-run characteristics), G4iMGrinder (to locate quadruplex candidates) and G4.ListAnalysis (to compare quadruplex results between genomes) from the GiG package to all the viruses. The ‘size-restricted overlapping search and frequency count’ method (Method 2, M2A and M2B) was used to locate all the candidates. Then, these PQS and PiMS were evaluated by the presence within of in vitro confirmed G4 or iM sequences, their frequency of appearance in the corresponding genome, and their probability of quadruplex-formation score (as the mean of G4Hunter [56] and the adaptation of the PQSfinder algorithm [57]). To compare between virus species, we calculated the density of potential quadruplex sequences per 100000 nucleotides ().We previously saw that viruses have a wider-range of PQS and PiMS densities than that of the human, fungi, bacteria and parasite genomes [53]. Some were totally void whilst others were very rich in candidates. So, we explored different quadruplex definitions to determine the most useful configurations for the analysis of the viruses at hand. These different definitions control the characteristics of what the algorithm considers a quadruplex. They include the acceptable size of G- or C-repetitions to be considered a run, the acceptable amount of bulges within these runs, the acceptable loop sizes between runs, the acceptable number of runs to constitute a PQS or PiMS, and the total acceptable length of the sequence (Fig 1A). A flexible configuration of quadruplex definitions will detect larger amounts of candidates at the expense of requiring more computing power and accepting sequences that are more ambiguous in forming quadruplex structures in vitro (as determined by their score; with longer loops, smaller runs, more bulges and more complementary G/C %, ). More constrained definitions result in the opposite. Hence, for the analysis, we chose three different configurations: a Laxconfiguration (which accepts run bulges and longer ranges of runs, loops and total sizes), the Predefined configuration of the package (which restricts sizes but still accepts run bulges), and the original Folding Rule [58, 59] (which restricts length and does not accept run bulges) ( Left).
Fig 1
A. Results with G4-iM Grinder depend on the quadruplex definitions introduced to the algorithm. Sizes of G- or C-runs, loops and the entire sequence, together with an acceptable number of bulges within the runs are part of the definitions. B. The structures found with GiG under the definitions proposed by the user can be evaluated for their in vitro probability of formation. More positive scores mean that the sequence is more capable of forming G4s, whilst more negative values mean that it is more capable of forming iMs. C. Left, Quadruplex definitions used by GiG’s search engine in this work. C. Right, Total results found within the SARS-CoV-2 by configuration and score criteria. D. PQS and PiMS densities (per 100000 nucleotides) found per different configuration and score criteria for 19 viruses. The G and C content (as a percentage) is shown under each virus. X scale is in logarithmic scale (base 10). Results are categorized by their |score|: intense colours (blue for PQS, yellow for PiMS) are the most probable to form in vitro (|score| ≥ 40), lighter bars are the density of structures with at least a |score| ≥ 20 and grey bars are the densities without the score filter.
A. Results with G4-iM Grinder depend on the quadruplex definitions introduced to the algorithm. Sizes of G- or C-runs, loops and the entire sequence, together with an acceptable number of bulges within the runs are part of the definitions. B. The structures found with GiG under the definitions proposed by the user can be evaluated for their in vitro probability of formation. More positive scores mean that the sequence is more capable of forming G4s, whilst more negative values mean that it is more capable of forming iMs. C. Left, Quadruplex definitions used by GiG’s search engine in this work. C. Right, Total results found within the SARS-CoV-2 by configuration and score criteria. D. PQS and PiMS densities (per 100000 nucleotides) found per different configuration and score criteria for 19 viruses. The G and Ccontent (as a percentage) is shown under each virus. X scale is in logarithmic scale (base 10). Results are categorized by their |score|: intense colours (blue for PQS, yellow for PiMS) are the most probable to form in vitro (|score| ≥ 40), lighter bars are the density of structures with at least a |score| ≥ 20 and grey bars are the densities without the score filter.Then, we calculated the PQS and PiMS densities of each virus to allow a direct size-independent comparison between them all (), and filtered the results by their in vitro probability of formation score. The |score| filters were set to 20 and 40 to allow us the study of both the medium (PQS score ≥ 20; PiMS score ≤ -20) and the high probability candidates (PQS score ≥ 40; PiMS score ≤ -40; ) within the results. These score filters are important because they qualify the sequences and grant specificity to the results of GiG’s extremely flexible search engine (which was designed solely for sensitivity), as highlighted by the results of a recent review [60].For the viruses analysed, the best configuration to obtain significant number of candidates was the Lax set-up. This was also relevant for the reference genome of the SARS-CoV-2 ( Right). Given the small size of the viral genomes, the increase in computational power was deemed acceptable and hence, we established this Laxconfiguration as the default configuration for all posterior searches with GiG. Although some authors have reported the unfeasibility of forming iMs with tracks of only two C [19], such statement has been rebutted later [61], allowing the use of this configuration also for potential iMs.The search was then expanded to 17312 different SARS-CoV-2 genomes sequenced during the pandemic (from December-2019 to January-2021, by different laboratories worldwide and downloaded from the GISAID database [62]), other Coronaviridae family members and the entire virus realm (6678 other viruses) using the methodologies described previously and in the S1 File, section 1. To validate the in silico findings, the most interesting candidates were selected and confirmed by NMR and CD spectroscopy.
in silico methodology
To analyse these genomes, we employed the workflow described in the G4-iM Grinder’ parameter configuration section of the manuscript using the Lax parameter configuration. We investigated the biological features potentially affected by candidates using the function GiG.df.GenomicFeatures of the GiG package. The conservation of each PQS and PiMS found in the reference genome was calculated as {Conservation (%) = 100×∑Ng+/∑Ng} where Ng is the number of genomes, and Ng is the number of genomes with the PQS or PiMS candidate. The genomic pairwise alignments, used to study the similarity between viruses and detect PQS and PiMS variations between species, were done using the pairwiseAlignment function (global alignment type) from the Biostrings package in the Bioconductor repository. We calculated the divergence from the reference genome per clade (or lineage) as, {} where is the clade/lineage’s mean number of PQS or PiMS that |score| at least 20, is the number of PQS or PiMS that |score| at least 20 in the reference genome, and is the mean number variants of PQS or PiMS that |score| at least 20 per Lineage. To compare potential quadruplex presence and prevalence between genomic groupings (species, families, groups and the entire virus realm), we calculated also the genomic density of several arguments. These were calculated using the GiGList.Analysis function of the GiG package (density per 100000 nucleotides). The arguments were the density of results (PQS and PiMS), density of results with |score| filters (with at least 20 or 40), density of already confirmed sequences that form G4 or iM within, and uniqueness (as {Uniqueness (%) = 100×∑ Nsf = 1/∑ Ns} where Ns is the number of sequences, and Nsf = 1 is the number of sequences with a frequency of appearance of 1 in its respective genome). For the G- and C-runs density analysis of the viruses, we used the function GiG.Seq.Analysis from the GiG package. The arguments here were: densities of runs with different sizes (two or three to five long G- or C-runs) and with different bulges per run (zero and/or one). All of these results can be found in the S1 File, section 5.
Candidate selection
PQS and PiMS candidates were selected according to their potential to form quadruplex structures in vitro, uniqueness, frequency of appearance, conservation between 17312 different SARS-CoV-2clinical case genomes, confirmed quadruplex presence and localization within the genome.
NMR experiments
Oligonucleotides (0.3 mM) for NMR experiments were purchased from IDT, and suspended in 200 μl of H2O/D2O 9:1 in 25 mM KH2PO4 and 25 mM KCl buffer, pH 7. Samples at acidic pH were prepared by adding aliquots of concentrated HCl. Spectra were acquired on Bruker Avance spectrometers operating at 600 MHz, and processed with Topspin software. Experiments were carried out at temperatures ranging from 5.1 to 45°C and pH from 5 to 7. NOESY spectra in H2O were acquired with a 150 ms mixing time. Water suppression was achieved by including a WATERGATE module in the pulse sequence prior to acquisition.
Circular Dichroism (CD)
Circular dichroism (CD) studies were performed on a JASCO J-810 spectropolarimeter using a 1 mm path length cuvette. Spectra were recorded in a 320–220 nm range at a scan rate of 100 nm min−1 and a response time of 4.0 s with four acquisitions recorded for each spectrum. Data were smoothed using the means-movement function within the JASCO graphing software. Melting transitions were recorded by the monitoring the decrease of the CD signal at 264 nm. Heating rates were 30°C/h. Transitions were evaluated using a nonlinear least squares fit assuming a two-state model with sloping pre- and post-transitional baselines. Oligonucleotide solutions for CD measurements were prepared at the same buffer conditions as the NMR experiments. Oligonucleotideconcentration was of 50 μM.
Results and discussion
A detailed analysis of the results of SARS-CoV-2, Coronaviridae family and the entire virus realm with G4-iM Grinder can be found in the S1 File, Section 3.
G4-iMGrinder and settings
The genome of the SARS-CoV-2, and that of many other viruses, were analysed with G4-iM Grinder in search off potential quadruplex (both G4 and iM) therapeutic targets. To do so, we first expanded G4-iM Grinder’s quadruplex identification and characterization repertoire with two new functions, GiG.Seq.Analysis and GiG.df.GenomicFeatures. Other functions such as G4iMGrinder and GiGList.Analysis were upgraded to better analyse and summarise the quadruplex results obtained. Furthermore, over 2800 quadruplex-related sequences were searched for in the literature and included in G4-iM Grinder’s database to rapidly identify confirmed G4s and iMs within all results.An initial study of the SARS-CoV-2 genome and 18 other pathogenic viruses revealed the special characteristics that need to be considered for quadruplex-related examinations in these organisms. For most, the original folding rule (which accepts no bulges within the runs and is very constrained in its quadruplex definitions) and the predefined parameters of G4-iM Grinder (which allows more liberty by accepting bulges and longer loops) are too strict to find associated runs that can give rise to quadruplexes. Although other organisms such as Plasmodium falciparum or Entamoeba histolytica may be poorer in G and Ccontent [53], the size of these genomes enables finding rich G- or C- tracks that can ultimately form potential quadruplexes. In most viruses, however, this does not take place because of the small size of the genomes (in the range of tens to hundreds thousand nucleotides versus the tens of millions for the parasites mentioned, and thousands of millions for humans). Furthermore, most of the G4s found in viruses are complex sequences, with short runs and bulges (for example, HIV-1 [37, 39] and Ebola [47]), which elude detection when following traditional quadruplex definitions. To overcome these problems, we took advantage of the great adaptability of G4-iM Grinder, and developed, tested and successfully employed a lax quadruplex definition configuration for the analysis. With these settings, the number of candidates found increased greatly and included the complex sequences expected in viruses, at the expense of needing more computational power.
SARS-CoV-2
With all these updates and configurations at hand, we focused on the reference SARS-CoV-2 and located 323 PQS and 189 PiMS unique (only occurring once in the genome) sequences dispersed unevenly in its genome (). 20% of these candidates had at least a medium probability of formation (|score| ≥ 20), and 7 PQS and 10 PiMS had a |score| ≥ 30 (). Candidates with at least a medium probability of formation concentrate in the N, S and especially in the orf1ab gene (in the nsp 1 and 3 regions for PQSs and in the nsp 3, 4 and 12 regions for PiMS). The orf3a, orf8 and UTR regions also presented these candidates. Other genes, such as orb7a and b, and orf10 were found totally void of them.A. Top. Percentage of conservation of each PQS found along the genome of the SARS-CoV-2. Each point represents one PQS. The PQS score is given by the fill colour of the points, where lower |scores| are greyer, and bluer points have higher |scores|. Bottom. PQScount density plot related to the genome position (counts per 200 nucleotides). Grey coloured density plots are all the results found, whilst blue density plots are the results found with at least a |score| ≥ 20. B. Distribution of the biological features of the SARS-CoV-2 by its genomic position. UTR regions are in red, CDS and genes region are in green, and nps of the orf1ab gene are in purple. Orange dots are mature protein regions of the CDS. C, Top. PiMS count density plot related to the genome position (counts per 200 nucleotides). Grey coloured density plots are all the results found, whilst yellow density plots are the results found with at least a |score| ≥ 20. Bottom. Percentage of conservation of each PiMS found along the genome of the SARS-CoV-2. Each point represents one PiMS. The PiMS score is given by the fill colour of the points, where lower |scores| are greyer, and higher |scores| are more yellow. D. Top scoring PQS (Score ≥ 30, entry 1 to 7) and PiMS (Score ≤ -30, entry 8 to 17) found in the SARS-CoV-2 ordered by their localization in the genome. G-runs are in blue, C-runs are in yellow, loops are in red and bulges within the runs are in green. For each entry, the biological feature column lists the genomic landmark that hosts the potential quadruplex. The percentage of conservation is also given.We calculated the SARS-CoV-2candidate’s quadruplex conservation rates and quadruplex-related region variability under three different scopes.First, attention was focused exclusively on the virus in an intra-species analysis comprising 17312 genomes of the SARS-CoV-2 sequenced at different places and times of the pandemic. Here, we found that the least conserved candidates were located in the 5’UTR, orf1ab and N regions with conservation as low as 9.8%. On the other hand, most of the sequences analysed that |scored| ≥ 20 presented conservation rates of over 99% (46/71 PQS and 21/35). Of these, only 18 PQSs and 7 PiMSs rates surpassed that of the mean sequence identity percentage between the 17312 SARS-CoV-2 and the reference genome (99.83%). To further investigate these differences, we first identified the 5429 new PQSs and 3298 new PiMS variants that |scored| ≥ 20 amongst all the SARS-CoV-2 genomes and then associated them with the versions found in the reference genome. In this manner, we identified for one of the highest-scoring PQSs found in the N-gene (entry 7, and entry 1, ) a variant with the same probability of formation (entry 3, ), which is exclusive to the lineages within B.1.1/clade GR and B.1.160/clade G. These have a substitution of a C for a U in the first loop, and together with several other less frequent variations with similar modifications in the loops, partially explain its 99.08% conservation rate. Furthermore, a nearby four-membered G-run may influence this PQS, to the point of potentially being a fifth domain [63, 64] or forming an alternative G4 (entry 2, ). This extra G-run is separated from the PQS by a 19-nucleotide long loop that has a conservation rate of only 35%. The most frequent variants found for this poorly conserved area were also the substitution of a C for a U, as seen before (entry 4, ). Variants of lineage A/Clade S displayed a different substitution, where a C mutates to a G and becomes an additional G-run, which can further influence the PQS (entry 5, ). How this affects the known activity of the PQS and the N gene is yet to be determined [65]. Variants of specific lineages with heightened quadruplex formation probability were also detected for several other high scoring candidates, including a PQS found in the 5’UTR area (entry 1, and entry 6, ) and a PiMS in the orf1ab gene (entry 14, and entry 8, ), both of which are the only results found in SARS-CoV-2 with high a probability of forming quadruplex (|score| ≥ 40).
Fig 2
A. Top. Percentage of conservation of each PQS found along the genome of the SARS-CoV-2. Each point represents one PQS. The PQS score is given by the fill colour of the points, where lower |scores| are greyer, and bluer points have higher |scores|. Bottom. PQS count density plot related to the genome position (counts per 200 nucleotides). Grey coloured density plots are all the results found, whilst blue density plots are the results found with at least a |score| ≥ 20. B. Distribution of the biological features of the SARS-CoV-2 by its genomic position. UTR regions are in red, CDS and genes region are in green, and nps of the orf1ab gene are in purple. Orange dots are mature protein regions of the CDS. C, Top. PiMS count density plot related to the genome position (counts per 200 nucleotides). Grey coloured density plots are all the results found, whilst yellow density plots are the results found with at least a |score| ≥ 20. Bottom. Percentage of conservation of each PiMS found along the genome of the SARS-CoV-2. Each point represents one PiMS. The PiMS score is given by the fill colour of the points, where lower |scores| are greyer, and higher |scores| are more yellow. D. Top scoring PQS (Score ≥ 30, entry 1 to 7) and PiMS (Score ≤ -30, entry 8 to 17) found in the SARS-CoV-2 ordered by their localization in the genome. G-runs are in blue, C-runs are in yellow, loops are in red and bulges within the runs are in green. For each entry, the biological feature column lists the genomic landmark that hosts the potential quadruplex. The percentage of conservation is also given.
Fig 3
A. Sequences found in the SARS-CoV-2 reference genome (those with a starting position) and some of the variants identified in specific lineages for four high scoring candidates. Mutations are underlined. B. Centre, SARS-CoV-2 phylogenetic tree by clade and lineage of the sequences analysed. Lineages with less than 100 genomes were grouped (suffix x). Inner segment, Lineage: Mean PQS count (A), Mean PQS count with |score| ≥ 20 (B) and PQS divergence from the reference genome (C). Centre segment, Mean lineage percentage sequence identity with the reference genome (dots) compared to the overall mean found for the 17312 sequences analysed (black line). Outer segment, Clade: Mean PQS count (A), Mean PQS count with |score| ≥ 20 (B) and PQS divergence from the reference genome (C). R-packages used: ggtree [66] and circlize [67].
A. Sequences found in the SARS-CoV-2 reference genome (those with a starting position) and some of the variants identified in specific lineages for four high scoring candidates. Mutations are underlined. B. Centre, SARS-CoV-2 phylogenetic tree by clade and lineage of the sequences analysed. Lineages with less than 100 genomes were grouped (suffix x). Inner segment, Lineage: Mean PQScount (A), Mean PQScount with |score| ≥ 20 (B) and PQS divergence from the reference genome (C). Centre segment, Mean lineage percentage sequence identity with the reference genome (dots) compared to the overall mean found for the 17312 sequences analysed (black line). Outer segment, Clade: Mean PQScount (A), Mean PQScount with |score| ≥ 20 (B) and PQS divergence from the reference genome (C). R-packages used: ggtree [66] and circlize [67].We observed significant differences between the SARS-CoV-2 lineages and clades when considering the overall PQS differences. On the one hand, the GR clade displayed a reduced number of PQSs, PQSs that |scored| at least 20 and the least number of variants per genome analysed (). On the other hand, The S clade presented, on average, additional PQSs in their genome and a higher number of variants per genome analysed. In either case, both clades differed significantly from the reference genome, as well as amongst themselves. The rest of the clades presented fewer differences although some specific lineage aggrupations (B.1.1/Clade O and B.1.1/Clade G) also displayed a lower number of PQSs overall. For PiMS, the differences between clades were smaller and more homogeneous (S1 File, section 3, ).The search was then expanded to the rest of the Coronaviridae family. 53 SARS-CoV-2PQS and PiMS candidates were found in common with the SARS-CoV and/or Bat coronavirus BM48-31/BGR/2008 (Bat-CoV-BM), all of which are suspect of having bats as hosts during their evolution (S1 File, section 3, ). These common sequences were located in the 3’UTR, N and E genes of the SARS-CoV-2, although most were positioned in the orf1ab gene, and especially in the 5’UTR region. Paradoxically, the candidates found in the 5’UTR site (which regulates the translation of the RNA transcript) include the least conserved group of candidates of the inter-species analysis (with conservation rates as low as 9%), while also hosting a very conserved family-wise group of candidates. On the one hand, high conservation in candidates (maintained through natural selection) may be an important factor for the survival of the virus. This importance may transcend beyond the SARS-CoV-2 and into other familiar species were PQS and PiMS were found in common. On the other hand, variability in the region may also play a vital role in the ability of the virus to adapt to new hosts, situations and environments.The highest |scoring| candidates found in SARS-CoV-2 were however not common to any other Coronaviridae member species. So, we investigated the differences between them through genome alignments and found that most of the sequence versions amongst species (6 out of 8) were still able to form potential quadruplex structures even with modifications. Therefore, these PQS and PiMS, although different from those in the SARS-CoV-2, maintain their potential biological role and importance.Expanding the search for common candidates to the entire virus realm, we matched one PQS and PiMS from the SARS-CoV-2 with the potential quadruplexes found in four viruses from Group I belonging to the Herpesviridae, Podoviridae and Siphoviridae families (all dsDNA) which cannot be explained by the number of sequences analysed.
SARS-CoV-2 and the virus realm
We analysed the entire virus realm in a similar fashion to other studies in the literature [68, 69]. However, we employed the lax definition of quadruplexes to detect G- and C- structures and searched for verified G4 and iM sequences already described in literature. These results were then matched and compared to SARS-CoV-2.Whilst the SARS-CoV-2 did not present any of the published quadruplex sequences listed in the GiG.DB (as of V2.5.0) within its genome, other viruses including a wigeon-afflicting Coronavirus did. In the entire virus realm, 1725 viruses presented at least one confirmed G4 sequence in their genome, while 195 at least one confirmed iM sequence (the dimensional discrepancies between both results may partially be due to the difference in the number of G4 and iM entries in the database; 2568 and 283 respectively). The sheer volume of species with confirmed quadruplex structures in all groups of viruses suggests that quadruplexes may be common and necessary genomic regulatory elements for viruses, as seen in other organisms such as humans. However, the prevalence is not homogeneous and varies broadly at the group level although not that much at the family level. For example, some families like Group I’s Herpesviridae and Sphaerolipoviridae, Groups IV’s Matonavirirdae and Flaviviridae and Groups II’s Spiroviridae presented the highest PQS densities; whilst Groups V’s Aspiriviridae and Fimoviridae, Groups IV Mononiviridae and Mesoniviridae and Group’s I Mimiviridae displayed the lowest. PiMS showed a similar tendency with Group I (Sphaerolipoviridae and Herpesviridae) and especially IV (Tymoviridae, Matonaviridae and Gammaflexiviridae) families being the densest in candidates; whilst Groups IV (Monoviridae and Yueviridae), Groups V (Fimoviridae and Phasmavirirdae) and Groups I families (Mimiviridae) displayed the lowest. These results indicate that viruses/families (and particularly single-stranded ones) are probably more oriented to a kind of quadruplex structure in a group/genome-type independent manner, whilst being contingent upon cation concentration and pH of the environment for formation.Altogether, the SARS-CoV-2 genome displayed a quadruplex candidate scarcity when compared in a macroscopic perspective to the virus realm. Its PQS and PiMS densities were in the lower end of results from the Coronaviridae family, which itself was in the lower end of the (+) ssRNAGroup IV (in an approximate ratio of 1:2:4 for PQS and 1:2:8 for PiMS). When put into the entire virus realm context, the SARS-CoV-2PQS density was lower than 5813 other viruses analysed (out of 6680), whilst PiMS density was lower than 6125. Furthermore, when we compared the SARS-CoV-2 reference genome results with the results of five hundred randomly shuffled genomic sequences of size and composition equal to that of SARS-CoV-2, the number of candidates found in the SARS-CoV-2 was significantly lower than the mean expected number of candidates for the genome’s size and composition. Whilst 362 ± 42 PQSs were expected, only 323 were found in the SARS-CoV-2. Similarly, 97 ± 22 PQSs that score over 20 and 3.0 ± 2.6 candidates that score over 40 were expected with this genomic size and composition, whilst 71 and 0 were found in the virus, respectively. For PiMS the total number of expected candidates for the SARS-CoV-2 size and genome composition was of 250 ± 33, whilst candidates that score -20 or less and -40 or less was of 60 ± 16 and 1.5 ± 1.6, respectively. However, SARS-CoV-2 presented only 189, 32 and 0 PiMSs for each of these respective groups. Although the SARS-CoV-2 genomic organization limits the number of potential quadruplex structures almost to its minimum, other viruses with similar low quadruplex densities were identified here to possess confirmed G4 and iM sequences within, supporting the potential these structures have for targeting the SARS-CoV-2.
Candidate confirmation in vitro
We, therefore, selected the best candidates to evaluate in vitro. NMR spectra of CoVID-RNA.G4-1 and CoVID-RNA.G4-2 exhibited imino signals in the 10.5–12.0 region, characteristic of guanine imino protons involved in G-tetrads (). In both cases, CD spectra also showed the characteristic positive band of parallel G-quadruplexes, which together with the NMR results confirmed the formation of very stable structures. The highly conserved CoVID-RNA.G4-1 located in the N-gene can possibly interact with the viral RNA packaging, transcription and replication functions [70]. In fact, it has been shown in a recent study that a known G4-ligand can interact with this sequence and reduce the expression of the N protein [65]. Although CoVID-RNA.G4-2 also formed a stable parallel quadruplex, the signals in the NMR spectra were broader than for CoVID-RNA.G4-1. This might be due to the formation of higher order structures through self-association between G-quadruplex units. CoVID-RNA.G4-2 is located in the nsp3 region of orf1ab very near its SUD domain. This area has been associated with the increased pathogenicity of the virus compared to other Coronaviridae that do not present it [71]. Additionally, it has been suggested that the SUD domain interacts with G-quadruplexes of the host. These results, however, open the possibility of an intrinsic gene modulation that may be linked with an increased virulence. Such a hypothesis can be extended to the SARS-CoV, as another stable PQScandidate was found in its genome in the same location (S1 File, Section 3, Fig 3B1).
Fig 4
A. The candidates examined in vitro through biophysical assays. The in vitro column states if the sequence forms a quadruplex (Y for Yes, N for No). B, NMR spectra of the two RNA-G4s analyzed at different temperatures (pH 7.0). C, NMR spectra of the DNA-iM analyzed at different temperatures (pH 5.3). D, CD analysis of the two RNA-G4 analyzed.
A. The candidates examined in vitro through biophysical assays. The in vitro column states if the sequence forms a quadruplex (Y for Yes, N for No). B, NMR spectra of the two RNA-G4s analyzed at different temperatures (pH 7.0). C, NMR spectra of the DNA-iM analyzed at different temperatures (pH 5.3). D, CD analysis of the two RNA-G4 analyzed.For PiMS, we used NMR to confirm that the DNA version of a candidate located in the orf1ab gene of the SARS-CoV-2 and with a 99.54% conservation rate formed an iM at almost neutral pH ( and S1 File, Section 3, Fig 5). However, the SARS-CoV version of the iM (which differs by one nucleotide in the first loop, from TT to TG) was unable to form even at pH 5.1. As TT base pairs are common capping positions, the substitution of the T might prevent the folding in SARS-CoV. Additionally, the presence of C in G4s lowers overall stability of the quadruplex as Ccan base pair with G and ultimately hinder G-quartet formation [72]. Similarly, the pairing of C with G may also impede the formation of the C-based structures. When we analysed the RNA version of the SARS-CoV-2 iM, it did not form an iM. Despite the fact that the sequences found in SARS-CoV-2 have an intermediate probability of formation, RNA iMs are known to be less stable than their DNA-versions [73]. Still, G4-iM Grinder methodology identified several more candidates with the potential to form iMs in the virus.
PQS result comparison
The results of G4-iM Grinder were compared to other recent reports of quadruplex-related analysis in the single strand of SARS-CoV-2. QGRS mapper [74] was the main tool for the search because of its browser-based interface, its predefined capability to detect two-sized G-runs and its design that returns all the PQSs found independently of their score [65, 75–78]. Other search engines such as G4Hunter and PQSfinder automatically filter their results by their score threshold, which makes criterion optimization fundamental to successfully execute the analysis. For example, one PQSs was found with a threshold of ≥ 1.2 and none with higher thresholds when using G4Hunter in the virus (in its scale of -4 to 4) [75]. On the contrary, 25 candidates have been reported using QGRS mapper with very small scores (mean QGRS Score of 12 ± 5 in QGRS mapper’s scale of ≈ 0 to 100; mean G4Hunter score of 0.6 ± 0.2 in G4Hunter scale). G4catchall [79], PQSfinder and QGRS mapper methodologies were also combined to select 15 PQSs, 13 of which were part of the original QGRS mapper results [80]. Except one, all of these sequences reported to date in SARS-CoV-2 have been found with G4-iM Grinder and are part of the analysis made here. These are (mainly) part of the 71 sequences with a medium probability of forming G4 (scored between 20 and 40 in G4-iM Grinder’s scale). G4-iM Grinder however, found 47 extra PQSs that have not been previously reported for the SARS-CoV-2 with the same probability of forming G4s. Additionally, over 5000 different variants of these PQSs were also identified with the same probability, in the analysis of the 17312 different SARS-CoV-2 genomes.Overall, these results complement the current knowledge we have regarding quadruplexes and the SARS-CoV-2. They also broaden the way for targeting viruses in general, and the SARS-CoV-2 in particular, through the use of these nucleic sequences as therapeutic targets in future anti-viral treatments. G4-ligands based on small molecules that can stabilize G4s have recently been proposed to be viable antivirus strategies for viruses such as Ebola, HIV and HCV (reviewed in [50]). For the SARS-CoV-2, G4-ligands have already been reported to significantly reduce protein translation levels in vivo and in vitro [65]. Another report highlighted the existing evidence indicating that helicase inhibitors may also exert antiviral activity as another therapeutic approach for SARS-CoV-2 [78].(PDF)Click here for additional data file.
Transfer Alert
This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.14 Jan 2021PONE-D-20-34438Exploring G- and C-quadruplex structures as potential targets for the severe acute respiratory syndrome coronavirus 2PLOS ONEDear Dr. Belmonte Reche,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.After careful review of the comments, it appears that the reviewers have a several issues with the manuscript. One of the main concerns was the use of the 'lax' setting for the G4-iM Grinder and the resulting predicition of G-quadreplexes based on this setting, as well as comparision with alternative G-quadraplex tools such as e.g. cGcC or G4Hunter.Please address these concerns, along with the other comments made by the reviewers, in your re-submission.Please submit your revised manuscript by Feb 27 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocolsWe look forward to receiving your revised manuscript.Kind regards,EricCharles Dykeman, Ph.D.Academic EditorPLOS ONEJournal Requirements:When submitting your revision, we need you to address these additional requirements.1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttps://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf andhttps://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #1: PartlyReviewer #2: PartlyReviewer #3: Partly**********2. Has the statistical analysis been performed appropriately and rigorously?Reviewer #1: YesReviewer #2: YesReviewer #3: No**********3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: YesReviewer #3: No**********4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #1: NoReviewer #2: YesReviewer #3: Yes**********5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #1: Exploring G- and C-quadruplex structures as potential targets for the severe acute respiratory syndrome coronavirus 2By Efres Belmonte-Reche* et al (*Corresponding author)Submitted to PLoS One (Editorial No. PONE-D-20-34438)General CommentsG- and C-quadruplex secondary structures (G4s and iMs, respectively) are found in the cellular genomes of many animal species and have complex roles in the regulation of metabolic pathways, including the biochemistry of telomeres and of oncogenic promoters (genome stability). Bacteria and many viruses have recently been explored for the presence of such structures in their genomes. Here, the genomes of SARS-CoV-2 isolates (of the reference strain, GCF_009858895.2, and of >3000 subsequent isolates), of other coronaviruses (CoVs) and of other (+)ssRNA viruses were investigated for the presence of such structures using the G4-iM Grinder (GiG) algorithm with some modifications. The quadruplex sequence structures identified were evaluated for the probability of occurring in infectedcells, their position in the genome, and the degree of conservation in closely related viruses of the Coronaviridae family. The ‘best’ candidates for potential quadruplex formation were explored for stability by biophysical techniques (NMR, CD spectroscopy). Several relatively stable quadruplexes were identified which may be considered as possible targets for the development of novel antivirals.The data observed are interesting. However, their presentation is rather confusing:- The Materials and Methods section frequently refers to the Suppl. Mat. component of the manuscript for information which should be in the main text;- The modifications of the GiG method used are not properly explained;- The ‘detailed analysis of the Results’ is moved into Suppl. Mat. which is rather confusing and also has led to some duplications in panels of figures in the main text and in Suppl. Mat. ;- The frequency of quadruplex structure candidates is very high at the lax quadruplex definition, but the relevance of such data in biological terms is unclear;- The significance of differences in conservation of potential quadruplex structures is not clear;- Text and Legend of Fig. 2 differ in the nomenclature of structures assessed;- The findings of this manuscript are claimed to ‘greatly expand the current knowledge regarding quadruplexes and the SARS-CoV-2 in particular [refs 66-68]’ (lines 320/321), but no details are given;- No particular information is given on compounds which could potentially react with (and disturb) quadruplex structures, using them as therapeutic targets.Numerous clarifications are requested, and the presentation of this potentially very important information should be improved.SpecificCommentsLine1 Reconsider title, e.g., ‘Potential G- and C-quadruplex structures of SARS-CoV-2 as potential targets for the development of antivirals’, or similar. [Since no antiviral candidates are even mentioned, consider: ‘G- and C-quadruplex structures of SARS-CoV-2 and their potential biological relevance’]18 A short paragraph should introduce the topic.28 … the entire realm… Please clarify. …. in common with other species… Please specify.32 … may be suitable targets for… Please clarify (see comment above).38 Consider citation of: Lau SKP, et al. Possible Bat Origin of Severe Acute Respiratory Syndrome Coronavirus 2. Emerg Infect Dis. 2020 Jul;26(7):1542-1547.Andersen KG, et al. The proximal origin of SARS-CoV-2. Nat Med. 2020 Apr;26(4):450-452.45 … pathogenic bacteria and viruses…47 Consider citation of:Menachery VD, et al. A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Nat Med. 2015 Dec;21(12):1508-13.Peck KM, et al. Coronavirus Host Range Expansion and Middle East Respiratory Syndrome Coronavirus Emergence: Biochemical Mechanisms and Evolutionary Perspectives. Annu Rev Virol. 2015 Nov;2(1):95-117.57 … DNA or RNA [read RNA throughout the ms; ARN is the French abbreviation.]74 Consider reading: … great potential as targets for virus inhibition…77 Clarify the description of data from ref. [49]. Correct the site of publication to: NAR Genomics and Bioinformatics, Volume 2, Issue 1, March 2020, lqz005, https://doi.org/10.1093/nargab/lqz00578 Support the statement by refs.79 Rephrase sentence: … to the ongoing research efforts related to the COVID-19 pandemic by investigating SARS-CoV-2 for the presence of quadruplex structures…84 Rephrase sentence.111 … presence of known-to-form quadruplex structures… Please clarify.133 and 158f. Please clarify what ‘positive’ and ‘negative’ mean in this context. Are these designations just used for presentational purposes?154 Fig. 1A. Consider showing a quadruplex structure.172f This summarizing referral of the reader to Suppl. Mat. is confusing. In addition, in detail there are partial duplications in figures of Suppl. Mat. and the main text. It should be considered to transfer essential components of Suppl. Mat. into the main text (including relevant figures). See comments below.208 to 213. The meaning of this text is not clear. The concluding sentence ‘Here they may play their biological role if formed’ is speculative and should be omitted as long as no hard data are available.220 This interpretation of Fig. 2A does not describe what is shown. Please adjust accordingly.224 and 230. The citation of components of Suppl. Mat. is out of order and confusing.247 Fig. 2, panel C. Explain how COVID DNA.iM-1 was obtained and what this means in context.260f Clarify sentence.267 … quadruplexes may be common and… for viruses to ‘live’, thrive and adapt… This statement is highly speculative and should be considered for omission.293 and 296. Clarify: … CoVID-RNA G4-1… CoVID-RNA G4-2… The interpretation of the latter structure is highly speculative and should be omitted.310 … the opposite but with the same effect might also be happening… Please spell out more clearly what you want to say.318f This paragraph should contain details of how the data presented here augment those of refs. [60, 61, 63, 66-68]. Furthermore, the potential of developing novel antivirals interfering with quadruplex structure formation, should be assessed in more detail.476 Correct ref. , see comment above.520 Qu X is last, not first author.Suppl. Mat.Page3 The new functions developed and their exact website locations should be mentioned under Methods in the main text.4 The subheadings of Suppl. Fig. 1 could be incorporated into Fig. 1, making Suppl. Fig. 1 redundant.5 Essential information of ‘in silico methodology’ should be in the Methods section of the main text.6 The procedure of NMR and circular dichroism experiments should be in the Methods section of the main text.7 A condensed version of this text should be transferred to the Methods section of the main Text.Paragraph 2, line 3. Omit ref [2].8f Suppl. Fig. 2 panel A duplicates Fig. 2 panel A. Suppl Fig. 2 panels B-D could remain in SupplMat.11f Abbreviated versions of these data should be transferred to the main text.14f and Suppl. Fig. 4 could remain in Suppl. Mat. in a condensed form.18f Biophysical experiments. The Suppl. Fig. 5 duplicates data of Fig. 2. The section should be transferred to the main text and abandoned in Suppl. Mat.21f ‘Other bioinformatics figures’. Suppl. Figures 6-11 and 13-16 are not considered to be essential. Suppl. Figs. 12 and 17 could be considered for the main text.33f ‘Other biophysical figures’. Suppl. Figs. 18 and 19 could be considered for Suppl. Mat., preferentially Suppl. Fig. 19.35 to 37. Condensed components of this information (including relevant additional refs.) should be transferred to Methods of the main text.Reviewer #2: In their manuscript the authors present a genome wide screen for putativeG- and C Quadruplexes sites (PQS and PiMS, respectively) in SARS-CoV-2 and otherviral genomes. For that purpose, the authors updated the pre-existingR-based scanning tool G4-iM Grinder.The genome wide screen in the SARS-CoV-2 reference genome resulted in about300 PQS and about 200 PiMS. However, when the resulting scores were filteredby a previously suggested value of |score| > 40, none of the predicted sitesremained. Therefore, the authors concentrated on those results with |score| > 20which consisted of 71 PQS and 32 PiMS. The authors then compared thesesites to other viral genomes within the Coronaviridae, the entire group IVand a selection of all virus genomes to find sites conserved within differentvirus species. Furthermore, the authors conducted NMR and CD experiments ona handful of selected high-scoring PWS and PiMS found in SARS-CoV-2 referencegenome. Here they found that both of the PQS tested do form a G-quadruplex.The single PiMS, however, did not show formation of a C-quadruplex. However,for the latter the authors also tested a DNA variant instead and found, thatin DNA this sequence indeed adopts a C-quadruplexconformation. Interestingly,a single point mutation in the first loop region this DNA variant disruptsthe ability to form the C-quadruplex.Overall, the article is very well written, and the analysis performed by theauthors is comprehensive and sound. Nevertheless, there are some minor pointsthat require more attention or need to be described in more detail. Furthermore,in some instances the authors draw conclusions that I can't follow or agreewith.In particular, I have the following remarks and questions:1. In the methods section, the authors state that in their analysis theychose 3 different configurations of G4-iM Grinder, the lax, the default,and the 'original folding rule'. However, all Results are only for thelaxconfiguration. If I understand correctly, this was the only configurationthat yielded any results for SARS-CoV-2. If none of the results presentedin this study (and the corresponding supplement), please correct the Methodssection accordingly.Along with that, I wonder how the authors justify their threshold of |score|>20.From the original G4-iM Grinder publication I took that |score|>40 is suggestedfor good prediction performance. Here, I am missing some mroe discussionabout this lower threshold, especially in combination with the laxconfiguration,as I assume that this might greatly increase the number of false positive predictions!2. The authors should consider comparing their result of 73 PQS to those foundin another previous study by Ji et al. 2020 "Discovery of G-quadruplex-formingsequences in SARS-CoV-2", Briefings in Bioinformatics (https://doi.org/10.1093/bib/bbaa114)3. 3rd paragraph of the Materials and Methods section, line 125ff. The authorsstate that a more flexible configuration requires 'more computing power'. Maybe,I oversaw this but I wonder why this is the case and how the computation timescales with the choice of the different 'flexibility' parameters. Does theanalysis scale linearly, quadratically, or even exponential in the number ofcandidate sites? One can only speculate at this point. Especially, since theunderlying scoring schemes (cGcC, G4Hunter, etc) seem to be independenton the flexibility of the sequence constraint.4. Results and discussion SARS-CoV-2, 5th paragraph, lines 242-245, as wellas Supplement Section 3, 'SARS-CoV-2, the Virus Realm and quadruplexes'.The authors matched PQS and PiMS found in SARS-CoV-2 in other viruses andclaim that these findings 'cannot be explained by chance'. I argue against that,since there is no obvious relation between dsDNA viruses and the ss+virusSARS-CoV-2. Given the probability in the order of 1e-13 as derived in thesupplement and the vast size of the entirety of viral genomes, I would expect afew sequences harboring common subsequences in the size of the PQS tested.Note, that most viruses mutate much faster than bacteria, eucaryotes, or archaea,so chances are, that subsequences of length of about 30 develop independently.Especially if the other hypothesis is that dsDNA and +strand ssRNA viruses aresomehow related and have conserved these small pieces of subsequence duringevolution.5. For the conservation of the PQS found in SARS-CoV-2 reference and the remaining3297 SARS-CoV-2 genomes, please also state the overall sequence identity. Otherwise,simply showing that the PQS are conserved by 98.6%+-7.4% renders it difficult toassess whether this is expected, or unexpected.6. Suppl. Figure 17 and related text. The examples given do not seem well chosen.First, the linker sequences between the G-runs are quite large compared to the numberof layers (2). So I'd assume that, if at all, the resulting quadruplexes would beexceptionally weak. Second, the regions of Cluster 1 and Cluster 2 are locatedin the 5'UTR which is know to be well structured and conserved throughout allbetacoronavirus and even the remaining coronaviruses. In particular, Cluster 2overlaps the well annotated SL5C which is required for replication. Cluster 2resides in a highly complementary region that is known to form well conservedsecondary structure. This might also explain the low conservation of 27% among thecoronaviruses, if I'm not mistaken, since secondary structure, not G-quadruplexseems to play the most important role here. The authors should therefore relatetheir findings to known annotation of SARS-CoV-2.Along with that, the authors should add to their discussion a paragraph aboutthe reliability of their predictions, if such can be assessed. Especiallythe PQS (and PiMS) with just 2 layers and/or bulges are known to be lessstable, thus potentially do not form at all (in vitro or in vivo). Moreover,even when NMR and CD measurements of the PQS suggest their formation, theystill have to compete against regular secondary structure formation when theyreside in their viral sequence context. The authors should elaborate on thatproblem, at least in the discussion.General remarks:- There are multiple occurrences of ANR instead of RNA throughout the manuscriptand Supplement. Please correct them.- In the 4th paragraph of the introduction, there is a 'nucleic acid sequences'right after DNA and RNA. This is redundant, since the NA in both already standsfor Nucleic Acid- In the second paragraph of Materials and Methods, line 112. What do theauthors attempt to convey with 'presence of known-to-form quadruplex structuressequences'? Maybe a simple 'presence of known-to-form quadruplexes' would suffice?Reviewer #3: Thank you for the opportunity to review the manuscript: “Exploring G- and C-quadruplex structures as potential targets for the severe acute respiratory syndrome coronavirus 2". This manuscript is presenting analyses of potential G-quadruplex-forming sequences (PQS) in the genome of SARS-CoV-2 and related viruses. It was interesting to read this, but there are several points which must be improved remarkably before its publication. The main problems are 1. Conclusions of the analyses are wrongly interpreted, 2. The various methods must be used for their results evaluation and 3. The used methods must be described in manuscript with details. Although I found this manuscript interesting the major revision is necessary before its publication. I recommend a major revision of this manuscript.Major points:1. Conclusion are misleading. The authors find that compare to other tested viruses, there are no potential G-quadruplex forming (PQS) sequences in SARS-cov2 genome. Then they changed the algorithm for PQS search – “lax” configuration – with loop size 20bp – and claimed that these PQScould be may be suitable targets. This approach is wrong. The stability of these G-quadruplexes will be extremely low, so their formation in vivo is very doubtful. The right conclusion will be that compare to other genomes – the G-quadruplexes are probably very rare (if presented) in SARS-cov2.2. The alternative algorithms must be used as a control and discussed (for example QGRS Mapper and G4Hunter web are freely available and easily accessible for quick evaluations).3. The Title should be improved, the authors did not target any structures, just predicted them.4. Description of experimental methods in Material and Method section is missing.5. How the PQS score is set? What these number means? Please compare with G4Hunter score and QGRS score.6. The comparison of PQS in real genomes must be compared statistically with scrambled/random sequences – it is possible than the same amount or maybe more PQS will be in scrambled sequence. With this control your analyses do not prove anything.7. Do not use suggestion the the Result part. Just described your results (e.g. line 231: Here they may play their biological role if formed., line: 266, 277: „may be“. We are not interested what „may be“ in Results, but we would like to see the facts only. Keep your theories and “maybe” into Discussion part.8. Figure 2B. Compare results with already in vivo proved sequences.9. Figure 2C, Show the same result with confirmed iM and compare the spectra.Minor points:1. Please polish the terminology. I-motifs are formed by four strands, however to use C-quadruplex is unusual, because the structure is completely different from G-quadruplexes. Fours strand structures: G-quadruplexes and i-motifs.2. Please do not repeat the same first sentence in Methods and Results section.3. Line 195 and 196 – do not discuss in Result section.4. Line 294 and 295 – again this is not your result so move to discussion part5. Line 316 and 317 – I do not understand. How your results on RNA viruses proved “the especially for DNA, G4-iM Grinder can be used”? Any algorithm can be used if you used “lax” settings.**********6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: NoReviewer #3: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.24 Feb 2021A rebuttal letter that responds to each point raised by the academic editor and reviewer has been submitted.A marked-up copy of the manucripst that highlights changes made to the original version has been submitted.An unmarked version of the revised paper without tracked changes has been submitted.Figures that comply with PACE, have been submitted within the RAR file.The supplementary material has been submitted.Submitted filename: DV.2.docxClick here for additional data file.25 Mar 2021PONE-D-20-34438R1Potential G-quadruplexes and i-Motifs in the SARS-CoV-2PLOS ONEDear Dr. Belmonte Reche,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.==============================In particular, one of the referees notes a technical issue with the way that you have performed your analysis on random 16-nt long sequences and the probability that they would form an exact match. Specifically,"The authors matched PQS and PiMS found in SARS-CoV-2 in other viruses and claim that these findings 'cannot be explained by chance'."Although you have addressed this in your response, the second referee notes that, because of SARS-Cov2 genome has a length of > 25000 nt, this presents multiple chances for alignment, while your analysis only gives statistics when comparing 16 nt against 16 nt (not 16nt against 25,000+), and thus the referee is concerned that the presence of the PQS and PIMS are more common then you expect. I would be glad to re-consider your revised manuscript after you have responded to this issue.==============================Please submit your revised manuscript by May 09 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,EricCharles Dykeman, Ph.D.Academic EditorPLOS ONE[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.Reviewer #2: All comments have been addressedReviewer #3: All comments have been addressed**********2. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #2: YesReviewer #3: No**********3. Has the statistical analysis been performed appropriately and rigorously?Reviewer #2: N/AReviewer #3: No**********4. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #2: YesReviewer #3: Yes**********5. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #2: YesReviewer #3: Yes**********6. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #2: The authors revised their manuscript and properly addressed all the comments and questions raised in my previous report.Reviewer #3: 1. The experiment with the "shuffled" or "random" sequence was preformed wrongly. :"To complement the analysis, we randomly shuffled sequences (16 nucleotides long)."Please take complete virus sequence then shuffle and analyse this 29903 nt long sequence - and compare the number of PQS sequences in the real virus sequence and "shuffled" sequence.2. The abstract do not correspond to the results. It was found that there are less PQS in Coronaviridaecompare another viruses. This fact must be part of the abstract and disscussed accordingly.3. "hundreds of potential quadruplex candidates were discovered" - please use exact numbers and significance in the abstract statements.**********7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #2: NoReviewer #3: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.1 Apr 2021Please find the response to Reviewer 3 second comments attached on submission (as it has graphs which cannot be just pasted here).Submitted filename: Reviewer.answers.2.March.docxClick here for additional data file.12 Apr 2021Potential G-quadruplexes and i-Motifs in the SARS-CoV-2PONE-D-20-34438R2Dear Dr. Belmonte Reche,We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.Kind regards,EricCharles Dykeman, Ph.D.Academic EditorPLOS ONEAdditional Editor Comments (optional):Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.Reviewer #3: All comments have been addressed**********2. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #3: Yes**********3. Has the statistical analysis been performed appropriately and rigorously?Reviewer #3: Yes**********4. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #3: Yes**********5. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #3: Yes**********6. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #3: Thank you. Authors have improved the manuscript and added requested analyses. I recommend this manuscript for publication.**********7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #3: No27 May 2021PONE-D-20-34438R2Potential G-quadruplexes and i-Motifs in the SARS-CoV-2Dear Dr. Belmonte-Reche:I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.If we can help with anything else, please email us at plosone@plos.org.Thank you for submitting your work to PLOS ONE and supporting open access.Kind regards,PLOS ONE Editorial Office Staffon behalf ofDr. EricCharles DykemanAcademic EditorPLOS ONE
Authors: Vineet D Menachery; Boyd L Yount; Kari Debbink; Sudhakar Agnihothram; Lisa E Gralinski; Jessica A Plante; Rachel L Graham; Trevor Scobey; Xing-Yi Ge; Eric F Donaldson; Scott H Randell; Antonio Lanzavecchia; Wayne A Marasco; Zhengli-Li Shi; Ralph S Baric Journal: Nat Med Date: 2015-11-09 Impact factor: 53.440
Authors: Rubén Cebrián; Efres Belmonte-Reche; Valentina Pirota; Anne de Jong; Juan Carlos Morales; Mauro Freccero; Filippo Doria; Oscar P Kuipers Journal: J Med Chem Date: 2021-12-20 Impact factor: 7.446