Literature DB >> 33279989

Next generation sequencing of SARS-CoV-2 genomes: challenges, applications and opportunities.

Matteo Chiara¹, Anna Maria D'Erchia², Carmela Gissi², Caterina Manzari³, Antonio Parisi⁴, Nicoletta Resta⁵, Federico Zambelli¹, Ernesto Picardi⁶, Giulio Pavesi⁷, David S Horner¹, Graziano Pesole⁸.

Abstract

Various next generation sequencing (NGS) based strategies have been successfully used in the recent past for tracing origins and understanding the evolution of infectious agents, investigating the spread and transmission chains of outbreaks, as well as facilitating the development of effective and rapid molecular diagnostic tests and contributing to the hunt for treatments and vaccines. The ongoing COVID-19 pandemic poses one of the greatest global threats in modern history and has already caused severe social and economic costs. The development of efficient and rapid sequencing methods to reconstruct the genomic sequence of SARS-CoV-2, the etiological agent of COVID-19, has been fundamental for the design of diagnostic molecular tests and to devise effective measures and strategies to mitigate the diffusion of the pandemic. Diverse approaches and sequencing methods can, as testified by the number of available sequences, be applied to SARS-CoV-2 genomes. However, each technology and sequencing approach has its own advantages and limitations. In the current review, we will provide a brief, but hopefully comprehensive, account of currently available platforms and methodological approaches for the sequencing of SARS-CoV-2 genomes. We also present an outline of current repositories and databases that provide access to SARS-CoV-2 genomic data and associated metadata. Finally, we offer general advice and guidelines for the appropriate sharing and deposition of SARS-CoV-2 data and metadata, and suggest that more efficient and standardized integration of current and future SARS-CoV-2-related data would greatly facilitate the struggle against this new pathogen. We hope that our 'vademecum' for the production and handling of SARS-CoV-2-related sequencing data, will contribute to this objective.

Entities: Chemical Disease Gene Mutation Species

Keywords: COVID-19; SARS-CoV-2; data deposition; data integration; omics data; sequencing technologies

Mesh：

Year: 2021 PMID： 33279989 PMCID： PMC7799330 DOI： 10.1093/bib/bbaa297

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Introduction

In January 2020, a novel betacoronavirus, subsequently designated SARS-CoV-2, was identified as the etiological agent of a cluster of pneumonia cases in Wuhan City, Hubei Province, China [1-4]. COVID-19 (coronavirus disease 2019), the disease caused by the infection of this novel pathogen, spread rapidly and on the 11 March 2020, with 118 000 cases reported from 110 countries, the World Health Organization (WHO) declared a pandemic [5]. At the time of writing (25 September 2020), COVID-19 has affected more than 200 countries worldwide, with more than 33 Million confirmed individual infections and a death toll of about 1 Million, posing the greatest global health and socioeconomic threat since World War II [6]. SARS-CoV-2 is primarily transmitted between humans through respiratory droplets and physical contact [7], although some airborne transmission seems probable [8]. The incubation period ranges between 2 and 14 days, but longer intervals have been reported [9]. Fever, dry-cough and general fatigue are the most common symptoms. Less common symptoms include muscle pain, nasal congestion, runny nose, sore throat and diarrhea [10, 11]. A minority of patients develop pneumonia, severe acute respiratory syndrome and/or kidney failure [12, 13]. Estimated fatality rates vary greatly between countries [14], probably due to differences in testing strategies, demographic factors [15, 16], background comorbidities and other factors. While the pandemic has prompted an unprecedented global effort to find therapeutic targets and develop treatments and vaccines [17, 18], to date, decisive remedies are lacking. The first complete genomic sequences of the novel betacoronavirus were obtained in late December 2019 through metatranscriptomics approaches, supplemented by PCR and Sanger sequencing [2-4]. The availability of a reference genome assembly facilitated the development of diagnostic tests based on real time PCR [19]. SARS-CoV-2 falls into the severe acute respiratory syndrome-related coronavirus (SARSr-CoV) group defined by the International Committee on Taxonomy of Viruses (ICTV) [20]. Along with numerous isolates from bats and other mammals, the SARSr-CoV group contains SARS-CoV-1, the causal agent of a large epidemic of viral pneumonia (Severe Acute Respiratory Syndrome, SARS) that affected China and 25 other countries in 2003 and 2004 [21]. Phylogenetic analyses demonstrate that SARS-CoV-1 and SARS-CoV-2 are relatively distantly related, and that their spill-over into humans were distinct events [22]. The positive sense RNA genome of SARS-CoV-2 is approximately 30 000 nt long, and shows the highest levels of genome identity (96%) with a SARSr-CoV (denoted RaTG13) isolated from a bat in the Yunnan province of China [2]. The recent isolation of SARSr-CoVs closely related to SARS-CoV-2 (genome identity 91%) from Malayan pangolins, illegally imported into China, indicates that many similar coronaviruses circulate among mammals [23, 24]. Indeed, various studies have suggested ‘intermediate’ hosts in the zoonotic process [25], although the exact chain of events that allowed SARS-CoV-2 to acquire the molecular features required for human to human transmission remains unclear [26]. Further environmental sampling and meta-transcriptomic sequencing will be required to conclusively resolve these issues. The arrangement of the SARS-CoV-2 genome is not atypical. The replicase gene, which consists of two long, overlapping open reading frames, ORF1a and ORF1b [27, 28] occupies the two thirds of the genome at the 5′ end. ORF1a is translated to polyprotein 1a (pp1a), while the polyprotein 1b (pp1ab) is generated by −1 ribosomal frameshifting [29]. These polyproteins are subsequently processed into 16 nonstructural proteins (nsps), required for viral genome replication and transcription (Figure 1A). The 3′ terminal end of the genome encodes four structural proteins required for the assembly of the viral capsid, and six other accessory proteins which are less well characterized and are not universally conserved among coronaviruses (Figure 1B).

Figure 1

Architecture of the genome of SARS-CoV-2. (A) SARS-CoV-2 genome structure. Labels indicate gene names. The red circle indicates the TRS-L. The lower panel depicts the nsps derived from processing of the pp1a and pp1ab polyproteins. (B) sgmRNAs. Dotted lines are used to link the TRS-L with the body of each individual sgmRNA. The specific gene product, obtained from each individual sgmRNA is indicated by the colored boxes and the corresponding labels. These genes are transcribed through a complex mechanism of discontinuous transcription that generates a set of nested sub-genomic transcripts, called sub-genomic mRNAs (sgmRNAs). Antisense RNAs whose synthesis is prematurely terminated at specific transcription regulatory sequences (TRSs) upstream of each of the accessory genes are directed to continue synthesis of the complement of the 67–72 nt ‘leader’ at the extreme 5′ end of the positive sense genomic RNA. Transcription of these negative sense sgmRNAs results in positive sense sgmRNAs which are 5′ and 3′ coterminal with the genome sequence. Discontinuous transcription is mediated by sequence identity between a donor RNA (body, TRS-B) and hairpin structures present in the acceptor RNA (leader, Transcription Regulatory Leader Sequence [TRS-L]) and is probably modulated by long-distance RNA–RNA interactions (see also Figure 1B). For a complete review of coronavirus replication and transcription mechanisms, we refer readers to [27, 28]. Recent experience with emerging infectious diseases, such as SARS, MERS, Zika and Ebola has demonstrated that NGS technologies represent powerful tools for tracing origins, spread and transmission chains of outbreaks, as well as for monitoring the evolution of the etiological agents [30-34]. Accordingly, the COVID-19 pandemic has triggered unprecedented efforts for the development of effective real-time surveillance strategies based on sequencing of the genome of its causative agent [35-40] with more than 100 000 complete or near complete SARS-CoV-2 having been deposited in dedicated repositories such as EpiCov [41] and others [35, 42]. These data have already fostered several studies on the evolutionary dynamics of the virus, and the identification of variants of potential clinical relevance [43-45]. A critical need for consistent handling, labeling and deposition of sequence data has become apparent, given our incomplete understanding of the complexity of virus replication and gene expression, the possibility of RNA modifications of either RNA strand during replication or transcription, and, not least, to facilitate access to coherent and relevant metadata. These challenges can only be addressed through shared and coordinated efforts [46]. While data standards represent a recurring theme in the ‘omics’ era [47-49], in the case of SARS-CoV-2 the need to guarantee straightforward, unrestricted and rapid access to large volumes of processed and, in many cases, raw molecular data are unprecedented. This review provides a brief, but hopefully comprehensive summary of state of the art for NGS applications in SARS-CoV-2 genomics. Along with detailed descriptions of currently available sequencing approaches, we present an overview of the repositories and databases that provide access to SARS-CoV-2 genomic data and metadata, together with general advice for their correct sharing and deposition. By offering a clear and detailed vademecum for the production and handling of COVID-19-related sequencing data, and a detailed picture of the state of the art, we hope to contribute to more efficient and informative curation, integration and exploitation of SARS-CoV-2 sequencing data and metadata.

High-throughput sequencing for COVID-19 pandemic

Sample collection

Available SARS-CoV-2 sequence data derive mainly from clinical diagnostic samples, with high viral loads that permit the extraction of enough RNA for the sequencing and reconstruction of complete or nearly complete viral genomes. The WHO (Interim guidance; [50]) lists several types of clinical specimens that can be collected for laboratory diagnosis of COVID-19 [51], mostly deriving from the upper or lower respiratory tract. Some studies report that specimens from the lower respiratory tract may contain a higher viral load than those from the upper respiratory tract (see [51] and references therein). However, during the course of infection, the viral load changes dynamically between different respiratory districts as well as between respiratory and non-respiratory tissues [52-57]. SARS-CoV-2 genome assemblies have also been obtained from non-respiratory clinical specimens including urine and feces (see Supplementary Table S1 available online at https://academic.oup.com/bib). However, to our knowledge, they have not, until now, been generated from blood or serum, probably due to the low viral loads associated with these samples [58]. Viral genetic material can also be isolated from the supernatant of infected cell lines, but viral populations grown in cell lines often accumulate novel genetic variants during laboratory passage [59], and show relevant differences in the composition of viral quasi-species with respect to matched clinical samples for both SARS-CoV-2 [60] and SARS-CoV-1 [61]. These factors have profound implications for the study of viral evolution and the suitability of laboratory-adapted viruses in downstream applications. A very limited number of complete/nearly complete SARS-CoV-2 genomes have been obtained from environmental specimens, such as wastewater, air samples and undefined ‘environmental swabs.’ In these cases, the choice of the sequencing strategy and technology is greatly influenced by the low viral load and the consequent scarcity and poor quality of viral RNA [62-64]. Specific protocols for the sequencing of SARS-CoV-2 from wastewater are of emerging importance for epidemiological studies [65, 66] and can be used not only as a proxy to monitor viral prevalence in a population but also for genotyping the predominant genomic variant circulating in a specific geographical area [63]. While not exhaustive, Supplementary Table S1, available online at https://academic.oup.com/bib, lists the isolation source of the 23 791 SARS-CoV-2 genome sequences available in the NCBI virus database [67] (on 25 September 2020). It is evident that clinical respiratory specimens predominate but, for many entries, the isolation source is not mentioned or insufficiently/unclearly described, underlining the widespread incompleteness of metadata associated with viral genomes (see also Data Deposition and Access).

RNA extraction

A schematic of the common wet-lab workflow used for SARS-CoV-2 RNA extraction is represented in Supplementary Figure S1, available online at https://academic.oup.com/bib. Viral RNA extraction requires biosafety level (BSL) 2 laboratories. RNA can be extracted and purified from clinical specimens, cultured isolates or environmental samples, using any of a large variety of commercially available kits for total RNA extraction or enrichment of viral RNA (see Supplementary Table S2 available online at https://academic.oup.com/bib). Standard methodologies include the usage of Guanidine salt, which inhibits nucleases, ensuring viral RNA is not degraded, and of phenol, to denature and dissolve protein, effectively inactivating the virus. Viral RNA extraction protocols usually recommend the addition of carrier RNA, such as poly-A RNA, to increase RNA recovery. While the presence of carrier RNA does not affect SARS-CoV-2 genome sequencing methods based on amplicon or hybrid-capture, it may notably bias metatranscriptomic methods (as described below). Its use should thus be carefully evaluated. Alternatively, addition of linear polyacrylamide to the lysis buffer has been proposed for viral RNA extractions [68]. During or after RNA extraction, a DNase treatment is also recommended, especially for metatranscriptomic library preparations. RNA can be qualitatively analyzed with the Agilent 2100 Bioanalyzer, using a high sensitivity RNA assay (RNA 6000 Pico Kit), quantified by NanoDrop spectrophotometers (ThermoFisher) or Qubit Fluorometer (ThermoFisher) and stored at −80°C until use. Before sequencing, the presence and quantity of SARS-CoV-2 RNA can be evaluated using qRT-PCR targeting one or more viral genes (i.e. RdRp, orf1ab, E and N [69]) providing Ct (threshold cycle) values for each target. Ct values are inversely correlated with the viral load in the sample (i.e. the lower the Ct value, the higher the viral title) and their interpretation is specific to each amplicon.

Sequencing strategies

NGS sequencing technologies have rapidly become the method of choice for various applications in virology, including the identification of novel viruses from metagenomic samples [70], the reconstruction of complete or nearly complete viral genome sequences [71], and the analysis of viral evolution and quasispecies [72] (see [73] for a recent review). One of the most relevant advantages of NGS-based approaches is that full-length viral genomes can be reconstructed even for unknown or poorly characterized viruses, starting either from culture-enriched viral preparations, or directly from clinical samples. In the case of SARS-CoV-2, both second and third generation of NGS technologies have been successfully applied, and several specific library preparation protocols have developed independently by different manufacturers [74-78]. The final objectives of the project and the type of biological sample at hand are key considerations informing the choice of the most appropriate sequencing strategy. The type of sample (e.g. clinical specimens, environmental samples, infected cultured cells), viral load (often related to the sample source), RNA extraction procedure, RNA quality, requirements for parallelization/automation and other considerations must all be reconciled with the experimental objectives (investigation of inter- or intra-sample variations of the viral genome, study of the viral and host transcriptome and epitranscriptome, single cell studies, etc.). To date, four conceptually different approaches have been applied: (i) shotgun metatranscriptomics, (ii) hybrid capture-enrichment, (iii) amplicon sequencing and (iv) direct RNA sequencing (Table 1). In the following sections, we will discuss the merits and limitations of each of these strategies and their application using different sequencing platforms.

Table 1

Characteristics of SARS-CoV-2 sequencing approaches

	Shotgun metatranscriptomics	Amplicon-based	Hybrid capture-enrichment	Direct RNA sequencing^a
Goals	SARS-CoV-2, host microbiota, and host response to infection	SARS-CoV-2 genome	SARS-CoV-2 genome	SARS-CoV-2 and host transcriptome and epitranscriptome
Co-infection detection	Yes	No	No/yes (depending on gene panel)	Yes
Minimum number of reads	20–50 M	5–20 M	5–20 M	0.5 M
Genome Coverage	≥99%	≥95–99%	≥95–99%	≥99%
Accuracy in SNV identification	High	High	Moderate	Low
Sample viral load (Ct) requested (ref Xiao)	<24–28	≥24–28	≥24–28	<24–28
Sample RNA input (ng)	10–200	1–50	10–50	≥1000
Sample type	Patient specimens	Patient specimens, environmental samples	Patient specimens, environmental samples	Viral cell cultures
Cost	High	Low	Moderate	High
NGS sequencing platforms	High- or ultra high-throughput platforms	Mid-throughput platforms	Mid- or high-throughput platforms	ONT

aOnly 1 dataset from direct RNA sequencing is currently available in public repositories (Kim et al. [95])

Characteristics of SARS-CoV-2 sequencing approaches aOnly 1 dataset from direct RNA sequencing is currently available in public repositories (Kim et al. [95])

Shotgun metatranscriptomics

Shotgun metagenomics sequencing is a culture-independent technique that can interrogate all of the DNA in a sample, allowing the characterization of complex communities of microorganisms, without any prior knowledge of their genome sequences [79]. Metagenomic sequencing is an extremely powerful tool for the identification of previously uncharacterized pathogens, see [80, 81] for a recent review. By offering detailed and quantitative information on the composition of microbial communities, this approach also provides added value in clinical microbiology where it can be used to inform therapeutic strategies. Shotgun metatranscriptomics—saturation RNA sequencing—has been successfully applied to obtain complete or nearly complete assemblies of the genome of SARS-CoV-2 from several types of clinical samples. Since metagenomics/metatranscriptomics can also identify other viral and bacterial DNA/RNAs, these methods can also provide information regarding secondary infections, potentially informing treatment decisions and predicting patient outcomes. Moreover, since metatranscriptomics can recover host transcripts from infected epithelial and activated immune cells, this approach can provide an accurate snapshot of immune system reaction in patients, potentially informing studies of virus–host interactions [82, 83] and even facilitate limited genotyping of patients. Most RNA sequencing protocols were originally developed to monitor host gene expression and employ either enrichment of the poly(A) + RNA fraction, or depletion of host rRNA. Full length SARS-CoV-2 genomes and mature transcripts are polyadenylated [27, 28] and can thus be enriched using poly(T) oligonucleotides. However, such approaches may be less appropriate if the characterization and (potentially), the quantification of negative-strand intermediates in coronavirus transcription and genome replication are experimental objectives. In such cases, the adoption of strand-specific RNA-seq libraries should be considered. A typical workflow consists of RNA fragmentation, first- and second-strand cDNA synthesis, and library preparation according to the NGS technology of choice. Supplementary Table S3, available online at https://academic.oup.com/bib, reports a selection of protocols and NGS platforms that have been used for metatranscriptomic SARS-CoV-2 sequencing. While most studies have employed the Illumina platform, the Oxford Nanopore Technology (ONT) has been also exploited for shotgun metatranscriptomics [84], through modification of a protocol designed for influenza viruses from clinical samples [85]. A sequence-independent single-primer amplification (SISPA) step [68, 86] is employed, to meet the requirement for ≥1 μg of cDNA for ONT library preparation. Notwithstanding possible biases introduced by SISPA, this approach allows rapid generation of complete SARS-CoV-2 genome assemblies, even from low amounts of RNA [84]. The Pacific Bioscience (PacBio) technology is also suitable for shotgun metatranscriptomics of SARS-Cov-2, although its use has been limited to date (e.g. [23]). The shotgun metatranscriptomics approach was employed in the discovery of SARS-CoV-2 [2-4] and is, in many senses, the method of choice for sequencing emerging SARS-CoV-2 strains. It requires no prior knowledge of the viral sequence, and avoids potential effects of divergent regions on capture and amplicon approaches. In principle, other than the viral genome, viral subgenomic RNAs (derived from discontinuous transcription), possible post-transcriptional modifications and, depending on the library preparation workflow, negative-strand intermediates can all be studied with shotgun metagenomics. Moreover, when adequate levels of genome coverage are obtained, in addition to some insight into host gene expression, this approach can provide an accurate evaluation of intra-sample virus variants, from quasispecies or coinfections, and, as previously mentioned, allow insights into host gene expression patterns during infection. The major limitation of shotgun metatranscriptomics is the requirement for a high viral load to obtain complete virus assemblies. Moreover, compared to targeted enrichment based approaches, a substantially higher sequencing depth (>2 G bases) is required. Viral load shows enormous variation in clinical specimens due to variation in sampling technique as well as from inherent differences in load between patients. The proportion of reads derived from SARS-CoV-2 can vary greatly between samples, even where viral loads (as measured by Ct values) are similar [82, 83, 87]. High coverage can be easily obtained from viral cell cultures, prepared by infecting cell cultures with viruses derived from clinical samples. However, this last approach is time consuming, labor-intensive, requires access to a BSL3 laboratory environment (https://www.cdc.gov/coronavirus/2019-ncov/lab/lab-biosafety-guidelines.html) and carries the risk of identifying variants of questionable physiological origin (see previous section on sample collection).

Amplicon-based sequencing

Amplicon sequencing enables researchers to restrict the scope of their analysis only to a limited number/type of sequences of choice. This approach is highly specific, but requires significant a priori knowledge of the sequence that is to be ‘targeted.’ Diagnostic RT-PCR tests for the detection of SARS-CoV-2 nucleic acids from clinical specimens, which are based on very specific primers for the amplification of discrete regions of the genome of the virus, could be considered a specialized form of amplicon sequencing. Amplicon-based approaches for the sequencing of SARS-CoV-2 adopt an enrichment workflow consisting of first-strand cDNA synthesis followed by genome amplification with multiplex PCRs. The objective is to produce pools of amplicons that cover either the entire length or the discrete portions of the viral genome (see Supplementary Table S3 available online at https://academic.oup.com/bib). Several different multiplex PCR designs, differing in the number and size of amplicons, have been proposed for SARS-CoV-2. Amplicon sequencing is highly specific and robust to low amounts of RNA and degraded samples, and less sequencing is required with respect to the metatranscriptomic approach since non-viral reads are rare. While amplicon sequencing is theoretically convenient and cheap, it presents some limitations which should be considered. Firstly, because of differences in primer efficiency, or possible variants in the primer annealing regions, amplification across the genome can be biased, with decreased coverage in specific genomic regions (see V1 version of ARTIC protocol [88, 89]) and/or 3′ and 5′ UTRs regions missed altogether (see Supplementary Table S3 available online at https://academic.oup.com/bib) leading to an incomplete assembly. Moreover, since the primers are designed on the reference SARS-CoV-2 genome sequence, this approach may not identify large structural variants and can present systematic limitations in the presence of high levels of genomic divergence. While the amplicon-based approach is highly dependable for the reconstruction of the most prevalent genome variant in a viral population, a recent study suggests that it provides highly biased representation of minor allele frequencies with respect to that derived from metatranscriptomics experiments performed on the same samples [87]. Several commercial kits and non-commercial protocols are available for SARS-CoV-2 amplicon preparation, some of which are tailored to particular NGS platforms (see Supplementary Table S3, available online at https://academic.oup.com/bib, in the Additional Supporting File). Since sequencing depth is a marginal consideration, libraries can be sequenced on benchtop platforms with a mid-throughput (i.e. Illumina NextSeq and Miseq; Ion torrent platforms, etc.). Additionally, when combined with the short turn-around times of Single Molecule Sequencing (SMS) technologies such as ONT and PacBio, amplicon sequencing of SARS-CoV-2 can be used for rapid surveillance of transmission chains, as exemplified by the approach adopted by the ARTIC network for real-time monitoring of the COVID-19 outbreak in the United Kingdom [35], where a fast, amplicon-based protocol successfully applied to previous viral outbreaks (see https://artic.network/ncov-2019 for a complete list of the protocols and methods) has been adapted to SARS-CoV-2. Wang et al. [90] established a rapid in house tiling multiplex PCR protocol for the simultaneous detection and sequencing of several respiratory viruses which includes a large part of the SARS-CoV-2 genome. The Wang protocol has also been suggested for diagnostic usage as it shows higher sensitivity than approved RT-qPCR tests [90]. While several SARS-Cov-2 genome sequencing protocols using tiled amplicons are available for the PacBio platform (see https://www.pacb.com/research-focus/microbiology/COVID-19-sequencing-tools-and-resources/), to our knowledge they have been scarcely used until now, although a major study of the introduction and spread of SARS-CoV-2 in the New York City area used both the PacBio and the Illumina technologies [37]. The robustness of amplicon-sequencing to degraded and low concentrations of RNA is evident from studies of environmental specimens, where this approach is followed by sequencing with Ion torrent [62] or ONT [63] for wastewater samples, and by Sanger sequencing for a patient breathing air sample and for a door handle swab ([64] and J.A.Lednicky, personal communication).

Hybrid capture-enrichment sequencing

Similar to amplicon-based sequencing, hybrid capture is a sequencing strategy that enables researchers to target only predefined sequences or regions of a genome that are relevant to their specific interests. Target-enrichment strategies using hybrid capture were originally developed for human genomic studies, to enable the rapid and cost-effective sequencing of the exons of protein coding genes (exome sequencing) [91]. Exome sequencing is still considered the method of choice for the study of genetic variation in protein coding loci in humans [92], as it achieves a good trade-off between the specificity of amplicon based enrichment, and the sensitivity (to different types of genetic variants) of shotgun sequencing at significantly lower costs. Hybrid capture enriches targeted genetic material through hybridization to specific biotinylated probes, allowing a considerably reduced sequencing depth compared with shotgun metatranscriptomics. Libraries can be sequenced on benchtop platforms (Illumina NextSeq and Miseq, Ion torrent, etc.). In general hybrid capture-enrichment methods are based on a larger number of fragments/probes than amplicon-based methods (see Supplementary Table S3 available online at https://academic.oup.com/bib), and provide more complete profiling of the target sequences. Moreover, since the capture of target regions is less dependent on perfect complementarity than PCR-amplicon generation, capture by hybridization is generally more robust to genomic variability. While one hand, Xiao et al. [87] found that hybrid capture sequencing is less sensitive than amplicon-based methods for the sequencing of SARS-CoV-2 genomes, and did not recommend its application for challenging samples with low viral loads, in other studies enrichment by hybridization has been successful even for samples with very low viral loads [93]. Capture-based methods may also offer unbiased representation of intra-sample variants. Xiao et al. [87] reported high levels of concordance between allele frequency distributions estimated by shotgun metatranscriptomics and/or hybrid capture on the same sample. The SARS-CoV-2 genome capture enrichment workflow developed by Illumina is noteworthy as it includes probes for the simultaneous detection of SARS-CoV-2 and other respiratory viruses (see Supplementary Table S3, available online at https://academic.oup.com/bib, of Additional Supporting Material).

Direct RNA sequencing

The aforementioned strategies all require retrotranscription of RNA, and a greater or lesser degree of manipulation of nucleic acids prior to library construction, and can result in the loss of information, including post-transcriptional modifications and accurate representation of the stoichiometry of the transcripts. SMS is a relatively recent development in sequencing technologies, allowing the direct determination of the sequence of single nucleic acid molecules, without amplification and, in some cases (e.g. direct RNA sequencing by ONT), retrotranscription. SMS technologies usually provide longer reads than ‘classic’ NGS methods, but with reportedly higher error rates [94]. A direct RNA sequencing protocol setup by ONT potentially permits the detection of post transcriptional modifications (see the following section). Additionally, by virtue of the long reads, these technologies are able to provide very accurate reconstructions of single mature and precursor transcripts, and of complex transcriptional patterns, such as those taking place during coronavirus infection (recombination, alternative transcript maturation, rare transcriptional isoforms, etc.). In a recent study, Kim et al. [95] applied ONT direct RNA sequencing with DNA nanoball sequencing, to obtain a complete representation of the SARS-CoV-2 transcriptome and epitranscriptome (see section below), using RNA from SARS-CoV-2-infected cultures and from SARS-CoV-2 RNA fragments produced by in vitro transcription.

SARS-CoV-2 transcriptome and epitranscriptome

Current large-scale SARS-CoV-2 transcriptome investigations, mostly based on ONT direct RNA sequencing and DNA nanoball sequencing, have confirmed that transcription in SARS-CoV-2 is a discontinuous and highly controlled process (Figure 1B), in which a template switch during the synthesis of subgenomic negative-strand RNA adds a copy of the leader sequence [27, 28, 95]. Counting RNA-seq reads spanning template switch sites allows quantification of individual sgmRNAs [96]. Bulk and single cell RNA-seq data from infected human cell lines have revealed hierarchies of viral and host gene expression through time that appear to be linked to innate antiviral responses [96]. Epitranscriptome modifications, including transient changes such as N6-methyladenosine (m6A) and 5-methylcytosine (5mC) or non-transient changes such as RNA editing, may play relevant roles in host–virus interactions [97, 98]. ONT direct RNA sequencing of SARS-CoV-2 infected Vero cells revealed an ‘AAGAA-like’ motif enriched the 3′ region of the viral genome, which is strongly associated with probable post transcriptional modifications [95]. Putative post transcriptional modifications are more frequent in longer viral transcripts and are associated with shorter poly(A) tails, indicating an involvement in the control of viral RNA stability [95]. Consistent patterns of 5mC have been detected in HCoV-229E infected cells by ONT direct RNA sequencing [99]. RNA-seq and metatranscriptome sequencing of SARS-CoV-2 infected cell lines and clinical samples have shown strong signatures of A-to-I and C-to-U RNA editing, likely mediated by ADAR and APOBEC enzymes, respectively [100, 101]. Interestingly, computational analyses of RNAseq data from infected human cell lines detected A-to-I hyper-edited regions, distributed along the entire viral genome and responsible for multiple nonsynonymous changes [101].

Data analysis, deposition and access

Guidelines for the generation of SARS-CoV-2 genome assemblies

Since the genome of SARS-CoV-2 is relatively compact in size, and does not contain any large repetitive sequence, the assembly of the viral genome is per se a relatively straightforward process. Provided that the results of the sequencing reaction offer a complete and accurate representation of the genome, any state of the art method for the assembly of NGS data—based on Overlap Layout Consensus, de Bruijn graphs or, in general on reference based assembly—see [102, 103] for an up-to date review—should be capable of producing highly contiguous and accurate assemblies. Since 30x theoretical coverage of the genome is generally considered sufficient to generate high-quality assembly, SARS-CoV-2 genomes should be tractable with as little as a Megabase of sequencing data. However, depending on the sequencing platform and most importantly on the sequencing strategy, different considerations may apply. In principle, data obtained from targeted-enrichment-based library preparations methods, such as hybrid capture and amplicon sequencing, should be highly enriched for viral genomic reads. This notwithstanding, variable levels of ‘contaminant’ sequences, have been reported [104]. Moreover (see below) these strategies often generate dishomogeneous genome coverage—which can confound several assemblers. Data derived from metagenomics sequencing protocols tend to provide more uniform coverage, but variable proportions of viral reads can be obtained depending (although not linearly) on the viral load of the sample. Moreover, (see above) these data might also contain reads derived from viral subgenomic RNAs and replication intermediates. Although highly efficient software tools for the assembly of metagenomics reads are currently available [105], in general, this process is considerably more complex and computationally intensive than the assembly of a single genome and can be confounded by several factors, including the relative abundance of different species/transcripts in the sample. For these reasons, we strongly suggest that filtering of ‘non-viral’ reads should be performed prior to assembly, a process that can also be beneficial in the assembly of reads derived from targeted sequencing approaches. Simple similarity filters can be applied by mapping the complete collection of reads against the reference genome assembly of SARS-CoV-2 and retaining only SARS-CoV-2-like reads. However to avoid the systematic loss of reads at polymorphic loci, relatively relaxed similarity filters should be implemented. For shotgun metagenomic libraries, prior alignment of SARS-CoV-2 like reads to the reference genome can be useful also for the identification and filtering of reads or pairs of reads derived from subgenomic mRNA, by excluding reads with discontiguous mapping or read pairs mapping at an aberrant (with respect to the insert size) distance on the genome. In a similar vein, filtration of PCR duplicates can be a useful approach to obtain a more uniform coverage profile of the genome, particularly for libraries derived from targeted enrichment. If the aim of the study is to obtain an accurate representation of the genomic sequence of a novel strain of SARS-CoV-2, de-novo assembly should always be preferred to reference-guided assembly methods, as this type of approach is in general more sensitive to possible (although unlikely) large-scale rearrangements events [106, 107]. However, reference-guided approaches, or approaches based on variant calling may provide a clear advantage if the objective is of the study is to obtain a fine grained representation of the viral population in a sample, including rare variants, or the study of viral quasi-species. In such cases, a vcf file reporting the occurrence and the frequency of all the genetic variants observed in a sample is probably the most relevant type of output file that should be provided/generated. In this respect, it should be noted that in the presence of co-infection by more than one viral strain, de-novo assembly of viral genomes based on short—second generation—NGS reads cannot provide an accurate reconstruction of the different viral haplotypes. While, by virtue of a longer read size, in principle this should be possible when long SMS sequencing reads are available.

Currently available resources and guidelines for data deposition

Particularly during the current pandemic, timely deposition of available information and straightforward access to open data are essential and enabling elements for implementing effective mitigation strategies, supporting pharmaceutical and vaccine development, and understanding the disease and its effects [46, 108]. Careful curation and deposition of SARS-CoV-2 sequencing data and associated metadata has profound implications for both epidemiological studies and in enabling extensive association studies and, in future, follow-up studies [109]. While the first half of 2020 has seen a boom in the release of COVID-19 related scientific manuscripts, questions have been raised concerning the quantity and quality of data sharing [110, 111]. However, the pandemic has also seen a renewed effort by open-data-aware scientific entities and communities towards the dissemination of best practices and recommendations for COVID-19 data sharing (e.g. [112]), analysis (e.g. [113]), and for the effective coordination of national scientific infrastructures (e.g. [46]). At present, the GISAID [41] EpiCov portal represents the most widely used repository of SARS-CoV-2 genomic data. It provides a collection of over 100 000 complete SARS-CoV-2 genomes, isolated from over 80 countries (data collected on 25 September 2020). Limited metadata, including the type of sample, the sequencing technology and sequencing protocols are associated with each viral genome, and basic clinical annotations, i.e. the patient status (e.g. hospitalized or released), are available for a subset of ~5000 genomes. Other potentially important patient information (e.g. gender, age) are not collected systematically. Although data in EpiCov are publicly accessible, users must register and agree not to redistribute data to third parties, data use is limited to research purposes, raw sequencing data cannot be deposited, and programmatic access is not available. For these reasons, we welcome recommendations, such as those from the Research Data Alliance [112], that, in addition to GISAID, SARS-CoV-2 genomes and sequencing data should be submitted to repositories more compliant with FAIR principles [47]. In particular, raw and processed viral sequence data should be made available in one of the International Nucleotide Sequence Database Collaboration (INSDC) [114] repositories. Gene expression data should be deposited to ArrayExpress [115] or Gene Expression Omnibus [116], while the EGA [117] and GWAS Catalog [118] should be the choice for genome association data. We underline the fact that human genetic data must always be managed in compliance with applicable laws and regulations and, where possible, made available through dedicated secure repositories such as EGA and dbGAP [119]. It should also be noted that, for all omics data types, careful adherence to relevant metadata standards is essential for maximizing the utility and future reusability of datasets [120].

Development and reporting of computational methods

As for the reporting and availability of raw data and metadata, the reproducibility of bioinformatics analyses and workflows constitutes a crucial issue in modern biology [121]. For this reason, we highly recommend that all the tools and workflows used in the analysis of COVID-19 data should be made readily available through dedicated infrastructures and repositories. In this respect, the set of best practices and principles outlined in [122] represents an excellent guideline for software developers and bioinformaticians working in the development and application of software tools for COVID-19 data. However, these considerations extend to the analyses of clinical microbiology data in general. Highly curated catalogs of bioinformatics software and applications, such as https://bio.tools/ [123], represent important resources for the discovery and advertising of novel bioinformatics methods. Moreover, the usage of well-established workflow managers, as for example those provided by the Galaxy platform [124] or the Microreact [125] portal can foster collaborative analysis of data and the development of standard operative protocols and pipelines. Finally, deposition of software tools and methods in specialized repositories, specifically developed for the COVID-19 community, for example the OpenAIRE COVID-19 gateway [126], link relevant expertise and know-how and can greatly improve the discussion within the COVID-19 bioinformatics community, further facilitating the development of new software and methods. All in all, analogously to the situation for sharing and integration of data and metadata, a wealth of repositories and platforms are already available for sharing and integrating software tools and methods. We strongly believe that the promotion of best practice in software development and usage will be critical in the fight against COVID-19.

Secondary analysis of the data and specialized repositories

Notwithstanding relevant limitations of the type and extent of data that are shared at different levels by the SARS-CoV-2 research community [127], and the requirement for a more thorough and systematic sharing of primary data, many dedicated computational infrastructures have been established to facilitate access and retrieval of COVID-19 omics data. By allowing a smooth integration of different types of data, these platforms have greatly facilitated the execution of complex meta-analyses including the monitoring of adaptive evolution in the genome of SARS-CoV-2 and a fine grained control of the prevalence of different viral strains in different geographic regions. In this respect, the system for the identification of emerging mutations in the S protein of SARS-CoV-2 developed Korber et al. [44] is probably one of the most remarkable examples. In brief, by monitoring the prevalence of different missense substitutions in the S protein of SARS-CoV-2, the authors have observed a systematic increase in the prevalence of a specific amino acid substitution, D614G, at the regional level in distinct geographic locations. Retrospective analyses of viral loads, as measured by Ct values, indicated a relatively modest, but statically significant increase of viral loads (decrease in Ct) in patients infected by viruses carrying the D614G haplotype. This suggests a likely association of this variant with an increased infectivity. However, no relevant differences were observed in the severity of symptoms manifested by the patients. While a detailed discussion of the functional relevance of the D614G substitution lies outside the scope of this review (we refer readers to [45] for a more detailed discussion), we would like to underline the importance of this and similar approaches for the generation of testable biological hypothesis and the monitoring of the evolution of SARS-CoV-2. The Nextstrain [128] and the Hyphy COVID-19 [129] portals are further notable examples of highly flexible and interactive systems for the real time monitoring of the evolution SARS-CoV-2 strains. By providing real time information of the worldwide distribution of different clades and lineages of SARS-CoV-2 (Nexstrain), and detailed phylogenetic analyses of SARS-CoV-2 protein coding genes (Hyphy), these systems provide, respectively, a one shop stop for the monitoring the prevalence of SARS-CoV-2 strains worldwide and the identification of amino acid residues that are possibly under selection. In this respect, we underscore that any initiative aiming to apply well established standards and protocols for the sharing of SARS-CoV-2 genetic/genomic data, like for example the application or modification of the Beacon [130] protocol, as available from [131] should be fully supported by the SARS-CoV-2 research community. Finally, we stress the importance of developing highly curated resources and databases to allow the seamless integration of different types of data/and or the execution of complex queries, which could represent an important added value for data mining and meta-analyses, as exemplified by [132]. By allowing the seamless and rapid integration of different types of data and metadata, these and similar resources can—at least in part—mitigate some of the most important limitations for a rapid and widespread access to the COVID-19 data. Summary statistics of methods applied in the sequencing of SARS-CoV-2 Data are related from records in INSDC public databases, for which an associated genome assembly is available The EBI COVID-19 Data Portal (https://www.covid19dataportal.org/) and the equivalent SARS-CoV-2 resource portal at the NCBI (https://www.ncbi.nlm.nih.gov/sars-cov-2/) probably, provide the most complete catalog of resources to navigate, access and retrieve SARS-CoV-2 data from open access repositories, including bioinformatics tools and online resources. The Vipr portal [133], an integrated system that facilitates the retrieval of SARS-CoV-2 genomic sequence data and provides access to a set of sophisticated tools for the execution of detailed comparative genomic analyses. COV3D [134] is a centralized resource for spike and other coronavirus protein structures, which provides effective and yet simple tools for the visualization of protein structures, along with the annotation of relevant functional elements or genomic variants. The Galaxy Europe server [135] incorporates a highly curated collection of tools and expert-made workflows for the analysis of COVID-19 data, along with pointers to many relevant datasets. Portals and resources for the sharing COVID-19-related knowledge are not limited to bioinformatics methods and applications, but also include sites that disseminate wet lab and sequencing protocols. The open source protocols.io portal (https://www.protocols.io/) provides access to a collection of more than 150 wet-lab and in-silico protocols, for the generation, handling and deposition of SARS-CoV-2 data in public repositories. Similar initiatives at the national level, e.g. the COVID-19 Genomics UK Consortium page (https://www.cogconsortium.uk), and COVID-19 Data Portal Sweden (https://www.covid19dataportal.se/), or made available by Research Infrastructures, such as the ELIXIR COVID-19 support page (https://elixir-europe.org/services/covid-19), provide pointers to a wealth of resources including guidelines, protocols, best practices, data analysis tools and computational platforms. Similarly, a detailed list of lab protocols, bioinformatics methods and primary repositories of SARS-CoV-2 sequencing data is also provided through publicly accessible Github repository by the US Centers for Disease Control and Prevention (https://github.com/CDCgov/SARS-CoV-2_Sequencing).

Data integration and exploratory analyses of currently available data

Although the aforementioned resources provide access to a wealth of sequencing data and metadata for SARS-CoV-2, their integration is not straightforward. Overview of the properties of different approaches for SARS-CoV-2 genome sequencing. (A) Violin plot of the size of SARS-CoV-2 genome assemblies obtained through different sequencing approaches. Assembly size in Knt (Kilonucleotides), is reported on the x-axis. (B) Violin plot of the sequencing depth (log10 of the total number of sequenced bases) obtained by different sequencing approaches. (C) Profile of normalized coverage levels of the genome of SARS-CoV-2 as obtained from different sequencing approaches. Coverage profiles were calculated on 300 non-overlapping genomic windows of 100 nt in size. A subset of 100 distinct records as available from public repositories of raw sequencing data has been considered to estimate the coverage profile of every sequencing approach. Coverage values were normalized by using the upper quartile normalization, and averaged for every data point (genomic window). Exploratory analyses of currently available genomic sequences, as obtained from three of the most popular resources for SARS-CoV-2 genome data: COG-UK [35], GISAID EpiCoV [41] and the NCBI virus portal [67], highlight apparent inconsistencies between databases. For example, analyses of strain identifiers and available metadata suggests that, of the more than 100 000 genomes currently available in GISAID EpiCoV, 22 599 are derived from the COG-UK database. However, these assemblies do not represent the entirety of COG-UK, which currently contains over 48 000 sequences. Similarly, only about 10% (1695 out of 17 106) of the genomic assemblies contained in the NCBI virus database can be linked directly or indirectly (through strain identifiers, or BioSample metadata) to sequences also deposited at GISAID EpiCoV. At present, even establishing the levels of overlap between data stored at different repositories is challenging. Currently, INSDC repositories collectively provide access to more than 65 000 distinct depositions of raw sequencing data for SARS-CoV-2. Of these, 43 577 can be/are associated with a genome assembly. As outlined in Table 2, the majority of the raw sequencing data records (37 279) have been deposited by the COG-Consortium, and are the result of the application of the ARTIC amplicon protocol, combined with either Illumina (21 142) or Nanopore (16 137) sequencing. The remaining data offer a more unbiased representation of the approaches to the sequencing of SARS-CoV-2, and include metatranscriptomics libraries (1987 distinct depositions), amplicon libraries (3843) and a small number (468) of libraries based on hybrid capture protocols.

Table 2

Summary statistics of methods applied in the sequencing of SARS-CoV-2

Library preparation	Sequencing technology	Records	Notes
Amplicon	Illumina	24 311	21 142 from COG-UK (ARTIC)
	Oxford Nanopore	16 811	16 137 from COG-UK (ARTIC)
Hybrid capture	Illumina	468
Metatranscriptomics	Illumina	1987

Data are related from records in INSDC public databases, for which an associated genome assembly is available

Although these data provide an incomplete representation of sequencing protocols and strategies, visualization of their respective outputs and the completeness of associated genomic assemblies offers some relevant observations. As outlined in Figure 2A, metatranscriptomics and hybrid capture approaches seem to provide—on average—more complete representations of the SARS-CoV-2 genome. For amplicon-based sequencing, ONT assemblies tend to be slightly more complete than those obtained from Illumina sequencing technologies. As shown in Figure 2B, the quantity of data generated by each sequencing approach for which raw data depositions are available, is in line with expectations, and—metatranscriptomics sequencing datasets typically contain in the order of 10x more reads than those from targeted sequencing approaches. Interestingly, metatranscriptomics libraries show a highly uniform profile of genome coverage (Figure 2C), although a considerable reduction in coverage is observed at both ends of the genome, and in particular at the 3′ UTR where 53% of assemblies are incomplete. Hybrid-capture based methods also provide relatively uniform and reproducible coverage. Finally, amplicon-based approaches provide a generally more skewed coverage of the genome, with spikes in coverage corresponding with the overlaps between different amplicons.

Figure 2

Overview of the properties of different approaches for SARS-CoV-2 genome sequencing. (A) Violin plot of the size of SARS-CoV-2 genome assemblies obtained through different sequencing approaches. Assembly size in Knt (Kilonucleotides), is reported on the x-axis. (B) Violin plot of the sequencing depth (log10 of the total number of sequenced bases) obtained by different sequencing approaches. (C) Profile of normalized coverage levels of the genome of SARS-CoV-2 as obtained from different sequencing approaches. Coverage profiles were calculated on 300 non-overlapping genomic windows of 100 nt in size. A subset of 100 distinct records as available from public repositories of raw sequencing data has been considered to estimate the coverage profile of every sequencing approach. Coverage values were normalized by using the upper quartile normalization, and averaged for every data point (genomic window).

Conclusions

In the last decades, significant policy attention has focused on the need to identify and limit emerging outbreaks that might lead to pandemics and to expand and sustain investment to build preparedness and health capacity [136-139]. In this context, ultra-rapid and cost-effective methods for the reconstruction of the genomic sequences of emerging pathogens represent important tools for monitoring and countering the spread of novel human infectious diseases, as exemplified by recent experience with SARS, MERS, Zika and Ebola [30-34]. NGS methods have been rapidly adapted to the SARS-CoV-2 paradigm and shown to be applicable to a wide variety of associated biological questions [35, 69, 82, 88, 90, 93, 99]. The rate of data production and analysis has been unprecedented and would have been inconceivable only a few years ago. In just a few months, genome sequence data have allowed reconstruction of the probable time of spillover of SARS-CoV-2 into the human population [140-142], the development of systems for the classification of viral strains which have been fundamental for monitoring the spread of the virus [41, 140, 142], and for the identification of sites in the genome of SARS-CoV-2 that might be under the influence of various selective pressures [129, 143]. High-throughput transcriptomics has provided novel mechanistic insights into SARS-CoV-2 gene expression, the stoichiometry of their gene products, and possible molecular mechanisms—including post transcriptional modifications—of regulation of gene viral gene expression [95, 100, 101]. Several authors have already highlighted genetic variants in the genome of SARS-CoV-2 that could possibly be linked with increased/decreased virulence or possible adaptation to human hosts [43-45]. The integration of host and virus genome-wide variant information, ideally with other clinical, demographic and social parameters might provide both mechanistic hints and add predictive value for clinical outcomes. However, substantial numbers of individuals need to be incorporated in association studies to obtain the required statistical power. Indeed, notwithstanding, some remarkable initiatives [144, 145] at present few large scale association studies on COVID-19 have been presented. Here, we have attempted to provide a concise summary of the relative merits and applications of different sequencing strategies and platforms for SARS-CoV-2-related applications, emphasizing the considerations that should be borne in mind when establishing an experimental pipeline. A wealth of databases and resources providing access to SARS-CoV-2 sequence data are already available. However, to maximize their utility, associated raw data and metadata (which must be as extensive as possible, presented in standard formats and ideally available through FAIR compliant databases) are critical elements. Importantly, highly curated resources for the secondary analysis of the data and the integration of different types of metadata are already available, which can greatly facilitate the execution of complex meta-analyses, and/or retrospective cross-sectional studies. The challenge of fully exploiting this ongoing deluge of COVID-19-related sequence data lies ahead. It is clear that an equally unprecedented widespread acceptance of data standards will be required to fully capitalize on the productivity that has already been attained in data production. Availability and integration of (in many cases) publicly funded data are fundamental for open science and the progress of humanity at the best of times, but time is currently short. Winter is coming. The application of ‘omics technologies’ to SARS-CoV-2 have been fundamental in epidemiological and other aspects of the fight against COVID-19. Different approaches, with different advantages and limitations, can be applied to the sequencing of SARS-CoV-2 genomes. Various considerations should influence the choice of approach in different clinical and research contexts. While more than 100 thousand complete SARS-CoV-2 genomes are currently available in public repositories, the integration of these data and of associated metadata is, at present, problematic. Coordinated efforts are required to promote the principles of open science and data sharing in order to facilitate more efficient and comprehensive analyses of SARS-CoV-2 data. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.

43 in total

1. Case Report: Diagnosis of Acute Q Fever With Aseptic Meningitis in a Patient by Using Metagenomic Next-Generation Sequencing.

Authors: Meifeng Gu; Xiaoqin Mo; Zhenchu Tang; Jianguang Tang; Wei Wang
Journal: Front Med (Lausanne) Date: 2022-05-18

2. Real-time allelic assays of SARS-CoV-2 variants to enhance sewage surveillance.

Authors: Xiaoqing Xu; Yu Deng; Jiahui Ding; Xiawan Zheng; Shuxian Li; Lei Liu; Ho-Kwong Chui; Leo L M Poon; Tong Zhang
Journal: Water Res Date: 2022-05-29 Impact factor: 13.400

3. Emergence of omicron variant's sublineages BA.4 and BA.5: risks assessment and possible countermeasures.

Authors: Manish Dhawan; AbdulRahman A Saied; Talha Bin Emran; Om Prakash Choudhary
Journal: New Microbes New Infect Date: 2022-06-17

4. Combination of Isothermal Recombinase-Aided Amplification and CRISPR-Cas12a-Mediated Assay for Rapid Detection of Major Severe Acute Respiratory Syndrome Coronavirus 2 Variants of Concern.

Authors: Hongqing Lin; Yuanhao Liang; Lirong Zou; Baisheng Li; Jianhui Zhao; Haiying Wang; Jiufeng Sun; Xiaoling Deng; Shixing Tang
Journal: Front Microbiol Date: 2022-06-28 Impact factor: 6.064

5. Receptor-Binding-Motif-Targeted Sanger Sequencing: a Quick and Cost-Effective Strategy for Molecular Surveillance of SARS-CoV-2 Variants.

Authors: Sankar Prasad Chaki; Melissa M Kahl-McDonagh; Benjamin W Neuman; Kurt A Zuelke
Journal: Microbiol Spectr Date: 2022-05-31

6. SARS-CoV-2 Whole-Genome Sequencing by Ion S5 Technology-Challenges, Protocol Optimization and Success Rates for Different Strains.

Authors: Maria Szargut; Sandra Cytacka; Karol Serwin; Anna Urbańska; Romain Gastineau; Miłosz Parczewski; Andrzej Ossowski
Journal: Viruses Date: 2022-06-06 Impact factor: 5.818

Review 7. Advanced Molecular and Immunological Diagnostic Methods to Detect SARS-CoV-2 Infection.

Authors: John Charles Rotondo; Fernanda Martini; Martina Maritati; Elisabetta Caselli; Carla Enrica Gallenga; Matteo Guarino; Roberto De Giorgio; Chiara Mazziotta; Maria Letizia Tramarin; Giada Badiale; Mauro Tognon; Carlo Contini
Journal: Microorganisms Date: 2022-06-10

Review 8. Science's Response to CoVID-19.

Authors: Marcus J C Long; Yimon Aye
Journal: ChemMedChem Date: 2021-06-22 Impact factor: 3.540

9. Highly Sensitive Lineage Discrimination of SARS-CoV-2 Variants through Allele-Specific Probe PCR.

Authors: Jeremy Ratcliff; Farah Al-Beidh; Sagida Bibi; David Bonsall; Sue Ann Costa Clemens; Lise Estcourt; Amy Evans; Matthew Fish; Pedro M Folegatti; Anthony C Gordon; Cecilia Jay; Aislinn Jennings; Emma Laing; Teresa Lambe; George MacIntyre-Cockett; David Menon; Paul R Mouncey; Dung Nguyen; Andrew J Pollard; Maheshi N Ramasamy; David J Roberts; Kathryn M Rowan; Jennifer Rynne; Manu Shankar-Hari; Sarah Williams; Heli Harvala; Tanya Golubchik; Peter Simmonds
Journal: J Clin Microbiol Date: 2022-03-24 Impact factor: 11.677

10. Detection of SARS-CoV-2 variants requires urgent global coordination.

Authors: Carlos M Duarte; Tahira Jamil; Takashi Gojobori; Intikhab Alam
Journal: Int J Infect Dis Date: 2021-06-17 Impact factor: 3.623