| Literature DB >> 33279989 |
Matteo Chiara1, Anna Maria D'Erchia2, Carmela Gissi2, Caterina Manzari3, Antonio Parisi4, Nicoletta Resta5, Federico Zambelli1, Ernesto Picardi6, Giulio Pavesi7, David S Horner1, Graziano Pesole8.
Abstract
Various next generation sequencing (NGS) based strategies have been successfully used in the recent past for tracing origins and understanding the evolution of infectious agents, investigating the spread and transmission chains of outbreaks, as well as facilitating the development of effective and rapid molecular diagnostic tests and contributing to the hunt for treatments and vaccines. The ongoing COVID-19 pandemic poses one of the greatest global threats in modern history and has already caused severe social and economic costs. The development of efficient and rapid sequencing methods to reconstruct the genomic sequence of SARS-CoV-2, the etiological agent of COVID-19, has been fundamental for the design of diagnostic molecular tests and to devise effective measures and strategies to mitigate the diffusion of the pandemic. Diverse approaches and sequencing methods can, as testified by the number of available sequences, be applied to SARS-CoV-2 genomes. However, each technology and sequencing approach has its own advantages and limitations. In the current review, we will provide a brief, but hopefully comprehensive, account of currently available platforms and methodological approaches for the sequencing of SARS-CoV-2 genomes. We also present an outline of current repositories and databases that provide access to SARS-CoV-2 genomic data and associated metadata. Finally, we offer general advice and guidelines for the appropriate sharing and deposition of SARS-CoV-2 data and metadata, and suggest that more efficient and standardized integration of current and future SARS-CoV-2-related data would greatly facilitate the struggle against this new pathogen. We hope that our 'vademecum' for the production and handling of SARS-CoV-2-related sequencing data, will contribute to this objective.Entities:
Keywords: COVID-19; SARS-CoV-2; data deposition; data integration; omics data; sequencing technologies
Mesh:
Year: 2021 PMID: 33279989 PMCID: PMC7799330 DOI: 10.1093/bib/bbaa297
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Architecture of the genome of SARS-CoV-2. (A) SARS-CoV-2 genome structure. Labels indicate gene names. The red circle indicates the TRS-L. The lower panel depicts the nsps derived from processing of the pp1a and pp1ab polyproteins. (B) sgmRNAs. Dotted lines are used to link the TRS-L with the body of each individual sgmRNA. The specific gene product, obtained from each individual sgmRNA is indicated by the colored boxes and the corresponding labels.
Characteristics of SARS-CoV-2 sequencing approaches
| Shotgun metatranscriptomics | Amplicon-based | Hybrid capture-enrichment | Direct RNA sequencing | |
|---|---|---|---|---|
| Goals | SARS-CoV-2, host microbiota, and host response to infection | SARS-CoV-2 genome | SARS-CoV-2 genome | SARS-CoV-2 and host transcriptome and epitranscriptome |
| Co-infection detection | Yes | No | No/yes (depending on gene panel) | Yes |
| Minimum number of reads | 20–50 M | 5–20 M | 5–20 M | 0.5 M |
| Genome Coverage | ≥99% | ≥95–99% | ≥95–99% | ≥99% |
| Accuracy in SNV identification | High | High | Moderate | Low |
| Sample viral load (Ct) requested (ref Xiao) | <24–28 | ≥24–28 | ≥24–28 | <24–28 |
| Sample RNA input (ng) | 10–200 | 1–50 | 10–50 | ≥1000 |
| Sample type | Patient specimens | Patient specimens, environmental samples | Patient specimens, environmental samples | Viral cell cultures |
| Cost | High | Low | Moderate | High |
| NGS sequencing platforms | High- or ultra high-throughput platforms | Mid-throughput platforms | Mid- or high-throughput platforms | ONT |
aOnly 1 dataset from direct RNA sequencing is currently available in public repositories (Kim et al. [95])
Summary statistics of methods applied in the sequencing of SARS-CoV-2
| Library preparation | Sequencing technology | Records | Notes |
|---|---|---|---|
| Amplicon | Illumina | 24 311 | 21 142 from COG-UK (ARTIC) |
| Oxford Nanopore | 16 811 | 16 137 from COG-UK (ARTIC) | |
| Hybrid capture | Illumina | 468 | |
| Metatranscriptomics | Illumina | 1987 |
Data are related from records in INSDC public databases, for which an associated genome assembly is available
Figure 2Overview of the properties of different approaches for SARS-CoV-2 genome sequencing. (A) Violin plot of the size of SARS-CoV-2 genome assemblies obtained through different sequencing approaches. Assembly size in Knt (Kilonucleotides), is reported on the x-axis. (B) Violin plot of the sequencing depth (log10 of the total number of sequenced bases) obtained by different sequencing approaches. (C) Profile of normalized coverage levels of the genome of SARS-CoV-2 as obtained from different sequencing approaches. Coverage profiles were calculated on 300 non-overlapping genomic windows of 100 nt in size. A subset of 100 distinct records as available from public repositories of raw sequencing data has been considered to estimate the coverage profile of every sequencing approach. Coverage values were normalized by using the upper quartile normalization, and averaged for every data point (genomic window).