| Literature DB >> 33416890 |
Tao Hu1, Juan Li1, Hong Zhou1, Cixiu Li1, Edward C Holmes2, Weifeng Shi1.
Abstract
In early January 2020, the novel coronavirus (SARS-CoV-2) responsible for a pneumonia outbreak in Wuhan, China, was identified using next-generation sequencing (NGS) and readily available bioinformatics pipelines. In addition to virus discovery, these NGS technologies and bioinformatics resources are currently being employed for ongoing genomic surveillance of SARS-CoV-2 worldwide, tracking its spread, evolution and patterns of variation on a global scale. In this review, we summarize the bioinformatics resources used for the discovery and surveillance of SARS-CoV-2. We also discuss the advantages and disadvantages of these bioinformatics resources and highlight areas where additional technical developments are urgently needed. Solutions to these problems will be beneficial not only to the prevention and control of the current COVID-19 pandemic but also to infectious disease outbreaks of the future.Entities:
Keywords: COVID-19; SARS-CoV-2; bioinformatics; next-generation sequencing; pathogen discovery; phylogenetic analysis
Mesh:
Year: 2021 PMID: 33416890 PMCID: PMC7929396 DOI: 10.1093/bib/bbaa386
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1The workflow of different NGS sequencing approaches currently available for virus discovery and genomic surveillance. The library construction scheme employed in (A) metatranscriptomic sequencing, (B) a hybrid capture-based approach based on a metatranscriptomic library, (C) multiplex PCR amplification for NGS platforms and (D) the Oxford Nanopore sequencing platform.
Figure 2A schematic workflow and the bioinformatics resources used in novel virus discovery. Each key step in the workflow is shown with different backgrounds. Computational tools used in the SARS-CoV-2 discovery by our group are colored in orange.
Summary of the available bioinformatics resources for SARS-CoV-2 discovery and genomic surveillance
| Databases and software | URL | Reference |
|---|---|---|
| Data quality control | ||
| Trimmomatic |
| [ |
| Cutadapt |
| [ |
| SOAPnuke |
| [ |
| AfterQC |
| [ |
| Fastp* |
| [ |
| Cut_Multi_Primer.py |
| - |
| NanoPack |
| [ |
| Porechop |
| - |
| Read mapping | ||
| Hisat2 |
| [ |
| BWA |
| [ |
| Bowtie2* |
| [ |
| KMA |
| [ |
| SortmeRNA |
| [ |
| Minimap2 |
| [ |
| NGMLR |
| [ |
| MarginAlign |
| [ |
|
| ||
| Trinity* |
| [ |
| Megahit |
| [ |
| SPAdes |
| [ |
| Trans-ABySS |
| [ |
| PEHaplo |
| [ |
| SAVAGE |
| [ |
| coronaSPAdes |
| [ |
| Blast | ||
| Diamond* |
| [ |
| Blastn* | ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST | [ |
| Phyre2 |
| [ |
| Canu |
| [ |
| Falcon |
| - |
| Miniasm |
| [ |
| Genome visualization | ||
| IGV |
| [ |
| Geneious* |
| - |
| QUAST |
| [ |
| SEQMAN |
| - |
| Database | ||
| GISAID* |
| [ |
| NCBI* |
| [ |
| CNCB/NGDC database |
| [ |
| Genome Warehouse (GWH) |
| - |
| Virus Pathogen Resource (ViPR) |
| |
| Sequence alignment | ||
| CLUSTALW |
| [ |
| MAFFT* |
| [ |
| MUSCLE |
| [ |
| T-Coffee |
| [ |
| ProbCons |
| [ |
| PRANK |
| [ |
| Bali-Phy |
| [ |
| StatAlign |
| [ |
| JABAWS |
| [ |
| EMBL-EBI |
| [ |
| webPRANK |
| [ |
| Jalview |
| [ |
| MSAViewer |
| [ |
| AliView |
| [ |
| Bioedit* |
| [ |
| Phylogenetic analysis | ||
| jMODELTEST |
| [ |
| ProtTest |
| [ |
| TempEst |
| [ |
| BIONJ |
| [ |
| PhyML |
| [ |
| RAxML* |
| [ |
| IQ-TREE |
| [ |
| MrBayes |
| [ |
| PhyloBayes |
| [ |
| BEAST1 |
| [ |
| BEAST2 |
| [ |
| PAUP |
| [ |
| MEGA |
| [ |
| PhyloSuite |
| [ |
| Tree visualization | ||
| Dendroscope |
| [ |
| FigTree* |
| - |
| ggtree |
| [ |
| iTOL |
| [ |
| Evolview |
| [ |
| Genomic analysis | ||
| Pangolin COVID-19 Lineage Assigner |
| - |
| Nextstrain analysis platform* |
| [ |
| Conserved Domain Database* |
| [ |
| UCSC |
| - |
| GFF2PS |
| [ |
| Vectro NTI |
| [ |
| IBS |
| [ |
| PHYLIP |
| [ |
| SimPlot* |
| [ |
| RDP |
| [ |
| Swiss-Model program* |
| [ |
| PyMOL* |
| [ |
*Computer programs used by us in the discovery of SARS-CoV-2 [2, 118].