| Literature DB >> 33349792 |
Manisha Ray1, Mukund Namdev Sable2, Saurav Sarkar3, Vinaykumar Hallur3.
Abstract
The currently emerging pathogen SARS-CoV-2 has produced the global pandemic crisis by causing COVID-19. The unique and novel genetic makeup of SARS-CoV-2 has created hurdles in biological research, due to which the potential drug/vaccine candidates have not yet been discovered by the scientific community. Meanwhile, the advantages of bioinformatics in viral research had created a milestone since last few decades. The exploitation of bioinformatics tools and techniques has successfully interpreted this viral genomics architecture. Some major in silico studies involving next-generation sequencing, genome-wide association studies, computer-aided drug design etc. have been effectively applied in COVID-19 research methodologies and discovered novel information on SARS-CoV-2 in several ways. Nowadays the implementation of in silico studies in COVID-19 research has not only sequenced the SARS-CoV-2 genome but also properly analyzed the sequencing errors, evolutionary relationship, genetic variations, putative drug candidates against SARS-CoV-2 viral genes etc. within a very short time period. These would be very needful towards further research on COVID-19 pandemic and essential for vaccine development against SARS-CoV-2 which will save public health.Entities:
Keywords: Bioinformatics; COVID-19; Drug design; Genome wide association study; Next generation sequencing; SARS-CoV-2
Year: 2020 PMID: 33349792 PMCID: PMC7744275 DOI: 10.1016/j.mgene.2020.100844
Source DB: PubMed Journal: Meta Gene ISSN: 2214-5400
Application of metagenomics in different experimental studies on SARS-CoV-2.
| Author and publication year | Objectives of the study | Sequencing platform | Findings |
|---|---|---|---|
| Studied on SARS-CoV-2 epidemic, laboratory-confirmed positive and negative samples from Seattle, Washington | Illumina | Betacoronavirus of Bats are the closely related species of SARS-CoV-2 Colonization with human parainfluenza virus 3 with SARS-CoV-2 | |
| Investigated two pneumonia patients who developed acute respiratory syndromes after independent contact history with Wuhan sea food market | Illumina Miseq | 2019-nCoV was closely related to strains bat-SL-CoVZXC21 and bat-SL-CoVZC45 at ORF1a, S, and N genes Identified presence of SARS-CoV-2 from pneumonia patients No other pathogens were identified from the infected sample | |
| Quick characterization of Cambodia's first case of COVID-2019 | iSeq100 Illumina | All human SARS-CoV-2 genomes are very similar, including the SARS-CoV-2 genome from the Cambodian case SNP was noted at position 25,654 in ORF3a resulting in a valine-to-leucine substitution | |
| Isolation of other pathogen co-infections in people with COVID-19 | Illumina MiSeq | Several nonsynonymous substitutions in the obtained genomes SARS-CoV-2 SARS-CoV-2 co-infection with rhinovirus | |
| Tsan-Yuk- | Identification of any intermediate host for SARS-CoV-2 infection transmission to human | Illumina HiSeq | Malayan pangolin associated coronaviruses belong to sub lineages of SARS-CoV-2 with strong similarity in the receptor binding domain to SARS-CoV-2 Pangolins should be considered as possible hosts in the emergence of new coronaviruses |
| Examined close matches to the severe acute respiratory syndrome coronavirus 2 | NA | Similar viral sequence found in pangolin lung which hypothesized pangolin as the intermediate host for infection | |
| Investigated temporal transcriptional activity of SARS-CoV-2 and its association with longitudinal faecal microbiome alterations in patients with COVID-19 | Illumina NextSeq 550 | Faecal samples with signature of high SARS-CoV-2 infectivity had higher abundances of bacterial species Collinsella aerofaciens, Collinsella tanakaei, Streptococcus infantis, Morganella morganii |
Basic Bioinformatics Databases/Tools useful in COVID19 Next Generation Sequencing Data Analysis (Meta Genomics and Whole Genome Sequencing).
| Databases/Tools | Applications | References |
|---|---|---|
| Sequence Read Archive (SRA) Database ( | It is the largest publicly available repository of high throughput sequencing data, stores raw sequencing data and alignment information. | |
| European Nucleotide Archive (ENA) ( | Provides a comprehensive record on DNA and RNA raw sequencing and assembly data. | |
| Metagenomics | ||
| FastQC ( | Used to check quality control on raw sequences generated from high throughput sequencing pipelines. | |
| Cutadapt ( | Used to clean the sequences. It finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from the high-throughput sequencing reads. | |
| Qiime ( | An open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. It interprets demultiplexing and quality filtering, OTU picking, taxonomic assignment, and phylogenetic reconstruction, and diversity analyses and visualizations through command lines. | |
| Whole genome sequencing | ||
| FastQC ( | Used to check quality control on raw sequences generated from high throughput sequencing pipelines. | |
| Cutadapt ( | Used to clean the sequences. It finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from the high-throughput sequencing reads. | |
| MaSuRCA ( | Genome Assembler | |
| Ragout ( | A reference assisted assembly tool. Records contigs to create high quality scaffolds by using a genome rearrangement approach and multiple closely related genome references as a guide. | |
| Prokka ( | Rapid annotation of prokaryotic genomes. | |
| AUGUSTUS ( | A tool to predict genes in eukaryote genome sequences. | |
Whole genome sequencing (WGS) of SARS-CoV-2 strains in different COVID19 research studies.
| Author and Publication Year | Objectives of the Study | Platform | Findings |
|---|---|---|---|
| Whole genome sequencing of SARS-CoV-2 specimen isolated from COVID-19 patients of Nepal | Illumina miSeq | Identical sequence between BetaCoV/Nepal/61/2020 and 2019-nCoV WHU01 Silent mutations at coding region of Spike, ORF1a, ORF1b and ORF8b proteins | |
| Characterization of SARS-CoV-2 sequences isolated from India with travel history of China | Illumina miniseq | Sequence heterogeneity with in SARS-CoV-2 globally Mutations in Spike protein B and T cell epitope prediction on Spike protein | |
| Characterization of SARS-CoV-2 genome, isolated from Japan with travel history of Egypt | Illumina | Observed close lineage and single nucleotide variations in genomic isolates | |
| Whole genome sequencing and analysis of SARS-CoV-2 isolated from Malaysia | Illumina iseq | Unique mutations 16 nucleotide substitution in Malaysian strain 4 unique nucleotide substitution in nonstructural genes of SARS-CoV-2 | |
| To describe the first isolation and sequencing of SARS-CoV-2 in Australia and rapid sharing of the isolate | Oxford Nanopore Technologies and Illumina short-read | >99.9% of sequence identity between BetaCoV/Australia/VIC01/2020 and publicly available SARS-CoV-2 genomes SNPs and nucleotide deletions in 3’UTR |
Interpretation of genome wide association studies (GWAS) for characterization of SARS-CoV-2 genomes.
| Author and Publication Year | Objective | Findings |
|---|---|---|
| Understand the genomic structure and variations in SARS-CoV-2 complete genome sequences | 116 mutations found 3 most common mutations: 8782C > T in ORF1ab, 28,144 T > C in ORF8 and 29095C > T in N gene | |
| Identification of potential genetic factors involved in the development of Covid-19 | Analyzed 8,582,968 SNPs A3p2131 gene cluster as a genetic susceptibility locus in COVID-19 patients Potential involvement of ABO blood group | |
| Identification of Genetic variation associated with COVID-19 severity | Nucleotide variation at genomic position 11,083 Variation in 11083G in symptomatic patients 11,083 T variant in asymptomatic patient miR-485-3p, miR-539-3p, miR-3149 differentially target the variants | |
| Elucidation of Nucleotide polymorphisms in whole genome sequences of SARS-CoV-2 | SNPs in S (22224G, 22,224 T) and N (28792G, 28792C) protein of Indian and Nepal species respectively Less case fatality rate in India and Nepal | |
| Investigate and track SARS-CoV-2 in Iranian COVID-19 patients | Iranian isolates are closely related to Wuhan reference sequence No polymorphism found in assesses regions of nsp-2, nsp-12, Spike | |
| Investigation on source of origin of this novel coronavirus | Wuhan-Hu-1 genome showed evolutionary relationship with Bat CoV RaTG13 genome sequence with 96.12% sequence similarity | |
| Highlight the similarities and changes observed in the submitted Indian viral strains | Novel non-synonymous mutation C > T (NSP3) 14408C > t (RNA primase), 23403A > G (S), 3037C > T (NSP3 synonymous) in genes of SARS-CoV-2 Indian strain. | |
| Analyse the evolution and variation of SARS-CoV-2 during the epidemic starting at the end of 2019 | SARS-CoV-2 belonged to the Sarbecovirus subgenus of Beta coronavirus, Beta CoV/Bat/Yunnan/RaTG13/2013,bat-SL-CoVZC45, bat-SL-CoVZXC21 and SARS-CoV No positive time evolution signal between SARS-CoV-2 and BetaCoV/bat/Yunnan/RaTG13/2013 | |
| Investigate bats and pangolin as hosts in SARS-CoV-2 cross-species transmission | SARS-like-CoV-2 strains that infected pangolin and bats are close to SARS-CoV-2 Pangolin has yet lower ACE2 evolutionary divergence with humans and more diverged from bat |
List of researches reported on in silico drug design (CADD) against viral proteins of SARS-CoV-2.
| Author and Publication Year | Objective of the Study | Target Protein | Findings |
|---|---|---|---|
| identification of potential inhibitors from Cinnamon against main protease and spike glycoprotein of SARS CoV-2 | Mpro and Spike | Tenufolin (TEN) and Pavetannin C1 (PAV) are hit compounds against Mpro and Spike protein | |
| Identification of effective inhibors against Spike glycoprotein and 3CL protease of SARS-CoV-2 | Spike and 3CL Pro | Zanamivir, Indinavir, Saquinavir, and Remdesivir show potential inhibitory effects on S and 3CLvPRO | |
| Selection of potential molecules that can target viral spike proteins | Spike protein | Raltegravir have a relatively high binding score against S protein Forsythiae fructus and Isatidis radix herbs are widely used for treating Covid-19 | |
| Studied the effects of Chloroquine and Hydroxychloroquine for treating Covid-19 | Spike Protein | CLQ, CLQ-OH inhibits the binding of viral S protein with gangliosides binding site | |
| Screening of small molecules to bind ACE2 specific RBD on Spike glycoprotein of SARS-CoV-2 | Spike protein | Glycyrrhizic Acid of plant origin may be repurposed for SARS-CoV-2 intervention | |
| Docking-based screening from approved drugs and compounds undergoing clinical trials, against three SARS-CoV-2 target proteins | Spike, M pro, Papain like protease | Prlatrexete, Carumonam, Aclerasteride, Granotapide (S protein), Tiracizine (PL Pro), Ritonavir (M pro) are the effectives compounds and drugs processed under clinical triels | |
| Virtual screening of phytochemicals against viral proteins of SARS-CoV-2 | Spike, Mpro, 3CL pro, PL pro, ACE2, RdRp | Glycyrrhizic acid, limonin, 7-deacetyl-7-benzoylgedunin, maslinic acid, corosolic acid, obacunone and ursolic acid effective against the target proteins of SARS-CoV-2 | |
| Structure-based drug designing | Spike glycoprotein, M pro, ACE2 | Zanamivir and Lopinavir showed stronger binding affinity against S protein and M pro respectively | |
| Homology assisted identification of inhibitor against RNA binding domain of N protein | Nucleocapsid protein | Theophylline and pyrimidone derivatives are possible inhibitors | |
| Potential drug compound identification against Covid-19 | Nucleocapsid protein | Glycyrrhizic acid and Theaflavin natural compound showed best binding energy against N protein | |
| Identify potential drug candidates against SARS-CoV-2 structural proteins | Membrane, Envelope and Nucleocapsid protein | Rutin against envelope protein Caffeic acid and ferulic acid against membrane protein Simeprevir and grazoprevir against N protein | |
| Stabilization of non-native Protein-Protein Interactions (PPIs) of the nucleocapsid protein for inhibit viral replication in SARS-CoV-2 | Nucleocapsid Protein | Catechin might be used to stabilize PPIs of N protein | |
| Detection of inhibitors of SARS-CoV-2 ion channel to control covid-19 | Envelope protein | Belachinal, Macaflavanone E & Vibsanol B showed inhibitory effects for envelope protein ion channel | |
| Screening of flavonoinds against 3CL pro of SARS-CoV-2 | 3CL pro | Baicalin showed an effective inhibitory activity against SARS-CoV-2 3CLpro | |
| Inhibitors screening and drug discovery against main protease (Mpro) of SARS-CoV-2 | Mpro | Lopinavir-Ritonavir, Tipranavir, and Raltegravir show the best molecular interaction with the main protease of SARS-CoV-2 |
Basic Bioinformatics Databases/Tools useful for COVID19 genomics research.
| Databases/ Tools | Application | References |
|---|---|---|
| GEO (Gene Expression Omnibus) database ( | It is a repository of functional genomics data generated from experiments and stores curate gene expression profiles. | |
| NCBI Gene database ( | Repository of gene related information from a wide range of species. | |
| UCSC genome Browser ( | Broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing and downloading genomic data. | |
| UniProt ( | Resource of protein sequence and functional information | |
| CD (Conserved Domain) Search ( | Conserved domain search through multiple and pair wise sequence alignments. | |
| DAVID (Database for Annotation, Visualization and Integrated Discovery) | Functional annotation of genes (Biological process, Molecular function, Cellular component) | |
| KEGG (Kyoto Encyclopaedia of Genes and Genome) | Metabolic pathway analysis | |
| Discovery of Single Nucleotide Polymorphisms | ||
| dbSNP ( | A crucial repository for each single base nucleotide substitutions and quick deletion and insertion polymorphisms | |
| SIFT ( | Predicts effects of an amino acid substitution on protein function based on sequence homology and the physical properties of amino acids. | |
| PredictSNP1 ( | Consensus classifier for prediction of disease related amino acid mutations. | |
| PredictSNP2 ( | Platform for prediction of effects of SNPs in genomic region. | |
| PolyPhen2 ( | Predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. | |
| PROVEAN ( | Predicts impact of an amino acid substitution or indel on the biological function of a protein. | |
| SNAP2 ( | Predicts functional effects of sequence variants. | |
| Phylogenetic Analysis | ||
| MEGA (Molecular Evolutionary Genetics Analysis) ( | Multiple sequence alignment, phylogenetic tree generation and statistical analyses. | |
| Reconstruct and analyse phylogenetic relationships between molecular sequences. | ||
| PAUP ( | Reconstruct and analyse phylogenetic relationships between molecular sequences using parsimony method. | |
| DnaSP ( | Analyse DNA polymorphisms using data from a single locus, and also generate haplotype diversity between the sequences. | |
| PopArt ( | Population genetic software which visualizes haplotype diversity network. | |
| Primer Design | ||
| Primer3 ( | Primer design, often in high-throughput genomics applications. | |
| NCBI Primer-Blast ( | Design new target-specific primers in one step as well as to check the specificity of pre-existing primers and also placing primers based on exon/intron locations and excluding single nucleotide polymorphism (SNP) sites in primers. | |
Basic Bioinformatics Databases/Tools useful for COVID19 In silico drug design.
| Databases/ Tools | Application | References |
|---|---|---|
| BLAST (Basic local alignment search tool) ( | Used for local similarity between sequences by comparing nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. | |
| PDB (Protein databank) ( | Protein three dimensional structure database, it conation information about the 3D shapes of proteins, nucleic acids, and complex assemblies. | |
| PubChem ( | Chemical structure database, contains information on chemical compounds including name, molecular formula, chemical and physical properties, biological activities, toxic effects, literatures etc. | |
| Drug Bank ( | Drugbank contains information on FDA approved drugs and drug targets. It is a both bioinformatics and chemoinformatics resource. | |
| Modeller ( | Used for homology or comparative modeling of protein three-dimensional structures by aligning query sequence with known structure. | |
| AutoDock ( | Molecular docking between protein and ligand (small compounds) molecules. | |
| Autodockvina ( | An open source for molecular docking and it significantly improves the average accuracy of the binding mode predictions compared to AutoDock 4. | |
| Zdock ( | An automatic protein docking online server, which simply interprets the protein structures. | |
| SwissDock ( | A web service to predict the molecular interactions between a target protein and a small molecule. | |
| PatchDock ( | A simple molecular docking algorithm based on shape complementarity principles. | |
| Glide ( | It offers the full range of speed vs. accuracy options, from the high-throughput virtual screening mode for efficiently enriching million compound libraries for reliably docking tens to hundreds of thousands of ligand with high accuracy, advanced scoring, and higher enrichment of results. | |
| PyMol ( | Molecular structure visualization and editing tool. | |
| Discovery Studio Visualizer ( | Structure visualization, and analysis of 3D molecules. | |
| UCSF Chimera ( | Visualization and analysis of molecular structures and related data, including density maps, trajectories, and sequence alignments. Also used for energy minimization of molecules. | |
| Open Babel ( | A chemical toolbox designed to search, convert file format, analyse, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas. | |
| Gromacs ( | Molecular dynamics simulation tool | |
| NAMD ( | Parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. | |
| VMD ( | Molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. |
Fig. 1The graphical representation of interconnected bioinformatics applications implemented in COVID-19 research.