Literature DB >> 33349792

Essential interpretations of bioinformatics in COVID-19 pandemic.

Manisha Ray¹, Mukund Namdev Sable², Saurav Sarkar³, Vinaykumar Hallur³.

Abstract

The currently emerging pathogen SARS-CoV-2 has produced the global pandemic crisis by causing COVID-19. The unique and novel genetic makeup of SARS-CoV-2 has created hurdles in biological research, due to which the potential drug/vaccine candidates have not yet been discovered by the scientific community. Meanwhile, the advantages of bioinformatics in viral research had created a milestone since last few decades. The exploitation of bioinformatics tools and techniques has successfully interpreted this viral genomics architecture. Some major in silico studies involving next-generation sequencing, genome-wide association studies, computer-aided drug design etc. have been effectively applied in COVID-19 research methodologies and discovered novel information on SARS-CoV-2 in several ways. Nowadays the implementation of in silico studies in COVID-19 research has not only sequenced the SARS-CoV-2 genome but also properly analyzed the sequencing errors, evolutionary relationship, genetic variations, putative drug candidates against SARS-CoV-2 viral genes etc. within a very short time period. These would be very needful towards further research on COVID-19 pandemic and essential for vaccine development against SARS-CoV-2 which will save public health.

Entities: Chemical Disease Gene Mutation Species

Keywords: Bioinformatics; COVID-19; Drug design; Genome wide association study; Next generation sequencing; SARS-CoV-2

Year: 2020 PMID： 33349792 PMCID： PMC7744275 DOI： 10.1016/j.mgene.2020.100844

Source DB: PubMed Journal: Meta Gene ISSN： 2214-5400

Introduction

Due to the small genome size, viruses have complex methods to maximize the coding potential of genomes and evaluation (Gautam et al., 2019). Meanwhile, the introduction of genomics and bioinformatics have contributed enormously to understand the infectious disease from disease pathogenesis, mechanisms and the spread of antimicrobial resistance to host immune responses (Bah et al., 2018). SARS-CoV-2, which has created world pandemic scenario by affecting not only public health but also the socio-economic status of the entire humankind. The genome of the novel severe acute respiratory syndrome 2 (SARS-CoV-2) has been observed to be between 29.8 kb to 29.9 kb in size, and its sequence differs substantially from some of the previously identified human corona viruses including SARS and the Middle East respiratory syndrome (MERS) (Khailany et al., 2020; Chaw et al., 2020). However, the proper investigation of epidemiological, virological and pathogenic characteristics of SARS-CoV-2 is crucial to introduce novel treatment approaches and to develop effective prevention strategies (Messina et al., 2020). For the above bioinformatics tools and techniques have been implemented.

Next-generation sequencing

Advances in Next-Generation sequencing (NGS) innovations have brought about a remarkable multiplication of genomic sequence data (Suwinski et al., 2019). NGS has revolutionized the scale and deepness of biomedical sciences. During an outbreak condition in a health care system, the fast and effective identification of causative pathogen with epidemiological surveys are needed to permit a focused on disease control reaction. The accuracy of NGS in viral variants has productively analyzed and quantify the extremely high diversity within viral quasi-species. Many low frequency discovered drug or vaccine resistant mutations of therapeutic importance (Lu et al., 2020). High throughput sequencing technologies, including whole-genome sequencing (WGS) metagenomics technique, are providing the possibility to rapidly obtain the full sequence of pathogen genomes.

Metagenomics

The in silico virus sequencing is often based on alignments mapping of reads against a reference sequence (Maurier et al., 2019). Whereas a simple, cost-effective approach metagenomics is the only approach, which does not require reference sequence for analysis. It represents a powerful application for pathogen identification from the environmental samples and directly accessing the genetic content of the organism during emerging pandemics situations (Peddu et al., 2020; Thomas et al., 2012). Metagenomics applications have also introduced in recent COVID-19 pandemics to reveal some critical novel information regarding SARS-CoV-2. The metagenomics has been used for rapid identification and quick characterization of the first few cases of COVID19 (Chen et al., 2020; Manning et al., 2020), for examining the SARS-CoV-2 with other co-infections in nasopharyngeal throat swabs of patients (Vardhan and Sahoo, 2020), identification of the intermediate host in transferring the infection to human body (Lam et al., 2020), screening of the homologous sequence of SARS-CoV-2 in other organisms (Wahba et al., 2020), the effect of SARS-CoV-2 in human faecal microbiome alterations (Zuo et al., 2020), clinical SARS-CoV-2 infection with bacterial co-infections (Peddu et al., 2020) etc. These findings have helped and are helping, the clinicians for better isolation of COVID-19 patients with different symptoms (Table 1 ). There are certain software and databases have reportedly used for interpretation of metagenomics applications (Table 2 ).

Table 1

Application of metagenomics in different experimental studies on SARS-CoV-2.

Author and publication year	Objectives of the study	Sequencing platform	Findings
Peddu et al., 2020	Studied on SARS-CoV-2 epidemic, laboratory-confirmed positive and negative samples from Seattle, Washington	IlluminaMiSeq	• Betacoronavirus of Bats are the closely related species of SARS-CoV-2 • Colonization with human parainfluenza virus 3 with SARS-CoV-2
Chen et al., 2020	Investigated two pneumonia patients who developed acute respiratory syndromes after independent contact history with Wuhan sea food market	Illumina Miseq	• 2019-nCoV was closely related to strains bat-SL-CoVZXC21 and bat-SL-CoVZC45 at ORF1a, S, and N genes • Identified presence of SARS-CoV-2 from pneumonia patients • No other pathogens were identified from the infected sample
Manning et al., 2020	Quick characterization of Cambodia's first case of COVID-2019	iSeq100 Illumina	• All human SARS-CoV-2 genomes are very similar, including the SARS-CoV-2 genome from the Cambodian case • SNP was noted at position 25,654 in ORF3a resulting in a valine-to-leucine substitution
Van Tan et al., 2020	Isolation of other pathogen co-infections in people with COVID-19	Illumina MiSeq	• Several nonsynonymous substitutions in the obtained genomes • SARS-CoV-2 SARS-CoV-2 co-infection with rhinovirus
Tsan-Yuk-Lam et al., 2020	Identification of any intermediate host for SARS-CoV-2 infection transmission to human	Illumina HiSeq	• Malayan pangolin associated coronaviruses belong to sub lineages of SARS-CoV-2 with strong similarity in the receptor binding domain to SARS-CoV-2 • Pangolins should be considered as possible hosts in the emergence of new coronaviruses
Wahba et al., 2020	Examined close matches to the severe acute respiratory syndrome coronavirus 2	NA	• Similar viral sequence found in pangolin lung which hypothesized pangolin as the intermediate host for infection
Zuo et al., 2020	Investigated temporal transcriptional activity of SARS-CoV-2 and its association with longitudinal faecal microbiome alterations in patients with COVID-19	Illumina NextSeq 550	• Faecal samples with signature of high SARS-CoV-2 infectivity had higher abundances of bacterial species Collinsella aerofaciens, Collinsella tanakaei, Streptococcus infantis, Morganella morganii

Table 2

Basic Bioinformatics Databases/Tools useful in COVID19 Next Generation Sequencing Data Analysis (Meta Genomics and Whole Genome Sequencing).

Databases/Tools	Applications	References
Sequence Read Archive (SRA) Database (https://www.ncbi.nlm.nih.gov/sra)	It is the largest publicly available repository of high throughput sequencing data, stores raw sequencing data and alignment information.	Leinonen et al., 2011a
European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/browser/)	Provides a comprehensive record on DNA and RNA raw sequencing and assembly data.	Leinonen et al., 2011a, Leinonen et al., 2011b

Metagenomics
FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)	Used to check quality control on raw sequences generated from high throughput sequencing pipelines.	Brown et al., 2017
Cutadapt (https://cutadapt.readthedocs.io/en/stable/)	Used to clean the sequences. It finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from the high-throughput sequencing reads.	Martin, 2011
Qiime (http://qiime.org/)	An open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. It interprets demultiplexing and quality filtering, OTU picking, taxonomic assignment, and phylogenetic reconstruction, and diversity analyses and visualizations through command lines.	Kuczynski et al., 2011

Whole genome sequencing
FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)	Used to check quality control on raw sequences generated from high throughput sequencing pipelines.	Brown et al., 2017
Cutadapt (https://cutadapt.readthedocs.io/en/stable/)	Used to clean the sequences. It finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from the high-throughput sequencing reads.	Martin, 2011
MaSuRCA (https://github.com/alekseyzimin/masurca)	Genome Assembler	Zimin et al., 2013
Ragout (https://github.com/fenderglass/Ragout)	A reference assisted assembly tool. Records contigs to create high quality scaffolds by using a genome rearrangement approach and multiple closely related genome references as a guide.	Kolmogorov et al., 2014
Prokka (https://kbase.us/applist/apps/ProkkaAnnotation/annotate_contigs/release?gclid=Cj0KCQiAzZL-BRDnARIsAPCJs729c42yhrdcRV0tbPIaJ5NVefVzYHwx5kDILF1ndoV-P5_Ue1qstiYaAgWrEALw_wcB)	Rapid annotation of prokaryotic genomes.	Seemann, 2014
AUGUSTUS (http://augustus.gobics.de/)	A tool to predict genes in eukaryote genome sequences.	Stanke and Morgenstern, 2005

Application of metagenomics in different experimental studies on SARS-CoV-2. Betacoronavirus of Bats are the closely related species of SARS-CoV-2 Colonization with human parainfluenza virus 3 with SARS-CoV-2 2019-nCoV was closely related to strains bat-SL-CoVZXC21 and bat-SL-CoVZC45 at ORF1a, S, and N genes Identified presence of SARS-CoV-2 from pneumonia patients No other pathogens were identified from the infected sample All human SARS-CoV-2 genomes are very similar, including the SARS-CoV-2 genome from the Cambodian case SNP was noted at position 25,654 in ORF3a resulting in a valine-to-leucine substitution Several nonsynonymous substitutions in the obtained genomes SARS-CoV-2 SARS-CoV-2 co-infection with rhinovirus Malayan pangolin associated coronaviruses belong to sub lineages of SARS-CoV-2 with strong similarity in the receptor binding domain to SARS-CoV-2 Pangolins should be considered as possible hosts in the emergence of new coronaviruses Similar viral sequence found in pangolin lung which hypothesized pangolin as the intermediate host for infection Faecal samples with signature of high SARS-CoV-2 infectivity had higher abundances of bacterial species Collinsella aerofaciens, Collinsella tanakaei, Streptococcus infantis, Morganella morganii Basic Bioinformatics Databases/Tools useful in COVID19 Next Generation Sequencing Data Analysis (Meta Genomics and Whole Genome Sequencing).

Whole genome sequencing

Obtaining virus genome sequence directly from clinical samples is still a challenging task due to the low load of virus genetic material compared to the host DNA and the difficulty to get an accurate genome assembly (Maurier et al., 2019). By the time genome sequencing procedure of virus has become a convenient method for better understanding of virus pathogenicity and epidemiological surveillance. Whole-genome sequencing (WGS) is a potent implement for studying virus evolution and genetic association to diseases or for tracking outbreaks. The depth of the sequencing data and the quality of the obtained sequences make this approach particularly efficient in this context (Kremer et al., 2017). For the early understanding and diagnosis of COVID-19, the whole genome sequencing of SARS-CoV-2 was done for the samples collected from different countries throughout the world by using NGS platforms like Illumina miseq, Roche etc. (Sah et al., 2020; Yadav et al., 2020; Sekizuka et al., 2020; Chong et al., 2020; Caly et al., 2020) (Table 2). The use of nanopore sequencing is used for genome sequencing of SARS-CoV-2 (Caly et al., 2020) (Table 3 ). The available whole genome sequences of SARS-CoV-2 in various online databases, and data analysis software provides insights into the further genomic data analysis to offer better medications to the patients (Table 2).

Table 3

Whole genome sequencing (WGS) of SARS-CoV-2 strains in different COVID19 research studies.

Author and Publication Year	Objectives of the Study	Platform	Findings
Sah et al., 2020	Whole genome sequencing of SARS-CoV-2 specimen isolated from COVID-19 patients of Nepal	Illumina miSeq	• Identical sequence between BetaCoV/Nepal/61/2020 and 2019-nCoV WHU01 • Silent mutations at coding region of Spike, ORF1a, ORF1b and ORF8b proteins
Yadav et al., 2020	Characterization of SARS-CoV-2 sequences isolated from India with travel history of China	Illumina miniseq	• Sequence heterogeneity with in SARS-CoV-2 globally • Mutations in Spike protein • B and T cell epitope prediction on Spike protein
Sekizuka et al., 2020	Characterization of SARS-CoV-2 genome, isolated from Japan with travel history of Egypt	Illumina	• Observed close lineage and single nucleotide variations in genomic isolates
Chong et al., 2020	Whole genome sequencing and analysis of SARS-CoV-2 isolated from Malaysia	Illumina iseq	• Unique mutations • 16 nucleotide substitution in Malaysian strain • 4 unique nucleotide substitution in nonstructural genes of SARS-CoV-2
Caly et al., 2020	To describe the first isolation and sequencing of SARS-CoV-2 in Australia and rapid sharing of the isolate	Oxford Nanopore Technologies and Illumina short-read	• >99.9% of sequence identity between BetaCoV/Australia/VIC01/2020 and publicly available SARS-CoV-2 genomes • SNPs and nucleotide deletions in 3’UTR

Whole genome sequencing (WGS) of SARS-CoV-2 strains in different COVID19 research studies. Identical sequence between BetaCoV/Nepal/61/2020 and 2019-nCoV WHU01 Silent mutations at coding region of Spike, ORF1a, ORF1b and ORF8b proteins Sequence heterogeneity with in SARS-CoV-2 globally Mutations in Spike protein B and T cell epitope prediction on Spike protein Observed close lineage and single nucleotide variations in genomic isolates Unique mutations 16 nucleotide substitution in Malaysian strain 4 unique nucleotide substitution in nonstructural genes of SARS-CoV-2 >99.9% of sequence identity between BetaCoV/Australia/VIC01/2020 and publicly available SARS-CoV-2 genomes SNPs and nucleotide deletions in 3’UTR

Genome-wide association study

GWAS has rehabilitated the complex disease genetics in to modest by providing various convincing links between complex characteristics of human and disease. Comprehensive and accurate detection of variants from whole-genome sequencing is a definite prerequisite for translational genomic research (Hwang et al., 2019). GWAS has involved in the screening of genetic variants across the genomes of many individuals to identify genotype-phenotype associations. Genetic variants discovered by GWAS are used to identify individuals at high risk of deadly diseases, which influences the early detection and prevention of diseases (Tam et al., 2019). A genome wide association study (GWAS) is an extensive genetic analysis of the disease-associated observable alleles in the host/pathogen in the form of single nucleotide polymorphisms (SNP) (Patron et al., 2019). The use of GWAS applications including sequence analysis, alignment, genetic/nucleotide variations in the form of SNPs, genomic structure and alterations, primer design etc. have represented novel insights in case of SARS-CoV-2 experiments by accurately detect and quantify rare viral variants within the species (Khailany et al., 2020; Ellinghaus et al., 2020; Aiewsakun et al., 2020; Ray et al., 2020a) (Table 4 ). In addition to the SNP analysis, the incorporation of haplotype diversity analysis with phylogenetic analysis has been frequently used in the SARS-CoV-2 research analyses to study the evolution and population demography of SARS-CoV-2 globally (Ramírez et al., 2020; Fang et al., 2020). The molecular and evolutionary relationship with other coronavirus species, closely related species identification etc. have been efficaciously analyzed through phylogenetic study. This provides additional data for proper genomic assessment of SARS-CoV-2 (Ray et al., 2020b; Tabibzadeh et al., 2020; Satpathy, 2020; Joshi and Paul, 2020; Zhou et al., 2020; Lopes et al., 2020) (Table 4).

Table 4

Interpretation of genome wide association studies (GWAS) for characterization of SARS-CoV-2 genomes.

Author and Publication Year	Objective	Findings
Khailany et al., 2020	Understand the genomic structure and variations in SARS-CoV-2 complete genome sequences	• 116 mutations found • 3 most common mutations: 8782C > T in ORF1ab, 28,144 T > C in ORF8 and 29095C > T in N gene
Ellinghaus et al., 2020	Identification of potential genetic factors involved in the development of Covid-19	• Analyzed 8,582,968 SNPs • A3p2131 gene cluster as a genetic susceptibility locus in COVID-19 patients • Potential involvement of ABO blood group
Aiewsakun et al., 2020	Identification of Genetic variation associated with COVID-19 severity	• Nucleotide variation at genomic position 11,083 • Variation in 11083G in symptomatic patients • 11,083 T variant in asymptomatic patient • miR-485-3p, miR-539-3p, miR-3149 differentially target the variants
Ray et al., 2020b	Elucidation of Nucleotide polymorphisms in whole genome sequences of SARS-CoV-2	• SNPs in S (22224G, 22,224 T) and N (28792G, 28792C) protein of Indian and Nepal species respectively • Less case fatality rate in India and Nepal
Tabibzadeh et al., 2020	Investigate and track SARS-CoV-2 in Iranian COVID-19 patients	• Iranian isolates are closely related to Wuhan reference sequence • No polymorphism found in assesses regions of nsp-2, nsp-12, Spike
Satpathy, 2020	Investigation on source of origin of this novel coronavirus	• Wuhan-Hu-1 genome showed evolutionary relationship with Bat CoV RaTG13 genome sequence with 96.12% sequence similarity
Joshi and Paul, 2020	Highlight the similarities and changes observed in the submitted Indian viral strains	• Novel non-synonymous mutation C > T (NSP3) 14408C > t (RNA primase), 23403A > G (S), 3037C > T (NSP3 synonymous) in genes of SARS-CoV-2 Indian strain.
Zhou et al., 2020	Analyse the evolution and variation of SARS-CoV-2 during the epidemic starting at the end of 2019	• SARS-CoV-2 belonged to the Sarbecovirus subgenus of Beta coronavirus, Beta CoV/Bat/Yunnan/RaTG13/2013,bat-SL-CoVZC45, bat-SL-CoVZXC21 and SARS-CoV • No positive time evolution signal between SARS-CoV-2 and BetaCoV/bat/Yunnan/RaTG13/2013
Lopes et al., 2020	Investigate bats and pangolin as hosts in SARS-CoV-2 cross-species transmission	• SARS-like-CoV-2 strains that infected pangolin and bats are close to SARS-CoV-2 • Pangolin has yet lower ACE2 evolutionary divergence with humans and more diverged from bat

Interpretation of genome wide association studies (GWAS) for characterization of SARS-CoV-2 genomes. 116 mutations found 3 most common mutations: 8782C > T in ORF1ab, 28,144 T > C in ORF8 and 29095C > T in N gene Analyzed 8,582,968 SNPs A3p2131 gene cluster as a genetic susceptibility locus in COVID-19 patients Potential involvement of ABO blood group Nucleotide variation at genomic position 11,083 Variation in 11083G in symptomatic patients 11,083 T variant in asymptomatic patient miR-485-3p, miR-539-3p, miR-3149 differentially target the variants SNPs in S (22224G, 22,224 T) and N (28792G, 28792C) protein of Indian and Nepal species respectively Less case fatality rate in India and Nepal Iranian isolates are closely related to Wuhan reference sequence No polymorphism found in assesses regions of nsp-2, nsp-12, Spike Wuhan-Hu-1 genome showed evolutionary relationship with Bat CoV RaTG13 genome sequence with 96.12% sequence similarity Novel non-synonymous mutation C > T (NSP3) 14408C > t (RNA primase), 23403A > G (S), 3037C > T (NSP3 synonymous) in genes of SARS-CoV-2 Indian strain. SARS-CoV-2 belonged to the Sarbecovirus subgenus of Beta coronavirus, Beta CoV/Bat/Yunnan/RaTG13/2013,bat-SL-CoVZC45, bat-SL-CoVZXC21 and SARS-CoV No positive time evolution signal between SARS-CoV-2 and BetaCoV/bat/Yunnan/RaTG13/2013 SARS-like-CoV-2 strains that infected pangolin and bats are close to SARS-CoV-2 Pangolin has yet lower ACE2 evolutionary divergence with humans and more diverged from bat Also to prevent the false positive results during testing of COVID-19 through real-time polymerase chain reaction (rtPCR) and decreasing the need for standardization across different PCR protocols, some primers have been designed through in silico algorithms by targeting conserved segments in viral genome (Lanza et al., 2020; Lopez-Rincon et al., 2020; Toms et al., 2020). This generated novel information on SARS-CoV-2 infectious genes are helping the researchers in the vaccine development against SARS-CoV-2, according to the identified viral genes coding regions, genetic sequence variations and molecular differentiations between the isolated species throughout the world. All the reported genomic experiments and analyses including SNP study, phylogenetic analysis, primer designing etc. have been carried out through high throughput bioinformatics tools and techniques which provide an appropriate pipeline for data analyses and annotations (Table 5 ).

Table 5

List of researches reported on in silico drug design (CADD) against viral proteins of SARS-CoV-2.

Author and Publication Year	Objective of the Study	Target Protein	Findings
Prasanth et al., 2020	identification of potential inhibitors from Cinnamon against main protease and spike glycoprotein of SARS CoV-2	Mpro and Spike	• Tenufolin (TEN) and Pavetannin C1 (PAV) are hit compounds against Mpro and Spike protein
Hall Jr and Ji, 2020	Identification of effective inhibors against Spike glycoprotein and 3CL protease of SARS-CoV-2	Spike and 3CL Pro	• Zanamivir, Indinavir, Saquinavir, and Remdesivir show potential inhibitory effects on S and 3CLvPRO
Wei et al., 2020	Selection of potential molecules that can target viral spike proteins	Spike protein	• Raltegravir have a relatively high binding score against S protein • Forsythiae • fructus and Isatidis radix herbs are widely used for treating Covid-19
Fantini et al., 2020	Studied the effects of Chloroquine and Hydroxychloroquine for treating Covid-19	Spike Protein	• CLQ, CLQ-OH inhibits the binding of viral S protein with gangliosides binding site
BR et al., 2020	Screening of small molecules to bind ACE2 specific RBD on Spike glycoprotein of SARS-CoV-2	Spike protein	• Glycyrrhizic Acid of plant origin may be repurposed for SARS-CoV-2 intervention
Cavasotto and Di Filippo, 2020	Docking-based screening from approved drugs and compounds undergoing clinical trials, against three SARS-CoV-2 target proteins	Spike, M pro, Papain like protease	• Prlatrexete, Carumonam, Aclerasteride, Granotapide (S protein), Tiracizine (PL Pro), Ritonavir (M pro) are the effectives compounds and drugs processed under clinical triels
Vardhan and Sahoo, 2020	Virtual screening of phytochemicals against viral proteins of SARS-CoV-2	Spike, Mpro, 3CL pro, PL pro, ACE2, RdRp	• Glycyrrhizic acid, limonin, 7-deacetyl-7-benzoylgedunin, maslinic acid, corosolic acid, obacunone and ursolic acid effective against the target proteins of SARS-CoV-2
Panda et al., 2020	Structure-based drug designingand immunoinformatics approach for SARS-CoV-2	Spike glycoprotein, M pro, ACE2	• Zanamivir and Lopinavir showed stronger binding affinity against S protein and M pro respectively
Sarma et al., 2020	Homology assisted identification of inhibitor against RNA binding domain of N protein	Nucleocapsid protein	• Theophylline and pyrimidone derivatives are possible inhibitors
Ray et al., 2020a	Potential drug compound identification against Covid-19	Nucleocapsid protein	• Glycyrrhizic acid and Theaflavin natural compound showed best binding energy against N protein
Bhowmik et al., 2020	Identify potential drug candidates against SARS-CoV-2 structural proteins	Membrane, Envelope and Nucleocapsid protein	• Rutin against envelope protein • Caffeic acid and ferulic acid against membrane protein • Simeprevir and grazoprevir against N protein
Lavecchia and Fernandez, 2020	Stabilization of non-native Protein-Protein Interactions (PPIs) of the nucleocapsid protein for inhibit viral replication in SARS-CoV-2	Nucleocapsid Protein	• Catechin might be used to stabilize PPIs of N protein
Gupta et al., 2020	Detection of inhibitors of SARS-CoV-2 ion channel to control covid-19	Envelope protein	• Belachinal, Macaflavanone E & Vibsanol B showed inhibitory effects for envelope protein ion channel
Jo et al., 2020	Screening of flavonoinds against 3CL pro of SARS-CoV-2	3CL pro	• Baicalin showed an effective inhibitory activity against SARS-CoV-2 3CLpro
Kumar et al., 2020	Inhibitors screening and drug discovery against main protease (Mpro) of SARS-CoV-2	Mpro	• Lopinavir-Ritonavir, Tipranavir, and Raltegravir show the best molecular interaction with the main protease of SARS-CoV-2

List of researches reported on in silico drug design (CADD) against viral proteins of SARS-CoV-2. Tenufolin (TEN) and Pavetannin C1 (PAV) are hit compounds against Mpro and Spike protein Zanamivir, Indinavir, Saquinavir, and Remdesivir show potential inhibitory effects on S and 3CLvPRO Raltegravir have a relatively high binding score against S protein Forsythiae fructus and Isatidis radix herbs are widely used for treating Covid-19 CLQ, CLQ-OH inhibits the binding of viral S protein with gangliosides binding site Glycyrrhizic Acid of plant origin may be repurposed for SARS-CoV-2 intervention Prlatrexete, Carumonam, Aclerasteride, Granotapide (S protein), Tiracizine (PL Pro), Ritonavir (M pro) are the effectives compounds and drugs processed under clinical triels Glycyrrhizic acid, limonin, 7-deacetyl-7-benzoylgedunin, maslinic acid, corosolic acid, obacunone and ursolic acid effective against the target proteins of SARS-CoV-2 Zanamivir and Lopinavir showed stronger binding affinity against S protein and M pro respectively Theophylline and pyrimidone derivatives are possible inhibitors Glycyrrhizic acid and Theaflavin natural compound showed best binding energy against N protein Rutin against envelope protein Caffeic acid and ferulic acid against membrane protein Simeprevir and grazoprevir against N protein Catechin might be used to stabilize PPIs of N protein Belachinal, Macaflavanone E & Vibsanol B showed inhibitory effects for envelope protein ion channel Baicalin showed an effective inhibitory activity against SARS-CoV-2 3CLpro Lopinavir-Ritonavir, Tipranavir, and Raltegravir show the best molecular interaction with the main protease of SARS-CoV-2

Computer aided drug design

Drug design is very challenging, expensive, time consuming and an integrated rising discipline (Bisht and Singh, 2019). In the interim, the field of bioinformatics has become a crucial part of the drug design that plays a vital role for the validation of drug targets. It can help in the understanding of complex biological processes to improve drug discovery (Choudhury and Saikia, 2018). The in silico screening or computer-aided drug design (CADD) has signified as a dominant practice because of its proper algorithms including the development of digital repositories for the study of chemical interaction relationships, computer programs for designing compounds with unusual physicochemical characteristics as well as tools for systematic assessment of potential lead candidates etc. in drug discovery and development (Song et al., 2009). Also, the additional benefits like cost-saving, time to market, in-sight knowledge of drug-receptor interaction, speed up in drug discovery and development increases its popularity in scientific researches (Ramírez et al., 2020). The potentiality of CADD has been exploited to the fullest in finding a solution for this COVID-19 outbreak. Researchers have taken the privilege of CADD including structure-based drug design, network-based drug design towards the identification of potential drug candidates against the identified viral proteins including Spike (S) protein (Prasanth et al., 2020; Hall Jr and Ji, 2020; Wei et al., 2020; Fantini et al., 2020; BR et al., 2020; Cavasotto and Di Filippo, 2020; Vardhan and Sahoo, 2020; Panda et al., 2020), Nucleocapsid (N) protein (Sarma et al., 2020; Ray et al., 2020a; Bhowmik et al., 2020; Lavecchia and Fernandez, 2020), Envelop protein (Bhowmik et al., 2020; Lavecchia and Fernandez, 2020; Gupta et al., 2020), Membrane (M) Protein (Bhowmik et al., 2020), Main protease (M pro) (Prasanth et al., 2020; Cavasotto and Di Filippo, 2020; Vardhan and Sahoo, 2020; Panda et al., 2020; Kumar et al., 2020), 3CL protease (Hall Jr and Ji, 2020; Vardhan and Sahoo, 2020; Jo et al., 2020) of SARS-CoV-2 by using the bioinformatics tools and software (Table 6 ). This immediate and effective action has not only predicted novel putative natural inhibitors but also re-experimented some previously used ancient synthetic drugs with antiviral activities like chloroquine (malaria), hydroxylchloroquine (maalaria), zanamivir (influenza A & B virus), indinavir (HIV), saquinavir (HIV), remdesivir (SARS-CoV), ralterravin (HIV), streptomycine, ciprofloxacin, zanamivir (influenza virus), glycyrrhizic acid (anti inflammation) etc. against SARS-CoV-2 (Hall Jr and Ji, 2020; Fantini et al., 2020; BR et al., 2020; Panda et al., 2020; Ray et al., 2020b) (Table 6). For the successful completion of CADD, various bioinformatics tools and databases have been used since last decades and would be used in further research (Table 7 ).

Table 6

Basic Bioinformatics Databases/Tools useful for COVID19 genomics research.

Databases/ Tools	Application	References
GEO (Gene Expression Omnibus) database (https://www.ncbi.nlm.nih.gov/geo/)	It is a repository of functional genomics data generated from experiments and stores curate gene expression profiles.	Clough and Barrett, 2016
NCBI Gene database (https://www.ncbi.nlm.nih.gov/gene/)	Repository of gene related information from a wide range of species.	Brown et al., 2015
UCSC genome Browser (https://genome.ucsc.edu/)	Broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing and downloading genomic data.	Karolchik et al., 2009
UniProt (https://www.uniprot.org/)	Resource of protein sequence and functional information	UniProt Consortium, 2008
CD (Conserved Domain) Search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi)	Conserved domain search through multiple and pair wise sequence alignments.	Ray et al., 2020a
DAVID (Database for Annotation, Visualization and Integrated Discovery)	Functional annotation of genes (Biological process, Molecular function, Cellular component)	Huang et al., 2007
KEGG (Kyoto Encyclopaedia of Genes and Genome)	Metabolic pathway analysis	Kanehisa and Goto, 2000

Discovery of Single Nucleotide Polymorphisms
dbSNP (https://www.ncbi.nlm.nih.gov/snp/)	A crucial repository for each single base nucleotide substitutions and quick deletion and insertion polymorphisms	Sherry et al., 2001
SIFT (https://sift.bii.a-star.edu.sg/)	Predicts effects of an amino acid substitution on protein function based on sequence homology and the physical properties of amino acids.	Sim et al., 2012
PredictSNP1 (https://loschmidt.chemi.muni.cz/predictsnp1/)	Consensus classifier for prediction of disease related amino acid mutations.	Rath et al., 2020
PredictSNP2 (https://loschmidt.chemi.muni.cz/predictsnp2/)	Platform for prediction of effects of SNPs in genomic region.	Bendl et al., 2016
PolyPhen2 (http://genetics.bwh.harvard.edu/pph2/)	Predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.	Ray et al., 2019
PROVEAN (http://provean.jcvi.org/index.php)	Predicts impact of an amino acid substitution or indel on the biological function of a protein.	Ray et al., 2019
SNAP2 (https://rostlab.org/services/snap/)	Predicts functional effects of sequence variants.	Ray et al., 2019

Phylogenetic Analysis
MEGA (Molecular Evolutionary Genetics Analysis) (https://www.megasoftware.net/)	Multiple sequence alignment, phylogenetic tree generation and statistical analyses.	Kumar et al., 2008
Phylogeny.fr (https://www.phylogeny.fr/)	Reconstruct and analyse phylogenetic relationships between molecular sequences.	Dereeper et al., 2008
PAUP (https://paup.phylosolutions.com/)	Reconstruct and analyse phylogenetic relationships between molecular sequences using parsimony method.	Wilgenbusch and Swofford, 2003
DnaSP (http://www.ub.edu/dnasp/)	Analyse DNA polymorphisms using data from a single locus, and also generate haplotype diversity between the sequences.	Rozas et al., 2017
PopArt (http://popart.otago.ac.nz/index.shtml)	Population genetic software which visualizes haplotype diversity network.	Leigh and Bryant, 2015

Primer Design
Primer3 (https://bioinfo.ut.ee/primer3-0.4.0/)	Primer design, often in high-throughput genomics applications.	Untergasser et al., 2012
NCBI Primer-Blast (https://www.ncbi.nlm.nih.gov/tools/primer-blast/)	Design new target-specific primers in one step as well as to check the specificity of pre-existing primers and also placing primers based on exon/intron locations and excluding single nucleotide polymorphism (SNP) sites in primers.	Ye et al., 2012

Table 7

Basic Bioinformatics Databases/Tools useful for COVID19 In silico drug design.

Databases/ Tools	Application	References
BLAST (Basic local alignment search tool) (https://blast.ncbi.nlm.nih.gov/Blast.cgi)	Used for local similarity between sequences by comparing nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.	Boratyn et al., 2013
PDB (Protein databank) (https://www.rcsb.org/)	Protein three dimensional structure database, it conation information about the 3D shapes of proteins, nucleic acids, and complex assemblies.	Berman et al., 2000
PubChem (https://pubchem.ncbi.nlm.nih.gov/)	Chemical structure database, contains information on chemical compounds including name, molecular formula, chemical and physical properties, biological activities, toxic effects, literatures etc.	Kim et al., 2016
Drug Bank (https://www.drugbank.ca/)	Drugbank contains information on FDA approved drugs and drug targets. It is a both bioinformatics and chemoinformatics resource.	Wishart et al., 2018
Modeller (https://salilab.org/modeller/)	Used for homology or comparative modeling of protein three-dimensional structures by aligning query sequence with known structure.	Eswar et al., 2006
AutoDock (http://autodock.scripps.edu/)	Molecular docking between protein and ligand (small compounds) molecules.	Forli et al., 2016
Autodockvina (http://vina.scripps.edu/)	An open source for molecular docking and it significantly improves the average accuracy of the binding mode predictions compared to AutoDock 4.	Trott and Olson, 2010
Zdock (http://zdock.umassmed.edu/)	An automatic protein docking online server, which simply interprets the protein structures.	Pierce et al., 2011
SwissDock (http://www.swissdock.ch/)	A web service to predict the molecular interactions between a target protein and a small molecule.	Grosdidier et al., 2011
PatchDock (https://bioinfo3d.cs.tau.ac.il/PatchDock/)	A simple molecular docking algorithm based on shape complementarity principles.	Schneidman-Duhovny et al., 2005
Glide (https://www.schrodinger.com/glide)	It offers the full range of speed vs. accuracy options, from the high-throughput virtual screening mode for efficiently enriching million compound libraries for reliably docking tens to hundreds of thousands of ligand with high accuracy, advanced scoring, and higher enrichment of results.	Richard et al., 2004
PyMol (https://pymol.org/2/)	Molecular structure visualization and editing tool.	Seeliger and de Groot, 2010
Discovery Studio Visualizer (https://discover.3ds.com/discovery-studio-visualizer-download)	Structure visualization, and analysis of 3D molecules.	Ray et al., 2020a
UCSF Chimera (https://www.cgl.ucsf.edu/chimera/)	Visualization and analysis of molecular structures and related data, including density maps, trajectories, and sequence alignments. Also used for energy minimization of molecules.	Pettersen et al., 2004
Open Babel (http://openbabel.org/wiki/Main_Page)	A chemical toolbox designed to search, convert file format, analyse, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.	O'Boyle et al., 2011
Gromacs (http://www.gromacs.org/About_Gromacs)	Molecular dynamics simulation tool	Abraham et al., 2015
NAMD (https://www.ks.uiuc.edu/Research/namd/)	Parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.	Phillips et al., 2005
VMD (https://www.ks.uiuc.edu/Research/vmd/)	Molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.	Hsin et al., 2008

Basic Bioinformatics Databases/Tools useful for COVID19 genomics research. Basic Bioinformatics Databases/Tools useful for COVID19 In silico drug design. The overall in silico processes are established in an order to perform a task in a sequential manner. From the beginning metagenomics to the end CADD have interconnected and represented the applications of bioinformatics in a single flow diagram (Fig. 1 ).

Fig. 1

The graphical representation of interconnected bioinformatics applications implemented in COVID-19 research.

Limitations

Wide application of robust algorithm based tools and information perceived from several public repository have enriched the knowledge spheres of modern life science research. The available bioinformatics tools and techniques are simple, accurate, cost effective, economical and freely available on internet, enabling their universal use for different research purposes. The above mentioned online repositories including PDB, PubChem, DrugBank, NCBI gene/genome databases, UCSC genome database, Uniprot, dbSNP, GEO, SRA, ENA (Table 2), (Table 5), (Table 7) etc. have updated frequently with huge novel datasets, which provides much authenticated and useful information to the users to carry out their research purposes. However, there is some limitations in use of certain tools particularly used for drug design such as Modeller (Table 7) or any other software generated 3D structure of proteins is approximate, which needs to be properly validated through crystallographic method for further study. The analyzed docking parameters based on predefined algorithms of autodock (Table 7) should be simulate further to analyse the proper stability between target and drug candidate interactions. Likewise, some softwares including Schrodinger, Discovery studio (Table 7), PAUP (Table 5) etc. are creating limitations for researchers during data analysis and accession, as they are customized or paid software. Apart from the above major drawbacks/limitations some minor flaws are associated with the using of tools and software i.e. error during software installation, software dependencies particularity the type of operating systems, high speed internet network connection, high core computer facility etc. The designed tools and software are meant for respective analyses, the user cannot modify the algorithms and outputs according to own interest, the user need to use different respective software for different purposes to get the authenticate results. The knowledge about different programming languages like Perl, R, Python and Linux operating system is necessary to work with different bioinformatics software as well as to rewrite the codes needed to solve particular biological problem computationally, in particular for software used for next generation sequencing analyses.

Future aspects

The observations on SARS-CoV-2 will be explored extensively through bioinformatics and its applications variously. The researchers can also elucidate the SNPs in host body after affected with COVID-19. According to the modified nucleotides/genes novel primers can be designed for polymerase chain reaction through computational primer design algorithms. Apart from the drug design, putative inhibitory peptide can be created against SARS-CoV-2 viral genes. These further ideas would exploit many more denovo information of SARS-CoV-2, which will help the clinicians to add novel medication insights in the diagnosis procedures.

Conclusion

The outbreak of COVID-19 throughout the world is a big challenge for people to overcome this. Advances in bioinformatics techniques have been proved as the most advanced and effective technique in biomedical research. The high throughput screening and accuracy of data analysis have made this possible. The vast utilization of computational approaches in the current pandemic situation has effectively used from the preliminary stage of viral sample identification to the end stage of drug design by discovering novel information on SARS-CoV-2 genomic contents, variations, diversity within the species and predicted potential drug/ vaccine candidates against the viral genes within a very short period. In the present economically down condition, the successfully implementation of bioinformatics approaches against SARS-CoV-2 is a great achievement for scientific community.

Funding

No funding has been received for this work.

Declaration of Competing Interest

The authors declare that they have no conflict of interest.

81 in total

1. The Gene Expression Omnibus Database.

Authors: Emily Clough; Tanya Barrett
Journal: Methods Mol Biol Date: 2016

2. DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets.

Authors: Julio Rozas; Albert Ferrer-Mata; Juan Carlos Sánchez-DelBarrio; Sara Guirao-Rico; Pablo Librado; Sebastián E Ramos-Onsins; Alejandro Sánchez-Gracia
Journal: Mol Biol Evol Date: 2017-12-01 Impact factor: 16.240

3. The UCSC Genome Browser.

Authors: Donna Karolchik; Angie S Hinrichs; W James Kent
Journal: Curr Protoc Bioinformatics Date: 2009-12

4. PatchDock and SymmDock: servers for rigid and symmetric docking.

Authors: Dina Schneidman-Duhovny; Yuval Inbar; Ruth Nussinov; Haim J Wolfson
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

5. Genomewide Association Study of Severe Covid-19 with Respiratory Failure.

Authors: David Ellinghaus; Frauke Degenhardt; Luis Bujanda; Maria Buti; Agustín Albillos; Pietro Invernizzi; Javier Fernández; Daniele Prati; Guido Baselli; Rosanna Asselta; Marit M Grimsrud; Chiara Milani; Fátima Aziz; Jan Kässens; Sandra May; Mareike Wendorff; Lars Wienbrandt; Florian Uellendahl-Werth; Tenghao Zheng; Xiaoli Yi; Raúl de Pablo; Adolfo G Chercoles; Adriana Palom; Alba-Estela Garcia-Fernandez; Francisco Rodriguez-Frias; Alberto Zanella; Alessandra Bandera; Alessandro Protti; Alessio Aghemo; Ana Lleo; Andrea Biondi; Andrea Caballero-Garralda; Andrea Gori; Anja Tanck; Anna Carreras Nolla; Anna Latiano; Anna Ludovica Fracanzani; Anna Peschuck; Antonio Julià; Antonio Pesenti; Antonio Voza; David Jiménez; Beatriz Mateos; Beatriz Nafria Jimenez; Carmen Quereda; Cinzia Paccapelo; Christoph Gassner; Claudio Angelini; Cristina Cea; Aurora Solier; David Pestaña; Eduardo Muñiz-Diaz; Elena Sandoval; Elvezia M Paraboschi; Enrique Navas; Félix García Sánchez; Ferruccio Ceriotti; Filippo Martinelli-Boneschi; Flora Peyvandi; Francesco Blasi; Luis Téllez; Albert Blanco-Grau; Georg Hemmrich-Stanisak; Giacomo Grasselli; Giorgio Costantino; Giulia Cardamone; Giuseppe Foti; Serena Aneli; Hayato Kurihara; Hesham ElAbd; Ilaria My; Iván Galván-Femenia; Javier Martín; Jeanette Erdmann; Jose Ferrusquía-Acosta; Koldo Garcia-Etxebarria; Laura Izquierdo-Sanchez; Laura R Bettini; Lauro Sumoy; Leonardo Terranova; Leticia Moreira; Luigi Santoro; Luigia Scudeller; Francisco Mesonero; Luisa Roade; Malte C Rühlemann; Marco Schaefer; Maria Carrabba; Mar Riveiro-Barciela; Maria E Figuera Basso; Maria G Valsecchi; María Hernandez-Tejero; Marialbert Acosta-Herrera; Mariella D'Angiò; Marina Baldini; Marina Cazzaniga; Martin Schulzky; Maurizio Cecconi; Michael Wittig; Michele Ciccarelli; Miguel Rodríguez-Gandía; Monica Bocciolone; Monica Miozzo; Nicola Montano; Nicole Braun; Nicoletta Sacchi; Nilda Martínez; Onur Özer; Orazio Palmieri; Paola Faverio; Paoletta Preatoni; Paolo Bonfanti; Paolo Omodei; Paolo Tentorio; Pedro Castro; Pedro M Rodrigues; Aaron Blandino Ortiz; Rafael de Cid; Ricard Ferrer; Roberta Gualtierotti; Rosa Nieto; Siegfried Goerg; Salvatore Badalamenti; Sara Marsal; Giuseppe Matullo; Serena Pelusi; Simonas Juzenas; Stefano Aliberti; Valter Monzani; Victor Moreno; Tanja Wesse; Tobias L Lenz; Tomas Pumarola; Valeria Rimoldi; Silvano Bosari; Wolfgang Albrecht; Wolfgang Peter; Manuel Romero-Gómez; Mauro D'Amato; Stefano Duga; Jesus M Banales; Johannes R Hov; Trine Folseraas; Luca Valenti; Andre Franke; Tom H Karlsen
Journal: N Engl J Med Date: 2020-06-17 Impact factor: 91.245

6. Complete Genome Sequences of SARS-CoV-2 Strains Detected in Malaysia.

Authors: Yoong Min Chong; I-Ching Sam; Sasheela Ponnampalavanar; Sharifah Faridah Syed Omar; Adeeba Kamarulzaman; Vijayan Munusamy; Chee Kuan Wong; Fadhil Hadi Jamaluddin; Han Ming Gan; Jennifer Chong; Cindy Shuan Ju Teh; Yoke Fun Chan
Journal: Microbiol Resour Announc Date: 2020-05-14

7. PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions.

Authors: Jaroslav Bendl; Miloš Musil; Jan Štourač; Jaroslav Zendulka; Jiří Damborský; Jan Brezovský
Journal: PLoS Comput Biol Date: 2016-05-25 Impact factor: 4.475

8. Structure-based drug designing and immunoinformatics approach for SARS-CoV-2.

Authors: Pritam Kumar Panda; Murugan Natarajan Arul; Paritosh Patel; Suresh K Verma; Wei Luo; Horst-Günter Rubahn; Yogendra Kumar Mishra; Mrutyunjay Suar; Rajeev Ahuja
Journal: Sci Adv Date: 2020-07-10 Impact factor: 14.136

9. Full-genome sequences of the first two SARS-CoV-2 viruses from India.

Authors: Pragya D Yadav; Varsha A Potdar; Manohar Lal Choudhary; Dimpal A Nyayanit; Megha Agrawal; Santosh M Jadhav; Triparna D Majumdar; Anita Shete-Aich; Atanu Basu; Priya Abraham; Sarah S Cherian
Journal: Indian J Med Res Date: 2020 Feb & Mar Impact factor: 2.375

10. Isolation and rapid sharing of the 2019 novel coronavirus (SARS-CoV-2) from the first patient diagnosed with COVID-19 in Australia.

Authors: Leon Caly; Julian Druce; Jason Roberts; Katherine Bond; Thomas Tran; Renata Kostecki; Yano Yoga; William Naughton; George Taiaroa; Torsten Seemann; Mark B Schultz; Benjamin P Howden; Tony M Korman; Sharon R Lewin; Deborah A Williamson; Mike G Catton
Journal: Med J Aust Date: 2020-04-01 Impact factor: 12.776

3 in total

Review 1. Modern drug discovery applications for the identification of novel candidates for COVID-19 infections.

Authors: Isha Rani; Avjit Kalsi; Gagandeep Kaur; Pankaj Sharma; Sumeet Gupta; Rupesh K Gautam; Hitesh Chopra; Shabana Bibi; Syed Umair Ahmad; Inderbir Singh; Manish Dhawan; Talha Bin Emran
Journal: Ann Med Surg (Lond) Date: 2022-07-12

Review 2. Bioinformatics Accelerates the Major Tetrad: A Real Boost for the Pharmaceutical Industry.

Authors: Tapan Behl; Ishnoor Kaur; Aayush Sehgal; Sukhbir Singh; Saurabh Bhatia; Ahmed Al-Harrasi; Gokhan Zengin; Elena Emilia Babes; Ciprian Brisc; Manuela Stoicescu; Mirela Marioara Toma; Cristian Sava; Simona Gabriela Bungau
Journal: Int J Mol Sci Date: 2021-06-08 Impact factor: 5.923

3. Bioinformatics helping to mitigate the impact of COVID-19 - Editorial.

Authors: Mario Cannataro; Andrew Harrison
Journal: Brief Bioinform Date: 2021-03-22 Impact factor: 11.622

3 in total