| Literature DB >> 34201152 |
Tapan Behl1, Ishnoor Kaur1, Aayush Sehgal1, Sukhbir Singh1, Saurabh Bhatia2,3, Ahmed Al-Harrasi3, Gokhan Zengin4, Elena Emilia Babes5, Ciprian Brisc5, Manuela Stoicescu5, Mirela Marioara Toma6,7, Cristian Sava5, Simona Gabriela Bungau6,7.
Abstract
With advanced technology and its development, bioinformatics is one of the avant-garde fields that has managed to make amazing progress in the pharmaceutical-medical field by modeling the infrastructural dimensions of healthcare and integrating computing tools in drug innovation, facilitating prevention, detection/more accurate diagnosis, and treatment of disorders, while saving time and money. By association, bioinformatics and pharmacovigilance promoted both sample analyzes and interpretation of drug side effects, also focusing on drug discovery and development (DDD), in which systems biology, a personalized approach, and drug repositioning were considered together with translational medicine. The role of bioinformatics has been highlighted in DDD, proteomics, genetics, modeling, miRNA discovery and assessment, and clinical genome sequencing. The authors have collated significant data from the most known online databases and publishers, also narrowing the diversified applications, in order to target four major areas (tetrad): DDD, anti-microbial research, genomic sequencing, and miRNA research and its significance in the management of current pandemic context. Our analysis aims to provide optimal data in the field by stratification of the information related to the published data in key sectors and to capture the attention of researchers interested in bioinformatics, a field that has succeeded in advancing the healthcare paradigm by introducing developing techniques and multiple database platforms, addressed in the manuscript.Entities:
Keywords: COVID-19; bioinformatics; microRNA; microbiology; pharmacovigilance; public health
Mesh:
Substances:
Year: 2021 PMID: 34201152 PMCID: PMC8227524 DOI: 10.3390/ijms22126184
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Flow chart presenting the methodology of published data selection.
Figure 2The multiple stages of drug discovery and the events targeted in each of the discussed steps.
Techniques of translational bioinformatics for discovery and development of drugs.
| Diseased States | Translational Bioinformatics Techniques | Refs. |
|---|---|---|
| Osteoporosis drug targets | Functional pathway enrichment, genetic expression profiles from GEO, dysfunctional pathways | [ |
| Neuro AIDS and drug abuse | A public domain database, molecular relationship-evaluating database | [ |
| Repositioning of drugs | Evaluation of transcriptomic information for relationship between drugs and disease | [ |
| Drug resistance in ovarian cancer therapy | Evaluation of protein interactions, evaluation of methylated genes, related to drug resistance, enrichment of biological process | [ |
| Drugs for AIDS and drug resistance | Calculator of resistance to drugs, evaluation of residues of protein with digital signaling processing | [ |
| Repositioning of drugs in transplantation of organs | Microarray dataset profiling, meta-analysis of genomic drugs as well as information, recognizing redundant molecular processes | [ |
| HCV drug discovery | Collation of filtering, based upon dictionary, and gene mention tagger, knowledge discovery process for literature mining | [ |
| Off-label selection of drugs for TNBC | Evaluation of TNBC patient information, collaboration of databases of cancer drugs and respective targets, evaluation of personal molecular profiles | [ |
| Glycomics and drug targets | Tree-based algorithmic models for glycan structure data, collation of data, public databases for glycome informatics, such as KEGG | [ |
Legend: AIDS—acquired immune-deficiency syndrome; GEO—gene expression omnibus; HCV—Hepatitis C virus; KEGG—Kyoto Encyclopedia of Genes and Genomes; TNBC—triple-negative breast cancer.
Figure 3Bioinformatics in multiple assets of microbiology—proteomics, bacterial functional genomics, gene and drug discovery, siderophores, marine natural products, sequence data analysis, multi-drug-resistant tuberculosis (TB) drugs, and prophylactic agents.
Bioinformatics tools for genotyping and drug-resistant TB.
| Bioinformatics Tools | Type | Genotyping |
|---|---|---|
| PhyTB | Online | SNP |
| CASTB | Online | 4a |
| TGS-TB | Online | 4b |
| KvarQ | Stand-alone | SNP/Spol |
| TB-Profiler | Online | SNP |
| PhyResSE | Online | SNP |
Legend: PhyTB—phylogenetic tree visualization and sample positioning for Mycobacterium tuberculosis; CASTB—comprehensive analysis server for the complex M. tuberculosis; TGS-TB—Total Genotyping Solution for M. tuberculosis; KvarQ—tool that directly scans fastq files of bacterial genome sequences for known variants; TB-Profiler—Profiling tool for M. tuberculosis to detect drug resistance and lineage from Whole-Genome Sequencing Data; PhyResSE—a Web Tool Delineating M. tuberculosis Antibiotic Resistance and Lineage from Whole-Genome Sequencing Data; SNP—single nucleotide polymorphisms; Spol—database.
Bioinformatics tools in microbiological paradigm of gene and drug delivery.
| Bioinformatics Tools | Role | Refs. |
|---|---|---|
| AntiBP server | Anti-bacterial peptides prediction in a protein sequence | [ |
| BACTIBASE incorporated with MODELER | Prediction of 3D structure of user peptide by homology to known bacteriocins; such relational database is applicable for in silico designing of newer AMPs. | [ |
| Titanium | Capable of making 400–600 million bases/run with 400 base pairs’ read lengths | [ |
| 454 sequencing technology | Employed in studies related to 16S profile of microbiomes | [ |
| Illumina sequencing platform | Rapid outputs due to more accurate outcomes and read lengths | [ |
| MiSeq and HiSeq | Production of sequencing data of 100 GB in just 6 days | [ |
| ANNs | Mathematical modeling algorithms, which provide an effective and reliable method for in silico detection of newer AMPs | [ |
| Fuzzy logic modeling | Accurate evaluation of AMPs, enables antimicrobial research and development in quantitative structure–activity relationship (QSAR) | [ |
| Machine learning | Evaluation of high-dimensional gene expression datasets for selection of gene | [ |
| CHARMM | Program employed to facilitate activation of AMP interactions with lipid bilayers | [ |
| GROMACS | Freeware tool employed in MD studies to develop trajectories | [ |
Legend: AMPs—antimicrobial peptides; ANNs—artificial neural networks; AntiBP—anti-bacterial peptides; BACTIBASE—database dedicated to bacteriocins; CHARMM—biomolecular simulation program; GB—gigabyte; GROMACS—Groningen machine for chemical simulations; MD—medical doctor; MiSeq and HiSeq—sequencing technologies developed by Illumina; QSAR—quantitative structure-activity relationship.
Figure 4Use of bioinformatics-based tools, for alignment, validation, prediction modeling, BLAST tools, and protein sequence databases, in multiple steps of 3D homology model generation. Legend: BLAST—basic local alignment search tool; EBI—European Bioinformatics Institute; EMBL—European Molecular Biology laboratory; HMMER—biosequence analysis using profile hidden Markov models; MAFFT—Multiple Alignment using Fast Fourier Transform; MUSCLE—multiple sequence comparison by long expectation; NCBI—National Center for Biotechnology Information; Phyre 2—Protein Homology/AnalogY Recognition Engine; T-COFFEE—tree-based consistency objective function for alignment evaluation.
Bioinformatics-based databases and software for discovery and evaluation of miRNAs.
| Tool | Category | Significance | Refs. |
|---|---|---|---|
| DIANAmicroT-CDS | Predicted miRNA target evaluation | A web-based application that facilitates data interpretation, evaluates functions exhibited by ncRNAs in body processes and diseases, and scrutinizes expression regulation datasets and miRNA regulatory elements. | [ |
| MiRBase | Search for miRNAs | Introduces miRNA-based novel genes, provides comprehensive data on immature and mature miRNAs, provides immediate access to all the miRNA-related published data and resources | [ |
| MiRDB | Predicted miRNA target evaluation | Predicts and evaluates the role of target genes, offers comprehensive data, provides screening options to facilitate role-based anticipation of different miRNAs, facilitates alignment of sequences | [ |
| MiRscan | Search for miRNAs | A web-based application that recognizes and contrasts miRNA genes in the genetic sequence of greater than one organism | [ |
| microTar | Predicted miRNA target evaluation | A windows application that evaluates the effect of miRNA binding on the whole mRNA molecule | [ |
| MiReader | Search for miRNAs | A Linux and Windows application that recognizes the sequences of mature miRNAs without the requirement for reference genome sequences | [ |
| miRmap | Predicted miRNA target evaluation | A windows/web-based application that is a mixture of characteristics from PITA, TargetScan, PACMIT, and miRanda, offers user-friendly approaches for operating precomputed predictions and modeling of miRNA targets | [ |
| MiRanalyzer | Search for miRNAs | A windows/web-based application that is an sRNA toolbox, which recognizes or evaluates miRNA features based upon the outcome of the next-generation sequencing approaches | [ |
| MiRNAPath | miRNAs and metabolic pathway | Depicts the inter-association between gene, miRNA, and metabolic pathway inputs, thus used to study the miRNA-based metabolic pathways | [ |
| MiRmaid | Search for miRNAs | A web-based application that comprises all the characteristics of the miRNA database | [ |
| MiRecords | Assessment of confirmed and estimated targets | Search and review of targets for miRNAs | [ |
| Pharmaco-miR | MiRNAs, genes, and drugs | Significant in pharmacogenomics; integrates function status to the expression profile of miRNA via validated experimental proofs and computational techniques | [ |
| MiRwalk | Assessment of confirmed and estimated targets | Validation of newer MiRNA targets, comprehensive database | [ |
| ViTa | Viruses and miRNAs | Curates the target sites of miRNAs in chicken, mice, humans, or rats, with miRBase-derived known viral miRNA genes | [ |
| TMREC | MiRNA regulatory network | Evaluation of role of regulatory processes, controlled by interactions between transcription factors and miRNAs in diseased states | [ |
| MiR2GO | Mutations and miRNAs | Evaluation of impact of mutations and alterations in single nucleotide in central core miRNA sequence, coupling with target mRNA on their function; comparison and assessment of similarity percentage of miRNA pair functions at a cellular and molecular level | [ |
| DAVID | MiRNA regulatory network | Effective interpretation of changes in a large number of genes; evaluation of procedure of functional product generation and genetic expression | [ |
| PolymiRTS | Mutations and miRNAs | Evaluation of genetic polymorphisms in central core binding or target pairing site | [ |
| PhenomiR | Diseases and miRNAs | Relationship between miRNA and disease | [ |
| MiREnvironment | Environment and miRNAs | Interaction between environmental factors and miRNAs | [ |
| MiRcancer | Diseases and miRNAs | Relationship between miRNA and cancer | [ |
| CircuitsDB | Transcription factors and miRNAs | Interaction between miRNA and transcription factors to facilitate regulation of joint target gene | [ |
| MiR2disease | Diseases and miRNAs | Portrays validated and estimated gene targets upon miRNA changes in diseases | [ |
| PutmiR | Transcription factors and miRNAs | Interaction between miRNA and transcription factors to facilitate regulation of gene expression | [ |
| MiRgator | MiRNA-miRNA interaction | Deep sequencing miRNA database that facilitates massive data analysis, comprehensive evaluation of target genes, expression profiles of miRNA–miRNA interactions, proper representation of miRNAs genes chromosomal region | [ |
| DIANA-mirExTra | MiRNA-miRNA interaction | Functional evaluation of expression profiles and targets of miRNAs | [ |
Legend: Bioinformatics-based databases and software for discovery and evaluation of miRNAs. Significance of tools targeting the following categories: 1. Predicted miRNA target evaluation, 2. miRNA search, 3. miRNA metabolic pathway, 4. Confirmed and estimated targets assessment, 5. miRNAs, genes, and drugs, 6. Viruses and miRNAs, 7. miRNA regulatory network, 8. Mutations and miRNAs, 9. Disease and miRNAs, 10. Environment and miRNAs, 11. Transcription factors and miRNAs, 12. miRNA-miRNA interaction. DIANA—displacement analyzer; MiR—micro ribo-nucleic acid; MiRDB—miRNA database; MicroTAR—miRNA target prediction program; miRNAPath—miRNA and metabolic pathway; Pharmaco-miR—Pharmacogenomics and miRNA; ViTa—visual interpretations with three-dimensional annotations; TMREC—database for transcription factor and miRNA regulatory cascades in human diseases; miR2GO—comparative functional analysis for microRNAs; DAVID—the database for annotation, visualization, and integrated discovery; PolymiRTs—polymorphisms in microRNA target site; Phenotypic analysis of miRNA; PutmiR—putative transcription factor and micro RNA.
Figure 5Role of bioinformatics in primary, secondary, and tertiary analysis in clinical genome sequencing, employing tools in sequence machine, alignment, calling, annotation, filtering, interpretation, and preparation of clinical reports.
Softwares used for secondary analysis and variant annotation in clinical genomic sequencing.
| Name | Application | Refs. |
|---|---|---|
|
| ||
| GATK-UnifiedGenotyper | Consistently proceeds towards HalotypeCaller | [ |
| BWA-SW | Shorter reads alignment | [ |
| Freebayes | Based upon Bayesian haplotype, efficient in evaluating genomes with specific properties | [ |
| Novoalign | Alignment software for commercial use | [ |
| BCF tools and VCF tools | Distinct analytical characteristics, yet some common functions | [ |
| BWA-MEM | Longer reads alignment, with bp > 100 | [ |
| FastQC | Evaluation of sequencing quality | [ |
| Picard Tools | Provides QC evaluation for multiple secondary analysis stages | [ |
| GenomeStrip | Evaluation of read length, read depth, and read mate pairing | [ |
| verifybamID | Detection of sample contamination | [ |
| BreakDancer | SV detection | [ |
| Vcfeval | Comparison of two distinct variant call files, significant for validation | [ |
| Pindel | Detection of large deletions and limited insertions | [ |
| Bedtools | Manipulation of multiple bed files | [ |
| Manta | Germline and somatic evaluation | [ |
| VisCap | CNV calling for panel information | [ |
| XHMM | CNV calling for exome information | [ |
|
| ||
| VEP | Annotation and prediction of impact of variant on genes | [ |
| WGSA | Integration of results from SnpEff, VEP, and ANNOVAR, for annotation based upon gene modeling, integration of numerous epigenomics projects, integration of conservation scores, database associated with the disease, multiple prediction scores, and allele frequencies for SNV-centric resources | [ |
| DANN | Integration of multiple annotations into one metric, annotation of non-coding and coding variants | [ |
| CADD | Integration of multiple annotations into one metric, annotation of non-coding and coding variants | [ |
| ANNOVAR | Annotation based upon gene, region, and filter | [ |
| Oncotator | Aggregation of annotations from genomic, cancer variants, protein, and non-cancer variant annotations | [ |
| SnpEff | Annotation and prediction of the impact of variants on genes | [ |
Legend: Software used in the process of clinical genomic sequencing, targeting secondary analysis (detailing the alignment or mapping of the sequence reads on the reference genomic sequence) and variant annotation (deals with assigning data to DNA variants). GATK—genome analysis tool kit; BWA-SW—Burrows–Wheeler aligner software; BCF—binary variant call format; VCF—variant call format; FastQC—fast quality control software; XHMM—eXome-Hidden Markov Model; VEP—variant effect predictor; WGSA—whole-genome sequencing annotator; DANN—deleterious annotation of genetic variants using neural networks; CADD—computer-aided drug design; ANNOVAR—annotate variation; SnpEff—single nucleotide polymorphism annotator.
Bioinformatics-based tools and databases with the potential to combat the COVID-19 pandemic.
| Bioinformatics-Based Tools/Databases | Role | Refs. |
|---|---|---|
| SRA database | High-throughput sequencing data repository | [ |
| Fast QC | Quality control check on raw sequences in WGS | [ |
| AUGUSTUS | Gene prediction in eukaryotic genome sequencing | [ |
| MaSuRCA | Assembly of genome | [ |
| Prokka | Prokaryotic genome annotation in WGS | [ |
| Cutadapt | Recognizing and eliminating adaptor sequences, poly A tails, primer, and other unrequired sequences in WGS and metagenomics | [ |
| Ragout | Reference-assisted assembly tool in WGS | [ |
| Gene expression omnibus (GEO) database | Repository of data related to functional genomics | [ |
| dbSNP | Repository for single-base nucleotide substitutions in SNP discovery | [ |
| UCSC genome browser | Collection and analysis of model organism annotations in genomics | [ |
| PROVEAN | Estimations of effect of substitution of amino acid on biological role of protein in SNP discovery | [ |
| Kyoto encyclopaedia of genes and genome (KEGG) | Analysis of metabolic pathway | [ |
| SIFT | Estimation of amino acid substitution on functional role of proteins | [ |
| Conserved domain (CD) Search | Sequence alignment | [ |
| NCBI gene database | Genetic data repository | [ |
| PAUP | Evaluation of phylogenetic relationship between molecular sequences by using parsimony method | [ |
| UniProt | Stores functional data on proteins | [ |
| PopArt | Phylogenetic evaluation with visualization of haplotype diversity network | [ |
| Molecular evolutionary genetics analysis (MEGA) | Alignment of multiple sequences, generation and statistical evaluation of phylogenetic relationships | [ |
| Primer 3 | Primer design in high-throughput genomics | [ |
| PubChem | Chemical structure database for drug designing | [ |
| Basic local alignment search tool (BLAST) | Finding similarity between sequences | [ |
| AutoDock, Patch dock, Swiss dock, Zdock | Molecular docking tools | [ |
| Protein databank (PDB) | 3D-protein structure database | [ |
| Drug bank | Comprises of data on FDA approved drugs | [ |
| Modeler | Homology of 3D protein structures | [ |
| PyMol | Editing and visualization of molecular structure | [ |
| GROMACS | Tool for simulation of molecular dynamics | [ |
| NAMD | Parallel molecular dynamics code | [ |
| Open Babel | Chemical toolbox aiding in drug designing | [ |
| VMD | Built-in scripting and 3D graphics-based visualization program | [ |
Legend: Bioinformatics-based tools and databases with the potential to combat the COVID-19 pandemic. Techniques and databases associated with high-throughput sequencing, whole-genome sequencing, data repositories, drug designing, molecular docking, phylogenetic evaluation, metabolic pathway analysis, sequence alignment, 3D homology, simulation of molecular dynamics, etc. SRA—sequence read archive; Fast QC—fast quality control; Ragout—Reference-assisted genome ordering utility; Prokka—prokaryotic genome annotation; Cutadapt—cutting adaptor sequences; GEO—gene expression omnibus; dpSNP—single nucleotide polymorphism database; UCSC Genome Browser—The University of California Santa Cruz Genome Browser; PROVEAN—protein variant effect analyzer; KEGG—Kyoto encyclopedia of genes and genome; SIFT—sorting intolerant from tolerant; CD—conserved domain; NCBI—National Centre for Biotechnology Information; PAUP—phylogenetic analysis using parsimony; UniProt—Universal Protein resource; PopArt—population analysis with reticulate tress; MEGA—Molecular evolutionary genetics analysis; BLAST—basic local alignment search tool; PDB—protein databank; PyMol—Python using molecular graphics tool; GROMACS—Groningen machine for chemical simulations; NAMD—nanoscale molecular dynamics; VMD—visual molecular dynamics.