Literature DB >> 27848897

Bioinformatics and Drug Discovery.

Abstract

Bioinformatic analysis can not only accelerate drug target identification and drug candidate screening and refinement, but also facilitate characterization of side effects and predict drug resistance. High-throughput data such as genomic, epigenetic, genome architecture, cistromic, transcriptomic, proteomic, and ribosome profiling data have all made significant contribution to mechanismbased drug discovery and drug repurposing. Accumulation of protein and RNA structures, as well as development of homology modeling and protein structure simulation, coupled with large structure databases of small molecules and metabolites, paved the way for more realistic protein-ligand docking experiments and more informative virtual screening. I present the conceptual framework that drives the collection of these high-throughput data, summarize the utility and potential of mining these data in drug discovery, outline a few inherent limitations in data and software mining these data, point out news ways to refine analysis of these diverse types of data, and highlight commonly used software and databases relevant to drug discovery. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: Drug candidate; Drug screening; Drug target; Epigenetics; Genomics; Proteomics; Structure; Transcriptomics

Mesh：

Year: 2017 PMID： 27848897 PMCID： PMC5421137 DOI： 10.2174/1568026617666161116143440

Source DB: PubMed Journal: Curr Top Med Chem ISSN： 1568-0266 Impact factor: 3.295

INTRODUCTION

Drug discovery starts with diagnosis of a disease with well characterized symptoms that reduce the quality of life. Conventionally, a desirable drug is a chemical (which could be a simple chemical or a complicated protein) or a combination of chemicals that reduces the symptoms without causing severe side effects in the patient. Other properties of a desirable drug include affordability and profit for drug companies [1, 2], low chance of drug resistance [3] leading to dramatic decrease in the commercial value of the drug, low deleterious effect on the environment, e.g., no re-activation by bacterial species after human use [4]. Thus, a desirable drug is one that not only is efficacious with little side effects, but also has minimal long-term negative effect on the patient, the society and the environment. This review will focus on how bioinformatics can facilitate the discovery of such desirable drugs. Bioinformatics is an interdisciplinary science spanning genomics, transcriptomics, proteomics, population genetics and molecular phylogenetics. Bioinformaticians in drug discovery use high-throughput molecular data (Fig. ) in comparisons between symptom-carriers (patients, animal disease models, cancer cell lines, etc.) and normal controls. The key objectives of such comparisons are to 1) connect disease symptoms to genetic mutations, epigenetic modifications, and other environmental factors modulating gene expression, 2) identify drug targets that can either restore cellular function or eliminate malfunctioning cells, e.g., cancer cells, 3) predict or refine drug candidates that can act upon the drug target to achieve the designed therapeutic result and minimize side effects, and 4) assess the impact on environmental health and the potential of drug resistance.

GENOMIC SEQUENCE AND EXOME DATA IN DRUG DISCOVERY

One of the early contributions from bioinformatics to drug target discovery is the identification of sequence homology between simian sarcoma virus onc gene, v-sis, and a platelet-derived growth factor (PDGF) by simple string matching [5, 6]. This finding not only resulted in PDGF being used as a cancer drug target [7-9], but also led to two new lines of thinking. First, the viral transforming factor may work simply by changing transient expression of a growth factor to constitutive expression, suggesting growth factors as targets for anti-cancer drug development. Second, any factors modulating gene expression patterns can potentially contribute to cancer. This new conceptual framework of cancer biology contributed to the progress of mechanism-based anti-cancer drug development in the following years [10-12].

Genetic Diseases

Genomic and whole exome sequencing of patients with inherited disorders have recovered many somatic mutations which are associated with genetic diseases [13-15] and could be potential drug targets. The main difficulty concerning bioinformatic research on somatic mutations lies in the identification of disease-causing mutations among many observed genetic differences between matched patient and normal control [16]. Some diseases such as cancer exhibit high genetic heterogeneity [17], even among cells within a single tumor [18]. Many of these somatic mutations could be the consequence rather than the cause of cellular malfunction [16]. Effort must be made to distinguish three types of somatic mutations: 1) those that cause the disease and may serve as drug targets, 2) those that are closely linked to the disease gene and consequently are associated with the disease, and 3) those not associated with the disease but happen to occur in the patient group and not in the control group. The second type of mutations can be used for disease diagnosis, but not as drug target. The third type can be excluded in two ways. The first is by increasing sample size. If thousands of breast cancers all share the same somatic mutation, then the relevance of the mutation to breast cancer is high relative to a somatic mutation occurring in only one breast cancer [19]. The second is by collecting longitudinal data, recognizing that many diseases may have a genetic determinant long before the manifestation of the disease [20]. Suppose mutation X predisposes a person to Alzheimer’s disease (AD). If we compare one groups of AD patients with a non-AD control group, and if the control group has people who already have mutation X but have not developed AD yet, we may fail to recognize the importance of mutation X simply because it is not unique in the AD group. Only if we follow patients or relevant animal models over time can we come to the conclusion that whoever has mutation X eventually develop AD. It is much more difficult to distinguish between the first and second type of genetic differences between patient and control without an understanding of disease mechanism. A loss-of-function mutation can happen in the coding sequence (CDS), in the regulatory motif (e.g., response elements for ligand-activated nuclear receptors) or in an enhancer that could be up to 1 million bases away from the CDS. Bioinformaticians will typically take three approaches to check if the mutation has major impact on gene function: 1) whether the mutation replaces an amino acid by a very different one (e.g., non-polar uncharged glycine by a positively charged arginine) at a typically conserved site, 2) whether the mutation occurs in a highly conserved non-coding sequence (which is typically done by comparing genomes between human and non-human primates.), and 3) whether the mutation occurs in a known signal (e.g., regulatory motif, splice sites, transcription initiation and termination sites) for cellular machinery (e.g., ribosome, spliceosome, degradosome). The last approach is facilitated by the availability of extensively compiled and curated databases of known regulatory motifs [21-23]. Bioinformatic tools are often used to scan genomes for regulatory motifs. Such tools include position weight matrix (PWM) to find the genomic location of a known motif, Gibbs sampler for de novo motif discovery [24, 25] and support vector machines [26, 27] that can be used to extract differences between two groups of sequences (e.g., motif-present and motif-absent) and to use the resulting information to detect/scan motifs in genomes. The regulatory motifs could be response elements of nuclear receptors whose identification often leads to refinement of drug targets [28]. Such studies are facilitated by software such as DAMBE [29] which, when given an annotated genomic sequence, can extract coding sequences, rRNAs, tRNAs, introns, exons, 5’ and 3’ splice sites, upstream or downstream sequences of gene features, etc., with just a few mouse clicks. In addition to functions for PWM, Gibbs sampler, and minimum folding energy estimation, DAMBE can also compute protein isoelectric point and indices of protein translation efficiency. If a deleterious mutation is identified to be a loss-of-function mutation, then bioinformatics can help identify a paralogous gene or an alternative cellular pathway that can compensate for the mutation effect. Functional redundancy or partial redundancy is common in mammals, e.g., the function of paralogous genes USP4 and USP15 in mice are partially redundant [30]. Human adrenoleukodystrophy (ALD) is caused by partial deletion of the 10-exon gene ABCD1 resulting in the accumulation of very long chain fatty acids [31], which suggests not only diet limitation of very long chain fatty acids (VLCFA) in disease management, but also activation of alternative metabolic pathways for VLCFA through regulating another gene involved in fatty acid metabolism (ABCD2) and suppression of the activity of elongase involved in generating VLCFA [32]. Another example of activating alternative biological pathways or genes with partial functional redundancy involves sickle-cell anemia [33] caused by a single amino acid replacement in human beta-globin gene [34, 35]. Fetal hemoglobin gene (HbF) is a promising drug target because HbF reduces hemoglobin polymerization and clumping. A drug that could revive the silenced HbF would alleviate the symptoms of sickle-cell anemia and thalassemia in adults [36, 37]. Interestingly, some β-thalassemia patients have the correct version of the β-globin gene but the gene is not expressed because of mutations that occurred far away from it [38, 39]. Such long-range gene regulation will be addressed later on epigenetic modification and genome architecture.

Human Diseases Caused by Pathogens

Well annotated genomes are essential for target-based drug discovery against pathogens. The general bioinformatic approach involves three essential steps. The first is to identify essential genes in the pathogen as drug targets. A genome, especially a well-annotated one, can facilitate identification of such essential genes. For example, genes involved in nucleotide synthesis are well known, but are often missing in pathogenic species because they use salvage pathway instead of de novo synthesis pathway to procure nucleotides. In, Trypanosoma brucei, genes for de novo synthesis of ATP, GTP and TTP have gone missing, but the pathogen retains limited capacity for de novo synthesis of CTP [40], presumably because CTP generally has much lower centration than the other three nucleotides in the cell and cannot be reliably obtained through salvage. This points to CTP synthesis pathway as a drug target. Indeed, inhibiting CTP synthesis arrests the growth and replication of the pathogen [40]. Essential genes are often highly conserved and can be revealed by genomic comparisons between pathogens and their phylogenetic relatives. Sometimes they may also inferred from experimental data from model organisms such as Escherichia coli, Bacillus subtilis or Saccharomyces cerevisiae whose genes have been systematically and individually knocked out. Genes essential for the two bacterial species are likely to be essential in another bacterial species. The second step in developing drugs against pathogen is to check if such essential genes have homologues in the host. If they do, then inhibiting such essential genes in the pathogen may have adverse effect on the function of the host homologue, and we consequently need to perform sequence and structural comparisons between the pathogen and host homologues to identify unique part in the pathogen homologue to assist in the design of pathogen-specific drugs. Third, to minimize the chance of pathogen developing drug resistance, it is important for the drug to target at specific pathogen and not its phylogenetic relatives that are not pathogenic. For this reason, pathogenicity islands that are unique in pathogenic bacteria but not in their non-pathogenic relatives have increasingly become the preferred source of drug targets [41-43]. Bioinformatic analysis revealed a glutamate transport system that is present in the pathogen Clostridium perfringens but absent in mammals and birds [44]. Drugs developed against such a transport system will protect not only humans, but also domesticated mammals and fowls. In the human parasite Giardia intestinalis, the phosphoinositide-3 kinase (PI3K) signaling pathways are essential and could serve as a drug target. However, the PI3K pathway is also essential in many eukaryotes so it is important to identify what is unique in the PI3K homologues (Gipi3k1 and Gipi3k2) in G. intestinalis relative to mammals. Sequence comparisons revealed a unique insertion only in the parasite that can serve as a pathogen-specific drug target [45]. The same approach is used in targeting Pseudomonas aeruginosa [46]. Similarly, in developing anti-HIV-1 drugs, one can target genes involved in reverse transcription and protease digestion of its translated polyprotein because these processes not only are essential for viral survival and transmission, but also have no close homologues in human so their inhibition should have minimal side effect on human. Genomic analysis can also help in repurposing existing drugs against other pathogens. Galactofuranose (Galf) is an important constituent on the cell surface of a variety of bacterial pathogens [47, 48], and its synthesis requires UDP-galactopyranose mutase (UGM). Because Galf is absent in human [44], UGM has been used as a desirable drug target [49]. UGM coded by gene GLF was later found in several eukaryotic unicellular pathogens [50] as well as in nematodes [51]. Can we repurposing drugs developed against bacterial pathogens to fight eukaryotic unicellular pathogens [50]? Drug repurposing is cost-effective in drug development [52]. Genomic analysis shows that eukaryotic UGMs, while similar to each other, is quite different from prokaryotic UGMs, suggesting difficulty in drug repurposing from bacterial pathogen to eukaryotic pathogens. However, if one develops an effective drug against one eukaryotic UGM, the drug would have a very good chance of being repurposed for another eukaryotic pathogen. Genomics has also contributed to understanding drug actions. The venom protein PcFK1 of spider Psalmopoeus cambridgei was able to inhibit the growth of Plasmodium falciparum, but the mechanism was unknown. A sequence analysis revealed sequence homology between PcFK1 and the protein substrate of P. falciparum enzyme PfSUB1, leading to the hypothesis that PcFK1 is an antagonist of PfSUB1. Subsequent docking prediction and in vitro experiments confirm this hypothesis, pointing to PfSUB1 as a drug target [53]. Essential cellular processes are often functionally redundant, and understanding such functional redundancy is crucial in developing effective drugs against pathogens. In Mycobacterium tuberculosis, arabinofuranosyltransferases Mt-EmbA and Mt-EmbB contribute to the synthesis of cell wall mycolyl-arabinogalactan-peptidoglycan complex and are targeted by the drug ethambutol. Bioinformatic analyses revealed another arabinofuranosyltransferase, Mt-AftA, which is not inhibited by ethambutol and consequently would serve as a drug target [54]. A combination of drugs against all three arabinofuranosyltransferases will not only be more effective against the pathogen, but also reduce the chance of the pathogen developing drug resistance. Activating alternative biological pathways to satisfy the need of growth and survival has been known in bacterial species since the discovery of the lac operon and the glucose/lactose genetic switch [55], and a drug cannot be effective against a pathogen or a cancer cell unless we know how cells do things with alternative pathways that can be activated in response to the drug. Bioinformatics, with its inherent evolutionary perspective and its integration of molecular phylogenetics [56, 57], can often contribute to resolving controversies on molecular mechanisms. One such example involves the causal interpretation of CpG methylation causing CpG deficiency through subsequent C→T mutation mediated by spontaneous deamination. A controversy arose when both Mycoplasma genitalium and M. pneumoniae genomes were found to lack DNA CpG methyltransferase, yet M. genitalium genome exhibits much stronger CpG deficiency than M. pneumoniae genome, suggesting a conclusion that the difference in CpG deficiency between the two species is irrelevant to CpG methylation [58, 59]. Such a conclusion from genomic studies without an evolutionary perspective is often wrong. A comprehensive phylogenetic study using software DAMBE [29] showed that the ancestors of the two species should have multiple CpG methyltransferases because M. pulmonis and other relatives that branch off earlier than M. genitalium and M. pneumoniae have multiple CpG methyltransferases. After the loss of the CpG methyltransferases in the ancestor of M. genitalium and M. pneumoniae, both species began to gain CpG frequency, but M. pneumoniae evolved much faster (with a much longer branch) and regained CpG much faster than M. genitalium [60]. These findings restored the validity of causal relationship between CpG-specific DNA methylation and CpG deficiency, and illustrate the importance of having an evolutionary perspective in understanding biological processes. Because many such studies involve highly diverged bacterial or viral species, and because it is often difficult to obtain reliable multiple sequence alignment with highly divergent sequences, a new phylogenetic method based on pairwise sequence alignment has recently been developed [61] to facilitate phylogenomic studies involving highly diverged species.

EPIGENETICS, GENOME ARCHITECTURE AND CISTROMES IN DRUG DISCOVERY

Monozygotic twins carrying the same deleterious mutations such as the aforementioned ALD mutation often differ much in phenotype [62-65]. Such observations serve to highlight the relationship between epigenetic modifications and human diseases [66, 67]. Epigenetic modification includes two interrelated events, DNA methylation and histone modification. The maintenance of DNA methylation pattern in mammals is accomplished by the mammalian DNA methyltransferase 1 (DNMT1) whose CatD domain recognizes hemi-methylated CpG sites [68] so that DNA methylation pattern can be maintained from parental to daughter cells. In mammals, the methylated CpG recruits proteins with a methyl-CpG binding domain such as MBD1, MBD2, MBD3 and MeCP2 which then recruit histone deacetylase to remove the acetyl group and restore the positive charge of lysine residues (or histone N-terminal) in histone so that the negatively charged backbone of DNA can wrap tightly around the positively charged histone to silence the gene [69]. A silenced gene is in many ways equivalent to a loss-of-function mutation. Because some cancers appear to be caused by permanent silencing of genes involved in apoptosis pathway through DNA methylation and histone deacetylation [70-71], histone deacetylase has been used as a drug target with its inhibitors aiming to reactivate the apoptosis pathway [72]. The main problem in this approach is specificity because deacetylase inhibitors often have profound effect on the regulation of many other genes, which may explain why such drugs often do not enter clinical trials [73]. Methods for precise editing of the epigenome, involving components for DNA-binding and specific sequence recognition and modification are currently being developed [74]. The conventional view that DNA methylation and histone deacetylation mainly serve the purpose of permanent gene silencing has now been replaced by a more general conceptual framework of epigenetic modification and gene regulation (Fig. ). This conceptual shift demands integrated analysis of several types of high-throughput data: methylation pattern from bisulfite sequencing [75-76], DNA/protein binding data (cistrome) from ChIP-on-chip and ChIP-Seq [77], and genome architecture data from Hi-C [78] or its derivatives. DNA methylation alters DNA/protein binding which in turn alters genome architecture, i.e., two DNA segments far apart along the linear DNA can be brought together. Genome architecture data pave the way for studying spatial interaction between enhancers and promoters that can be up to one million bases apart. That gene expression depends on gene location on the genome is known since 1930 through studies of translocation [79], but empirical evidence accumulated much later to demonstrate that protein/DNA interactions resulted in nucleosome reconfiguration and interaction between enhancer and promoter [80-84]. This had spawned the formulation of the enhancer hub model of gene regulation [85, 86]. That is, the hub contains one or more enhancers and a gene with its promotor looping close to the hub will be expressed; deletion of such a hub will silence the expression of all genes that depends on their physical proximity to the hub to be expressed. From a bioinformatics point of view, the key question concerns what is the methylation signal on DNA and whether it is possible to modulate such a signal to alter epigenetic modifications. I have mentioned before that some β-thalassemia patients have the correct version of the β-globin gene but the gene is not expressed because of mutations that occurred far away from it [38, 39]. One may formulate two hypotheses. First, the enhancer that controls the expression of β-globin gene is mutated or deleted in the patient [38, 87]. Second, the enhancer that is brought close to the promotor of β-globin gene in normal genome architecture is relocated somewhere else due to abnormal epigenetic modifications and protein/DNA binding. Testing these hypotheses, which has become possible only with the availability of high-throughput data of genome architecture, methylation patterns and cistromes (the set of all protein/DNA binding sites), would shed light on how we can reposition the enhancer and the β-globin promotor so that the gene will be expressed [88-90]. Similarly, if the β-globin gene is silenced through DNA methylation, then the knowledge of how to modulate the signal to modify the methylation pattern would bring us closer to reactivating the silenced β-globin gene. Along the same line of reasoning, if the fetal globin genes are silenced by methylation, and if reactivation of these fetal globin genes can alleviate the problem caused by mutations in adult globin genes, then the knowledge of site-specific demethylation would be highly desirable [74]. Given that some CpG are methylated and some are not in mammalian genomes, one straightforward bioinformatic analysis would be to compare the flanking sites of these two groups of CpG dinucleotides to detect if flanking nucleotides contributes to methylation signals. Equivalent analyses of splice sites have revealed strong splice signal in flanking sequences of the 5’ and 3’ splice sites [91, 92], but such comparisons of flanking regions between methylated and unmethylated CpG, although done in a limited scale [93-95], have not yielded clear-cut results. Equally disappointing is that, while the concept of imprinting center (IC) has been known for many years [96], the physical basis of IC, either at the sequence level or structural level, remains elusive. Because monozygotic twins carrying the same genetic defect often differ much in manifestation of the associated disease [62-65], one naturally wishes to identify environmental contributions such as diet to epigenetic modification [97, 98]. As methylation needs S-adenosyl L-methionine (SAM) as the methyl donor, a deficiency in methionine most likely will, and indeed has been confirmed to, affect DNA methylation [99, 100]. Similarly, one would predict that any major perturbation on methionine, such as the deletion of methylthioadenosine phosphorylase (MTAP) crucial in the methionine salvage pathway, would also affect DNA methylation, gene regulation and cancer. Indeed, MTAP deletion is common in cancer cells [101]. Thus, all genes that affect methionine metabolism could be drug targets, and bioinformatics, with databases such as KEGG [102-104] can identify such genes effectively. If wrong DNA methylation pattern has formed, then an ideal drug (or an epigenome-repairing nano-machine) should be able to specifically identify the wrong pattern and correct it [74]. To develop such a drug or nano-machine, we first have to know the correct methylation pattern or ideally discover a set of molecules that encode such a correct pattern. Experimental results have accumulated in support of RNA’s role in epigenetic modification [105]. Given that DNA in the zygote undergoes demethylation to regain pluripotency [106], the epigenomic code is perhaps not on DNA. As proteins do not seem to be good in writing code in and because most core histones are replaced by protamine in male germ cells [107], the epigenetic codes, especially the ones that specify de novo DNA methylation, is unlikely to be found in proteins. However, such codes may exist in a set of highly conserved and structurally stable RNA molecules that might be present as early as the oocyte and sperm stage. Long noncoding RNAs (lncRNAs) can participate in epigenetic modification and regulate chromatin state. Characterization of lncRNAs bound to DNA and protein by the ChiRP-seq method [108, 109] revealed numerous sequence-specific binding sites on DNA, and the binding of lncRNA such as HOTAIR [108, 110] and Kcnq1ot1 [111] to such sites facilitates the recruitment of Polycom Repressive Complex 2 (PRC2) for mediating histone H3 lysine-27 trimethylation. Short RNAs can also modulate epigenetic changes. Mature sperm contain a number of small RNA species [97, 98, 112, 113], and these small RNAs do affect offspring phenotype [113, 114]. Furthermore, these small RNAs on offspring appear to contribute to epigenetic modification [97, 98, 113, 114]. The ENCODE pilot project shows that “the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts” [115]. Those non-coding transcripts may be a treasure trove for bioinformaticians to discover epigenome-modifying RNAs as drug targets. Epigenetic modification has an early origin. Many bacterial species modify their own DNA by methylation to protect against endogenic type II restriction endonucleases. Some Bacteriophage have their own methyltransferase that can modify their own genome against host restriction digestion [116], and human viral pathogens such as HIV-1 can induce profound alteration in host epigenetic pattern [117]. It is now known that some of the host defense mechanisms against pathogens are implemented through epigenetic modifications [118, 119] and many pathogens can modify host epigenetic patterns in favour of their survival and reproduction in the host [118]. What is the eventual fate of such pathogen-mediated epigenetically modified host cells remains unclear. Do they defeat the pathogen invasion, restore the normal epigenetic pattern and reassume normal function again or do they initiate certain apoptosis pathway and perish? What epigeneticists need is a model organism or a cell line in which the epigenetic pattern can be perturbed by extrinsic factors and then restored back to normal.

TRANSCRIPTOMICS AND DRUG DISCOVERY

Transcriptomic data have been increasingly used to identify differentially regulated genes, alternatively spliced isoforms and different transcription start and termination sites between patient and matched control [120-125]. Transcriptomic data analysis contributes to drug discovery mainly in two ways, one in phenotypic screening to identify and refine drug candidates, and the other in drug target identification.

Phenotypic Screening

There has been debates on what constitutes phenotypic screening, but recently proposed definitions [12, 126] converge in five points: 1) the screening involves a large number of compounds (drug candidates) ideally chosen systematically, 2) phenotypic changes in response to each compound is monitored, 3) a criterion of desirability is formulated and used in ranking the compounds, 4) those compounds generating desirable biological effects (phenotypes) are kept as drug candidates for further testing and validation, and 5) the mechanism of action is unknown and not the focus of the screening. Phenotypic screening can be quite effective in identifying active ingredients in traditional medicine, with one of the success stories being the discovery of artemisinin which is the most effective drug against the malaria parasite Plasmodium falciparum [127]. While the target-based approach is effective in developing drugs against diseases with relatively simple mechanisms such as single-gene genetic diseases, phenotypic screening is more effective in drug development against diseases with multiple causes such as multi-gene genetic diseases [128-129]. Cancer is composed of heterogeneous genetic background [17], with extremely high genetic diversity among cells within a single tumor [18]. For such complex diseases, phenotypic screening designed specifically for cancer has been used widely in cancer drug development [11]. The identification of an efficacious chemical by screening often shed lights on the molecular mechanism of action [130]. Phenotypic screening of FDA-approved drugs for drug repurposing is cost-effective because these drugs have already gone through the difficult path of regulatory authorities. This approach has resulted in promising inhibitors against Enteroviruses [131], anti-aging therapeutics [132], anti-cancer drugs [133], and allosteric Bcr-Abl inhibitors in the fight against chronic myeloid leukemia [134]. How does bioinformatics contribute to phenotypic screening? The answer lies in the fact that many modern phenotypic screening studies, especially in screening for anti-cancer drugs, typically define phenotype, either implicitly or explicitly, as a gene expression (transcripts or protein) profile [11] or a metabolomic profile [135-137]. From this perspective, there are two alternative approaches to treat cancer cells. The first is to restore the gene expression of cancer cells to that of normal cells. The second, when the first is not achievable, is to kill cancer cells by inducing apoptosis [11-12]. These two approaches imply two criteria in phenotypic screening for anti-cancer drugs: 1) increased similarity in gene expression between cancer cells and normal cells, and 2) increased similarity in gene expression between cancer cells and apoptotic cells. Bioinformatics can contribute to gene expression and drug discovery by formulating an objective and rational index of drug desirability (Idd) in phenotypic screening studies with gene expression profiles as phenotypes. Such an Idd would complement therapeutic indices [138, 139] based on various pharmacokinetic models for evaluating drug effects and safety under various drug concentrations [140-142]. The lack of an explicit Idd may have contributed to the low rate of successful drugs discovered through phenotypic screen [126]. For this reason, I will take a rare step in a review article to initiate the effort of developing an index of drug desirability integrating both symptom reduction and side effect. Designate gene expression profile of a “patient” (which could be an animal disease model or cancer cell line) as Gp, that of a normal control as Gn, and that of a patient after the use of a candidate drug as Gd. It is now easy to compute a variety of pairwise distances [143] between Gn and Gp, between Gd and Gp and between Gn and Gd (designated Dnp, Ddp, and Dnd, respectively, Fig. ). Dnp is a measure of severity of the symptoms, and (Dnp – Dnd) a measure of symptom reduction by the application of the candidate drug, equivalent to drug efficacy (Emax) in pharmacodynamics models [141-142]. Side effect could be measured by the difference between (Dnd + Ddp) and Dnp, i.e., (Dnd + Ddp - Dnp), which implies that the side effect is greater for Drug B in Fig. () than for Drug A in Fig. (). With these definitions, we can formulate an index of drug desirability (Idd) as: The application of Eq. (1) is illustrated in Fig. () where Drug A in Fig. (), with Idd = 2.71, is more desirable than Drug B, with Idd = 1.03, in Fig. (). One potential problem with Eq. (1) is that the denominator would be zero when Gd = Gn or Gd = Gp, although this is not expected to happen in practice. However, one may add a small pseudo number (c) to the equation so that; The only requirement for c is that it should be small relative to (Dnp – Dnd) so that its effect on Idd is small. One may set c = 0.01*(Dnp – Dnd). The application of Idd is not limited to gene expression or metabolomics profiles, but can be applied to any laboratory data in which the patient before the drug use, the normal control and the patient after the drug use can be represented by a vector of values such as blood ferritin and transferrin concentrations, calcium and iron levels, etc. It can be used not only to evaluate desirability of different drugs, but also to evaluate drugs applied at different concentrations or administered through different routes (e.g. oral, subcutaneous injection, etc.). Idd for the second criterion, i.e., how much can a drug induce apoptosis in cancer cells, can be obtained by simply replacing Gn by gene expression of apoptotic cells. Effective application of the two criteria depends on accurate characterization of gene expression. Development of bioinformatic methods and software has followed the development of high-throughput technologies, such as microarray in the past [143, 144] and next-generation sequencing now [145-153]. Unfortunately, the fundamental problem encountered in allocating sequence reads to paralogous genes, which has previously plagued microarray data analysis, remains unsolved, with nearly all software offering two simple but unsatisfactory options, i.e., either ignoring sequence reads matching multiple genes or allocating such sequence reads equally among paralogous genes. Because a large number of genes are duplicated in multicellular eukaryotes, the lack of proper allocation of sequence reads to paralogous genes implies that the expression of a large number of genes cannot be properly characterized. The method implemented in the software Tuxedo [154], which I outline in the section on ribosomal profiling, may serve as a good starting point.

Drug Target Identification

Transcriptomic data obtained from RNA-Seq can be used to identify alternative splicing isoforms and differential gene expression and regulation between patient and control. Alteration of spatial and temporal distributions of different splicing isoforms often leads to diseases [155] such as Alzheimer’s disease (AD) associated with abnormal splicing of the amyloid precursor protein (APP). Proteolytic processing of APP generates Amyloid β which contributes to the formation of the extracellular neuritic plaques commonly believed to be the causal factor of AD [156]. APP is a multi-exon gene with exon 7 (E7) encoding a Kunitz protease inhibitor. At least eight isoforms are formed by alternative splicing of APP pre-mRNA, with three isoforms expressed in mammalian brain (one lacking E7 and two others containing E7). The E7-lacking isoform (APP695) is normally prevalent in neurons while the E7-containing isoforms (APP770 and APP751) are expressed mainly in astrocytes and microglial cells [157]. The secreted E7-containing APPs form stable, non-covalent, inhibitory complexes with trypsin, whereas the secreted E7-lacking isoform does not [158]. Increased E7-containing isoforms is associated with AD symptoms in both human and mouse [159]. Expression of U2AF is down-regulated during cellular differentiation of neural tissues [160], which is likely responsible for the E7-skipping in APP695. However, a recent study [156] suggested that E7-skipping is directly linked to the RBFox1 protein with its binding motif (U)GCAUG found both upstream of E7 and within E7. RBFox1 [161] is a neuron- and muscle-specific splicing factor that induces exon skipping of several genes including APP [156]. Thus, both U2AF and RBFox1 could be potential drug targets for AD, i.e., a drug candidate that downregulates U2AF or upregulates RBFox1 specifically in neural tissues could reduce the risk of developing AD. These transcriptomic studies have significantly contributed to our understanding of pathology of not only AD, but many other human diseases associated with alternative splicing. Abnormal changes in gene expression or regulation is often associated with diseases. The main difficulty is in the interpretation of cause and effects because a gene may have its disease-causing expression occurring at time t1, causing differential expression of many other genes at time t2, where t2 may be years away from t1. Thus, comparing gene expression patterns between a disease group and a control group almost always lead to many false positives [162]. Longitudinal data collected over time will help narrow down to the real culprit of the disease, which is illustrated well by a bioinformatic meta-analysis of 84 kidney transplant biopsies collected at different stage of kidney injury progression [163]. Unfortunately, while it is easy to take a piece of wood from a tree at regular time intervals, it is much more difficult to take a piece of liver out of the patient weekly or monthly. Transcriptomic data analysis has revealed that most of the human genome is transcribed [115]. Because RNA interference can modulate many cellular processes and that RNA has been recognized as a new type of drugs [164-166], mining transcriptomic data may uncover many RNA molecules either as drugs or as drug targets. Among the numerous unannotated transcripts in human, which are functionally important in human biology? From an evolutionary point of view, a functionally important sequence is one that is expected to be conserved among related species, such as within apes or primates. One can identify functionally important RNAs among millions of different transcripts by checking sequence conservation with one of numerous bioinformatics tools. Any functionally important RNA species may be a potential drug target.

PROTEOMIC DATA AND DRUG DISCOVERY

Proteins are the workhorses in living cells and their abnormal abundance is often associated with diseases. A transcribed gene may be differentially translated [167, 168] or not translated [169], and different proteins have different degradation rate, so transcriptomic data is often not a good predictor of protein abundance. For this reason, characterizing and comparing proteomes between patient and control is often more effective in identifying drug targets than genomic or transcriptomic data. Proteomic data have been obtained from nearly all model organisms and deposited in public databases such as PaxDB [170]. Such data have greatly facilitated the development [168] and application of indices predicting translation efficiency [171-173]. Bioinformatic tools used for proteomic data analysis is similar to those in transcriptomic data, i.e., using proteomic data for phenotypic screening and for drug target discovery. Most proteomic data are used in comparisons either between treatment and control animals [174-176] or between patients and matched normal control [177]. For example, caffeine-treated rats differ in protein expression from control rats [175]. Numerous such relationships between drugs and protein targets have been reported and stored in databases [178-180] to facilitate retrieval of possible interactions of a query drug with proteins. Proteomic data, without following a cohort over time, suffer from the same problem as genomic and transcriptomic data in the causal interpretation as I have mentioned before. In particular, from differential expression observed in many proteins, it is difficult to identify which is truly disease-causing. Different proteins change their abundance at different cell cycle phases. Without taking temporal and spatial heterogeneity of cells into consideration, comparison of protein profiles (or transcriptomic profiles) between matched patient/normal pairs will continue to pump out false positives that have little relevance to drug discovery. In animal models, it is possible to sample cells over different periods [176]. Performing single-cell characterization of transcriptomes and proteomes [181-183] over time to reconstruct a cell cycle profile of gene expression (i.e., reorder cell expression profiles characterized at phases 3, 1, 2, 2, 4 to phases 1, 2, 2, 3, 4) should yield much more informative results.

RIBOSOME PROFILING AND DRUG DISCOVERY

Protein abundance data have limitations because 1) low-concentration proteins, short peptides, or transient proteins often cannot be detected, 2) membrane proteins, which often serve as essential components in signal transduction, are difficult to isolate, separate and purify. Transcriptomic data once spawned the hope that proteomic data can be predicted from transcriptomic data, but differential translation efficiencies among mRNA [168, 184] and degradation efficiencies among proteins distort the relationship between mRNA abundance and protein abundance. However, ribosome profiling data, coupled with transcriptomic data, are expected to generate good predictions of protein production rate. Transcriptomic and ribosome profiling data provide information on mRNA abundance and translation efficiency, respectively. If genes A and B have mRNA abundance values NA and NB, respectively, from transcriptomic data, and their translation efficiency is RA and RB, respectively, from ribosome profiling data, then their relative protein production rate is NA*RA and NB*RB, respectively. Differences between such predicted protein abundance and observed protein abundance can be used to measure protein degradation rate. Such prediction should be facilitated by obtaining transcriptomic and proteomic data in the same experiment [185], ideally from a single cell [181-183]. Ribosome profiling data, traditionally from microarray [186-187], is now almost exclusively from deep sequencing of ribosome-protected fragments (RPF, ~30 nucleotides) of mRNA [188-190]. The two approaches, however, exhibit high concordance with data from the yeast [167]. The sequenced RPFs can be mapped to protein-coding genes to obtain the location of the ribosome on mRNA. Ribosomal density may be taken as a proxy of translation efficiency [167]. However, for an mRNA with poor codon usage, ribosomes may move slowly and become densely packed along the mRNA. For this reason, elongation efficiency needs to be controlled for, e.g., by regressing ribosome density on the index of translate elongation [168]. Ribosome profiling data are useful in characterizing regulatory motifs such as poly(A) tract that modulate translation efficiency [167], e.g., short poly(A) at 5’ UTR may facilitate the recruitment of translation initiation factors and enhance translation, but long poly(A) may bind to poly(A)-binding proteins and inhibit translation. Such regulatory motifs can serve easily identifiable drug targets that can be easily manipulated. There are four major models of translation initiation cross-classified by two variables. The first is whether the translation machinery starts scanning for the start codon from the 5’ end of mRNA [191, 192] or from internal ribosome entry sites [169, 193-196]. The second is whether the small ribosomal subunit does the scanning for the start codon or a fully formed ribosome can also perform the scan. While there is little controversy now on the occurrence of internal ribosome entry, only recent ribosome profiling data have offered strong empirical support for fully formed ribosomes along 5’ UTR of mRNAs [197], suggesting that fully formed ribosomes may also scan for the start codon. In contrast to eukaryotic internal ribosomal entry sites (IRESs) whose IRES activity decreases with the stability of secondary structure [198], many viral IRESs have strong secondary structure. Cricket paralysis virus (CrPV) has an IRES located at the intercistronic region that is capable of directly interacting with the ribosome via its complex secondary structure without any translation initiation factors [199, 202]. The hepatitis C virus (HCV) has an IRES that can mimic the translation initiation complex so that it does not need initiation factors essential for cap-dependent translation [203, 204]. The IRES mechanism of translation initiation allows viruses to carry on their translation while the host cap-dependent translation has been shut down, and viral IRESs, especially those with relatively rigid secondary and tertiary structure such as in HCV, have consequently been recognized as promising drug targets [205]. Translation regulation represents an important cellular mechanism capable of responding to extracellular environment. In the yeast Saccharomyces cerevisiae, a dozen or so genes are transcribed but not normally translated; they are translated when the surface nutrients have been depleted and their products enable yeast cells to burrow down into the culture medium to extract nutrients for growth [169]. Ribosome profiling data can reveal the translation status of these translation regulated messages, and consequently help us understand how organisms use translation regulation in response to environmental changes. Ribosome profiling is the ultimate tools to discover new protein-coding genes many of which could be drug targets. That many protein-coding genes may remain unannotated is highlighted by the finding that even the extensively studied phage lambda may have unannotated protein-coding genes [206]. In human and mouse, ribosomes are frequently found on transcripts not annotated as coding sequences, with the consequent production of polypeptides [207]. Given that the majority of the human genome are in fact transcribed [115], many new protein-coding genes may be discovered by bioinformatic analysis of ribosome profiling data [208]. One fundamental problem in analyzing ribosome profiling data is with assigning RPFs to paralogous genes when an RPF matches multiple genes equally well. This problem is shared with transcriptomic and proteomic data where protein identification is typically done with peptide mass fingerprinting and a peptide fragment can match multiple proteins equally well [56, pp. 293-308]. Most programs offer two unsatisfactory options: 1) ignore sequence reads that match to multiple paralogous genes, and 2) allocate such reads equally among the matched paralogous genes. A recent program (MMR: Multiple Mapper Resolution) available at https://github.com/ratschlab/mmr intends to solve this problem but offers no methodological details. Because of the large number of duplicated genes in multicellular eukaryotes, inappropriate assignment of RPFs to paralogous genes will render all downstream analysis untrustworthy. I will outline the approach for assigning RPFs to three or more paralogous genes implemented in the computer program Tuxedo [154]. When a gene family has only two members, the assignment is relatively simple and will not be discussed here. A phylogenetic tree is needed for proper allocation of sequence reads with three or more paralogous genes in a gene family. I illustrate the allocation principle with a gene family with three paralogous genes A, B, and C idealized into three segments in Fig. (). The three genes shared one identical middle segment with 23 matched reads (that necessarily match equally well to all three paralogues). Genes B and C share an identical first segment to which 20 reads matched. Gene A has its first segment different from that of B and C and got four matched reads. The three genes also have a diverged third segment where paralogous gene A matched 3 reads, B matched 6 and C matched 12. Our task is then to allocate the 23 reads shared by all three and 20 reads shared by B and C to the three paralogues. TUXEDO uses a simple counting approach by applying the following: Thus, we allocate the 23 equally matched reads to paralogous genes A, B and C according to PA, PB and PC, respectively. For the 20 reads that matched B and C equally well, we allocate 20*6/(6+12) to B and 20*12/(6+12) to C. This gives the estimated number of matches to each gene as:

STRUCTURAL BIOLOGY AND DRUG DISCOVERY

An ideal bioinformatic platform for drug discovery based on structural biology should allow one to 1) predict 3-D structure of a protein or RNA based on the cellular environment where it is translated or transcribed, 2) “BLAST” a known protein/RNA structure against databases of protein/RNA structures to retrieve all protein/RNA with similar structures to facilitate structure-function interpretations and assessment of functional redundancy of the query protein in the cell, and to understand structural convergence, e.g., nonhomologous proteins or RNAs with similar structures [209-210], 3) identify and retrieve all potential binding partners of a given query structure to facilitate the assessment of the query’s potential as a drug target or drug candidate, i.e., its efficiency and side effects as a consequence of physical interactions with other cellular components, 4) automatically identify proteins and RNA that can form a complex and assemble such complexes (e.g., ribosome and spliceosome) through structural modeling and simulation, 5) predict the function of protein/RNA with known structure, either alone or as a component in a complex, and 6) suggest new structures that can physically interact with the query to activate/deactivate the query protein/RNA function in the cell. Almost all of these can be done, although not perfectly, by databases and software tools compiled at http://www.click2drug.org/ . When one has a protein of interest, the first is to check if its structure already exists in PDB [211-212]. If not, then one can use tools such as homology modeling to infer its structure based on one or more close homologues with known structure. Such tools include SWISS-MODEL [213] and TASSER [214] and its derivatives. Once the structure is refined, one can use UCSF Chimera [215] or PyMOL (The PyMOL Molecular Graphics System, Version 1.8 Schrödinger, LLC) to visualize the structure and use automated screening software such as SwissSimilarity [216] to identify potential drug candidate that can interact with the protein of interest. Such screening approach is greatly enhanced by metabolic and ligand databases such ChEMBL [217] and SuperSite [218]. Such analyses not only shed lights on identifying drug-target interactions, but also facilitate the identification of side effects of individual drugs, e.g., a drug that can bind to many biologically important enzymes in human is almost surely to have many side effects, but a drug that does the same in a pathogen would be quite desirable. One may also use docking software such as SwissDock [219] to study physical interactions between protein and small molecules, or use SwissBioisostere [220] to design and refine ligands. Such structural studies improve the design of drugs against HIV-1 protease [221]. The protease is a homodimer each with 99 amino acids in each monomer, and an inhibitor typically needs to squeeze its way between the two monomers to disrupt the protease function [222-224]. Given one well documented protein-ligand interaction, it is natural to infer that other proteins with similar sequence or structure may also bind to the ligand. Such similarity-based approach [52, 225] is the conceptual foundation for the software SwissTargetPrediction [226]. It is important to keep in mind that a structure determined by X-ray crystallography or by NMR represents only a snapshot of structural dynamics, and that protein structure can change in response to different cellular environment. The software CHARMM [227] and its derivatives facilitate the characterization of such dynamic interactions of proteins with their binding partners. Such studies are facilitated by general databases of drug-target interactions [180] and special databases documenting protein interactions in cancer cells [179] or in membranes involving GPCR-ligand associations [178] or organism-specific databases such as that for Mycobacterium tuberculosis [228].

BIOINFORMATICS AND DRUG RESISTANCE

Bacterial resistance to penicillin became known soon after its discovery in 1928 and its regular medical applications in 1940 [229, 230]. Such resistance can also develop quickly in eukaryotic pathogens, e.g., in malaria parasite Plasmodium falciparum against the most effective anti-malaria drug artemisinin, soon after the large-scale application of artemisinin in Asian countries [231, 233]. Drug resistance often renders a costly developed drug useless, contributing to the high cost of drug development [1-2]. The rapid development of drug resistance in HIV-1 [234, 235] highlights the importance of understanding drug resistance. Modern drug development against pathogens demands high specificity against the pathogen. If a drug is toxic to a specific bacterial pathogen, then drug-mediated selection will operate only in this particular bacterial pathogen population to favour individuals with drug resistance. However, if the drug is also toxic to 100 other non-pathogenic bacterial species, drug resistance may develop in all these species, often with subsequent transmission of drug resistance from a non-pathogenic species to a pathogenic one. Pathogenicity islands [41-43], i.e., distinct DNA segment in a large number of bacterial pathogens but not in their avirulent counterparts, serve as specific drug targets against pathogens, and bioinformaticians have created databases [236, 237] to facilitate the identification pathogenicity islands as drug targets. Modern bioinformatic analysis and innovative experiments have shed light on how fast microbial pathogens can evolve drug resistance. In one experiment [238], error-prone PCR was used to introduce random mutations in Streptococcus pneumoniae genes. These mutated amplicons were then used to transform S. pneumoniae with some resulting colonies exhibiting resistance against antibiotic fusidic acid. DNA sequence analysis revealed a single mutation in the fusA gene accounting for the drug resistance. Many cases have been documented in HIV-1 protease in which a single mutation can significantly change the susceptibility of the protease to its inhibitors [239, 240]. Such studies allow us to estimate the proportion of mutations that confer drug resistance among all random mutations. How rapid can bacterial and eukaryotic pathogens respond to drug resistance depends mainly on mutation rate, parasite population size and genetic diversity. Lack of genetic diversity implies that drug resistance need to arise de novo, in which case mutation rate becomes a major limiting factor in pathogens evolving drug resistance. Spontaneous mutation rate traditionally was measured in mutation accumulation experiments which are tedious and, for practical reasons, have been done mainly on viruses and a few rapidly replicating bacterial species [241, 242]. Dating the origin of pseudogenes and then comparing their divergence against their functional counterparts [243-246] allow for an estimation of spontaneous mutation rate (approximated by the neutral substitution rate) and mutation spectrum. High mutation rate and large population size increase the chance of parasites developing drug resistance. For many years, it has been assumed that point mutations occur independent of each other, each being a separate mutation event. For this reason, two serine codons in the standard genetic code (UCU and AGU) are extremely unlikely to mutation into each other because they have to go through two nonsynonymous substitutions which are typically subject to strong purifying selection. However, bioinformatic research and modeling effort have revealed that multiple mutation events can happen in “clusters and showers” in a single generation not only in viruses and bacterial species [247-249], but also in eukaryotes [250, 251]. Genomics sequence analysis and phylogenetics have been frequently used to identify conserved sequence or structure that can guide the development of vaccine [252-254] and ligand designed as inhibitors against bacterial or viral pathogens because sequence and structural similarities often imply similarity in ligand binding. However, strongly conserved sites in a gene does not imply that mutations at these sites will necessarily cripple the gene function. Many amino acid sites in HIV-1 protease are invariant among subtypes of the M group suggesting that they are functionally important. However, drugs designed to inhibit HIV-1 protease quickly leads to mutations at these highly conserved sites, resulting in reduced susceptibility to protease inhibitors [239-240]. This adaptation to the drug-induced selection works in similar way as the development of antibiotics. In the absence of antibiotics, plasmids (regardless of whether they carrying antibiotic-resistant genes or not) in a bacterial species such as E. coli constitute a replication burden. They are consequently selected against and quickly lost in E. coli cultures. However, in the presence of antibiotics, the cost of a replication burden is more than offset by the benefit of antibiotic resistance, and the plasmids carrying the antibiotic-resistant genes will spread. If we know the population size of the pathogen under the drug effect, the random mutation rate of the pathogen, the proportion of drug-resistant mutations among all random mutations, then it is possible to estimate the probability that a drug-resistant mutation will occur in the first generation after drug application, the probability of no such mutation until the second generation, or in general the probability of no such mutation until the Nth generation. One may also estimate the average number of generations for the first drug resistance mutation to occur. Such estimation is within the domain of population genetics.

BIOINFORMATIC SOFTWARE AND DATABASES

An extensive compilation of software, databases and web services directly related to drug discovery can be found at http://click2drug.org/ maintained by Swiss Institute of Bioinformatics. These are roughly grouped into 1) databases, 2) chemical structure representations, 3) molecular modeling and simulation, 4) homology modeling to infer the structure of a protein guided by a homologue of known structure, 5) binding site prediction, 6) docking, 7) screening for drug candidates, 8) drug target prediction, 9) ligand design, 10) binding free energy estimation, 11) QSAR, 12) ADME Toxicity. Many software packages are powerful and free, and supported by well-known institutions. These include databases such as ChEMBL [217] and SwissSidechain [255], software tools such as UCSF Chimera [215] which is not only a 3D visualization tool but also a platform for software developers interested in structural biology, SwissSimilarity for virtual screening [216], SwissBioisostere for ligand design [220], SwissTargetPrediction [226], SwissSideChain to facilitate experiments that expand the protein repertoire by introducing non-natural amino acids, and SwissDock [219] for docking drug candidates (small molecules) on proteins. Although some software are commercial, e.g., CHARMM [227] and PyMOL (Schrödinger), they typically have free versions for students and teachers.

Conclusion

Bioinformatics is a data-driven branch of science, with many of the algorithms and databases developed or adapted in response to new types of data. For this reason, bioinformatic tools often lag behind high-throughput data acquisition technologies. However, a large number of molecular biologists, computer scientists and mathematicians have dedicated their extensive effort to develop new and powerful software packages and databases to extend our views of nature, just as microscopes and telescopes extend our views to see patterns that we have never seen before. Taking a close look at this effort by pharmaceutical scientists may prove to be highly beneficial not only to pharmaceutical industry, but also to bioinformatics research community as well.

243 in total

Review 1. Pathophysiology of sickle cell disease: role of cellular and genetic modifiers.

Authors: M H Steinberg; G P Rodgers
Journal: Semin Hematol Date: 2001-10 Impact factor: 3.851

2. Gene expression analyzed by high-resolution state array analysis and quantitative proteomics: response of yeast to mating pheromone.

Authors: Vivian L MacKay; Xiaohong Li; Mark R Flory; Eileen Turcott; G Lynn Law; Kyle A Serikawa; X L Xu; Hookeun Lee; David R Goodlett; Ruedi Aebersold; Lue Ping Zhao; David R Morris
Journal: Mol Cell Proteomics Date: 2004-02-06 Impact factor: 5.911

3. A deletion of the human beta-globin locus activation region causes a major alteration in chromatin structure and replication across the entire beta-globin locus.

Authors: W C Forrester; E Epner; M C Driscoll; T Enver; M Brice; T Papayannopoulou; M Groudine
Journal: Genes Dev Date: 1990-10 Impact factor: 11.361

Review 4. Pathogenicity islands: a molecular toolbox for bacterial virulence.

Authors: Ohad Gal-Mor; B Brett Finlay
Journal: Cell Microbiol Date: 2006-08-24 Impact factor: 3.715

Review 5. Quantitative pharmacology or pharmacokinetic pharmacodynamic integration should be a vital component in integrative pharmacology.

Authors: J Gabrielsson; A R Green
Journal: J Pharmacol Exp Ther Date: 2009-09-24 Impact factor: 4.030

Review 6. Understanding the dose-effect relationship: clinical application of pharmacokinetic-pharmacodynamic models.

Authors: N H Holford; L B Sheiner
Journal: Clin Pharmacokinet Date: 1981 Nov-Dec Impact factor: 6.447

7. Pseudogenes as a paradigm of neutral evolution.

Authors: W H Li; T Gojobori; M Nei
Journal: Nature Date: 1981-07-16 Impact factor: 49.962

8. Inhibition of HIV-1 protease: the rigidity perspective.

Authors: J W Heal; J E Jimenez-Roldan; S A Wells; R B Freedman; R A Römer
Journal: Bioinformatics Date: 2012-02-01 Impact factor: 6.937

9. Monozygotic twins exhibit numerous epigenetic differences: clues to twin discordance?

Authors: Arturas Petronis; Irving I Gottesman; Peixiang Kan; James L Kennedy; Vincenzo S Basile; Andrew D Paterson; Violeta Popendikyte
Journal: Schizophr Bull Date: 2003 Impact factor: 9.306

10. Evolution of the highly networked deubiquitinating enzymes USP4, USP15, and USP11.

Authors: Caitlyn Vlasschaert; Xuhua Xia; Josée Coulombe; Douglas A Gray
Journal: BMC Evol Biol Date: 2015-10-26 Impact factor: 3.260

24 in total

1. Draft genome and secondary metabolite biosynthetic gene clusters of Streptomyces sp. strain 196.

Authors: Prateek Kumar; Anjali Chauhan; Munendra Kumar; Bijoy K Kuanr; Renu Solanki; Monisha Khanna Kapur
Journal: Mol Biol Rep Date: 2020-09-04 Impact factor: 2.316

2. Potential Therapeutic Candidates against Chlamydia pneumonia Discovered and Developed In Silico Using Core Proteomics and Molecular Docking and Simulation-Based Approaches.

Authors: Roqayah H Kadi; Khadijah A Altammar; Mohamed M Hassan; Abdullah F Shater; Fayez M Saleh; Hattan Gattan; Bassam M Al-Ahmadi; Qwait AlGabbani; Zuhair M Mohammedsaleh
Journal: Int J Environ Res Public Health Date: 2022-06-15 Impact factor: 4.614

3. PDAUG: a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling.

Authors: Jayadev Joshi; Daniel Blankenberg
Journal: BMC Bioinformatics Date: 2022-05-28 Impact factor: 3.307

Review 4. Virtual screening of substances used in the treatment of SARS-CoV-2 infection and analysis of compounds with known action on structurally similar proteins from other viruses.

Authors: Paul Andrei Negru; Denisa Claudia Miculas; Tapan Behl; Alexa Florina Bungau; Ruxandra-Cristina Marin; Simona Gabriela Bungau
Journal: Biomed Pharmacother Date: 2022-07-18 Impact factor: 7.419

5. Exploration of inhibitory action of Azo imidazole derivatives against COVID-19 main protease (M^pro): A computational study.

Authors: Abhijit Chhetri; Sailesh Chettri; Pranesh Rai; Biswajit Sinha; Dhiraj Brahman
Journal: J Mol Struct Date: 2020-08-31 Impact factor: 3.196

Review 6. One Century of Study: What We Learned about Paracoccidioides and How This Pathogen Contributed to Advances in Antifungal Therapy.

Authors: Erika Seki Kioshima; Patrícia de Souza Bonfim de Mendonça; Marcus de Melo Teixeira; Isis Regina Grenier Capoci; André Amaral; Franciele Abigail Vilugron Rodrigues-Vendramini; Bruna Lauton Simões; Ana Karina Rodrigues Abadio; Larissa Fernandes Matos; Maria Sueli Soares Felipe
Journal: J Fungi (Basel) Date: 2021-02-02

Review 7. Drug targets for COVID-19 therapeutics: Ongoing global efforts.

Authors: Ambrish Saxena
Journal: J Biosci Date: 2020 Impact factor: 1.826

8. Chemical Composition and Antifungal In Vitro and In Silico, Antioxidant, and Anticholinesterase Activities of Extracts and Constituents of Ouratea fieldingiana (DC.) Baill.

Authors: José Eranildo Teles do Nascimento; Ana Livya Moreira Rodrigues; Daniele Silva de Lisboa; Hortência Ribeiro Liberato; Maria José Cajazeiras Falcão; Cecília Rocha da Silva; Hélio Vitoriano Nobre Júnior; Raimundo Braz Filho; Valdir Ferreira de Paula Junior; Daniela Ribeiro Alves; Selene Maia de Morais
Journal: Evid Based Complement Alternat Med Date: 2018-11-07 Impact factor: 2.629

Review 9. Highlights on the Application of Genomics and Bioinformatics in the Fight Against Infectious Diseases: Challenges and Opportunities in Africa.

Authors: Saikou Y Bah; Collins Misita Morang'a; Jonas A Kengne-Ouafo; Lucas Amenga-Etego; Gordon A Awandare
Journal: Front Genet Date: 2018-11-27 Impact factor: 4.599

10. mRNA-miRNA-lncRNA Regulatory Network in Nonalcoholic Fatty Liver Disease.

Authors: Marwa Matboli; Shaimaa H Gadallah; Wafaa M Rashed; Amany Helmy Hasanin; Nada Essawy; Hala M Ghanem; Sanaa Eissa
Journal: Int J Mol Sci Date: 2021-06-24 Impact factor: 5.923