Literature DB >> 26716901

The drug target genes show higher evolutionary conservation than non-target genes.

Wenhua Lv1, Yongdeng Xu1, Yiying Guo1, Ziqi Yu1, Guanglong Feng2, Panpan Liu1, Meiwei Luan1, Hongjie Zhu1, Guiyou Liu3, Mingming Zhang1, Hongchao Lv1, Lian Duan1, Zhenwei Shang1, Jin Li1, Yongshuai Jiang1, Ruijie Zhang1.   

Abstract

Although evidence indicates that drug target genes share some common evolutionary features, there have been few studies analyzing evolutionary features of drug targets from an overall level. Therefore, we conducted an analysis which aimed to investigate the evolutionary characteristics of drug target genes. We compared the evolutionary conservation between human drug target genes and non-target genes by combining both the evolutionary features and network topological properties in human protein-protein interaction network. The evolution rate, conservation score and the percentage of orthologous genes of 21 species were included in our study. Meanwhile, four topological features including the average shortest path length, betweenness centrality, clustering coefficient and degree were considered for comparison analysis. Then we got four results as following: compared with non-drug target genes, 1) drug target genes had lower evolutionary rates; 2) drug target genes had higher conservation scores; 3) drug target genes had higher percentages of orthologous genes and 4) drug target genes had a tighter network structure including higher degrees, betweenness centrality, clustering coefficients and lower average shortest path lengths. These results demonstrate that drug target genes are more evolutionarily conserved than non-drug target genes. We hope that our study will provide valuable information for other researchers who are interested in evolutionary conservation of drug targets.

Entities:  

Keywords:  drug target; evolutionary conservation; topological properties

Mesh:

Substances:

Year:  2016        PMID: 26716901      PMCID: PMC4826257          DOI: 10.18632/oncotarget.6755

Source DB:  PubMed          Journal:  Oncotarget        ISSN: 1949-2553


INTRODUCTION

Drug targets, a class of biological targets, are in vivo binding sites which include receptors, enzymes, ion channels and nucleic acids, etc. Drugs bind to their corresponding targets and perform the desirable therapeutic effects [1]. To date, thousands of drug targets have been identified and stored in databases such as DrugBank [2], Therapeutic Target Database (TTD) [3], Potential Drug Target Database (PDTD) [4] and TDR Targets Database [5]. Previous researches have shown that evolutionary features offer fresh views to many important fields that are related to drug discovery, including immunology [6], physiology [7, 8], epidemiology [9] and neurosciences [10]. Wang et al. [11] conducted an analysis and showed that some targeted genes shared common evolutionary features, which suggested that evolutionary information might provide novel insights for characterizing drug targets from new perspectives. However, most of the current studies about evolutionary conservation focus on a single gene or several genes belonging to a same protein family, rather than a large group of genes with same or similar features [12-16]. Compared with conventional analyses of evolutionary conservation, gene sets with a large number of genes can better reflect the characteristics of evolution. In addition, evolution conservation can be not only reflected by the general features such as evolutionary rate, the percentage of orthologous genes and protein sequence identity, but also by the network features [17, 18]. Therefore, we wondered whether there was difference in evolutionary features between drug target genes and non-target genes. We hoped to integrate comprehensive evolutionary information and investigate the evolutionary conservation characteristics of drug target genes from a global perspective. Therefore, we compared the evolutionary features between drug target genes and non-target genes combining both regular evolutionary features and some network features. All the evolutionary features were categorized into two groups: (1) evolutionary features of 21 species including evolutionary rate, conservation score and the percentage of orthologous genes; (2) topological features of human protein-protein interaction network including the average shortest path length, betweenness centrality, clustering coefficient and degree. In this research, we hope to explore the evolutionary conservation features of drug targets and help to enhance the efficiency of target identification.

RESULTS

Drug target genes had lower evolutionary rates than non-target genes

For each of the 21 species, we calculated the evolutionary rate dN/dS of both the drug target genes and non-target genes. We also respectively calculated the median dN/dS of drug target genes and non-target genes for each species and compared them using a line chart (Figure 1A). The results showed that the median dN/dS of drug target genes was significantly lower than that of non-target genes (P = 6.41E−05). For each species, a box plot was given to display the difference of dN/dS between the two groups of genes (Figure 1B). The results of box plots and Wilcoxon rank sum tests showed that the evolutionary rate of drug target genes was lower than that of the non-drug target genes for each of the 21 species. Detailed information about the dN/dS for each species is given in Table 1.
Figure 1

Evolutionary rates (dN/dS ratios) for the drug target genes and non-target genes

(A) Line chart of the drug target genes and non-target genes. (B) Box plots of the drug target genes against non-target genes for each of the 21 species.

Table 1

Summary statistics for the comparisons of dn/ds in species

Speciesdn/ds of Approved Drug Target Genesdn/ds of Non-Target GenesWilcoxon Rank Sum test P-value
MedianUpper QuartileLower QuartileMedianUpper QuartileLower Quartile
amel0.11040.18310.05550.12800.24260.06087.03E–07
btau0.10280.18510.05350.12460.23440.05647.93E–06
cfam0.10570.18570.05760.12700.24080.05912.94E–06
cjac0.15840.27330.07790.18930.35750.08389.80E–07
cpor0.10260.18000.05340.12110.22470.05783.11E–06
ecab0.11770.19840.06130.13520.25280.05955.50E–05
itri0.10270.18170.05380.11810.22120.04870.0063
lafr0.11730.19900.06450.14000.25510.06844.43E–07
mdom0.07570.13080.04250.09430.16920.04513.08E–08
mfur0.09750.17360.05020.12330.22350.05375.02E–07
mluc0.12810.21040.06840.14070.25470.06930.00172
mmul0.15780.29660.07090.19700.38700.07302.12E–06
mmus0.09100.15580.04790.11250.21000.04974.12E–09
nleu0.17350.32600.07810.22350.42610.08811.94E–08
ocun0.10140.16620.05100.11780.21840.05701.84E–07
ogar0.11630.19500.06040.13950.24820.05935.43E–06
pabe0.15610.30960.07430.20220.40180.07921.70E–07
ptro0.17180.35590.05780.21840.47150.05742.73E–06
rnor0.09310.16160.04870.11590.21050.05216.80E–08
shar0.07560.13260.04260.09380.16760.04514.92E–08
sscr0.11300.19440.05850.13210.23780.05950.0006

Evolutionary rates (dN/dS ratios) for the drug target genes and non-target genes

(A) Line chart of the drug target genes and non-target genes. (B) Box plots of the drug target genes against non-target genes for each of the 21 species.

Drug target genes had higher conservation scores than non-target genes

We aligned the protein sequence of both human drug target genes and non-target genes to the orthologous protein sequence of the other 21 species by using BLAST software and got conservation scores from the blast results. The median conservation scores of the two gene sets for 21 species were calculated and displayed by a line chart (Figure 2A) showing that the median conservation score of drug target genes was higher than that of non-target genes. The Wilcoxon signed rank test gave a P-value of 6.40E-05 confirming that there was significant difference in the conservation scores between human drug target genes and non-target genes. For each of the 21 species, the conservation scores of drug target genes are significantly higher than that of the non-target genes (Figure 2B). Detailed information about the conservation score for each species is given in Table 2.
Figure 2

Conservation scores for the drug target genes and non-target genes

(A) Line chart of the drug target genes and non-target genes. (B) Box plots of the drug target genes against non-target genes for each of the 21 species.

Table 2

Summary statistics for the comparisons of conservation score in species

SpeciesSequence Identity of Approved drug Target GenesSequence Identity of Non-Target GenesWilcoxon Rank Sum test P-value
MedianUpper QuartileLower QuartileMedianUpper QuartileLower Quartile
amel838.001213.00548.00613.00957.00361.002.44E–34
btau840.001257.50571.50615.00965.00373.506.18E–38
cfam859.001279.00557.25622.00988.00371.001.11E–34
cjac905.001299.50620.00655.001054.25394.003.59E–37
cpor828.001221.00545.00587.00919.50352.001.83E–40
ecab845.001228.00552.50608.00952.00360.253.47E–36
itri817.501153.00553.25594.00909.00367.501.47E–36
lafr831.501205.25555.00591.50926.00359.009.60E–41
mdom773.001135.75472.00514.50808.25314.001.37E–41
mfur856.501238.75576.00636.00981.25389.005.23E–33
mluc823.501197.00525.00582.00925.00354.004.33E–32
mmul895.001315.75613.00644.001023.50390.001.04E–41
mmus852.001271.50565.00602.00932.00361.001.81E–46
nleu900.001290.50610.00669.001064.00403.002.05E–31
ocun845.001233.25568.50608.00949.75360.001.31E–37
ogar863.001272.25580.00628.00974.00382.008.81E–40
pabe877.001257.50595.50655.001038.00399.003.41E–32
ptro925.501332.00611.00682.001087.00410.001.47E–33
rnor804.001141.00541.50569.00876.00343.008.96E–38
shar701.001007.00432.50499.00796.00305.003.34E–28
sscr768.501098.75482.25565.00876.00328.002.25E–28

Conservation scores for the drug target genes and non-target genes

(A) Line chart of the drug target genes and non-target genes. (B) Box plots of the drug target genes against non-target genes for each of the 21 species.

Drug target genes had higher percentages of orthologous genes than non-target genes

We calculated the percentage of orthologous genes of drug target genes and non-target genes for each species and displayed the line chart of this evolutionary feature in Figure 3, which showed that the drug target genes had a higher percentage of orthologous genes than the non-target genes. The P-value of Wilcoxon signed rank test was 9.54E-07 confirming that there was significant difference in the percentage of orthologous genes between the two groups of genes.
Figure 3

Line chart of the percentage of orthologous genes for the drug target genes and non-target genes

Drug target genes had a tighter network topology structure than non-target genes

We further analyzed the topological properties of the human protein-protein interaction network downloaded from HPRD and extracted the network features of both drug target genes and non-target genes. Then we compared these features between drug target genes and non-target genes. These following results were obtained: 1) The average shortest path length of drug target genes was significantly smaller than that of non-target genes (Figure 4A) and 2) The betweenness centrality, clustering coefficient and degree of drug target genes were significantly higher than those of non-target genes (Figure 4B–4D). These results showed that drug target genes had a tighter topological structure than non-target genes in the human protein-protein interaction network.
Figure 4

Network topological properties for the drug target genes and non-target genes

DISCUSSION

It is an important task to investigate the evolutionary conservation of drug target genes, which helps to well characterize drug targets. In this study we analyzed the evolutionary conservation of drug target genes by comparing multiple evolutionary characteristics including both classical features (evolutionary rate, conservation score and percentage of orthologous genes) and network topological properties (average shortest path length, betweenness centrality, clustering coefficient and degree). Through comprehensive analyses, we got consistent results supporting that drug target genes were more evolutionarily conserved during the evolutionary history. Previous studies about drug targets have identified genes or genome regions with higher evolutionary conservation as potential or candidate drug targets. For instance, the nucleoprotein (NP) of the influenza A virus which is a protein of high conservation was identified as a potential target of universally effective antivirals [19]. Heat shock proteins (HSPs), a ubiquitous group of evolutionary conserved proteins, which are involved in binding antigens and presenting them to the immune system, were determined as possible therapeutic targets [20]. Nonstructural proteins (NS3) are components of flavivirus polyprotein. Shiryaev et al. [21] performed a study focusing on the structural and functional characteristics of flaviviral protease. They found that the N-terminal and C-terminal parts of NS3 were composed by serine protease and the RNA helicase. Individual virus proteins were produced and a new progeny would be assembled if the polyprotein was cleaved by protease or RNA helicase. Since both the protease and the RNA helicase were conserved among flaviviruses, NS3 was identified as a promising drug target in flaviviral infections. Furthermore, some genes or proteins involved in conserved cellular progress such as DNA replication and apoptosis during evolution can also be identified as potential drug targets. For instance, Robinson et al. [22] explored the architecture and conservation of the bacterial DNA replication machinery and found that genes or proteins involved in maintaining the machinery of DNA replication had the greatest potential as drug targets. The mitochondrial permeability transition (mPT) is a mechanism that enables the secretion of Cytochrome-c (Cyt-c), Apoptosis Inducing Factor (AIF) and other pro-apoptotic proteins which initiate and promote apoptosis. A research conducted by Hellebrand et al. [23] suggested that some mPT inhibitory agents might become promising drug targets against apoptosis. With the rapid development of computer technology and machine learning theory, evolution information has been used to identify and prioritize drug targets. Ludin et al. [24] predicted antimalarial drug target candidates by utilizing evolution information and found 40 candidate drug targets with high evolution conservation. Another study about drug target identification and prioritization also indicated that many potential drug target genes could be predicted by orthologues information [25]. The comparison analysis results obtained in our study and the previous studies focusing on evolutionary conservation of drug targets or drug target identification based on evolution information suggest that drug targets are closely correlated with evolution conservation and they are characterized by higher evolutionary conservation during evolution process compared with non-target genes. This indicates that the results in our study are quite reliable and they might have the potential to expand the understanding of evolutionary characteristics of drug target genes.

MATERIALS AND METHODS

Human drug target genes

The human drug target gene set used in our study came from the DrugBank database that is a unique bioinformatics and cheminformatics resource combing detailed drug data with comprehensive drug target information [2]. We downloaded the data of Food and Drug Administration (FDA) approved drugs and the corresponding drug targets, which contained a total of 1857 terms for multiple species. We then extracted the human drug targets from the original data and finally obtained 1347 FDA-approved drug target genes for the following analyses.

Non-target genes

With the purpose of getting non-target gene set, we downloaded protein family data from Pfam database (ftp://ftp.sanger.ac.uk/pub/databases/Pfam/releases/Pfam27.0/), a collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs) [26], and obtained the human protein family information. After filtering out the protein families to which drug targets belonged, we got the non-target gene set containing 4181 non-redundant genes. It's worth noting that non-targets refer to those proteins that do not have similar domains with target proteins.

Calculation of evolutionary rate, percentage of orthologous genes and conservation score

We downloaded the orthologous gene data which included 21 species from the Ensembl database [27-29] (ftp://ftp.ensembl.org/pub/release-69/mysql/ensembl_mart_69). The full names and abbreviations of the 21 species can be found in Table 3. Then we extracted one-to-one ortholog genes [30] with non-null dN (rate of non-synonymous substitutions) and dS (rate of synonymous substitutions) values and calculated the evolutionary rate as the ratio of dN/dS.
Table 3

Full names and abbreviations

CalssAbbreviationFull name
Species 1amelAiluropoda melanoleuca
Species 2btauBos taurus
Species 3cfamCanis familiaris
Species 4cjacCallithrix jacchus
Species 5cporCavia porcellus
Species 6ecabEquus caballus
Species 7itriIctidomys tridecemlineatus
Species 8lafrLoxodonta africana
Species 9mdomMonodelphis domestica
Species 10mfurMustela putorius furo
Species 11mlucMyotis lucifugus
Species 12mmulMacaca mulatta
Species 13mmusMus musculus
Species 14nleuNomascus leucogenys
Species 15ocunOryctolagus cuniculus
Species 16ogarOtolemur garnrttii
Species 17pabePongo abelii
Species 18ptroPan troglodytes
Species 19rnorRattus norvegicus
Species 20sharSarcophilus harrisii
Species 21sscrSus scrofa
For both drug target genes and non-target genes, we counted the numbers of one-to-one orthologous genes in each of the 21 species and then calculated the percentage of orthologsous genes for each species. Conservation score is defined as a score assigned to each orthologous gene by sequence alignment between species to determine how conserved a gene is. Here the sequence conservation score is used to evaluate the degree of similarity between a human sequence and another species sequence for the orthologous gene. The higher scores indicate the higher degree of conservation. To compute the sequence conservation score, we downloaded the pair-wise protein sequences of human and other species from BioMart [31] (http://www.ensembl.org/biomart/martview) and performed alignment using BLASTP program and the BLOSUM62 matrix [32].

Calculation of topological properties of human protein-protein interaction network

We downloaded the protein-protein interaction (PPI) network data containing 39240 interaction pairs from the Human Protein Reference Database (HPRD) [33]. In the PPI network, a node denotes a protein and a path denotes a finite sequence of edges which connect proteins. Then we calculated 4 topological properties which included the average shortest path length, betweenness centrality, clustering coefficient and degree [34] by using MCODE, a plug-in of Cytoscape software [35]. The average shortest path length reflecting how tight one node is connected to the other nodes in a network is defined as the average length of all shortest paths passing through a certain node. The normalized betweenness centrality of node v is defined as where σ is the number of shortest paths from node i to node j and is the number of shortest paths passing through node v out of σivj. The betweenness centrality is an indicator used to measure a node's centrality in a network. The clustering coefficient in an undirected network is defined as where n is the number of edges connecting the k direct neighbors of node v and Ck2 is the max possible number of edges between k nodes. The clustering coefficient represents the degree to which nodes in a network tend to cluster together. The degree of node v is the number of nodes directly connecting with node v. To compare the network features of the drug target genes and non-target genes, we extracted the topological properties for the two gene sets.

Statistical analysis

We used the Wilcoxon rank sum test to evaluate the statistical significance of the difference in an evolutionary feature or a network feature between the drug target genes and non-target genes. We used the Wilcoxon signed rank test to check whether the median of an evolutionary feature of drug target genes was significantly different from that of the non-target genes for each species. In our study, Perl scripts were used for data processing (http://www.activestate.com/activeperl) and R scripts were used for statistical graphics and calculations (http://cran.r-project.org).
  35 in total

1.  Structural and functional parameters of the flaviviral protease: a promising antiviral drug target.

Authors:  Sergey A Shiryaev; Alex Y Strongin
Journal:  Future Virol       Date:  2010-09-01       Impact factor: 1.831

2.  Evolutionary conservation of glucose-dependent insulinotropic polypeptide (GIP) gene regulation and the enteroinsular axis.

Authors:  Michelle C Musson; Lisa I Jepeal; Torfay Sharifnia; M Michael Wolfe
Journal:  Regul Pept       Date:  2010-06-02

Review 3.  HSP60 as a drug target.

Authors:  Hiroyuki Nakamura; Hidemitsu Minegishi
Journal:  Curr Pharm Des       Date:  2013       Impact factor: 3.116

Review 4.  Human Protein Reference Database and Human Proteinpedia as resources for phosphoproteome analysis.

Authors:  Renu Goel; H C Harsha; Akhilesh Pandey; T S Keshava Prasad
Journal:  Mol Biosyst       Date:  2011-12-08

5.  Development of mitochondrial permeability transition inhibitory agents: a novel drug target.

Authors:  E E Hellebrand; G Varbiro
Journal:  Drug Discov Ther       Date:  2010-04

6.  Drug target prediction and prioritization: using orthology to predict essentiality in parasite genomes.

Authors:  Maria A Doyle; Robin B Gasser; Ben J Woodcroft; Ross S Hall; Stuart A Ralph
Journal:  BMC Genomics       Date:  2010-04-03       Impact factor: 3.969

Review 7.  Genomic-scale prioritization of drug targets: the TDR Targets database.

Authors:  Fernán Agüero; Bissan Al-Lazikani; Martin Aslett; Matthew Berriman; Frederick S Buckner; Robert K Campbell; Santiago Carmona; Ian M Carruthers; A W Edith Chan; Feng Chen; Gregory J Crowther; Maria A Doyle; Christiane Hertz-Fowler; Andrew L Hopkins; Gregg McAllister; Solomon Nwaka; John P Overington; Arnab Pain; Gaia V Paolini; Ursula Pieper; Stuart A Ralph; Aaron Riechers; David S Roos; Andrej Sali; Dhanasekaran Shanmugam; Takashi Suzuki; Wesley C Van Voorhis; Christophe L M J Verlinde
Journal:  Nat Rev Drug Discov       Date:  2008-10-17       Impact factor: 84.694

Review 8.  Lessons from comparative physiology: could uric acid represent a physiologic alarm signal gone awry in western society?

Authors:  Richard J Johnson; Yuri Y Sautin; William J Oliver; Carlos Roncal; Wei Mu; L Gabriela Sanchez-Lozada; Bernardo Rodriguez-Iturbe; Takahiko Nakagawa; Steven A Benner
Journal:  J Comp Physiol B       Date:  2008-07-23       Impact factor: 2.200

9.  DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Authors:  David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  Evolutionary conservation and network structure characterize genes of phenotypic relevance for mitosis in human.

Authors:  Marek Ostaszewski; Serge Eifes; Antonio del Sol
Journal:  PLoS One       Date:  2012-05-02       Impact factor: 3.240

View more
  9 in total

1.  Genome scale identification, structural analysis, and classification of periplasmic binding proteins from Mycobacterium tuberculosis.

Authors:  Padmani Sandhu; Monika Kumari; Kamal Naini; Yusuf Akhter
Journal:  Curr Genet       Date:  2016-11-17       Impact factor: 3.886

2.  Evolutionary Perspectives of Genotype-Phenotype Factors in Leishmania Metabolism.

Authors:  Abhishek Subramanian; Ram Rup Sarkar
Journal:  J Mol Evol       Date:  2018-07-19       Impact factor: 2.395

3.  Metabolic network-based stratification of hepatocellular carcinoma reveals three distinct tumor subtypes.

Authors:  Gholamreza Bidkhori; Rui Benfeitas; Martina Klevstig; Cheng Zhang; Jens Nielsen; Mathias Uhlen; Jan Boren; Adil Mardinoglu
Journal:  Proc Natl Acad Sci U S A       Date:  2018-11-27       Impact factor: 11.205

4.  Construction and analysis of gene-gene dynamics influence networks based on a Boolean model.

Authors:  Maulida Mazaya; Hung-Cuong Trinh; Yung-Keun Kwon
Journal:  BMC Syst Biol       Date:  2017-12-21

5.  Metabolic Network-Based Identification and Prioritization of Anticancer Targets Based on Expression Data in Hepatocellular Carcinoma.

Authors:  Gholamreza Bidkhori; Rui Benfeitas; Ezgi Elmas; Meisam Naeimi Kararoudi; Muhammad Arif; Mathias Uhlen; Jens Nielsen; Adil Mardinoglu
Journal:  Front Physiol       Date:  2018-07-17       Impact factor: 4.566

6.  Sequence-Derived Markers of Drug Targets and Potentially Druggable Human Proteins.

Authors:  Sina Ghadermarzi; Xingyi Li; Min Li; Lukasz Kurgan
Journal:  Front Genet       Date:  2019-11-15       Impact factor: 4.599

7.  CTRP3 exacerbates tendinopathy by dysregulating tendon stem cell differentiation and altering extracellular matrix composition.

Authors:  Yongsik Cho; Hyeon-Seop Kim; Donghyun Kang; Hyeonkyeong Kim; Narae Lee; Jihye Yun; Yi-Jun Kim; Kyoung Min Lee; Jin-Hee Kim; Hang-Rae Kim; Young-Il Hwang; Chris Hyunchul Jo; Jin-Hong Kim
Journal:  Sci Adv       Date:  2021-11-19       Impact factor: 14.136

8.  In Silico Pleiotropy Analysis in KEGG Signaling Networks Using a Boolean Network Model.

Authors:  Maulida Mazaya; Yung-Keun Kwon
Journal:  Biomolecules       Date:  2022-08-18

9.  Evaluating drug targets through human loss-of-function genetic variation.

Authors:  Eric Vallabh Minikel; Konrad J Karczewski; Hilary C Martin; Beryl B Cummings; Nicola Whiffin; Daniel Rhodes; Jessica Alföldi; Richard C Trembath; David A van Heel; Mark J Daly; Stuart L Schreiber; Daniel G MacArthur
Journal:  Nature       Date:  2020-05-27       Impact factor: 49.962

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.