Literature DB >> 23698860

TiPs: a database of therapeutic targets in pathogens and associated tools.

Rosalba Lepore¹, Anna Tramontano, Allegra Via.

Abstract

MOTIVATION: The need for new drugs and new targets is particularly compelling in an era that is witnessing an alarming increase of drug resistance in human pathogens. The identification of new targets of known drugs is a promising approach, which has proven successful in several cases. Here, we describe a database that includes information on 5153 putative drug-target pairs for 150 human pathogens derived from available drug-target crystallographic complexes.
AVAILABILITY AND IMPLEMENTATION: The TiPs database is freely available at http://biocomputing.it/tips. CONTACT: anna.tramontano@uniroma1.it or allegra.via@uniroma1.it.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2013 PMID： 23698860 PMCID： PMC3702258 DOI： 10.1093/bioinformatics/btt289

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Novel mechanisms to escape therapy are constantly emerging among human pathogen populations, and this clearly urges the development, on one hand, of new drugs for the treatment of the diseases and, on the other hand, of rapid and effective methods to help expand the landscape of available treatment options (Hopkins ). In this context, computational studies are called on to help identify novel therapeutic targets and characterize their interactions, and indeed a number of such efforts are described in the literature (Aguero ; Kinnings ; Lepore ; Orti ). However, these are mostly devoted to the analysis of single targets or specific tropical disease pathogens. The TiPs database has been developed with the aim of facilitating the identification of new therapeutic targets in >150 organisms responsible for human infections. We performed a large-scale analysis to systematically identify candidate targets in the proteomes of such organisms. The rationale of our approach is based on the intrinsic polypharmacological behaviour of compounds targeting homologous proteins (Paolini ). We considered all drug–target pairs for which the 3D structure of the complex is experimentally known and used the sequence of the target to identify its homologues in human pathogens. The evolutionary conservation of such homologues and their 3D structures (available or predicted) were used to verify whether the original drug was in principle able to bind them as it does the original target. To this aim, stringent filters were applied to ensure that predicted binding sites and their interactions with the drug are as accurate as possible. Pathogen proteins predicted with high confidence to be therapeutic targets and the putative drugs interacting with them were collected and annotated in TiPs.

2 METHODS

More than 400 human pathogen species were obtained from ‘The Approved List of Biological Agents’ provided by the Advisory Committee on Dangerous Pathogens. To unambiguously assign an identifier (ID) to human pathogens, the names of the organisms were mapped onto the NCBI Taxonomy Database records (http://www.ncbi.nlm.nih.gov/Taxonomy/). Drug compounds and information on their molecular targets were obtained from DrugBank (http://www.drugbank.ca). The SMILE IDs of drugs annotated either as ‘inhibitor’, ‘agonist’ or ‘antagonist’ were used to associate them with ligands present in the PDB structure entries (Berman ). Only identical compounds were considered (Tanimoto coefficient = 1). A total of 308 distinct drugs were observed in complex with at least one PDB structure. About 40% of these (119/308) occur in complex with their actual pharmaceutical target. These were used as starting points to predict potential drug targets in pathogens. The search for homologues in pathogens was performed using BLAST+ (Camacho ) with default parameters against the nr database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/). We only retained highly reliable hits, i.e. those showing at least 40% sequence identity to the original target and e-value < 10−6. Pathogen taxonomic IDs were retrieved by matching the gi numbers of BLAST hits to the NCBI Taxonomy database. For each known drug–target complex, we defined the binding site as the subset of target residues having at least one atom within 3.5 Å distance from any atom of the drug. The drug-binding site residues in the predicted pathogen sequences were retrieved through a multiple sequence alignment (MSA) of the original target sequence with its homologues generated with T-coffee (Taly ). The number and type of aligned residues were used to classify the binding site local conservation, both in terms of sequence coverage (percentage of binding site residues in the original target that could be aligned to the pathogen sequence) and identity (percentage of identical residues among the aligned binding site residues). Coverage and identity percentages were calculated separately for each pathogen sequence in the alignment. Only pathogen proteins showing at least 80% coverage in their binding sites were further considered (4215). Among these 4215 reliable putative targets, only 41 have a solved structure in the PDB. Homology modelling (Kopp and Schwede, 2004) was used to predict the structure of the remaining ones as follows: for each pathogen sequence, an MSA was generated using three iterations of HHblits (Remmert ) (with default parameters) on the non-redundant Uniprot database. The MSA was used as HHsearch query to search for templates in the PDB70 database. We only selected templates with at least 40% sequence identity (and e-value < 10−5) with the pathogen query sequence. If more than one template was found, the one with the highest coverage to the pathogen sequence was selected. Models were generated using the Modeller software. Note that the best template used to build the model corresponds to the original structure in the drug–target complex only in 153 cases, whereas in all the other cases, the best template was a different structure. The binding site residues of the original complex and of the predicted target were structurally superimposed using the LGA software (Zemla, 2003). Subsequently, the ligands were transferred into the structure or model of the pathogen proteins that could be successfully superimposed <5 Å distance to the known target. Binding sites in the modelled structures were analysed for the occurrence of nearby insertions/deletions. These cases are suitably highlighted in the TiPs database search output. This allows users to analyse them to establish the likelihood that their presence affects the conformation of the binding site.

3 RESULTS

TiPs currently contains 4071 candidate pathogen target structures involved in 5153 different drug–target complexes in 150 pathogens. All entries are thoroughly annotated with both sequence and functional information. The database can be queried by organism name (genus or specie name), protein family or function (EC number, GO terms and Pfam), as well as UniProt ID. The query returns a sortable table providing information about both known and predicted drug–target pairs and links to visualize specific information on the drug(s) (physicochemical properties, structure, indication and side effects), the target(s) [UniProt annotation and PDB structure(s)] and to visually analyse or download their 3D complexes. Ligplot (Laskowski and Swindells, 2011) drawings of both the known and inferred binding sites in complex with the drug are available as well (Fig. 1).

Fig. 1.

The figure shows the results of ‘all pathogens’ filtered by the ‘ATP binding’ GO term query in the TiPs database. The output table lists all putative pathogen targets. Each table row reports the known and predicted target UniProt IDs, their overall sequence identity, their binding site identity and rmsd, whether there are clashes between the known drug and the predicted target, and whether there are insertions or deletions nearby the binding site in the alignment used to model the protein. For each hit, the system also shows details of the structure(s) and the binding site(s) in a Jmol window and the corresponding Ligplot drawings

13 in total

1. The Protein Data Bank at 40: reflecting on the past to prepare for the future.

Authors: Helen M Berman; Gerard J Kleywegt; Haruki Nakamura; John L Markley
Journal: Structure Date: 2012-03-07 Impact factor: 5.006

2. LGA: A method for finding 3D similarities in protein structures.

Authors: Adam Zemla
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

Review 3. Automated protein structure homology modeling: a progress report.

Authors: Jurgen Kopp; Torsten Schwede
Journal: Pharmacogenomics Date: 2004-06 Impact factor: 2.533

4. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures.

Authors: Jean-Francois Taly; Cedrik Magis; Giovanni Bussotti; Jia-Ming Chang; Paolo Di Tommaso; Ionas Erb; Jose Espinosa-Carrasco; Carsten Kemena; Cedric Notredame
Journal: Nat Protoc Date: 2011-11 Impact factor: 13.491

5. Global mapping of pharmacological space.

Authors: Gaia V Paolini; Richard H B Shapland; Willem P van Hoorn; Jonathan S Mason; Andrew L Hopkins
Journal: Nat Biotechnol Date: 2006-07 Impact factor: 54.908

6. LigPlot+: multiple ligand-protein interaction diagrams for drug discovery.

Authors: Roman A Laskowski; Mark B Swindells
Journal: J Chem Inf Model Date: 2011-10-05 Impact factor: 4.956

7. BLAST+: architecture and applications.

Authors: Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal: BMC Bioinformatics Date: 2009-12-15 Impact factor: 3.169

Review 8. Genomic-scale prioritization of drug targets: the TDR Targets database.

Authors: Fernán Agüero; Bissan Al-Lazikani; Martin Aslett; Matthew Berriman; Frederick S Buckner; Robert K Campbell; Santiago Carmona; Ian M Carruthers; A W Edith Chan; Feng Chen; Gregory J Crowther; Maria A Doyle; Christiane Hertz-Fowler; Andrew L Hopkins; Gregg McAllister; Solomon Nwaka; John P Overington; Arnab Pain; Gaia V Paolini; Ursula Pieper; Stuart A Ralph; Aaron Riechers; David S Roos; Andrej Sali; Dhanasekaran Shanmugam; Takashi Suzuki; Wesley C Van Voorhis; Christophe L M J Verlinde
Journal: Nat Rev Drug Discov Date: 2008-10-17 Impact factor: 84.694