Literature DB >> 26677339

Identification of potential drug targets by subtractive genome analysis of Escherichia coli O157:H7: an in silico approach.

Shakhinur Islam Mondal¹, Sabiha Ferdous², Nurnabi Azad Jewel², Arzuba Akter³, Zabed Mahmud², Md Muzahidul Islam², Tanzila Afrin⁴, Nurul Karim⁵.

Abstract

Bacterial enteric infections resulting in diarrhea, dysentery, or enteric fever constitute a huge public health problem, with more than a billion episodes of disease annually in developing and developed countries. In this study, the deadly agent of hemorrhagic diarrhea and hemolytic uremic syndrome, Escherichia coli O157:H7 was investigated with extensive computational approaches aimed at identifying novel and broad-spectrum antibiotic targets. A systematic in silico workflow consisting of comparative genomics, metabolic pathways analysis, and additional drug prioritizing parameters was used to identify novel drug targets that were essential for the pathogen's survival but absent in its human host. Comparative genomic analysis of Kyoto Encyclopedia of Genes and Genomes annotated metabolic pathways identified 350 putative target proteins in E. coli O157:H7 which showed no similarity to human proteins. Further bio-informatic approaches including prediction of subcellular localization, calculation of molecular weight, and web-based investigation of 3D structural characteristics greatly aided in filtering the potential drug targets from 350 to 120. Ultimately, 44 non-homologous essential proteins of E. coli O157:H7 were prioritized and proved to have the eligibility to become novel broad-spectrum antibiotic targets and DNA polymerase III alpha (dnaE) was the top-ranked among these targets. Moreover, druggability of each of the identified drug targets was evaluated by the DrugBank database. In addition, 3D structure of the dnaE was modeled and explored further for in silico docking with ligands having potential druggability. Finally, we confirmed that the compounds N-coeleneterazine and N-(1,4-dihydro-5H-tetrazol-5-ylidene)-9-oxo-9H-xanthene-2-sulfon-amide were the most suitable ligands of dnaE and hence proposed as the potential inhibitors of this target protein. The results of this study could facilitate the discovery and release of new and effective drugs against E. coli O157:H7 and other deadly human bacterial pathogens.

Entities: CellLine Chemical Disease Gene Species

Keywords: DNA polymerase III alpha; E. coli O157:H7; KEGG metabolic pathways; homology modeling; novel and broad-spectrum antibiotic targets

Year: 2015 PMID： 26677339 PMCID： PMC4677596 DOI： 10.2147/AABC.S88522

Source DB: PubMed Journal: Adv Appl Bioinform Chem ISSN： 1178-6949

Introduction

Enteropathogenic Escherichia coli and enterohemorrhagic E. coli (EHEC) infections in humans are a major source of morbidity and mortality in both developing and developed countries.1 Among various pathogenic E. coli strains that cause intestinal or extra-intestinal diseases in humans, the most devastating are shiga toxins producing EHEC strains, because they cause not only diarrhea and hemorrhagic colitis but also life-threatening hemolytic uremic syndrome and encephalopathy.2 Over 100 serotypes of shiga toxin-producing E. coli have been associated with human infections and the most common serotype is E. coli O157:H7. Several deadly outbreaks of E. coli O157:H7 were reported in Canada, United States, Great Britain, and Japan.3–5 However, the massive outbreak in Sakai city, Japan, in 1996 is of great concern as a number of deaths from the infection were reported.6 The genome of EHEC O157:H7 Sakai strain was sequenced in 2001.7 The sequence analysis revealed that this strain contains 18 prophages (Sp1 to Sp18), six prophage-like elements (SpLE1 to SpLE6), and two plasmids (pO157 and pOSAKl). To know the mechanism underlying pathogenicity of this bacterium, a substantial number of virulence-related genes or functions associated with various stages of infection have been identified.7 However, lack of details of functional annotations often limit the possibility to use them as targets for designing new drugs against this pathogen. The treatment of E. coli O157:H7 mostly relies on conventional antibiotic therapy although some studies have highlighted that there is no evidence that this improves the course of disease and antibiotic treatment of patients with E. coli O157:H7 infection increases the risk of hemolytic–uremic syndrome.8,9 Moreover, an increase of antibiotic resistance has been reported in E. coli O157:H7 over the last 30 years which is also alarming.10–14 The accumulated results strongly suggest that there is an urgent and continuing need to find new drug and vaccine candidates to tackle this deadly pathogen. Drug target identification is the first step in the drug discovery process.15 However, traditional drug discovery methods are time-consuming, expensive, and often yield few drug targets. In contrast, advances in complete genome sequencing, bioinformatics, and cheminformatics represent an attractive alternative approach to identify drug targets worthy of experimental follow-up. Because of the availability of both pathogen and host–genome sequences, it has become easier to identify drug targets at the genomic level for any given pathogen.16,17 In recent years, computational methods have been used widely for the identification of potential drug and vaccine targets in different pathogenic microorganisms.18–21 Subtractive and comparative genomics approach combined with metabolic pathway analysis was found to be an efficient way to identify the protein-set essential for the pathogen’s survival but absent in the host.22 Subtraction of the host genome from essential genes of pathogens helps in searching for non-human homologous targets which ensures no interaction of drugs with human targets. On the other hand, comparative genomics method emphasizes the selection of conserved proteins amongst several species as most favorable targets.23–26 The use of advanced bioinformatics tools with integrated genomics, proteomics, and metabolomics may ensure the discovery of potential drug targets for most of the infectious diseases. Once the target(s) have been identified, the in silico virtual screening of different chemical databases could provide unprecedented opportunity to select and design the best possible inhibitor(s).27 In this study, we took an in-depth in silico approach to identify novel therapeutic targets in E. coli O157:H7 Sakai strain by combining analysis of metabolomics and genomics data. Instead of analysis of whole genome, we particularly considered the key essential or survival proteins of the pathogen which are non-homologous to the host. We elucidated a good number of novel targets in E. coli O157:H7 to design effective drugs against broad-spectrum pathogenic bacteria. Moreover, we provided a modeled 3D structure of DNA polymerase III alpha (dnaE) which was selected as the best possible target for inhibition and designing potential drugs. To the best of our knowledge this was the first in silico identification of drug targets in E. coli O157:H7.

Materials and methods

Pathway analysis and protein retrieval

Figure 1 showed the strategies for identification of suitable drug targets and prediction of inhibitor used. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database28,29 was searched for metabolic pathways for both human genomes and E. coli O157:H7 Sakai strain. The identification numbers of all pathways from both organisms were listed. A manual comparison was made, and pathways that did not appear in the human genomes but were present in the pathogen, according to the KEGG database annotations, were selected as unique to E. coli O157:H7, while the remaining pathways were listed as common. Amino acid sequences of proteins from common and unique pathways were obtained from UniprotKB.30

Figure 1

A schematic representation of the workflow of computational drug target identification and prediction of putative inhibitors of the selected target.

Abbreviations: BLASTP, Protein Basic Local Alignment Search Tool; TTD, Therapeutic Target Database.

Identification of essential proteins and non-homologous proteins in humans

Database of Essential Genes (DEG)31 was used to identify the essential proteins involved in host-pathogen pathways. The DEG 6.8 database was retrieved from http://www.tubic.tju.edu.cn/deg/. The BioEdit Sequence Alignment Editor (version 7.1.3) was used for Protein Basic Local Alignment Search Tool (BLASTP) search to screen for and eliminate the probable essential proteins of the organism setting e-value cut off 10−4, sequence identity >35%, bit score >100 and others as default. All human protein sequences were retrieved from Refseq database ftp site (ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/protein/) and essential proteins were subjected to BLASTP search against the human proteins with BioEdit. Only the non-hit proteins at e-value cut off 10−10, were selected as non-homologous proteins to avoid any functional similarity with host proteome.

Subcellular localization prediction and targets’ prioritization

Subcellular localization prediction of the essential non-human proteins was done by PSORTb version 3.0.2,32 which predicts three types of localization such as: cytoplasmic, membrane, and extracellular proteins for Gram-negative bacteria. The potential drug targets were evaluated by several molecular and structural criteria33 for prioritizing suitable drug targets. Drug targets’ prioritization involved calculation of molecular weight (MW) using computational tools and drug targets associated literature available at Swiss-Prot database. Protein Data Bank and ModBase (http://www.salilab.org/modbase) databases were searched for identifying experimentally and computationally solved 3D structures respectively.34 The selected protein was searched for any structural identity with the 3D ligand binding site of any human protein structure on the web server SMAP-WS at a cut off value of 30% sequence identity.35 Moreover, druggability is another important prioritization criterion for therapeutic targets; that is defined as the likelihood of being able to modulate the activity of the therapeutic target protein with a small-molecule drug.36,37 The druggability of identified drug targets was measured by mining DrugBank contents. BLASTP with default parameters was performed against the list of targets of compounds found within DrugBank to align the potential drug targets from E. coli O157:H7. Alignments with e-values less significant than 10−25 were removed as described previously as selection criteria for filtering BLAST results in identifying drug targets of bacterial genomes.38

Identification of novel targets and searching for common proteins

To identify novel targets among the potential targets, databases DrugBank, SuperTarget, and Therapeutic Target Database, were searched for similarity with the cytoplasmic proteins.39–41 Parameters were set as e-value <10−5, sequence identity >35%, and bit score >100. The non-hit proteins at the threshold value were selected as novel drug targets. To search for the common proteins amongst pathogenic bacteria, all protein sequences of 73 different strains of pathogenic bacteria were retrieved from PATRIC database.42 The novel targets were subjected to BLASTP against these proteomes at e-value cut off 10−5, sequence identity >35%, bit score >100 with BioEdit software. The proteins that were found to be common in at least 40 pathogenic strain proteomes were listed as broad-spectrum targets. Different bacterial species were used as references.

Homology modeling

As no exact protein data bank (PDB) structure was available for dnaE in PDB, it was subjected to BLAST search against PDB structures using 0.001 e-value cut off. The template for homology modeling was chosen considering X-ray diffraction resolution and highest sequence similarity. Homology modeling was done on ESyPred3D server.43

Structure validation and active site prediction

The modeled structure was assessed through SWISS-MODEL structure assessment tool44 and ANOLEA (atomic non-local environment assessment)45 assessing the packing quality of the models. PROCHECK suite of programs46 checked the stereochemical quality of protein structures. Energy minimization was carried out by GROMOS96 with default parameters implemented in Swiss PDB Viewer (version 4.0.4).47 Active site of the modeled structure was determined by CASTp server.48

Virtual screening, drug likeliness, and toxicity analysis

For visual analysis and comparison of the active site interaction with the ligands, 24 ligand molecules of E. coli K-12 dnaE subunit were extracted from BindingDB database and docked with the subject receptor.49 Virtual screening was done with a total 6,460 molecules, 5,040 experimental and 1,447 approved molecules deposited in DrugBank, based on selected active sites into the dnaE. Virtual screening was performed on Linux (Ubuntu 10.04) based cluster with 32 core systems. The top 100 molecules were selected based on lowest binding energy after being docked several times. The selected molecules were analyzed by Lipinski’s rule of five. The ligand interaction analysis and visualization was done with the help of Pymol and Discovery Studio (Accelrys, San Diego, CA, USA). Absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction was carried out with PreADMET server. PreADMET predicts mutagenicity and carcinogenicity of a compound and helps to avoid toxic compound. Oral bioavailability was predicted with FAF-Drugs2 program of Mobyle@RPBS server.50

Results and discussion

Identification of pathogen-specific pathways

Here we report the first computational comparative and subtractive genomics analysis of different metabolic pathways from E. coli O157:H7, for the identification of potential drug targets. A systematic workflow was defined involving several bioinformatics tools, databases, and drug target prioritization parameters (Figure 1), with the goal of obtaining information about proteins that were involved in various metabolic pathways of E. coli O157:H7, but absent in its host, therefore avoiding any potential side effects. When we searched the KEGG database for pathogen metabolic pathways, a total of 105 different pathways appeared. In order to identify drug targets involved in pathogen-specific metabolic pathways, comparative analysis of the metabolic pathways of the host and pathogen was performed. Detailed pathways analyses revealed that a total of 35 pathways were present only in the pathogen and termed as pathogen-specific pathways, and the remaining 70 pathogen pathways were defined as common host-pathogen pathways as listed in Table 1.

Table 1

Host–pathogen common and pathogen-specific pathways from KEGG database

Pathway IDs	Pathway names	Pathway IDs	Pathway names
Host–pathogen common pathway
ecs00010	Glycolysis/Gluconeogenesis	ecs00860	Porphyrin and chlorophyll metabolism
ecs00020	Citrate cycle (TCA cycle)	ecs00900	Terpenoid backbone biosynthesis
ecs00030	Pentose phosphate pathway	ecs00920	Sulfur metabolism
ecs00040	Pentose and glucuronate interconversions	ecs00970	Aminoacyl-tRNA biosynthesis
ecs00051	Fructose and mannose metabolism	ecs01040	Biosynthesis of unsaturated fatty acids
ecs00052	Galactose metabolism	ecs02010	ABC transporters
ecs00053	Ascorbate and aldarate metabolism	ecs03010	Ribosome
ecs00061	Fatty acid biosynthesis	ecs03018	RNA degradation
ecs00071	Fatty acid metabolism	ecs03020	RNA polymerase
ecs00130	Ubiquinone and other terpenoid-quinone biosynthesis	ecs03030	DNA replication
ecs00190	Oxidative phosphorylation	ecs03060	Protein export
ecs00230	Purine metabolism	ecs03410	Base excision repair
ecs00240	Pyrimidine metabolism	ecs03420	Nucleotide excision repair
ecs00250	Alanine, aspartate and glutamate metabolism	ecs03430	Mismatch repair
ecs00260	Glycine, serine and threonine metabolism	ecs03440	Homologous recombination
ecs00270	Cysteine and methionine metabolism	ecs04122	Sulfur relay system
ecs00280	Valine, leucine and isoleucine degradation	ecs00561	Glycerolipid metabolism
ecs00290	Valine, leucine and isoleucine biosynthesis	ecs00562	Inositol phosphate metabolism
ecs00300	Lysine biosynthesis	ecs00564	Glycerophospholipid metabolism
ecs00310	Lysine degradation	ecs00590	Arachidonic acid metabolism
ecs00330	Arginine and proline metabolism	ecs00592	alpha-Linolenic acid metabolism
ecs00340	Histidine metabolism	ecs00600	Sphingolipid metabolism
ecs00350	Tyrosine metabolism	ecs00620	Pyruvate metabolism
ecs00360	Phenylalanine metabolism	ecs00630	Glyoxylate and dicarboxylate metabolism
ecs00380	Tryptophan metabolism	ecs00640	Propanoate metabolism
ecs00400	Phenylalanine, tyrosine and tryptophan biosynthesis	ecs00650	Butanoate metabolism
ecs00410	beta-Alanine metabolism	ecs00670	One carbon pool by folate
ecs00430	Taurine and hypotaurine metabolism	ecs00730	Thiamine metabolism
ecs00450	Selenocompound metabolism	ecs00740	Riboflavin metabolism
ecs00460	Cyanoamino acid metabolism	ecs00750	Vitamin B6 metabolism
ecs00471	D-Glutamine and D-glutamate metabolism	ecs00760	Nicotinate and nicotinamide metabolism
ecs00480	Glutathione metabolism	ecs00770	Pantothenate and CoA biosynthesis
ecs00500	Starch and sucrose metabolism	ecs00780	Biotin metabolism
ecs00511	Other glycan degradation	ecs00785	Lipoic acid metabolism
ecs00520	Amino sugar and nucleotide sugar metabolism	ecs00790	Folate biosynthesis
Pathogen-specific pathway
ecs00281	Geraniol degradation	ecs00627	Aminobenzoate degradation
ecs00361	Chlorocyclohexane and chlorobenzene degradation	ecs00633	Nitrotoluene degradation
ecs00362	Benzoate degradation	ecs00642	Ethylbenzene degradation
ecs00363	Bisphenol degradation	ecs00660	C5-Branched dibasic acid metabolism
ecs00364	Fluorobenzoate degradation	ecs00680	Methane metabolism
ecs00401	Novobiocin biosynthesis	ecs00903	Limonene and pinene degradation
ecs00440	Phosphonate and phosphinate metabolism	ecs00910	Nitrogen metabolism
ecs00473	D-Alanine metabolism	ecs00930	Caprolactam degradation
ecs00521	Streptomycin biosynthesis	ecs01053	Biosynthesis of siderophore group non-ribosomal peptides
ecs00523	Polyketide sugar unit biosynthesis	ecs01110	Biosynthesis of secondary metabolites
ecs00540	Lipopolysaccharide biosynthesis	ecs01120	Microbial metabolism in diverse environments
ecs00550	Peptidoglycan biosynthesis	ecs02020	Two-component system
ecs00621	Dioxin degradation	ecs02030	Bacterial chemotaxis
ecs00622	Xylene degradation	ecs02040	Flagellar assembly
ecs00623	Toluene degradation	ecs02060	Phosphotransferase system
ecs00624	Polycyclic aromatic hydrocarbon degradation	ecs03070	Bacterial secretion system
ecs00625	Chloroalkane and chloroalkene degradation	ecs05130	Pathogenic Escherichia coli infection
ecs00626	Naphthalene degradation

Abbreviation: KEGG, Kyoto Encyclopedia of Genes and Genomes.

Identification of non-homologous essential proteins

To be an effective drug target, the protein should be crucial for the survival of pathogen in the host body but non- homologous to human proteins and this criterion is a prerequisite for avoiding cross binding of drugs with human proteins, and drug side effect probability.51 Unique pathways are those that are specific to the pathogen but absent in its host. Proteins in these pathways can also be considered as unique to the pathogen and might serve as potential drug and vaccine targets.34 Moreover, several unique proteins are known to be present in common pathways as identified during our analysis of E. coli O157:H7 (data not shown) and in several previous studies on other bacteria.19,20,52 Additionally, we also identified that a single unique protein can also take part in multiple pathways. Proteins that are involved in more than one pathway could be more effective drug targets when, in addition, they are non-homologous proteins. Nevertheless, being unique or non-human and involved in metabolic pathways are not the sole criteria for selecting favorable drug targets. It is possible that a bacterial protein showing no similarity to host proteins might be involved in multiple metabolic pathways, but its disruption might be of no therapeutic benefit. The reasons may include presence of paralogs, isoenzymes, and most importantly, being non-essential for the pathogen’s survival. Again, not all essential proteins are non-homologous in nature. Therefore, pathogen proteins that fulfill the criteria of being unique and essential at the same time represent more attractive drug targets.34 Based on the criteria discussed above, we identified 780 probable essential proteins from host–pathogen common pathways, and 234 from pathogen unique pathways (Supplementary materials). These proteins showed good similarity with the experimentally proven essential proteins recorded in DEG database. BLASTP search against human proteome narrowed down the target proteome to only 220 and 130 proteins from common pathways and unique pathways respectively resulting in 350 proteins which were essential for the pathogen’s survival and non-homologous to the host (Supplementary materials).

Subcellular localization and prediction of drug target prioritization

Localization of the proteins in the cell is an important factor for identification of suitable and effective drug targets. Membrane localized proteins are difficult to purify and assay53 and therefore, cytoplasmic proteins are more favorable as drug targets. Other major factors are: accessibility value of a target protein; preferably low MW (<100 kDa); whether a potential drug is a transmembrane protein; and availability of 3D structural information.20 Based on these essential features, the identified non-homologous essential proteins of E. coli O157:H7 were further characterized. Most of the proteins had MW less than 100 kDa indicating the possibility to experimentally study these proteins for drug development. From the common pathways, 152 proteins were found to be cytoplasmic, 54 proteins to be membrane localized, and the other 14 proteins to be of unknown localization. From pathogen unique pathways, 70, 52, and eight proteins were found to be cytoplasmic, membrane localized, and of unknown localization respectively (Supplementary materials). Based on these results, 152 and 70 cytoplasmic proteins from common pathways and unique pathways respectively were considered for further analysis to identify suitable drug targets (Figure 2).

Figure 2

Comparative subcellular localization of proteins from the common host-pathogen pathways and pathogen-specific pathways.

The DrugBank is a unique bioinformatics and cheminformatics resource which combines detailed drug data with comprehensive information about drug targets. By utilizing the DrugBank database, druggability of non-homologous essential proteins of E. coli O157:H7 was measured by sequence similarity to the targets of small-molecule drugs and BLASTP search was performed to align the non-homologous essential proteins to the list of drug-targeted proteins from DrugBank. A total of 129 proteins of E. coli O157:H7 showed high similarities with the binding partners of US Food and Drug Administration (FDA)-approved drugs, experimental small-molecule compounds, or nutraceutical compounds supporting the potential of comparative genomics in drug discovery. Among these, 80 proteins and 49 proteins were from common and unique pathways respectively (Supplementary materials). By this comparison with drug-targeted proteins additionally a list of approved drug and drug-like compounds was identified that bind to proteins with similar sequences to those of E. coli O157:H7. It is reasonable that careful filtering of this set could reveal a number of potential compounds that were primed for optimization and derivatization using traditional medicinal chemistry although protein sequence similarity does not guarantee identical structures or binding pockets.38 We searched the presence of 3D structures of the non-host essential proteins of E. coli O157:H7. Such information could greatly facilitate a structure-based drug design, including homology modeling, docking, virtual screening or pharmacophore-based screening.54 The PDB and ModBase were used as sources for the 3D structural information. Out of 350 non-host essential proteins, ten proteins were identified as having experimentally determined 3D structures in PDB and 312 were found to have 3D models in ModBase (Supplementary materials).

Novel targets’ identification

Proteins which showed significant similarity with the databases were discarded and the remaining protein sequences were taken as novel targets. Forty-four proteins from pathogen unique pathways and 76 proteins from common pathways totaling 120 proteins were defined as novel proteins (Supplementary materials). Metabolic pathway analysis indicated that these 120 proteins were involved in 12 biological processes unique to pathogens and 49 biological processes that were common in both host and pathogen. Moreover, all of these 61 biological processes were classified into 12 classes: amino acid metabolism, carbohydrate metabolism, energy metabolism, glycan biosynthesis and metabolism, lipid metabolism, metabolism of cofactors and vitamins, metabolism of other amino acids, nucleotide metabolism, genetic information processing, environmental information processing, cellular processes, and others (Supplementary materials). Figure 3 showed the percentage distribution of novel drug targets involved in different biological process.

Figure 3

Percentage distribution of novel drug targets involved in different metabolic pathways or biological processes.

Novel targets in pathogens’ unique pathways

Our study revealed that 44 proteins were uniquely involved in pathogen-specific 12 unique pathways and these were lipopolysaccharide (LPS) biosynthesis, peptidoglycan biosynthesis, methane metabolism, C5-branched dibasic acid metabolism, nitrogen metabolism, phosphonate and phosphinate metabolism, bacterial secretion system, phosphotransferase system (PTS), flagellar assembly, two-component system, biosynthesis of siderophore group non-ribosomal peptides, and bacterial chemotaxis. Three enzymes of LPS core biosynthesis and three enzymes of lipid A pathway were found uniquely present in LPS biosynthesis pathway (KEGG Pathway: map00540). In LPS of the Enterobacteriaceae, the core oligosaccharide is responsible for many of the biological properties of the antigenic O-polysaccharide.55 On the other hand, enzymes of lipid A pathway required for bacterial growth could be excellent targets for the development of new antibiotics.56 Moreover, murF; EC 6.3.2.10 was the only enzyme present in peptidoglycan biosynthesis (KEGG Pathway: map00550). In Gram-positive bacteria, the cell wall composed of peptidoglycan macromolecules and many surface proteins of Gram-positive bacteria is thought to be important for survival within an infected host.57 In methane metabolism (KEGG Pathway: map00680), AckA; EC 2.7.2.1 was found to be uniquely present. Methanotrophs involved in the global methane cycle consume methane as their sole source of carbon and energy for growing58 whereas methanogens can obtain energy for growth by converting a limited number of substrates to methane.59 The protein ilvH, acetolactate synthase III small subunit, is a unique protein present in C5-branched dibasic acid metabolism (KEGG Pathway: map00660) which provides alternative sources of carbon and energy.60 There was a unique presence of four proteins in nitrogen metabolism (KEEG Pathway: map00910). In the nitrogen cycle, different reductive or oxidative reactions are utilized by prokaryotes for energy conservation.61 In case of oxygen deprived growth conditions for E. coli, anaerobic respiration nitrate is the preferred electron acceptor.62 The putative resistance protein was identified as unique protein present in phosphonate and phosphinate metabolism (KEGG Pathway: map00440). Natural products containing carbon-phosphorous bonds, so-called C-P compounds have been found in many organisms, but only protists and bacteria, mostly actinobacteria, have biosynthetic capacity. Moreover, the secB; protein-export protein secB and etpN; type 4 prepilin-like proteins leader peptide-processing enzyme, that are the parts of bacterial secretion system (KEGG Pathway: map03070), were found as unique. Many proteins implicated in efflux of different toxins and drugs; virulence and biogenesis of different organelles (pili and flagella) are secreted.63 The secB is involved in efficient export of proteins across the cytoplasmic membrane in E. coli.64 In PTS (KEGG Pathway: map02060), n9 proteins were found to be uniquely present. It has been known that PTS is involved in transportation of more than 20 carbohydrates in bacteria and plays a major role in phosphorylation and uptake of carbohydrates and controlling their metabolism.65,66 Flagellar FliJ protein which is part of flagellar assembly (KEGG Pathway: map02040) as well as a putative general chaperone and cytoplasmic protein67 was identified as unique in the pathogen. The bacterial flagellum extending from the cytoplasm to the cell exterior serves as both a motor organelle and a protein export/assembly apparatus.68 A total of 19 proteins were identified as unique in the pathogen-specific pathway in a two-component system (KEGG Pathway: map02020). Bacterial two-component system is required for adaptation to external stimuli and can affect changes in cellular physiology.69 A single protein was identified to be involved in biosynthesis of siderophore group non-ribosomal peptides pathway. Polyketide synthases and non-ribosomal polypeptide synthetases are known to be responsible for the biosynthesis of several siderophores such as enterobactin in E. coli.70 Two proteins were detected in case of bacterial chemotaxis. In chemotaxis, bacteria sense chemical gradients in the environment and move toward favorable conditions. The pathway is arguably best characterized in the case of E. coli.71

Novel targets in common host-pathogen pathways

We also identified 76 proteins in 49 host–pathogen common metabolic pathways as novel targets. The pathways were grouped as metabolism of amino acid, carbohydrate, energy, lipid, other amino acids, nucleotide, cofactors and vitamins, and genetic information processing, environmental information processing, cellular processes, and others (Supplementary materials).

Identification of broad-spectrum targets

Common proteins among several species would be well broad-spectrum antibiotic targets.72 We used 73 species as reference and proteins which were common in at least 40 different species were listed as broad-spectrum targets (Supplementary materials). Forty-four proteins were identified as broad-spectrum targets (Supplementary materials). In addition, broad-spectrum proteins involved in multiple pathways would be better targets as their inhibition of activity will hamper more than one system in the pathogen.73 dnaE subunit and AckA were involved in a maximum number of pathways. AckA catalyzes the reversible reaction of formation of acetyl phosphate from acetate and ATP. Whereas dnaE participates in some critical pathways of the pathogen like purine metabolism, pyrimidine metabolism, DNA replication, mismatch repair, and homologous recombination. There is no resolved X-ray crystallography structure for both dnaE (E. coli O157:H7 Sakai strand) and AckA (E. coli O157:H7 Sakai strand) as we intended to do homology modeling. However, dnaE subunit (Uniprot ID: Q8X8X5) was preferred over AckA (Uniprot ID: P0A6A5) based on suitability of homology modeling and docking studies. During the BLAST search against PDB structures (threshold e-value <0.001), dnaE subunit showed 98% sequence identity with E. coli replicative dnaE subunit (PDB ID: 2HNH), whereas 94% for AckA with Salmonella enterica subspecies enterica serovar Typhimurium AckA (PDB ID: 3SK3 and 3SLC) (data not shown). Furthermore, an inter-domain motion during ligand binding as well as a ligand binding pocket located at the dimeric interface of form-II AckA has been reported74 that is very hard to address through in silico docking software. In case of dnaE subunit, literature searching helped us to identify the pocket for ligand binding as well as important and conserved residues within the pocket that are important for catalytic activity.75 Moreover, we conducted BLAST searching for dnaE (Uniprot ID: Q8X8X5) with e-value cut off of 0.001 against UniprotKB. We selected the organisms whose proteins showed at least 75% sequence identity with dnaE. The organisms were searched in PATRIC database to check their host and pathogenicity. Only the human hosts were considered and predicted to be pathogenic if involved in disease(s) according to PATRIC database.42 We found that all the organisms are pathogenic except Lelliottia, Kluyvera, Hafnia, Ewingella, Cedecea, and Yokenella. However, literature searching helped us to conclude that Kluyvera, Yokenella, Ewingella, Hafnia, and Cedecea are also involved in diseases.76–81 We did not find sufficient information about Lelliottia. Thus it is clearly demonstrated that E. coli O157:H7 dnaE does not share any sequence similarity with non-pathological bacterial dnaE. Therefore, due to the above advantages, dnaE was selected for homology modeling and subsequent structure-based drug designing. DNA polymerase III holo-enzyme has ten different peptides arranged in an asymmetric dimer and contains a 3′-5′ exonuclease activity. The alpha subunit is at the core enzyme and mainly functions in the polymerase activity. The best hit of similarity search identified the crystal structure of the catalytic alpha subunit of E. coli replicative dnaE DNA polymerase III (PDB ID: 2HNH).82 It showed 99.98% sequence similarity with our target sequence and was used as the template for homology modeling. The modeled structure was shown in Figure 4.

Figure 4

Homology modeled structure of the Escherichia coli O157:H7 Sakai strand DNA polymerase III alpha, modeled with ESyPred3D server.

Structure validation and energy minimization

Advancing toward the way of in silico drug design with protein models depends largely on the quality of the models. Inspection of the Psi/Phi Ramachandran plot analysis showed that the model built by ESyPred3D has residues in most favored regions 92.4%, residues in additional allowed regions 7.1%, and residues in generously allowed regions 0.3%. A good quality model would be expected to have over 90% in the most favored regions (Figure 5A).

Figure 5

Structure validation and energy minimization.

Notes: (A) Result of PROCHECK verification program, showing number and percentages of residues in most favored regions (red); additional allowed regions (yellow); generously allowed regions (creamy white); and in disallowed regions (white). Based on an analysis of 118 structures of resolution of at least 2.0 angstroms and R-factor no greater than 20%, a good quality model would be expected to have over 90% in the most favored regions. (B) Result of the 3D structure verification tool ANOLEA. This figure shows residues in favorable energy environment (green) and residues in unfavorable energy (red).

Abbreviation: ANOLEA, atomic non-local environment assessment.

This structure was also verified with ANOLEA. The y-axis of the plot represents the energy for each amino acid of the protein chain and it showed that maximum residues are in favorable energy environment (Figure 5B). As a result the structure modeled was considered as a good quality model for further analysis. To obtain a better refined model energy minimization was done using Swiss PDB viewer. It minimizes energy using GROMOS96. The force-field energies of the overall structure before and after minimization were −13,816.546 KJ/mol and −36,224.391 KJ/mol respectively.

Active site analysis

Active site is the region on the surface of an enzyme to which a specific substrate (ligand) or set of substrates (ligands) binds. The properties of the active site are determined by the sequences of amino acids and the 3D arrangement of the polypeptide chains of the enzyme. Identification of the active site of E. coli O157:H7 Sakai strain dnaE was done by CASTp server. This server calculates the surface area and volume of the pocket of the given structure. In addition, it also shows the active site residues. The active site residues with a volume of 17,116 were shown in Figure 6. Literature searching helped us to identify the important residues within the pocket. The structure of catalytic alpha subunit of E. coli is known to have three conserved residues of aspartate (Asp 402, Asp 404, and Asp 556 in our model, ensured by aligning with the reference structure, data not shown) and four conserved positive residues (Arg 391, Arg 397, Arg 710, and Arg 711) in this structure (Arg 390, Arg 396, Arg 709, and Arg 710 in the reference structure) that are important in catalysis83 and proposed to interact with the negatively charged triphosphate tail of the incoming nucleotide respectively. Moreover, two aromatic residues, Tyr 754 and Phe 756, that could potentially interact with the nascent base pair resembles Tyr 755 and Phe 757 in our modeled structure. These residues were therefore considered as important in inhibition of DNA binding and catalysis of the dnaE subunit.

Figure 6

Active site residues (shown in green) of the Escherichia coli O157:H7 Sakai strand DNA polymerase III alpha.

Note: Figure prepared by CASTp server.

Twenty-four ligand molecules of E. coli K-12 obtained from BindingDB database were docked on the receptor. Molecules showing lowest binding energy were listed in Table 2. The 251D molecule is a potent inhibitor of the bacterial replicative dnaE.84 Information on these molecules’ binding interaction was taken as reference to predict convenient inhibitors of dnaE of E. coli O157:H7 Sakai strand from the molecules of DrugBank (Figure 7A). The top 100 hits from DrugBank showing lower binding energies after being docked five times were filtered for Lipinski’s rule of five85 that reduced the compounds to 59. To avoid off target binding, the compounds having human targets were excluded. The top two hits showed stable or nearly stable binding energy and good clustering performances from the docking results (Table 3) and structures are shown in Figure 8. Among the final high affinity binding molecules, DB04118 (N-coeleneterazine) and DB04698 (N-(1,4-dihydro-5H-tetrazol-5-ylidene)-9-oxo-9H-xanthene-2-sulfonamide) were found to interact with important residues (Figure 7B) required for DNA binding and catalysis. According to DrugBank database, both molecules are experimental drugs. DB04118 has no specific target yet. However, 3-dehydroquinate dehydratase of Helicobacter pylori (American Type Culture Collection strain 700392/26695), that catalyzes a trans-dehydration via an enolate intermediate, is the target for DB04698. These molecules could be proposed as potential inhibitors of the E. coli O157:H7 Sakai strand dnaE. For the identification of potential drug candidates human intestinal absorption is imperative.86 By using PreADMET we found human intestinal absorption was 95.36% and 88.10% for DB04118 and DB04698 respectively, indicating well-absorbed compounds (70%~100% according to PreADMET) that is desirable for drug candidates. As only the unbound drug is necessary for diffusion or transport across the cell membranes and target-drug interaction, we predicted percent drug bound in plasma protein. Plasma protein binding prediction results showed 99.28% and 100.00% plasma protein binding for DB04118 and DB04698 respectively, indicating strongly bound chemicals which are not desirable. Caco-2 cell model serves as a reliable in vitro model for the prediction of oral drug absorption. PreADMET predicted 17.45 (nm/second) and 0.36 (nm/second) Caco-2 cell permeability for DB04118 and DB04698 respectively that are considered middle and low permeability respectively. Both drugs exhibit good oral bioavailability and mostly negative result in Ames test as predicted by FAF-Drugs2 and PreADMET toxicity prediction (data not shown) and no evidence of carcinogenic activity for only DB04698 as predicted by PreADMET rodent carcinogenicity prediction. We also analyzed other ADMET properties like blood–brain barrier penetration, skin permeability, MDCK cell, and P-gp inhibition, and in all of these cases results are positive for our proposed compounds (data not shown).

Table 2

Lowest docking energies and important residues of the binding site observed to be interactive with the ligands from BindingDB database

No	Compounds from BindingDB	Important amino acid residues involved	Docking energy (Kcal/mol)
1	ZINC 5117079	SER365, PHE392, ARG391, ARG397, MET400, ASP402, ASP404, ARG711	−8.8
2	CID 9809878	SER365, ARG391, ARG711, PHE392, ASP402, ASP404, GLY364, MET400, ARG397, SER545, GLY56, VAL52, LYS30, LYS53, ALA57, TYR549	−8.7
3	ZINC 28356629	SER365, PHE392, ARG391, ARG397, MET400, ASP402, ASP404, ARG711, PHE757, ASN758, HIS761	−8.5

Figure 7

Three-dimensional representation.

Notes: Three-dimensional representation of the interactive residues on the binding site of the protein when it interacts with (A) active inhibitors (ligands) respectively with CID 9809878; ZINC 5117079 and ZINC 28356629 and (B) top binding affinity molecules DB04118 and DB04698. The color indicator on the left side shows the types of interaction of particular residues.

Table 3

Lowest docking energies, important residues of the binding site observed to be interactive with the ligands from DrugBank, percentage of human intestinal absorption and plasma protein binding, Caco-2 cell permeability and carcinogenicity in rats

No	DrugBank compounds	Important amino acid residues involved in interactions	Docking energy (Kcal/mol)	Human intestinal absorption %	Plasma protein binding %	Caco-2 cell permeability (nm/second)	Carcinogenicity (Rats)
1	DB04118	LYS33, SER365, ARG391, ARG397, VAL398, ASP402, MET400, GLU548, TYR549, SER545	−9.8	95.36	99.28	17.4462	Negative
2	DB04698	SER365, PHE392, ARG391, ARG397, MET400, ASP402, GLU548, TYR549, LYS554, ARG711	−9.9	88.10	100.00	0.362186	Positive

Figure 8

Structure of top hit compounds by in silico screening.

Notes: (A) DB04118 (N-Coeleneterazine), (B) DB04698 (N-(1,4-Dihydro-5H-tetrazol-5-ylidene)-9-oxo-9H-xanthene-2-sulfonamide).

Conclusion

The overall picture emerging from this study was the identification of broad-spectrum antibiotic target as well as prediction of potential inhibitors of DNA III alpha of deadly pathogen E.coli 0157:H7 using extensive in silico tools. Bacterial replicative dnaE is a member of C family of polymerases87,88 that are unique in terms of sequence. Bacterial replicative DNA polymerase III share no sequence similarity and is strikingly different from canonical DNA polymerases including those of eukaryotic replicative polymerases.89 It has been reported that DNA polymerases are responsible for pathogen survival and drug resistance so they have been considered to be a drug target in a broad group of Gram-positive pathogens such as Staphylococcus, Streptococcus, Enterococcus, and Mycoplasma.90 It has also been reported that DNA polymerase III inactivation would impede survival of Mycobacterium tuberculosis within the host.91 Since bacterial evolution aims to acquire resistance against single or multiple antibiotics to ensure their survival in the environment, the development, not only of new conventional antibiotics but also of novel compounds and alternative strategies for the battle against bacterial infections, is becoming a topical and widely recognized need. In this study, a number of criteria such as essentiality of proteins, dissimilarity with host, conservation among pathogens, availability in drug databases, virtual screening etc, were utilized to explore potential drug targets as well as to predict a drug that can block dnaE of deadly E coli O157:H7 strain. This in silico strategy can be used for screening novel and alternative targets in a way to design and develop new drugs against other emerging human pathogens.

88 in total

1. Evaluation of human intestinal absorption data and subsequent derivation of a quantitative structure-activity relationship (QSAR) with the Abraham descriptors.

Authors: Y H Zhao; J Le; M H Abraham; A Hersey; P J Eddershaw; C N Luscombe; D Butina; G Beck; B Sherborne; I Cooper; J A Platts; D Boutina
Journal: J Pharm Sci Date: 2001-06 Impact factor: 3.534

2. Functional characterization in vitro of all two-component signal transduction systems from Escherichia coli.

Authors: Kaneyoshi Yamamoto; Kiyo Hirao; Taku Oshima; Hirofumi Aiba; Ryutaro Utsumi; Akira Ishihama
Journal: J Biol Chem Date: 2004-11-02 Impact factor: 5.157

Review 3. Recent advances and method development for drug target identification.

Authors: Janet N Y Chan; Corey Nislow; Andrew Emili
Journal: Trends Pharmacol Sci Date: 2009-12-07 Impact factor: 14.819

Review 4. Genomic-scale prioritization of drug targets: the TDR Targets database.

Authors: Fernán Agüero; Bissan Al-Lazikani; Martin Aslett; Matthew Berriman; Frederick S Buckner; Robert K Campbell; Santiago Carmona; Ian M Carruthers; A W Edith Chan; Feng Chen; Gregory J Crowther; Maria A Doyle; Christiane Hertz-Fowler; Andrew L Hopkins; Gregg McAllister; Solomon Nwaka; John P Overington; Arnab Pain; Gaia V Paolini; Ursula Pieper; Stuart A Ralph; Aaron Riechers; David S Roos; Andrej Sali; Dhanasekaran Shanmugam; Takashi Suzuki; Wesley C Van Voorhis; Christophe L M J Verlinde
Journal: Nat Rev Drug Discov Date: 2008-10-17 Impact factor: 84.694

5. Atrophic rhinitis caused by Cedecea davisae with accompanying mucocele.

Authors: Ömer Bayır; Gökçe Aksoy Yıldırım; Güleser Saylam; Elvan Yüksel; Ali Özdek; Mehmet Hakan Korkmaz
Journal: Kulak Burun Bogaz Ihtis Derg Date: 2015

Review 6. Pathogenic Escherichia coli.

Authors: James B Kaper; James P Nataro; Harry L Mobley
Journal: Nat Rev Microbiol Date: 2004-02 Impact factor: 60.633

7. UniProt Knowledgebase: a hub of integrated protein data.

Authors: Michele Magrane
Journal: Database (Oxford) Date: 2011-03-29 Impact factor: 3.451

8. From genomics to chemical genomics: new developments in KEGG.

Authors: Minoru Kanehisa; Susumu Goto; Masahiro Hattori; Kiyoko F Aoki-Kinoshita; Masumi Itoh; Shuichi Kawashima; Toshiaki Katayama; Michihiro Araki; Mika Hirakawa
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi.

Authors: Alexander G Holman; Paul J Davis; Jeremy M Foster; Clotilde K S Carlow; Sanjay Kumar
Journal: BMC Microbiol Date: 2009-11-28 Impact factor: 3.605

10. Comparative genomics analysis of Mycobacterium ulcerans for the identification of putative essential genes and therapeutic candidates.

Authors: Azeem Mehmood Butt; Izza Nasrullah; Shifa Tahir; Yigang Tong
Journal: PLoS One Date: 2012-08-13 Impact factor: 3.240

19 in total

1. Study of intra-inter species protein-protein interactions for potential drug targets identification and subsequent drug design for Escherichia coli O104:H4 C277-11.

Authors: Shakhinur Islam Mondal; Zabed Mahmud; Montasir Elahi; Arzuba Akter; Nurnabi Azad Jewel; Md Muzahidul Islam; Sabiha Ferdous; Taisei Kikuchi
Journal: In Silico Pharmacol Date: 2017-04-11

2. Putative vaccine candidates and drug targets identified by reverse vaccinology and subtractive genomics approaches to control Haemophilus ducreyi, the causative agent of chancroid.

Authors: Alissa de Sarom; Arun Kumar Jaiswal; Sandeep Tiwari; Letícia de Castro Oliveira; Debmalya Barh; Vasco Azevedo; Carlo Jose Oliveira; Siomar de Castro Soares
Journal: J R Soc Interface Date: 2018-05 Impact factor: 4.118

3. Potential Therapeutic Candidates against Chlamydia pneumonia Discovered and Developed In Silico Using Core Proteomics and Molecular Docking and Simulation-Based Approaches.

Authors: Roqayah H Kadi; Khadijah A Altammar; Mohamed M Hassan; Abdullah F Shater; Fayez M Saleh; Hattan Gattan; Bassam M Al-Ahmadi; Qwait AlGabbani; Zuhair M Mohammedsaleh
Journal: Int J Environ Res Public Health Date: 2022-06-15 Impact factor: 4.614

4. Finding Potential Therapeutic Targets against Shigella flexneri through Proteome Exploration.

Authors: Mohammad Uzzal Hossain; Md Arif Khan; Abu Hashem; Md Monirul Islam; Mohammad Neaz Morshed; Chaman Ara Keya; Md Salimullah
Journal: Front Microbiol Date: 2016-11-22 Impact factor: 5.640

Review 5. DNA replication proteins as potential targets for antimicrobials in drug-resistant bacterial pathogens.

Authors: Erika van Eijk; Bert Wittekoek; Ed J Kuijper; Wiep Klaas Smits
Journal: J Antimicrob Chemother Date: 2017-05-01 Impact factor: 5.790

6. An In Silico Identification of Common Putative Vaccine Candidates against Treponema pallidum: A Reverse Vaccinology and Subtractive Genomics Based Approach.

Authors: Arun Kumar Jaiswal; Sandeep Tiwari; Syed Babar Jamal; Debmalya Barh; Vasco Azevedo; Siomar C Soares
Journal: Int J Mol Sci Date: 2017-02-14 Impact factor: 5.923

7. Reverse vaccinology and subtractive genomics reveal new therapeutic targets against Mycoplasma pneumoniae: a causative agent of pneumonia.

Authors: Thaís Cristina Vilela Rodrigues; Arun Kumar Jaiswal; Alissa de Sarom; Letícia de Castro Oliveira; Carlo José Freire Oliveira; Preetam Ghosh; Sandeep Tiwari; Fábio Malcher Miranda; Leandro de Jesus Benevides; Vasco Ariston de Carvalho Azevedo; Siomar de Castro Soares
Journal: R Soc Open Sci Date: 2019-07-31 Impact factor: 2.963

8. In Silico Prediction and Prioritization of Novel Selective Antimicrobial Drug Targets in Escherichia coli.

Authors: Frida Svanberg Frisinger; Bimal Jana; Stefano Donadio; Luca Guardabassi
Journal: Antibiotics (Basel) Date: 2021-05-25

9. Subtractive proteomics to identify novel drug targets and reverse vaccinology for the development of chimeric vaccine against Acinetobacter baumannii.

Authors: Vandana Solanki; Vishvanath Tiwari
Journal: Sci Rep Date: 2018-06-13 Impact factor: 4.379

10. An integrative, multi-omics approach towards the prioritization of Klebsiella pneumoniae drug targets.

Authors: Pablo Ivan Pereira Ramos; Darío Fernández Do Porto; Esteban Lanzarotti; Ezequiel J Sosa; Germán Burguener; Agustín M Pardo; Cecilia C Klein; Marie-France Sagot; Ana Tereza R de Vasconcelos; Ana Cristina Gales; Marcelo Marti; Adrián G Turjanski; Marisa F Nicolás
Journal: Sci Rep Date: 2018-07-17 Impact factor: 4.379