| Literature DB >> 25573232 |
Syed Shah Hassan, Sandeep Tiwari, Luís Carlos Guimarães, Syed Babar Jamal, Edson Folador, Neha Barve Sharma, Siomar de Castro Soares, Síntia Almeida, Amjad Ali, Arshad Islam, Fabiana Dias Póvoa, Vinicius Augusto Carvalho de Abreu, Neha Jain, Antaripa Bhattacharya, Lucky Juneja, Anderson Miyoshi, Artur Silva, Debmalya Barh, Adrian Gustavo Turjanski, Vasco Azevedo, Rafaela Salgado Ferreira.
Abstract
Corynebacterium pseudotuberculosis (Cp) is a pathogenic bacterium that causes caseous lymphadenitis (CLA), ulcerative lymphangitis, mastitis, and edematous to a broad spectrum of hosts, including ruminants, thereby threatening economic and dairy industries worldwide. Currently there is no effective drug or vaccine available against Cp. To identify new targets, we adopted a novel integrative strategy, which began with the prediction of the modelome (tridimensional protein structures for the proteome of an organism, generated through comparative modeling) for 15 previously sequenced C. pseudotuberculosis strains. This pan-modelomics approach identified a set of 331 conserved proteins having 95-100% intra-species sequence similarity. Next, we combined subtractive proteomics and modelomics to reveal a set of 10 Cp proteins, which may be essential for the bacteria. Of these, 4 proteins (tcsR, mtrA, nrdI, and ispH) were essential and non-host homologs (considering man, horse, cow and sheep as hosts) and satisfied all criteria of being putative targets. Additionally, we subjected these 4 proteins to virtual screening of a drug-like compound library. In all cases, molecules predicted to form favorable interactions and which showed high complementarity to the target were found among the top ranking compounds. The remaining 6 essential proteins (adk, gapA, glyA, fumC, gnd, and aspA) have homologs in the host proteomes. Their active site cavities were compared to the respective cavities in host proteins. We propose that some of these proteins can be selectively targeted using structure-based drug design approaches (SBDD). Our results facilitate the selection of C. pseudotuberculosis putative proteins for developing broad-spectrum novel drugs and vaccines. A few of the targets identified here have been validated in other microorganisms, suggesting that our modelome strategy is effective and can also be applicable to other pathogens.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25573232 PMCID: PMC4243142 DOI: 10.1186/1471-2164-15-S7-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Strains of C. pseudotuberculosis employed in the pan-modelome study, and their respective information regarding genomes statistics, disease prevalence and broad-spectrum hosts.
| Strains | GPID | NCBI Accession | Genome Size (Mb) | Number of Proteins | G+C% | Hosts/ | Nitrate's Reduction/ | Clinical Manifestation | Sequencing Technology |
|---|---|---|---|---|---|---|---|---|---|
| Cp1/06-A | 73235 | NC_017308.1 | 2.28 | 1,963 | 52.2 | Horse/USA | Positive/ | Abscess | Illumina |
| Cp31 | 73223 | NC_017730.1 | 2.3 | 2,063 | 52.2 | Buffalo/Egypt | Positive/ | Abscess | Ion Torrent, |
| Cp258 | 157069 | NC_017945.1 | 2.31 | 2,088 | 52.1 | Horse/Belgium | Positive/ | Ulcerative lymphangitis | SOLiD v3 |
| Cp316 | 71591 | NC_016932.1 | 2.31 | 2,106 | 52.1 | Horse/USA | Positive/ | Abscess | Ion Torrent |
| CpCIP52.97 | 61117 | NC_017307.1 | 2.32 | 2,057 | 52.1 | Horse/Kenya | Positive/ | Ulcerative Lymphangitis | SOLiD v2 |
| Cp162 | 89445 | NC_018019.1 | 2.29 | 2,002 | 52.2 | Camel/UK | Positive/ | Neck Abscess | SOLiD v3 |
| CpP54B96 | 77871 | NC_017031.1 | 2.34 | 2,084 | 52.2 | Antelope/S. Africa | Negative/ | CLA Abscess | Ion Torrent, |
| Cp267 | 73515 | NC_017462.1 | 2.34 | 2,148 | 52.2 | Lhama/USA | Negative/ | CLA Abscess | SOLiD v3 |
| Cp1002 | 40687 | NC_017300.1 | 2.34 | 2,097 | 52.2 | Goat/Brazil | Negative/ | CLA Abscess | 454, Sanger |
| Cp42/02-A | 73233 | NC_017306.1 | 2.34 | 2,051 | 52.2 | Sheep/Australia | Negative/ | CLA Abscess | Illumina |
| CpC231 | 40875 | NC_017301.1 | 2.33 | 2,095 | 52.2 | Sheep/Australia | Negative/ | CLA Abscess | 454, Sanger |
| CpI19 | 52845 | NC_017303.1 | 2.34 | 2,099 | 52.2 | Bovine/Israel | Negative/ | Bovine Mastitis Abscess | SOLiD v2 |
| Cp3/99-5 | 73231 | NC_016781.1 | 2.34 | 2,142 | 52.2 | Sheep/Scotland | Negative/ | CLA | Illumina |
| CpPAT10 | 61115 | NC_017305.1 | 2.34 | 2,089 | 52.2 | Sheep/Argentina | Negative/ | Lung Abscess | SOLiD v2 |
| CpFRC41 | 48979 | NC_014329.1 | 2.34 | 2,104 | 52.2 | Human/France | Negative/ | Necrotizing lymphadenitis | SOLiD v3 |
Figure 1High-throughputness (efficiency) of the MHOLline biological workflow for genome-scale modelome (3D models) prediction. Predicted proteomes from the genomes of 15 C. pseudotuberculosis strains were fed to the MHOLline workflow in FASTA format. The blue line represents the number of input data, according to the left-hand side y-axis. The bars show the number in the form of MHOLline output data (according to the right-hand side y-axis) of: not aligned sequences (G0, green bars); sequences for which there is a template structure available at RCSB PDB (yellow bars); sequences with acceptable template structures that where modeled in the MHOLline workflow (G2, red bars); sequences with predicted transmembrane regions (HMMTOP, purple bars) and the number of sequences that were predicted as enzymes in each genome and were assigned an EC number (ECNGet, gray bars). The x-axis represents the C. pseudotuberculosis genomes used in this study.
Figure 2Overview of different computational steps employed in the identification of putative essential targets (non-host homologous and host homologous) for drugs and vaccines from the core-proteome of 15 . Figure 2b. Intra-species subtractive modelomics workflow for conserved targets identification in C. pseudo tuberculosis species. The table (from left to right) represents the total number of protein sequences as an input data in fasta format fed to the MHOLline workflow (upper forward arrow). The remaining columns show the output data of group G2 (upper backward arrow), first by BATS and then by Filter tools of the MHOLline workflow respectively. Columns 4th-7th constitute the number of protein sequences of different qualities of all 15 Cp strains, where the sequences of 14 Cp strains were compared using BLASTp, to the sequences of Cp1002 strain as reference, for the identification of conserved protein targets (core-modelome). The funnel shows how this workflow processes and filters a large quantity of genomic data for putative drug and vaccine targets identification of a pathogen.
Drug and/or vaccine targets prioritization parameters and functional annotation of the four essential non-host homologous putative targets.
| Gene and protein codes | Official full name | Number of cavities with Drug Scorea | Number of cavities with Drug Scorea | Mol. Wt | Functionsc | Cellular componentd | Pathwayse | Virulencef |
|---|---|---|---|---|---|---|---|---|
| Cp1002_0515 | DNA-binding response regulator mtrA | 1 | 2 | 25.97 | Intracellular/ | Two-component signaling systems | Yes | |
| Cp1002_0742 | 4-hydroxy-3-methylbut-2-enyl diphosphatereductase | 1 | 4 | 36.59 | Cytoplasm | Inositol phosphate metabolism/ Pentose phosphate pathway/Terpene metabolism | Yes | |
| Cp1002_1648 | Two-component system transcriptional regulatory protein | 3 | 2 | 21.93 | Intracellular/ | Two-component system | Yes | |
| Cp1002_1676 | Ribonucleoside-diphosphatereductase alpha chain | 1 | 1 | 88.02 | Cytoplasm | Pyrimidine metabolism/ Purine metabolism | Yes | |
aDruggability predicted with DoGSiteScorer software. A druggability score above 0.60 is considered to be good, but a score above 0.80 is favored [48].
bMolecular weight was determined using ProtParam tool (http://web.expasy.org/protparam/).
cMolecular function (MF) and biological process (BP) for each target protein was determined using UniProt.
dCellular localization of pathogen targets was performed using CELLO.
eKEGG was used to find the role of these targets in different cellular pathways.
fPAIDB was used to check if the putative targets are involved in pathogen's virulence.
Drug and/or vaccine targets prioritization parameters and functional annotation of the six essential host homologous putative targets.
| Gene and protein codes | Official full name | Number of cavities with Drug Scorea | Number of cavities with Drug Scorea | Mol. Wt | Functionsc | Cellular componentd | Pathwayse | Virulencef |
|---|---|---|---|---|---|---|---|---|
| Cp1002_0385 | Adenylate kinase | 0 | 24.120 | Cytoplasm | Purine metabolism; AMP biosynthesis via salvage pathway | |||
| Cp1002_0692 | Glyceraldehyde-3-phosphate dehydrogenase A | 1 | 51.918 | Cytoplasm | Glycolysis/Gluconeogenesis | |||
| Cp1002_0728 | Serine hydroxymethyltransferase | 1 | 46.187 | Cytoplasm | Amino-acid biosynthesis; glycine biosynthesis; One-carbon metabolism; tetrahydrofolate interconversion. | |||
| Cp1002_0738 | Fumaratehydratase class II | 0 | 49.767 | Cytoplasm | Carbohydrate metabolism; tricarboxylic acid cycle; | |||
| Cp1002_1005 | 6-phosphogluconate dehydrogenase | 5 | 53.669 | Cytoplasm | Carbohydrate degradation; pentose phosphate pathway; | |||
| Cp1002_1042 | Aspartate ammonia-lyase | 4 | 52.277 | Cytoplasm | Alanine, aspartate and glutamate metabolism, Nitrogen metabolism | |||
aDruggability predicted with DoGSiteScorer software. A druggability score above 0.60 is usually considered, but a score above 0.80 is favored [48].
bMolecular weight was determined using ProtParam tool (http://web.expasy.org/protparam/).
cMolecular function (MF) and biological process (BP) for each target protein was determined using UniProt.
dCellular localization of pathogen targets was performed using CELLO.
eKEGG was used to find the role of these targets in different cellular pathways.
fPAIDB was used to check if the putative targets are involved in pathogen's virulence.
ZINC codes, MolDock scores and predicted hydrogen bonds for the ten compounds selected among the top ranking 200 molecules against Cp1002_0515 (MtrA, DNA-binding response regulator).
| ZINC IDs | MolDock score | Number of H-bonds/ residues interacting with the compound |
|---|---|---|
| 75109074 | -130.402 | 3 |
| 12117405 | -115.838 | 3 |
| 02546720 | -113.761 | 3 |
| 40266587 | -116.119 | 2 |
| 71405274 | -113.264 | 2 |
| 05687366 | -111.376 | 2 |
| 04730243 | -109.609 | 2 |
| 19720976 | -109.061 | 2 |
| 72342680 | -108.299 | 2 |
ZINC codes, MolDock scores and predicted hydrogen bonds for the ten compounds selected among the top ranking 200 molecules against Cp1002_0742 (IspH, 4-hydroxy-3-methyl but-2-enyl diphosphate reductase).
| ZINC IDs | MolDock score | Number of H-bonds/ residues interacting with the compound |
|---|---|---|
| 00510419 | -151.376 | 7 |
| 00529019 | -129.348 | 5 |
| 04344036 | -135.156 | 8 |
| 04632419 | -136.984 | 6 |
| 04730243 | -129.414 | 10 |
| 05479451 | -129.963 | 9 |
| 05775454 | -161.806 | 3 |
| 16941408 | -126.163 | 6 |
| 04622741 | -127.816 | 12 |
| 14017317 | -129.664 | 8 |
ZINC codes, MolDock scores and predicted hydrogen bonds for the ten compounds selected among the top ranking 200 molecules against Cp1002_1648 (TcsR, Two component transcriptional regulator).
| ZINC IDs | MolDock score | Number of H-bonds/ residues interacting with the compound |
|---|---|---|
| 00510419 | -167.633 | 3 |
| 01617096 | -146.178 | 3 |
| 32911447 | -148.424 | 3 |
| 00091802 | -143.287 | 3 |
| 67847806 | -156.655 | 4 |
| 19399766 | -160.743 | 3 |
| 16980834 | -147.631 | 4 |
| 06269029 | -145.277 | 4 |
| 05934077 | -145.785 | 3 |
| 01647971 | -167.152 | 3 |
ZINC codes, MolDock scores and predicted hydrogen bonds for the ten compounds selected among the top ranking 200 molecules against Cp1002_1676 (NrdI).
| ZINC IDs | MolDock score | Number of H-bonds/ residues interacting with the compound |
|---|---|---|
| 01585114 | -151.406 | 6 |
| 04721321 | -144.134 | 7 |
| 17023683 | -140.718 | 6 |
| 00510419 | -154.064 | 4 |
| 01417445 | -138.997 | 4 |
| 00042420 | -135.363 | 6 |
| 00408361 | -133.535 | 6 |
| 15830653 | -153.83 | 4 |
| 00032839 | -139.327 | 6 |
| 48212336 | -137.675 | 6 |
Percentage of sequence identity between C. pseudotuberculosis and host homologous proteins.
| Protein Locus tag | Official full name | Percentage of Sequence Identity# | |||
|---|---|---|---|---|---|
| HS* | EC* | BT* | OA* | ||
| Cp1002_0385 | Adenylate kinase | 38 | 36 | 35 | 35 |
| Cp1002_0692 | Glyceraldehyde-3-phosphate dehydrogenase A | 39 | 40 | 41 | 41 |
| Cp1002_0728 | Serine hydroxymethyltransferase | 43 | 45 | 45 | 45 |
| Cp1002_0738 | Fumaratehydratase class II | 54 | 54 | No Hits | No Hits |
| Cp1002_1005 | 6-phosphogluconate dehydrogenase | 48 | 48 | 48 | 48 |
| Cp1002_1042 | Aspartate ammonia-lyase | 39 | 39 | 39 | 39 |
Comparison of the residues from druggable cavities in C. pseudotuberculosis proteins and the corresponding residues in structurally aligned host protein cavities.
| Protein Loci | Bacterial Residues for the Most Druggable Cavity Predicted by DGSS Server# | ||||
|---|---|---|---|---|---|
| Lys157 | Asp35 | Asp33 | Asp33 | Asp33 | |
| Val174 | Thr52 | Thr50 | Thr50 | Thr50 | |
| Arg229 | Thr103 | Thr101 | Thr101 | Thr101 | |
| Asn311 | Ala183 | Ala181 | Ala181 | Ala181 | |
| Phe35 | Leu50 | Leu52 | Leu52 | Leu43 | |
| Ile53 | Met68 | Met70 | Met70 | Met61 | |
| Thr64 | Val79 | Val81 | Val81 | Val72 | |
| Cys70 | Ala88 | Thr86 | Thr86 | Thr86 | |
| Ala99 | Ser121 | Ser119 | Ser119 | Ser119 | |
| Ala101 | Ser123 | Ser121 | Ser121 | Ser121 | |
| Trp177 | Thr204 | Thr202 | Thr202 | Thr202 | |
| Pro361 | Ala397 | Ala395 | Ala395 | Ala395 | |
| Ser55 | Thr35 | Thr161 | Thr35 | Thr35 | |
| Met94 | Leu74 | Leu200 | Leu74 | Leu74 | |
| Gln96 | Lys76 | Lys202 | Lys76 | Lys76 | |
| Val104 | Phe84 | Phe210 | Phe84 | Phe84 | |
| Ile148 | Val128 | Val254 | Val128 | Val128 | |
| Gln268 | Lys248 | Lys374 | Lys248 | Lys248 | |
| Pro269 | His249 | Tyr375 | His249 | His249 | |
| Gln193 | His235 | His257 | His235 | His235 | |
| Ile428 | Lys470 | Lys492 | Lys470 | Lys470 | |
| His447 | Leu489 | Leu511 | Leu489 | Leu489 | |
#Drug score ≥ 0.80
*HS = Homo sapiens, EC = Equus caballus, BT = Bos taurus, OA = Ovis aries