| Literature DB >> 20438628 |
Kitiporn Plaimas1, Roland Eils, Rainer König.
Abstract
BACKGROUND: Identifying essential genes in bacteria supports to identify potential drug targets and an understanding of minimal requirements for a synthetic cell. However, experimentally assaying the essentiality of their coding genes is resource intensive and not feasible for all bacterial organisms, in particular if they are infective.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20438628 PMCID: PMC2874528 DOI: 10.1186/1752-0509-4-56
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1The workflow. Essentiality data for an organism was taken from an experimental genome wide knock-out screen. It was used to train the machine learning system using features that based on the topology of the metabolic network and genomic and transcriptomic data. The trained classifier was then applied to another organism (query organism) for which the essentiality for each gene in the metabolism was predicted.
List of all features.
| Short form | Explanation |
|---|---|
| RUP | Reachable/Unreachable Products (RUP): equals one if all products could be produced when blocking the reaction, otherwise zero |
| PUP | Percentage of Unreachable Products (PUP): the percentage of products which cannot be produced when blocking the reaction |
| ND | Number of Deviations (ND) |
| APL | Average Path Length (APL): the average path length of the deviations |
| LSP | Length of the Shortest Path (LSP): the length of the shortest path of the deviations |
| NS | Number of Substrates (NS) |
| NP | Number of Products (NP) |
| NNR | Number of Neighboring Reactions (NNR) |
| NNNR | Number of Neighbors of Neighboring Reactions (NNNR) |
| CCV | Clustering Coefficient Value (CCV): clustering coefficient of a reaction |
| DIR | Directionality of a reaction (DIR) |
| CP | Choke Point (CP): a reaction is a choke point or not (Rahman |
| LS | Load Score (LS): load score of a reaction (Rahman |
| NDR | Number of Damaged Reactions (NDR) (Lemke |
| NDC | Number of Damaged Compounds (NDC) (Lemke |
| NDRD | Number of Damaged Reactions having no Deviations (NDRD): the number of damaged reactions that have no other alternative paths to be reached after blocking a reaction |
| NDCD | Number of Damaged Compounds having no Deviations (NDCD): the number of damaged compounds that have no other alternative paths to be reached after blocking a reaction |
| NDCR | Number of Damaged Choke point Reactions (NDCR) |
| NDCC | Number of Damaged Choke point Compounds (NDCC) |
| NDCRD | Number of Damaged Choke point Reactions having no Deviations (NDCRD): the number of damaged choke point reactions that have no other alternative paths to be reached after blocking a reaction |
| NDCCD | Number of Damaged Choke point Compounds having no Deviations (NDCCD): the number of damaged choke point compounds that have no other alternative paths to be reached after blocking a reaction |
| BW | Betweenness centrality |
| CN | Closeness centrality |
| EC | Eccentricity centrality |
| EV | Eigenvector centrality |
| NAR | Number of Associated Reactions (NAR): the number of reactions that base on the knocked-out gene |
| Hn | Homology at different expectation values: the number of homologous genes with e-value cutoff 10-30,10-20,10-10,10-7,10-5,10-3 (H30, H20, H10, H7, H5, H3) |
| NGSE | Number of Genes having Similar Expression (NGSE): the number of genes that have similar expression (correlation coefficient >0.8) |
| MCC | Maximum of Correlation Coefficients (MCC): maximum value of the correlation coefficients for all neighboring genes |
| PR | Phyletic Retention (PR): the number of orthologs in the other prokaryotes |
| Nc | Number of codons |
| N3s | Base composition at silent sites (T3s, C3s, A3s, G3s) |
| glt | The frequency of amino acids glutamine (exemplarily) |
Figure 2ROC curves of the prediction performances. (A) 100 Support Vector Machines were trained with the datasets ecoB and ecoG, respectively, and were then queried using the datasets from P. aeruginosa (union of the datasets paeL and paeJ). The number of machines predicting essentiality was summed up (voting score). Results from varying thresholds of the voting score were compared to the experimental results of paeL and paeJ yielding the ROC curves (area under the curve: 0.80 and 0.79, respectively). (B) Similar to (A) only that the machines were trained with the datasets of P. aeruginosa and queried with the datasets of E. coli resulting in ROC curves with AUC = 0.81 and 0.75 for the datasets ecoB and ecoG, respectively.
Figure 3Correlation of the features to essentiality. The feature-values of each gene were correlated with the essentiality of the gene (1 = essential, 0 = non-essential). (A) shows the correlation coefficients for the topology features, (B) for the genomic and transcriptomic features. High values indicate that the feature was positively correlated to essentiality (see Additional file 2: SupplementS2 for all correlation coefficients). These values were obtained for all gold standards (ecoB, ecoG for E. coli and paeJ, paeL for P. aeruginosa).
Predicted essential genes and potential drug targets.
| ORF | Gene Symbol | EC | Enzyme | Evidence |
|---|---|---|---|---|
| STM0123 | murE | 6.3.2.13 | UDP-N-acetylmuramoylalanyl-D-glutamate-2,6-diaminopimelate ligase | ** |
| STM0128 | murG | 2.4.1.227 | N-acetylglucosaminyl transferase | * |
| STM0129 | murC | 6.3.2.8 | UDP-N-acetylmuramate-L-alanine ligase | ** |
| STM0154 | lpdA | 1.8.1.4 | Dihydrolipoamide dehydrogenase | |
| STM0218 | pyrH | 2.7.4.22 | Uridylate kinase | * |
| STM0221 | uppS | 2.5.1.31 | Undecaprenyl pyrophosphate synthase | ** |
| STM0222 | cdsA | 2.7.7.41 | CDP-diglyceride synthase | |
| STM0228 | lpxA | 2.3.1.129 | UDP-N-acetylglucosamine acyltransferase | |
| STM0232 | accA | 6.4.1.2 | Acetyl-CoA carboxylase | ** |
| STM0489 | hemH | 4.99.1.1 | Ferrochelatase | * |
| STM0535 | lpxH | UDP-2,3-diacylglucosamine hydrolase | ||
| STM0542 | folD | 1.5.1.5, 3.5.4.9 | Bifunctional 5,10-methylene-tetrahydrofolate dehydrogenase | |
| STM0988 | kdsB | 2.7.7.38 | CTP:CMP-KDO cytidylyltransferase | * |
| STM1194 | fabD | 2.3.1.39 | Acyl carrier protein S-malonyltransferase | * |
| STM1195 | fabG | 1.1.1.100 | 3-ketoacyl-(acyl-carrier-protein) reductase | ** |
| STM1200 | tmk | 2.7.4.9 | Thymidylate kinase | |
| STM1700 | fabI | 1.3.1.10 | Enoyl-(acyl carrier protein) reductase | |
| STM2483 | dapE | 3.5.1.18 | Succinyl-diaminopimelate desuccinylase | |
| STM2652 | pssA | 2.7.8.8 | Phosphatidylserine synthase | * |
| STM3090 | metK | 2.5.1.6 | ||
| STM3415 | rpoA | 2.7.7.6 | DNA-directed RNA polymerase subunit alpha | |
| STM3724 | kdtA | 3-deoxy-D-manno-octulosonic-acid transferase | * | |
| STM3730 | dfp | 4.1.1.36 | Pantothenate kinase | ** |
| STM3912 | rep | 3.6.1.- | ATP-dependent DNA helicase Rep | * |
| STM3978 | yigC | 3-octaprenyl-4-hydroxybenzoate decarboxylase | ||
| STM4153 | rpoB | 2.7.7.6 | DNA-directed RNA polymerase subunit beta | * |
| STM4154 | rpoC | 2.7.7.6 | DNA-directed RNA polymerase subunit beta' | |
| STM0049 | ispH, lytB | 1.17.1.2 | 4-hydroxy-3-methylbut-2-enyl diphosphate reductase | * |
| STM0220 | dxr | 1.1.1.267 | 1-deoxy-D-xylulose 5-phosphate reductoisomerase | * |
| STM0422 | dxs | 2.2.1.7 | 1-deoxy-D-xylulose-5-phosphate synthase | ** |
| STM0423 | ispA | 2.5.1.10 | geranyltranstransferase | * |
| STM1779 | ispE, ipk | 2.7.1.149 | 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase | ** |
| STM2523 | ispG, gcpE | 1.17.7.1 | 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase | * |
| STM2929 | ispF | 4.6.1.12 | 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase | * |
| STM2930 | ispD | 2.7.7.60 | 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase | * |
** clear evidence to be a drug target
* reasonable evidence to serve as a good drug target
Figure 4The non-mevalonate pathway. The non-mevalonate pathway produces isopentenyl diphosphate (IPP). It is an alternative pathway in bacteria and does not exist in the human host which uses the mevalonate pathway to produce IPP. The non-melavonate pathway was highly enriched with genes that were predicted to be essential. Reactions are given by their EC number and the gene symbol of the genes of the corresponding enzymes.