| Literature DB >> 24931980 |
Masaaki Kotera1, Yasuo Tabei1, Yoshihiro Yamanishi2, Ai Muto1, Yuki Moriya1, Toshiaki Tokimatsu1, Susumu Goto1.
Abstract
MOTIVATION: Metabolic pathway analysis is crucial not only in metabolic engineering but also in rational drug design. However, the biosynthetic/biodegradation pathways are known only for a small portion of metabolites, and a vast amount of pathways remain uncharacterized. Therefore, an important challenge in metabolomics is the de novo reconstruction of potential reaction networks on a metabolome-scale.Entities:
Mesh:
Year: 2014 PMID: 24931980 PMCID: PMC4058936 DOI: 10.1093/bioinformatics/btu265
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of training models and predictions of new compound–compound pairs. (a) Flowchart of training models using diff-common feature vectors. The same procedure is conducted for diff-only feature vectors as well. See Sections 3.1 and 3.2 for more details. (b) Flowchart of predicting the k-step reaction sequences. The k-th step is predicted by whether . See Sections 3.1 and 3.3 for more details
Fig. 2.k-step reaction sequences and intermediate compound prediction
Cross-validation on the 1-step reaction sequence likeness (enzymatic-reaction likeness)
| Chemical | Diff-common L1SVM | Diff-only L1SVM | Baseline | Random | ||||
|---|---|---|---|---|---|---|---|---|
| fingerprints/descriptors | AUC | AUPR | AUC | AUPR | AUC | AUPR | AUC | AUPR |
| CDK extended | 0.6917 | 0.0603 | 0.6742 | 0.0468 | 0.6161 | 0.0289 | 0.5000 | 0.0199 |
| MACCS | 0.6837 | 0.0489 | 0.6582 | 0.0342 | 0.5914 | 0.0189 | 0.5000 | 0.0199 |
| PubChem | 0.7170 | 0.0531 | 0.7026 | 0.0422 | 0.6752 | 0.0307 | 0.5000 | 0.0199 |
| NS-descriptor | 0.8858 | 0.2134 | 0.8429 | 0.0968 | 0.6566 | 0.0446 | 0.5000 | 0.0199 |
| KCF-S descriptor | 0.9659 | 0.3943 | 0.9610 | 0.2801 | 0.6945 | 0.0755 | 0.5000 | 0.0199 |
Cross-validation on the 2-step reaction sequence-likeness prediction (with one intermediate compound)
| Chemical | Diff-common L1SVM | Diff-only L1SVM | Baseline | Random | ||||
|---|---|---|---|---|---|---|---|---|
| fingerprints/descriptors | AUC | AUPR | AUC | AUPR | AUC | AUPR | AUC | AUPR |
| CDK extended | 0.7747 | 0.1730 | 0.7178 | 0.1352 | 0.4815 | 0.0576 | 0.5000 | 0.0665 |
| MACCS | 0.7474 | 0.1418 | 0.6634 | 0.1152 | 0.4465 | 0.0502 | 0.5000 | 0.0665 |
| PubChem | 0.7674 | 0.1589 | 0.7270 | 0.1357 | 0.5732 | 0.0710 | 0.5000 | 0.0665 |
| NS-descriptor | 0.8898 | 0.2937 | 0.8673 | 0.2651 | 0.6187 | 0.0937 | 0.5000 | 0.0665 |
| KCF-S descriptor | 0.9411 | 0.4493 | 0.9419 | 0.4473 | 0.6621 | 0.0635 | 0.5000 | 0.0665 |
Cross-validation on the 3-step reaction sequence-likeness prediction (with two intermediate compounds)
| Chemical | Diff-common L1SVM | Diff-only L1SVM | Baseline | Random | ||||
|---|---|---|---|---|---|---|---|---|
| fingerprints/descriptors | AUC | AUPR | AUC | AUPR | AUC | AUPR | AUC | AUPR |
| CDK extended | 0.8103 | 0.1436 | 0.7542 | 0.0959 | 0.5474 | 0.0368 | 0.5000 | 0.0367 |
| MACCS | 0.7608 | 0.0986 | 0.6770 | 0.0713 | 0.4959 | 0.0309 | 0.5000 | 0.0367 |
| PubChem | 0.8097 | 0.1239 | 0.7656 | 0.0910 | 0.6365 | 0.0489 | 0.5000 | 0.0367 |
| NS-descriptor | 0.9284 | 0.2638 | 0.9028 | 0.1989 | 0.7069 | 0.0807 | 0.5000 | 0.0367 |
| KCF-S descriptor | 0.9624 | 0.4232 | 0.9585 | 0.4094 | 0.6621 | 0.0635 | 0.5000 | 0.0367 |
Cross-validation on the 4-step reaction sequence-likeness prediction (with three intermediate compounds)
| Chemical | Diff-common L1SVM | Diff-only L1SVM | Baseline | Random | ||||
|---|---|---|---|---|---|---|---|---|
| fingerprints/descriptors | AUC | AUPR | AUC | AUPR | AUC | AUPR | AUC | AUPR |
| CDK extended | 0.8577 | 0.1062 | 0.7867 | 0.0649 | 0.5863 | 0.0172 | 0.5000 | 0.0156 |
| MACCS | 0.7663 | 0.0582 | 0.6898 | 0.0351 | 0.5187 | 0.0141 | 0.5000 | 0.0156 |
| PubChem | 0.8536 | 0.0818 | 0.7962 | 0.0481 | 0.6590 | 0.0234 | 0.5000 | 0.0156 |
| NS-descriptor | 0.9535 | 0.2058 | 0.9304 | 0.1341 | 0.7521 | 0.0436 | 0.5000 | 0.0156 |
| KCF-S descriptor | 0.9772 | 0.3283 | 0.9837 | 0.3202 | 0.7039 | 0.0315 | 0.5000 | 0.0156 |
Fig. 3.Self-rank distributions for the intermediate compounds in the 2-step reaction sequences (upper left), 3-step reaction sequences (upper middle and upper right) and 4-step reaction sequences (bottom left, bottom middle and bottom right)
Fig. 4.Extracted substructures specific to k-step reaction sequences (green) and reaction center-related substructures (red)
Fig. 5.Examples of falsely predicted reaction sequences. Colors represent structural changes during the reaction sequences. (a) The intermediate was possibly correct, but the number of steps was possibly wrong. Stereoisomerization was not considered. (b) Not including the distinction of geometric isomers (in purple), the number of steps was possibly correct. The intermediates were possibly wrong