| Literature DB >> 35561204 |
Xinmeng Li1, Li-Ping Liu1, Soha Hassoun1,2.
Abstract
MOTIVATION: Despite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Providing computational tools for the exploration of the enzyme-substrate interaction space can expedite experimentation and benefit applications such as constructing synthesis pathways for novel biomolecules, identifying products of metabolism on ingested compounds, and elucidating xenobiotic metabolism. Recommender systems (RS), which are currently unexplored for the enzyme-substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) RSs; however, hinges on the quality of embedding vectors of users and items (enzymes and substrates in our case). Importantly, enhancing CF embeddings with heterogeneous auxiliary data, specially relational data (e.g. hierarchical, pairwise or groupings), remains a challenge.Entities:
Mesh:
Year: 2022 PMID: 35561204 PMCID: PMC9113267 DOI: 10.1093/bioinformatics/btac201
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Boost-RS framework for enzyme–substrate recommendation prediction. (A) Interaction matrix construction from enzymatic reactions. For example for E1, three positive interactions are added to the matrix. (B) The Boost-RS framework that integrates the main task of interaction prediction with related auxiliary tasks. (C) Collaborative filtering models used as baselines
Interaction prediction performance evaluation. Boost-RS performance is bolded.
| Overall | Enzymes | Compounds | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| AP | R-Precision | AUC | MAP | R-Precision | MAP@3 | Precision@1 | MAP | R-Precision | MAP@3 | Precision@1 | |
| A. Baselines and their boosted models | |||||||||||
| DMF | 0.154 | 0.344 | 0.869 | 0.328 | 0.251 | 0.261 | 0.255 | 0.281 | 0.282 | 0.334 | 0.332 |
| Boost-DMF | 0.192 | 0.401 | 0.954 | 0.374 | 0.258 | 0.273 | 0.294 | 0.309 | 0.296 | 0.362 | 0.374 |
| NGCF | 0.169 | 0.328 | 0.810 | 0.333 | 0.274 | 0.278 | 0.261 | 0.277 | 0.297 | 0.347 | 0.326 |
| Boost-NGCF | 0.223 | 0.503 | 0.959 | 0.552 | 0.295 | 0.446 | 0.393 | 0.405 | 0.488 | 0.566 | 0.490 |
| NMF | 0.280 | 0.380 | 0.880 | 0.339 | 0.322 | 0.286 | 0.309 | 0.332 | 0.320 | 0.362 | 0.378 |
| |
|
|
|
|
|
|
|
|
|
|
|
| B. Group data treated as individual attributes and incorporated into RS via either multi-tasking or concatenation | |||||||||||
| Boost-RS_Multi-label | 0.404 | 0.492 | 0.936 | 0.419 | 0.411 | 0.353 | 0.421 | 0.452 | 0.395 | 0.441 | 0.490 |
| NMF-Concat_Multi-label | 0.396 | 0.485 | 0.950 | 0.430 | 0.406 | 0.363 | 0.413 | 0.441 | 0.408 | 0.454 | 0.480 |
| C. Interaction prediction with each auxiliary task using Boost-RS | |||||||||||
| Boost-RS(KO) | 0.296 | 0.377 | 0.857 | 0.321 | 0.315 | 0.280 | 0.321 | 0.349 | 0.319 | 0.347 | 0.381 |
| Boost-RS(FP) | 0.309 | 0.402 | 0.914 | 0.370 | 0.333 | 0.307 | 0.327 | 0.350 | 0.340 | 0.388 | 0.395 |
| Boost-RS(EC) | 0.344 | 0.432 | 0.880 | 0.337 | 0.377 | 0.294 | 0.372 | 0.399 | 0.325 | 0.364 | 0.438 |
| Boost-RS(CC) | 0.419 | 0.548 | 0.936 | 0.527 | 0.447 | 0.447 | 0.467 | 0.492 | 0.487 | 0.553 | 0.546 |
| D. Interaction prediction with each auxiliary data using NMF-Concat_Multi-label | |||||||||||
| NMF-Concat(KO) | 0.285 | 0.386 | 0.872 | 0.346 | 0.327 | 0.296 | 0.319 | 0.343 | 0.337 | 0.375 | 0.384 |
| NMF-Concat(FP) | 0.287 | 0.386 | 0.870 | 0.338 | 0.329 | 0.286 | 0.318 | 0.344 | 0.324 | 0.368 | 0.386 |
| NMF-Concat(EC) | 0.292 | 0.390 | 0.879 | 0.339 | 0.333 | 0.289 | 0.324 | 0.349 | 0.320 | 0.361 | 0.392 |
| NMF-Concat(CC) | 0.322 | 0.408 | 0.868 | 0.351 | 0.343 | 0.297 | 0.351 | 0.372 | 0.342 | 0.380 | 0.412 |
| E. Interaction prediction comparing Boost-RS framework against similarity-based method | |||||||||||
| GRGMF(FP+EC) | 0.189 | 0.407 | 0.946 | 0.362 | 0.282 | 0.266 | 0.293 | 0.307 | 0.281 | 0.361 | 0.387 |
| Boost-RS(FP+EC) | 0.349 | 0.456 | 0.931 | 0.376 | 0.377 | 0.314 | 0.385 | 0.408 | 0.352 | 0.398 | 0.453 |
Note: The best model (Boost-RS) is based on NMF and it exploits auxiliary data via multi-task learning, including hierarchical learning on EC, individual attribute learning on FP and contrastive viewing of KO and CC.
Fig. 2.Visualization using t-SNE for learned representation of enzymes and compounds, shown to the left and right of each sub-panel, respectively. (A) Baseline NMF. (B) Baseline with multi-label KO and CC concatenation (NMF-Concat_Multi-label). (C) Boost-RS with the auxiliary task of learning multi-label KO and CC (Boost-RS_Multi-label). (D) Boost-RS with triplet loss on KO and CC (Boost-RS)
Fig. 3.Example that shows how Boost-RS exploits CC relationships derived from RClass relationships in KEGG. (A) Legend and KEGG data. RClass RC00017 is associated with multiple CC pairs, including C00676 and C02269. (B) NMF prediction is not aware of the relationships between C00676 and C02269, and results in a 0.01 likelihood of interaction between C02269 and enzyme 3.1.3.89. (C) Boost-RS exploits the CC relationships and results in an improved prediction