| Literature DB >> 34094473 |
Eleni E Litsa1, Payel Das2,3, Lydia E Kavraki1.
Abstract
Metabolic processes in the human body can alter the structure of a drug affecting its efficacy and safety. As a result, the investigation of the metabolic fate of a candidate drug is an essential part of drug design studies. Computational approaches have been developed for the prediction of possible drug metabolites in an effort to assist the traditional and resource-demanding experimental route. Current methodologies are based upon metabolic transformation rules, which are tied to specific enzyme families and therefore lack generalization, and additionally may involve manual work from experts limiting scalability. We present a rule-free, end-to-end learning-based method for predicting possible human metabolites of small molecules including drugs. The metabolite prediction task is approached as a sequence translation problem with chemical compounds represented using the SMILES notation. We perform transfer learning on a deep learning transformer model for sequence translation, originally trained on chemical reaction data, to predict the outcome of human metabolic reactions. We further build an ensemble model to account for multiple and diverse metabolites. Extensive evaluation reveals that the proposed method generalizes well to different enzyme families, as it can correctly predict metabolites through phase I and phase II drug metabolism as well as other enzymes. Compared to existing rule-based approaches, our method has equivalent performance on the major enzyme families while it additionally finds metabolites through less common enzymes. Our results indicate that the proposed approach can provide a comprehensive study of drug metabolism that does not restrict to the major enzyme families and does not require the extraction of transformation rules. This journal is © The Royal Society of Chemistry.Entities:
Year: 2020 PMID: 34094473 PMCID: PMC8162519 DOI: 10.1039/d0sc02639e
Source DB: PubMed Journal: Chem Sci ISSN: 2041-6520 Impact factor: 9.825
Fig. 1Drug metabolites prediction (b) as opposed to reaction outcome prediction (a). In drug metabolism multiple outcomes are possible and transferred structures (highlighted in red) are not known in advance.
Fig. 2MetaTrans is derived through fine-tuning the molecular transformer on metabolic reactions. During inference, the ensemble MetaTrans model outputs the metabolites predicted by 6 fine-tuned models.
Fig. 3The composition of the dataset regarding (a) the data sources and (b) the metabolizing enzymes based on the EC classification (discarding the cases with no specified enzymes), in terms of pairs of parent molecules and metabolites.
Prediction performance of the pre-trained model, average performance and standard deviation of the individual fine-tuned models that comprise the ensemble, and performance of the ensemble, for comparable output sizes. The table indicates the percentage of drugs for which at least one, at least half and all reference metabolites have been correctly identified, as well as, the total number of identified metabolites
| Model | Output size | At least one metabolite (%) | At least half metabolites (%) | All metabolites (%) | Total identified metabolites | Precision (%) | Recall (%) |
|---|---|---|---|---|---|---|---|
| Pre-trained (beam 15) | 9.1 | 39.3 | 27.4 | 13.1 | 49 | 6.4 | 22.6 |
| Average (beam 15) | 9.3 ± 0.4 | 78.8 ± 4.6 | 61.7 ± 5.7 | 33.1 ± 4.1 | 102.3 ± 8.0 | 13.1 ± 0.8 | 47.2 ± 3.7 |
| Ensemble (beam 5) | 10.2 | 90.5 | 77.4 | 42.9 | 125 | 14.5 | 57.6 |
Assessment of the ranking capability of the ensemble model regarding the percentage of drugs for which at least one, at least half and all known metabolites have been identified, as well as, the total number of identified metabolites. The average output size per input is also indicated
| Beam size | Average out. size | At least one metabolite | At least half metabolites | All metabolites | Total identified metabolites | Precision | Recall |
|---|---|---|---|---|---|---|---|
| 2 | 5.0 | 77.4 | 60.7 | 27.4 | 93 | 22.2 | 42.9 |
| 5 | 10.2 | 90.5 | 77.4 | 42.9 | 125 | 14.5 | 57.6 |
| 10 | 20.0 | 91.7 | 82.1 | 45.2 | 139 | 8.3 | 64.1 |
| 15 | 29.0 | 94.0 | 84.5 | 48.8 | 147 | 6.0 | 67.7 |
Prediction performance of the ensemble model broken down based on the source of the data for beam size 5
| Dataset | At least one metabolite (%) | At least half metabolites (%) | All metabolites (%) |
|---|---|---|---|
| Glory | 93.1 | 65.5 | 34.5 |
| DrugBank | 89.1 | 83.6 | 47.3 |
| All | 90.5 | 77.4 | 42.9 |
Comparison between MetaTrans and GLORYx, SyGMa and BioTransformer for various prediction windows
| Method | At least one metabolite (%) | At least half metabolites (%) | All metabolites (%) | Total identified metabolites | Output size | Precision (%) | Recall (%) | |
|---|---|---|---|---|---|---|---|---|
| Top 5 | MetaTrans |
|
| 29.2 |
| 324 | 23.5 | 42.5 |
| GLORYx | 64.6 | 35.4 | 16.9 | 54 | 325 | 16.6 | 30.2 | |
| SyGMa | 72.3 | 55.4 | 29.2 | 76 | 325 | 23.4 | 42.4 | |
| Top 10 | MetaTrans |
|
| 44.6 | 103 | 687 | 15.0 | 57.5 |
| GLORYx | 80.0 | 64.6 | 27.7 | 93 | 650 | 14.3 | 51.9 | |
| SyGMa | 87.7 | 75.4 | 43.1 | 105 | 650 | 16.2 | 58.7 | |
| Top 13 | MetaTrans |
|
| 46.2 | 109 | 908 | 12.0 | 60.9 |
| GLORYx | 86.2 | 76.9 | 41.5 | 108 | 851 | 12.8 | 60.3 | |
| SyGMa | 89.2 | 78.5 | 44.6 | 115 | 842 | 13.6 | 64.2 | |
| BioTransformer | 87.7 | 78.5 | 44.6 | 115 | 842 | 13.5 | 64.2 | |
| Top 20 | MetaTrans |
| 86.2 | 46.2 | 116 | 1334 | 8.7 | 64.8 |
| GLORYx | 92.3 | 86.2 | 52.3 | 132 | 1259 | 10.5 | 73.7 | |
| SyGMa | 90.8 | 84.6 | 49.2 | 127 | 1284 | 9.9 | 70.9 |
Comparison per enzyme family
| Oxidation | UDP-GT | Sulfo-transferases | Other Trasferases | Hydrolases | Unspecified | All | |
|---|---|---|---|---|---|---|---|
| Total | 118 | 11 | 4 | 3 | 6 | 37 | 179 |
| MetaTrans | 70 | 7 | 3 |
| 4 | 23 | 109 |
| GLORYx | 70 | 8 | 3 | 1 | 4 | 22 | 108 |
| SyGMa | 80 | 8 | 2 | 0 | 5 | 20 | 115 |
| BioTransformer | 81 | 7 | 2 | 0 | 5 | 20 | 115 |
Fig. 4Correctly identified metabolites through uncommon enzymes.
Fig. 5Drug structure, actual metabolite and closest prediction for a small number of challenging test cases.