| Literature DB >> 28961699 |
Lewis H Mervin1, Krishna C Bulusu1,2, Leen Kalash1, Avid M Afzal1, Fredrik Svensson1, Mike A Firth3, Ian Barrett3, Ola Engkvist4, Andreas Bender1.
Abstract
Motivation: In silico approaches often fail to utilize bioactivity data available for orthologous targets due to insufficient evidence highlighting the benefit for such an approach. Deeper investigation into orthologue chemical space and its influence toward expanding compound and target coverage is necessary to improve the confidence in this practice.Entities:
Mesh:
Substances:
Year: 2018 PMID: 28961699 PMCID: PMC5870859 DOI: 10.1093/bioinformatics/btx525
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Model evaluation and target deconvolution. Bioactivity data is represented within a matrix of compounds (rows) and targets (columns). Typically, target prediction models are trained and evaluated per target or column-wise (blue), i.e. calculate the performance of a target model given the compounds retrieved. When the models are deployed for target deconvolution (red), the models are interpreted per compound (row), i.e. to identify the targets predicted for a compound. In this work, the evaluation of predictive models is based on the column-wise assessment of predictions for each new compound (shown)
Fig. 2.Nearest neighbor analysis. Tanimoto coefficient of ECFP_4 fingerprints were used to define similarity. (a) Human and orthologue nearest neighbor similarity indicates the chemical spaces covered by both datasets are dissimilar. This suggests that orthologue data could effectively extend the chemical space of models. (b) Nearest neighbor analysis indicates orthologues are frequently similar to each other. High intra-group similarity is suited to modelling, since this enables models to better identify key features
Fig. 3.Correlation of human and orthologue pChEMBL values. Bioactivity correlation varies between units of measurement and the type of assay. 21 446 compounds are tested in human and orthologue binding (blue) and functional assays (green). The linear regression line is shown with a 95% confidence interval. R2 reflects correlation per Unit
Fig. 4.Five-fold time split cross validation (CV) performance. The influence of orthologue inclusion varies between algorithms and hyper-parameter settings
Fig. 5.AstraZeneca external validation performance. Decreased performance towards the bottom left of the plots arises from difficult classification instances in external testing compounds, for compounds that are distinct from the training set. In accordance with internal CV, the influence of orthologue inclusion on performance varies between algorithms and hyper-parameter settings
Averaged F1-Score results for AstraZeneca external validation
| Learner | Hyper-parameter | Without orthologues | With orthologues |
|---|---|---|---|
| BNB | Alpha = 0.1 | 0.33 ± 0.29 | 0.34 ± 0.29 |
| Alpha = 1.0 | 0.35 ± 0.29 | 0.36 ± 0.29 | |
| SVM | C = 1.0E-02 | 0.44 ± 0.29 | 0.46 ± 0.30 |
| C = 1.0E-00 | 0.38 ± 0.28 | 0.42 ± 0.28 | |
| C = 1.0E+02 | 0.39 ± 0.27 | 0.42 ± 0.27 | |
| RFC | Trees = 5 | 0.33 ± 0.27 | 0.36 ± 0.27 |
| Trees = 50 | 0.37 ± 0.28 | 0.40 ± 0.28 | |
| Trees = 500 | 0.37 ± 0.28 | 0.41 ± 0.28 |