| Literature DB >> 29494708 |
Somin Wadhwa1,2, Aishwarya Gupta1,3, Shubham Dokania1,4, Rakesh Kanji1,5, Ganesh Bagler1.
Abstract
Prediction of adverse drug reactions is an important problem in drug discovery endeavors which can be addressed with data-driven strategies. SIDER is one of the most reliable and frequently used datasets for identification of key features as well as building machine learning models for side effects prediction. The inherently unbalanced nature of this data presents with a difficult multi-label multi-class problem towards prediction of drug side effects. We highlight the intrinsic issue with SIDER data and methodological flaws in relying on performance measures such as AUC while attempting to predict side effects.We argue for the use of metrics that are robust to class imbalance for evaluation of classifiers. Importantly, we present a 'hierarchical anatomical classification schema' which aggregates side effects into organs, sub-systems, and systems. With the help of a weighted performance measure, using 5-fold cross-validation we show that this strategy facilitates biologically meaningful side effects prediction at different levels of anatomical hierarchy. By implementing various machine learning classifiers we show that Random Forest model yields best classification accuracy at each level of coarse-graining. The manually curated, hierarchical schema for side effects can also serve as the basis of future studies towards prediction of adverse reactions and identification of key features linked to specific organ systems. Our study provides a strategy for hierarchical classification of side effects rooted in the anatomy and can pave the way for calibrated expert systems for multi-level prediction of side effects.Entities:
Mesh:
Year: 2018 PMID: 29494708 PMCID: PMC5832387 DOI: 10.1371/journal.pone.0193959
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The hierarchical coarse-graining schema for anatomical aggregation of side effects to address redundancy and to achieve improved, meaningful classification.
Fig 2The schema for hierarchical anatomical classification of side effects by aggregating them into their associated organs, sub-systems and systems.
Fig 3(A) Distribution of side-effects across drugs showing presence of drugs associated with exceptionally large number of side effects. (B) Distribution of drugs across side effetcs depicting presence of adverse reactions that are linked to large number of drugs.
Evaluation of classifier performance in response to AUC-ROC and F score (or any other point metric) in the presence of class imbalance.
A metric requiring high F score as well as AUC-ROC provides a better measure of classification performance.
| F Score | AUC-ROC | Classification Performance |
|---|---|---|
| Low | Low | Bad performance; altering threshold may help |
| Low | High | Bad performance; for extreme thresholds the performance may be satisfactory. |
| High | Low | Satisfactory performance for a specific threshold; however yields bad performance for other thresholds. |
| High | High | Good performance independent of threshold; A desirable criterion. |
Fig 4Comparison of prediction performance of classifiers at different levels of coarse-graining.
Aggregation of side effects rooted in anatomy significantly improves classification performance compared to arbitrary aggregation. Random Forest model yielded best performance across the anatomical hierarchy.
Fig 5Comparison of prediction performance of classifiers in terms of AUC score, at different levels hierarchy.
Fig 6Comparison of prediction performance of classifiers in terms of F2 score, at different levels hierarchy.
Fig 7Enumeration of performance enhancement at different levels of coarse-graining as compared to random aggregation.