| Literature DB >> 26696900 |
Gaurav Kandoi1, Marcio L Acencio2, Ney Lemke2.
Abstract
The emergence of -omics technologies has allowed the collection of vast amounts of data on biological systems. Although, the pace of such collection has been exponential, the impact of these data remains small on many critical biomedical applications such as drug development. Limited resources, high costs, and low hit-to-lead ratio have led researchers to search for more cost effective methodologies. A possible alternative is to incorporate computational methods of potential drug target prediction early during drug discovery workflow. Computational methods based on systems approaches have the advantage of taking into account the global properties of a molecule not limited to its sequence, structure or function. Machine learning techniques are powerful tools that can extract relevant information from massive and noisy data sets. In recent years the scientific community has explored the combined power of these fields to propose increasingly accurate and low cost methods to propose interesting drug targets. In this mini-review, we describe promising approaches based on the simultaneous use of systems biology and machine learning to access gene and protein druggability. Moreover, we discuss the state-of-the-art of this emerging and interdisciplinary field, discussing data sources, algorithms and the performance of the different methodologies. Finally, we indicate interesting avenues of research and some remaining open challenges.Entities:
Keywords: drug targets; druggability; machine learning; network topology; review; sequence properties; structural properties; systems biology
Year: 2015 PMID: 26696900 PMCID: PMC4672042 DOI: 10.3389/fphys.2015.00366
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.566
Summary of the papers analyzed in this mini-review.
| Zhu et al., | DrugBank | BioGRID | Connectivity degree, cluster coefficient, distance-based measures, topological coefficient | Support Vector Machine | AUC | AUC: 69.21% |
| Jeon et al., | DrugBank, Therapeutics Target Database | Bossi and Lehner, | GARP score, RMA intensity, row chromosomal copy number, mutation occurrence and closeness centrality (combined or isolated) | SVM-recursive feature elimination (SVM-REF) method for feature selection; SVM-RBF kernels for predictions | Accuracy, Specificity, AUC | Avg. accuracy: 91.69% Avg. specificity: 91.91% Avg. AUC: 78% (combined) |
| Li et al., | DrugBank | HIPPIE | Combination of various network distance-based measures and sequence features of proteins | Random Forest with minimum Redundancy Maximum Relevance (mRMR) Feature Selection | Accuracy, Sensitivity, Specificity, Precision, Matthews correlation coefficient | Accuracy: 87.05% Sensitivity: 90.28% Specificity: 83.83% Precision: 84.82% Matthews correlation coefficient: 0.7427 (Avg. of 10 random samples) |
| Laenen et al., | PubChem, ChEMBL and BindingDB | STRING, GEO (Edgar et al., | Combination of kernel and correlation diffusion and differential gene expression | Rank-based method | AUC | Kernel: 76–91% Correlation: 89–92% |
| Emig et al., | Integrity | metaBase (Bureeva et al., | Combination of neighborhood scoring, interconnectivity, network propagation, random walk and differential gene expression | Logistic regression model | AUC | AUC: 63.27–93.19% |
| Yao and Rzhetsky, | DrugBank | HPRD | Combination of connectivity, betweenness, tissue expression entropy, constant corrected ratio of non-synonymous and synonymous mutations and functional family assignment | Naive Bayesian, logistic regression, radial basis function network, Bayesian networks | AUC | Naive Bayes: 70.43% Logistic regression: 72.57% RBF network: 60.93% Bayesian Network: 72.31% |
| Costa et al., | Yildirim et al., | BioGRID, DIP, HPRD, IntAct, MINT, MIPS-MPPI, TRED, human metabolic model Recon 1 | Combination of several network measures, tissue expression profile and subcellular localization | Decision tree-based meta-classifier | AUC, Recall, Precision | AUC: 82% Recall: 78.2% Precision: 74.8% |
Brief description about the common terminologies used in this mini-review.
| Druggability | The property of a druggable molecule (i.e., a biological target) by virtue of which it elicits a favorable clinical response when it contacts a drug-like compound |
| Systems Biology | Study of the complex biological systems using mathematical and computational modeling |
| Machine Learning | Subfield of computer science devoted to the development and utilization of algorithms that can learn from and make predictions on data |
| Network Measures | Numerical attributes used to describe the role and position of every node in a network |
| Ensemble algorithms | Collection of machine learning algorithms in which the final consensus prediction is made using results from each component algorithm |
| Support Vector Machines (SVM) | A model that takes the input training data and maps the data points in space and then tries to find a hyperplane that can be used to distinctly classify the data into their respective classes |
| Decision Tree | Machine learning algorithms based on decision support tools that make use of a graph of conditions and their possible consequences |
| Random Forest | Ensemble learning algorithm that combines results from multiple decision trees and output the consensus predictions |
| Closeness Centrality | Network measure that indicates how close each node is to every other node in the network |
| Betweenness Centrality | Fraction of shortest paths between all nodes passing through the given node |