| Literature DB >> 29375816 |
Hashem Koohy1,2.
Abstract
In the era of explosion in biological data, machine learning techniques are becoming more popular in life sciences, including biology and medicine. This research note examines the rise and fall of the most commonly used machine learning techniques in life sciences over the past three decades.Entities:
Keywords: deep neural network; hierarchical clustering; linear regression; machine learning; principal component; random forest; support vector machine; t-SNE
Year: 2017 PMID: 29375816 PMCID: PMC5760972 DOI: 10.12688/f1000research.13016.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Common Machine Learning Techniques in Life Sciences.
This table shows 12 machine learning techniques whose popularity in life sciences have been investigated in this study. Technical note: Supervised means that the model requires training data to learn its parameters. A supervised model is used to predict the future instances. An unsupervised model doesn’t require any training data and is used to detect patterns within a dataset. Dimensionality reduction models are used to project high-dimensional datasets into lower dimension space where new variables are more interpretable.
| Technique | Abbreviation | Category |
|---|---|---|
| Random Forest | RF | Supervised |
| Support Vector Machine | SVM | Supervised |
| Artificial Neural Network | ANN | Supervised |
| Deep Neural Network | DNN | Supervised &
|
| Principal Component Analysis | PCA | Dimensionality
|
| Linear Regression | LR | Supervised |
| Markov Model | MM | Unsupervised |
| Decision Tree | DT | Supervised |
| Hierarchical Clustering | HC | Unsupervised |
| t-Distributed Stochastic
| t-SNE | Dimensionality
|
| Logistic Regression Model | LogReg | Supervised |
| Naïve Bayes Classifier | NBC | Supervised |
Figure 1. Cumulative usage of all 12 machine-learning techniques used in this manuscript.
Two different linear regression models have been fitted to this data. The first one covers years from 1990 to 2000. The second one that shows a triple increase in its slope covers from 2001 till 2017. Y-axis shows the number of hits per 100 publications.
Figure 2. A: Trends of individual machine-learning techniques defined as per million hits in y-axis. B: Similar to A but without the two very highly used techniques Linear Regression and Principal Components Analysis in order to enhance clarity in usage of other not-very-commonly used techniques that were overshadowed by LRs and PCAs.
Figure 3. An illustration of popularity rate of all 12 techniques used in this manuscript.
The PR has been defined as differences of HPYMs in each two-consecutive year for each model. This number have been further re-scaled to vary only between -1 and 1.