| Literature DB >> 27350930 |
Sarul Malik1, Rajesh Khadgawat2, Sneh Anand3, Shalini Gupta4.
Abstract
Machine learning techniques such as logistic regression (LR), support vector machine (SVM) and artificial neural network (ANN) were used to detect fasting blood glucose levels (FBGL) in a mixed population of healthy and diseased individuals in an Indian population. The occurrence of elevated FBGL was estimated in a non-invasive manner from the status of an individual's salivary electrochemical parameters such as pH, redox potential, conductivity and concentration of sodium, potassium and calcium ions. The samples were obtained from 175 randomly selected volunteers comprising half healthy and half diabetic patients. The models were trained using 70 % of the total data, and tested upon the remaining set. For each algorithm, data points were cross-validated by randomly shuffling them three times prior to implementing the model. The performance of the machine learning technique was reported in terms of four statistically significant parameters-accuracy, precision, sensitivity and F1 score. SVM using RBF kernel showed the best performance for classifying high FBGLs with approximately 85 % accuracy, 84 % precision, 85 % sensitivity and 85 % F1 score. This study has been approved by the ethical committee of All India Institute of Medical Sciences, New Delhi, India with the reference number: IEC/NP-278/01-08-2014, RP-29/2014.Entities:
Keywords: Artificial neural network; Diabetes; Logistic regression; Machine learning; Saliva; Support vector machine
Year: 2016 PMID: 27350930 PMCID: PMC4899397 DOI: 10.1186/s40064-016-2339-6
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Fig. 1Algorithm applied for the detection of FBGLusing salivary electrochemical parameters
Fig. 2a Layout of the confusion matrix showing various statistical performance indices used for validating our model fitting process. b General description of the ROC performance
Fig. 3A sigmoidal probability distribution curve obtained by logistic regression fitting of the test data
Final output of the CPI parameters obtained after twenty iterations of linear logistic regression, ANN, linear- and RBF-SVM models
| S. no. | Machine learning technique | Computational parameters | |||
|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1 score | ||
| 1 | Linear logistic regression | 75.86 ± 2.3 | 76.76 ± 3.8 | 75.48 ± 5.4 | 75.71 ± 2.6 |
| 2 | ANN | 80.7 ± 2.1 | 81.2 ± 1.7 | 79.3 ± 3.4 | 80.2 ± 2.2 |
| 3 | Linear-SVM | 77.93 ± 2.7 | 77.59 ± 3.5 | 79.43 ± 4.7 | 78.11 ± 2.7 |
| 4 | RBF-SVM | 84.09 ± 2.8 | 83.75 ± 3.3 | 84.92 ± 4.5 | 84.06 ± 2.9 |
Fig. 4a The ROC plots for linear logistic regression model. The coordinates for the normal and diabetic populations were (0.69, 0.16) and (0.82, 0.31), respectively. b The ROC plots for the ANN model illustrating the coordinates to be closely placed at (0.84, 0.2) for the normal class and at (0.8, 0.16) for the diabetic population. c The ROC plots for linear-SVM. d The ROC plots for RBF-SVM models. The coordinates for the curves were (0.72, 0.16) for the normal class and (0.84, 0.28) for the diabetic class in linear, and (0.8, 0.1) for the normal class and (0.9, 0.2) for the diabetic class in RBF-SVM. The RBF-SVM ROC coordinates being closer to (1, 0) suggested a better fit than the linear model