| Literature DB >> 29297288 |
Andrew Maxwell1, Runzhi Li2, Bei Yang2, Heng Weng3, Aihua Ou4, Huixiao Hong5, Zhaoxian Zhou1, Ping Gong6, Chaoyang Zhang7.
Abstract
BACKGROUND: Multi-label classification of data remains to be a challenging problem. Because of the complexity of the data, it is sometimes difficult to infer information about classes that are not mutually exclusive. For medical data, patients could have symptoms of multiple different diseases at the same time and it is important to develop tools that help to identify problems early. Intelligent health risk prediction models built with deep learning architectures offer a powerful tool for physicians to identify patterns in patient data that indicate risks associated with certain types of chronic diseases.Entities:
Keywords: Deep learning; Deep neural networks; Intelligent health risk prediction; Medical health records; Multi-label classification
Mesh:
Year: 2017 PMID: 29297288 PMCID: PMC5751777 DOI: 10.1186/s12859-017-1898-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The results of the classifiers for single-label, multi-class dataset
| Algorithm | Accuracy (%) | Precision | Recall | F-Score |
|---|---|---|---|---|
| LibSVM | 49.89 | 0.422 | 0.499 | 0.416 |
| MLP | 74.94 | 0.744 | 0.749 | 0.744 |
| SMO | 69.67 | 0.691 | 0.697 | 0.670 |
| J48 | 77.26 | 0.771 | 0.773 | 0.771 |
| DNN | 71.10 | 0.757 | 0.711 | 0.726 |
| RF | 81.51 | 0.810 | 0.815 | 0.808 |
The results of the classifiers for multi-label dataset
| Base Classifier | Accuracy (%) | Precision | Recall | F-Score |
|---|---|---|---|---|
| RAkEL-LibSVM | 59.47 | 0.697 | 0.603 | 0.630 |
| RAkEL-MLP | 81.63 | 0.854 | 0.838 | 0.837 |
| RAkEL-SMO | 59.47 | 0.697 | 0.603 | 0.630 |
| RAkEL-J48 | 83.64 | 0.864 | 0.865 | 0.856 |
| RAkEL-RF | 85.67 | 0.884 | 0.880 | 0.874 |
| MLkNN | 51.03 | 0.602 | 0.530 | 0.547 |
| DNN | 92.07 | 0.915 | 0.867 | 0.823 |
Fig. 1The distribution of physical examination records for chronic diseases. Here, the list of chronic diseases are Fatty Liver (FL), Diabetes (D), Hypertension (H), a combination of these diseases (DFL, HFL, HD, HDFL), and the absence of the disease or classified as Normal (N)
Fig. 2Performance comparison of activation functions. The sigmoid and ReLU activation functions are compared against each other in the DNN architecture
Fig. 3A comparison of additional layers added to the MLP. The hyperparameters are: 1000 epochs, 0.1 learning rate, 35 hidden layer units, hidden layers from 1 to 10, and no dropout to one dropout layer to all dropout layers. These parameters were chosen because they gave the best overall performance for MLP with 1 or 2 layers
DNN results for multi-label data with respect to different number of units
| Units Per Layer | Accuracy (%) | Precision | Recall | F-Score |
|---|---|---|---|---|
| 35 | 92.07 | 0.915 | 0.867 | 0.823 |
| 256 | 91.34 | 0.919 | 0.854 | 0.798 |
| 512 | 91.80 | 0.917 | 0.865 | 0.819 |
Fig. 4The Precision Recall (PR) curve for the testing dataset. The testing dataset which contained 10% of the data, or 11,030 instances. Class 0 is Hypertension, Class 1 is Diabetes, and Class 2 is Fatty Liver
Fig. 5The Receiver Operator Characteristic (ROC) curve of the testing data. Class 0 is Hypertension, Class 1 is Diabetes, and Class 2 is Fatty Liver