Literature DB >> 28868148

Automatic disease diagnosis using optimised weightless neural networks for low-power wearable devices.

Ramalingaswamy Cheruku¹, Damodar Reddy Edla¹, Venkatanareshbabu Kuppili¹, Ramesh Dharavath², Nareshkumar Reddy Beechu³.

Abstract

Low-power wearable devices for disease diagnosis are used at anytime and anywhere. These are non-invasive and pain-free for the better quality of life. However, these devices are resource constrained in terms of memory and processing capability. Memory constraint allows these devices to store a limited number of patterns and processing constraint provides delayed response. It is a challenging task to design a robust classification system under above constraints with high accuracy. In this Letter, to resolve this problem, a novel architecture for weightless neural networks (WNNs) has been proposed. It uses variable sized random access memories to optimise the memory usage and a modified binary TRIE data structure for reducing the test time. In addition, a bio-inspired-based genetic algorithm has been employed to improve the accuracy. The proposed architecture is experimented on various disease datasets using its software and hardware realisations. The experimental results prove that the proposed architecture achieves better performance in terms of accuracy, memory saving and test time as compared to standard WNNs. It also outperforms in terms of accuracy as compared to conventional neural network-based classifiers. The proposed architecture is a powerful part of most of the low-power wearable devices for the solution of memory, accuracy and time issues.

Entities: Chemical Disease Gene Species

Keywords: automatic disease diagnosis; bioinspired based genetic algorithm; biology computing; diseases; genetic algorithms; low power wearable devices; memory constraint; modified binary TRIE data structure; neural nets; noninvasive devices; optimised weightless neural networks; pain free devices; patient diagnosis; quality of life; random-access storage; variable sized random access memories

Year: 2017 PMID： 28868148 PMCID： PMC5569931 DOI： 10.1049/htl.2017.0003

Source DB: PubMed Journal: Healthc Technol Lett ISSN： 2053-3713

Introduction

In the present era, machine learning techniques have been used by several researchers in the medical domain for diagnosis of various diseases. In the literature, there exists so many wearable diagnostic devices, such as Gluco Track [1], Dexcom G5 [2], QuantuMDx [3], Gluco Beam [4], GeneXpert [5] and so on. These devices are non-invasive and pain-free for the better quality of life [4]. However, it is hard to embed these techniques in low-power hardware devices because of its memory, power and speed constraints. In this direction, Bledsoe and Browning in 1959 [6] invented weightless neural network (WNN). It is a breakthrough among neural network techniques suitable for hardware implementation [7]. The major advantages of this kind of network are the ease of implementation and the ability to learn in single iteration. WNNs are also called n-tuple classifiers. These classifiers offer fast training and testing performance. Standard multi-layer feed-forward neural network (MLFFNN) stores knowledge in the form of network weights whereas WNNs store knowledge in random access memory (RAM). The first version of WNN was designed by Bledsoe and Browning in 1959 and various improved WNNs [8] have been developed by many researchers. All these implementations of WNNs have the following common properties: (i) interconnections in WNN do not carry weights; (ii) the WNN can only take binary inputs; and (iii) the knowledge of network is stored in the form of binary look-up tables (LUTs). In simple WNNs implementation, RAM is used as a LUT. Each neuron (RAM) synapses (address lines) is supplied with a binary bit string from the input pattern. This binary string is used as an address to access the RAM. In the training phase, this binary string is used as an address to store desired output in the RAM. In the testing phase, a binary unseen test pattern is provided as an address to access the previously learned contents from the RAM. It is observed from the above training and testing procedures that training and testing in WNNs are made in a single iteration. It is also observed that the neuron (RAM) size grows exponentially with the size of the input vector because of RAM size is always in the power of 2. As the WNN proposed by Bledsoe and Browning suffers from a memory problem, to resolve this Wilkes and co-workers [9] have proposed WiSARD WNNs. WiSARD is identical to the simplest WNN, except that in WiSARD the input pattern is partitioned into multiple segments. Every WiSARD WNN is made up of several RAM-discriminators and each RAM-discriminator made up of Y one-bit word RAMs. Each one-bit RAM receives a portion of binary input pattern (n-bits). Along with Y one-bit word RAMs, each RAM-discriminator also consists of a summing device (). The number of such RAM-discriminators is equal to the number of distinct classes in the dataset. In the training phase, a binary pattern of Y*n bits partitioned randomly into Y equal sized segments. These segmented binary patterns are used as the address of RAMs to store desired output. In the testing phase, binary input pattern is supplied to each discriminator and every discriminator provides a response in terms of number of matches. These responses are evaluated according to the majority voting principle. Schematic representation of both WiSARD and RAM-discriminator is shown in Fig. 1 [7].

Fig. 1

Illustration of both RAM-discriminator and WiSARD WNN

Illustration of both RAM-discriminator and WiSARD WNN As shown in Fig. 1, WiSARD overcomes limitation of simple WNNs by dividing the Y*n-sized input vector in Y segments. As a result the total memory requirement is reduced from to . On the other side, it lowers the generalisation capability of WiSARD. This caused applications using WiSARD, performing lower than that of virtual generalising RAM (VG-RAM) WNN [10, 11]. However, WiSARD has used to solve the problems in automatic video surveillance [12], robotics [13], 3D video animation [14] and text categorisation [15]. The VG-RAM WNN is a different type of WNN, in which each neuron memory size is proportional to the training set [10, 16]. During training, VG-RAM neurons store input binary pattern along with its associated class information. During the testing, each VG-RAM neuron searched for closest learned pattern using any distance measure, such as hamming distance and so on. The class information of the closest input pattern is the neurons output. The searching procedure for the closest pattern is sequential and requires scanning of each neuron whole memory, which is costly in terms of time, if there are many training patterns. Moreover, the memory size of each VG-RAM is increasing on par with training patterns. Even though VG-RAM WNNs have these limitations, these are used in many applications, such as face recognition [17], text categorisation [18] and traffic sign detection [18]. To deal with such problems, a novel architecture for WNNs is proposed in this Letter. It uses variable sized RAMs (neurons) to optimise the memory usage, and a modified binary TRIE data structure for reducing the test time [19, 20]. In addition, a genetic algorithm (GA) is used to improve the accuracy of WNNs by optimising the mapping function [21].

Proposed architecture

The proposed architecture emphasises on improving the classification accuracy, reducing the memory usage and finally reducing the test time. To accomplish all these tasks, different techniques have been employed and all of them are explained in Sections 2.1–2.3. The proposed architecture is shown in Fig. 2. According to this figure, the input binary pattern is mapped to set of VG-RAM neurons using the mapping function for improving the accuracy. During the training, in each VG-RAM neuron, patterns are managed using TRIE data structure (for simplicity of diagram only prefix, access counts of each class are shown, class information is not shown in TRIE node). This helps us in memory saving and faster access to contents. During testing, each neuron outputs class label, final class is determined on the basis of the majority voting.

Fig. 2

Proposed architecture for the classification task

Memory reduction using variable sized VG-RAM (VVG-RAM) neuron

VG-RAM neuron stores both patterns and its class information. Hence, its size is proportional to the training set. Usually count of such VG-RAMs neurons needed is undeterministic. In this Letter, VVG-RAMs have been proposed to optimise the memory usage. The size of each VVG-RAM neuron is determined by a range of values that features can take. However, also count of such neurons is determined by dimensionality of the data. The motivation behind this approach is that every dataset consists of some repetitive subpatterns. By eliminating these repetitive subpatterns, it is possible to constitute VVG-RAM. Proposed VVG-RAM neuron maintains extra information about these repetitive subpatterns in the form of access count (see Fig. 3). This extra information plays a vital role in decision process, especially in case of ties. Architectures of VG-RAM neuron and VVG-RAM neuron have been shown in Figs. 3a and b. Further, the proposed architecture employs the binary TRIE data structure [19, 20] to manage the patterns inside each VVG-RAM neuron.

Fig. 3

Comparison of VG-RAM and VVG-RAM neuron structures

a VG-RAM neuron

b Proposed VVG-RAM neuron

Comparison of VG-RAM and VVG-RAM neuron structures a VG-RAM neuron b Proposed VVG-RAM neuron

Performance improvement using GA

Mapping function maps input pattern to synapses of VVG-RAM neuron, such that one synapse is mapped to exactly one neuron. To improve the VG-RAM network performance, optimal mapping function between input pattern and neuron synapses has to be defined. If there is an N-bit binary pattern there exist possible combinations. To select optimal or near optimal mapping by doing an exhaustive search of combinations is NP-hard problem. Hence, this combinatorial optimisation problem is solved by one of the most familiar GA with the objective of maximising the objective function. GA parameters have to be properly tuned to obtain the optimal or near optimal solution and these parameter values are dataset specific. The objective function based on sensitivity and specificity is defined in (1) [22]. It is clear from (1) that the objective function is a geometric mean of sensitivity and specificity. Sensitivity focuses only on the positive class case predictions and does not capture any information about how well the WNN handles negative class cases. Similarly, specificity focuses only on the negative case predictions and does not capture any information about how well the WNN handles positive cases. To balance the both positive and negative class predictions, a new measure has been proposed as a geometric mean of sensitivity and specificity where TP represents the true positive count, which is calculated as the number of positive class records that the WNN predicts as positive, TN represents the true negative count, which is calculated as the number of negative class records that the WNN predicts as negative, FP represents the false positive count, which is calculated as the number of negative class records that the WNN incorrectly classifies as positive and FN represents the false negative count, which is calculated as the number of positive class records that the WNN incorrectly classifies as negative [23].

Faster neuron memory search with modified TRIE data structure

Usually in standard VG-RAM networks, during the testing to measure the closeness of test pattern over patterns stored in neuron, distances need to be calculated sequentially. As a result test time increases. Since hashing technique is more appropriate for search and insertion operations, Forechi et al. [24] have used a hash table for test time reduction. Still hash tables are not an efficient solution as it increases chance of collisions as the number of entries grows. Moreover, it is desirable to design an effective hash function to handle these collisions; this imposes the computational complexity on hash function. Hence, to address these problems in this Letter a modified version of the binary TRIE data structure has been proposed and it is shown in Fig. 4.

Fig. 4

Modified binary TRIE data structure

Modified binary TRIE data structure According to Fig. 3, every node of the binary TRIE data structure stores pattern prefixes, class information along with access counts of each class during the training. During the test, it finds a longest prefix match. The output of each neuron is the class information along with each class access count associated with the longest prefix match. In case of tie, access counts of each class are useful to determine the class of the test pattern on the majority basis. Unlike standard VG-RAM neuron, the proposed VVG-RAM neuron does not require any distance computations for measuring similarity. As compared to hash-technique-based neuron, proposed neuron is free from hash function calculations. Hence, the proposed VVG-RAM neuron is efficient in terms of computations (calculations) as compared with standard VG-RAM and hash-technique-based neurons.

Results

Experimental setup

All software and hardware experiments are performed on an Intel (R) Core i7 processor with 3.60 GHz speed and 8 GB RAM. Three categories of datasets, such as more data with low dimensionality (MDLD), small data with high dimensionality (SDHD) and more data with high dimensionality (MDHD) are chosen from UCI machine learning repository [25] to validate proposed WNN. All the datasets used for experiments are shown in Table 1 and are partitioned according to 10-fold cross-validation (10-FCV) method [26]. According 10-FCV, every dataset is partitioned into ten equal folds. Every time, nine different folds constitute the training set and remaining fold is treated as the testing set. During the training phase, the training dataset is used to store the contents into RAM nodes (neurons). During the testing phase, the testing dataset is used to evaluate the performance of trained model using performance measures, such as accuracy, sensitivity and so on. This process is repeated for ten folds and at the end of tenth fold all the values are averaged. As the GA parameter values are dataset specific find tuned parameter values for three datasets are shown in Table 2.

Table 1

Datasets used in this Letter

Dataset	Number of patterns	Dimensionality
PID	768	8
LCD	32	56
DRD	1151	19

Table 2

GA tuning parameter values for three medical datasets

Dataset	Parameter	Value	Explanation
PID	PopSize	150	initial population size
	cross-over rate	0.82	cross-over rate
	MutRate	0.1	mutation rate
	selection rate	0.6	population that survive after every generation
	MaxGen	10,000	maximum number of generations
LCD	PopSize	250	initial population size
	cross-over rate	0.32	cross-over rate
	MutRate	0.01	mutation rate
	selection rate	0.7	population that survive after every generation
	MaxGen	25,000	maximum number of generations
DRD	PopSize	100	initial population size
	cross-over rate	0.8	cross-over rate
	MutRate	0.14	mutation rate
	selection rate	0.6	population that survive after every generation
	MaxGen	10,000	maximum number of generations

Datasets used in this Letter GA tuning parameter values for three medical datasets To validate the performance of proposed WNN further it compared with conventional neural networks, such as multi-layer perceptron network (MLPN) [27], MLFFNN [27], probabilistic neural network (PNN) [28], radial basis function neural network (RBFNN) [27] and time delay network (TDN) [29]. The MLPN and MLFFNN used in these experiments are constructed with one input layer, two hidden layers and one output layer. Hidden layers of MLFFNN consist of 21 and 19 neurons, whereas in MLPN consists of 9 and 5 neurons. The MLPN and MLFFNN are trained using back propagation and scaled conjugate gradient algorithms, respectively. The learning rate and maximum epochs for both the training algorithms are set to 0.01 and 1000, respectively. As the output values of both the networks are between 0 and 1, a transformation function has been applied with cut off of 0.5, to transform into binary values. Similarly, the RBFNN and PNN are configured with spread value of 1.60. Finally, TDN is providing the eight positive vectors as input delays.

Performance measures

We measure the memory usage of WiSARD, standard VG-RAM and proposed VVG-RAM using the following equations: where S is the number of synapses, N is the number of neurons chosen such that , and C is the number of classes where is the number of training samples, B is an integer chosen such that , N is the number of neurons such that . where D is the dataset dimensionality, is the number of bits required to represent values of feature and is the number of bits to represent each class access count of feature, i.e. , 0 < λ ≤ 1. In all the experiments, value chosen as . Apart from memory usage of each model, accuracy and test time also considered for evaluation. Accuracy is used to measure the overall predictive performance of model on unseen data. It is calculated according to (4). Test time is used to measure the fastness in the model response, i.e. how fast the model providing the response. It is calculated using tic and toc functions of Matlab.

Software realisation

The proposed WNN along with two standard WNNs namely WiSARD and VG-RAM are implemented using Matlab 2015a software. These WNNs are experimented on various datasets to obtain performance measures like memory usage, accuracy and test time. These results are shown in the following subsections.

More data with low dimensionality

In this category, Pima Indians Diabetes (PID) dataset has been selected. The PID dataset consists of 768 records of diabetes patients, of which 500 are negative and 268 are positive classes [25]. It has eight predictive attributes, one decision attribute. The experimental results on this dataset are provided in Table 3 in terms of memory usage, accuracy and test time. From Table 3, it is observed that the proposed WNN performed better than standard WNNs in terms of memory usage, accuracy and test time. Further, proposed method also compared with conventional neural network classifiers in Table 4 and best values are highlighted in bold. It is observed from the table results that proposed method achieved best rank in terms of accuracy. It is due to the GA performance on this dataset. In this Letter, GA parameters are used specifically to dataset (see Table 2).

Table 3

Comparison results on PID dataset

Type of WNN	Memory, KBs	Accuracy, %	Test time, s
WiSARD	35	64.72	79.23
VG-RAM	31.5	70.15	61.95
proposed	21.25	78.23	47.27

Table 4

Accuracy comparison on PID dataset

S. no	Type of neural network classifier	Accuracy, %
1	MLPN	75.20
2	MLFFNN	74.00
3	PNN	67.20
4	RBFN	68.53
5	TDN	66.54
6	proposed WNN	78.23

Comparison results on PID dataset Accuracy comparison on PID dataset

Small data with high dimensionality

In this category, Lung Cancer Dataset (LCD) has been chosen. This dataset has 32 instances with 57 attributes (1 decision, 56 predictive) [25]. Results against this dataset, in terms of memory, accuracy and time, are furnished in Table 5 and best values are highlighted in bold. These results show that the proposed WNN requires more memory than WiSARD and standard VG-RAM WNNs, it is due to high dimension of data and features are taking high range values. From (7), it is clear that memory required for the proposed architecture, i.e. is proportional to dimensionality of data (D) and range of values () features can take. However, proposed method performed better in terms of accuracy. It is due to the dataset-specific GA parameter tuning (see Table 2).

Table 5

Comparison results on LCD

Type of WNN	Memory, KBs	Accuracy, %	Test time, s
WiSARD	512	82.31	148.41
VG-RAM	4.3125	86.75	43.86
proposed	6.3	86.73	44.72

Comparison results on LCD Further, the proposed WNN has been compared against five popular neural network-based classifiers, namely MLPN, MLFFNN, PNN, RBFN and TDN. These comparison results are shown in Table 6 and best values are highlighted in bold. It is observed from these results that the proposed WNN outperformed in terms of accuracy as compared to other neural network-based classifiers.

Table 6

Accuracy comparison on LCD

S. no	Type of neural network classifier	Accuracy, %
1	MLPN	70.20
2	MLFFNN	60.30
3	PNN	60.40
4	RBFN	49.10
5	TDN	75.45
6	proposed WNN	86.73

Accuracy comparison on LCD

More data with high dimensionality

In this category, Diabetic Retinopathy Debrecen (DRD) dataset has been selected. It has 1151 instances with 20 attributes (1 class attribute, 19 predictive) [25]. Experimental results on this dataset are furnished and best values are highlighted in bold in Table 7. From these results, it is observed that the proposed WNN outperformed in terms of accuracy and test times as compared to standard WNNs.

Table 7

Comparison results on DRD dataset

Type of WNN	Memory, KBs	Accuracy, %	Test time, s
WiSARD	512	68.72	191.33
VG-RAM	121	70.15	173.61
proposed	22.93	72.86	63.73

Comparison results on DRD dataset Further, the proposed WNN has been compared with five popular neural network-based classifiers, such as MLPN, MLFFNN, PNN, RBFN and TDN. These comparison results are shown in Table 8 and best values are highlighted in bold. It is clear from the table results that the proposed WNN outperformed all other classifiers. Best values are highlighted in bold in Table 8.

Table 8

Accuracy comparison on DRD dataset

S. no	Type of neural network classifier	Accuracy, %
1	MLPN	53.17
2	MLFFNN	66.10
3	PNN	60.69
4	RBFN	45.08
5	TDN	49.70
6	proposed WNN	72.86

Accuracy comparison on DRD dataset From Tables 3–8 results, it is observed that the proposed WNN performed better in terms of accuracy for the datasets of kind MDLD, SDHD and BDHD. Datasets like where the number of patterns less than dimensionality, i.e. SDHD, the proposed method suffered from memory problem this in turn created delayed response. From all the above results, it is also observed that a number of features and range values that each feature can determine the memory usage of proposed model. Also, proper tuning of mapping function using GA affects the model performance. It is also observed that the difference in test times is small. It is due to all the experiments are carried out in high configuration system. In general, low-power devices use 2- or 4-bit processor with low processing capability; there the difference in test times is significant.

Hardware realisation

This section introduces hardware implementation of the proposed WNN architecture based on an FPGA SPARTAN 3E tool kit (shown in Fig. 5) for disease diagnosis. Two conventional WNNs architectures namely WiSARD and VG-RAM are also realised with hardware for comparison. Three bench mark disease datasets have been used for testing. For each dataset, we have obtained the performance measure, namely power consumption, area (memory) and test time.

Fig. 5

FPGA SPARTAN 3E tool kit

FPGA SPARTAN 3E tool kit Area or memory is measured as number of LUTs. A LUT consists of a block of SRAM that is indexed by the LUT's inputs. The output of the LUT is whatever value is in the indexed location in its SRAM. The LUT is actually implemented using a combination of the SRAM bits and a MUX. Each LUT size is varied from 8, 16, 32 or 64 words. The power consumption denotes amount of power consumed for operations, it is measured in watts. The test time denotes the delay in the response, it is measured in seconds. The performance measures over three datasets are averaged and normalised between 0 and 500. These results are shown in Fig. 6. It is clear from the figure that proposed model performed better in terms of power, area and test times over standard WNNs.

Fig. 6

Average performance of WiSARD, VG-RAM and proposed model over three medical datasets

Average performance of WiSARD, VG-RAM and proposed model over three medical datasets The results obtained by this hardware realisation are used to verify the presented software implementation and to compare software and hardware solutions. The hardware implementations of the WNNs are done using FPGA SPARTAN 3E tool kit. It is observed from the results that memory requirement in hardware is high. It is due to as the hardware area will depend on experiments and FPGA platform. Also, time requirements are slightly higher due to delay in RAM nodes. Overall, the results produced by both software realisation using Matlab and hardware realisation using FPGA platform produced the similar responses.

Conclusion

In this Letter, a new architecture has been proposed for WNNs to overcome limitations of low-power wearable devices, such as memory and delay in response. The proposed architecture has been designed using variable sized RAMs (neurons) to optimise the memory usage and a modified binary TRIE data structure for reducing the test time. It also used a bio-inspired GA to improve the accuracy of WNNs by optimising the mapping function. The proposed architecture has been validated using both software and hardware realisations of standard WNNs over various categories of disease datasets. In case of MDLD and MDHD, the proposed architecture reduced the memory, test time and increased the classification accuracy as compared with standard WNNs. In case of SDHD, the proposed architecture achieved the highest accuracy but suffers from memory problem. It is due to the fact that the memory required by proposed model is a function of the number of features in the dataset and the range of values that each feature can take. The proposed WNN also validated using five popular neural network-based classifiers. As compared to conventional neural network classifiers proposed WNN outperformed for the cases of MDLD, SDHD and MDHD. Hence, it is concluded that the proposed architecture is a powerful part of various low-power wearable devices, such as Gluco Track [1], Dexcom G5 [2], QuantuMDx [3], Gluco Beam [4], GeneXpert [5] and so on for the solution of memory, accuracy and time issues. This results into applications of low-power wearable diagnostic devices to diagnose diseases, such as diabetes, cancer, HIV, malaria and so on.

Funding and declaration of interests

Conflict of interest: none declared.

1 in total

1. SM-RuleMiner: Spider monkey based rule miner using novel fitness function for diabetes classification.

Authors: Ramalingaswamy Cheruku; Damodar Reddy Edla; Venkatanareshbabu Kuppili
Journal: Comput Biol Med Date: 2016-12-19 Impact factor: 4.589

1 in total