| Literature DB >> 35341015 |
M Thilagaraj1, B Dwarakanath2, V Pandimurugan3, P Naveen4, M S Hema5, S Hariharasitaraman3, N Arunkumar6, Petchinathan Govindan7.
Abstract
Medical data processing is exponentially increasing day by day due to the frequent demand for many applications. Healthcare data is one such field, which is dynamically growing day by day. In today's scenario, an enormous amount of sensing devices and data collection units have been employed to generate and collect medical data all over the world. These healthcare devices will result in big real-time data streams. Hence, healthcare-based big data analytics and monitoring have gained hawk-eye importance but needs improvisation. Recently, machine and deep learning algorithms have gained importance to analyze huge amounts of medical data, extract the information, and even predict the future insights of diseases and also cope with the huge volume of data. But applying the learning models to handle big/medical data streams remains to be a challenge among the researchers. This paper proposes the novel deep learning electronic record search engine algorithm (ERSEA) along with firefly optimized long short-term memory (LSTM) model for better data analytics and monitoring. The experimentations have been carried out using Apache Spark using the different medical respiratory data. Finally, the proposed framework results are contrasted with existing models. It shows the accuracy, sensitivity, and specificity like 94%, 93.5%, and 94% for less than 5 GB dataset, and also, more than 5 GB it provides 94%, 92%, and 93% to prove the extraordinary performance of the proposed framework.Entities:
Mesh:
Year: 2022 PMID: 35341015 PMCID: PMC8947900 DOI: 10.1155/2022/7120983
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Big healthcare data sources.
Different surveys on medical big data analytic methods with its limitations.
| S. no. | Author name and year | Model | Recent application in healthcare | Accuracy | Limitation |
|---|---|---|---|---|---|
| 1 | Yamashita et al. (2018) | CNN | Radiology [ | 99.3% confidence | Need lots of labeled data for classification |
| 2 | Humayun et al. (2018) | To detect the abnormal heart sound [ | Cross-fold Macc of 87.10, an absolute improvement of 9.54% over the baseline CNN system | ||
| 3 | Ismail et al. (2020) | Health model for regular health factor analysis [ | Accuracy reaches 95.60% | Only two layers are used to classify the positive and negative correlated factors | |
| 4 | Choi et al. (2017) | RNN | To detect the onset of heart failure [ | The AUC for the RNN model increased to 0.883 | Require a massive volume of datasets |
| 5 | Khodabakhshi et al. (2018) | To classify the abnormalities in the lungs [ | Classification accuracy of 91% | ||
| 6 | Maragatham et al. (2019) | Prediction of heart failure in big data [ | 0.894 AUC | Delineates the time taken for the training of two diverse LSTM models | |
| 7 | Gharehbaghi et al. (2018) | DNN | Phonocardiography [ | Accuracy reaches 92.60% | The learning process is too slow |
| 8 | Chen et al. (2018) | DBN | To detect type 1 diabetes [ | 71.5%, recall of 60.2%, and | The training process is computationally expensive |
| 9 | Seeliger et al. (2018) | GAN | Reconstructing natural images from brain activity [ | 72.2% correct identifications | Hard to learn to generate discrete data |
| 10 | Emami et al. (2018) | Generating synthetic brain CTs [ | PSNR was 26.6 ± 1.2 and SSIM was 0.83 ± 0.03 | Very hard to train | |
| 11 | San et al. (2016) | DBN | To detect the hypoglycemic episodes in children with type 1 diabetes [ | Sensitivity = 80% | The initialization process makes expensive computational overhead |
Spark engines' features and its functionalities.
| Sl. no. | Spark features | Functionalities |
|---|---|---|
| 1 | Spark SQL | Formerly known as Shark. Spark SQL is a distributed framework that works different categories of data. |
| 2 | Spark streaming layers | These layers are used for an effective real-time streaming. |
| 3 | Spark ML | This module in Spark provides scalable machine learning algorithms for big data analytics. Moreover, it can be programmed either using Python or Java. |
| 4 | Spark R | It is computational R programming packages used for data analytics. |
| 5 | GraphX | It is a computational tool used for creating discrete graphs for various data. |
| 6 | SparkCore | It is the top core of Spark in which the models are deployed. |
Figure 2Overall framework for the proposed architecture.
Figure 3LSTM structure.
Figure 4LSTM training networks.
Figure 5Working flowchart for the proposed firefly optimized LSTM for big EHR analysis.
Algorithm 1Electronic record search engine algorithm.
Figure 6Validation curves for the proposed ERSEA for increased number of datasets at dropout = 0.2.
Figure 7Validation curves for the proposed ERSEA for increased number of datasets at dropout = 0.4.
Figure 8Validation curves for the proposed ERSEA for increased number of datasets at dropout = 0.6.
Figure 9Validation curves for the proposed ERSEA for increased number of datasets at dropout = 0.8.
Performance metrics of the different algorithms with the data size of 5 GB.
| Algorithms | Performance metrics (%) | ||
|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | |
| SVM | 88% | 85% | 84.5% |
| NB | 82% | 81.5% | 80% |
| KNN | 83% | 80% | 78% |
| DNN | 87.4% | 86.5% | 77% |
| LSTM | 89% | 88.5% | 88% |
| Proposed ERSEA | 94% | 93.5% | 94% |
Performance metrics of the different algorithms with the data size greater than 5 GB.
| Algorithms | Performance metrics (%) | ||
|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | |
| SVM | 76% | 75% | 74% |
| NB | 70% | 69% | 70% |
| KNN | 74% | 68% | 69% |
| DNN | 73% | 67% | 72% |
| LSTM | 79% | 78.5% | 77% |
| Proposed ERSEA | 94% | 92% | 93% |
Figure 10Throughput analysis for the proposed DL-based streaming architecture for different volumes of data.
Figure 11Computational latency analysis for the proposed ERSEA-based Spark architecture and traditional streaming architecture.