Literature DB >> 33551665

Leveraging Deep Learning for Designing Healthcare Analytics Heuristic for Diagnostics.

Sarah Shafqat^1,2, Maryyam Fayyaz³, Hasan Ali Khattak⁴, Muhammad Bilal⁵, Shahid Khan⁶, Osama Ishtiaq⁶, Almas Abbasi¹, Farzana Shafqat^2,6, Waleed S Alnumay⁷, Pushpita Chatterjee^8,9.

Abstract

Healthcare Informatics is a phenomenon being talked about from the early 21st century in the era in which we are living. With evolution of new computing technologies huge amount of data in healthcare is produced opening several research areas. Managing the massiveness of this data is required while extracting knowledge for decision making is the main concern of today. For this task researchers are doing explorations in big data analytics, deep learning (advanced form of machine learning known as deep neural nets), predictive analytics and various other algorithms to bring innovation in healthcare. Through all these innovations happening it is not wrong to establish that disease prediction with anticipation of its cure is no longer unrealistic. First, Dengue Fever (DF) and then Covid-19 likewise are new outbreak in infectious lethal diseases and diagnosing at all stages is crucial to decrease mortality rate. In case of Diabetes, clinicians and experts are finding challenging the timely diagnosis and analyzing the chances of developing underlying diseases. In this paper, Louvain Mani-Hierarchical Fold Learning healthcare analytics, a hybrid deep learning technique is proposed for medical diagnostics and is tested and validated using real-time dataset of 104 instances of patients with dengue fever made available by Holy Family Hospital, Pakistan and 810 instances found for infectious diseases including prognosis of; Covid-19, SARS, ARDS, Pneumocystis, Streptococcus, Chlamydophila, Klebsiella, Legionella, Lipoid, etc. on GitHub. Louvain Mani-Hierarchical Fold Learning healthcare analytics showed maximum 0.952 correlations between two clusters with Spearman when applied on 240 instances extracted from comorbidities diagnostic data model derived from 15696 endocrine records of multiple visits of 100 patients identified by a unique ID. Accuracy for induced rules is evaluated by Laplace (Fig. 8) as 0.727, 0.701 and 0.203 for 41, 18 and 24 rules, respectively. Endocrine diagnostic data is made available by Shifa International Hospital, Islamabad, Pakistan. Our results show that in future this algorithm may be tested for diagnostics on healthcare big data.

Entities: Chemical

Keywords: Big data; Deep learning algorithm; Endocrine diseases; Healthcare analytics; Infectious diseases; Learning healthcare system; Medical diagnostics; Neural nets

Year: 2021 PMID： 33551665 PMCID： PMC7852051 DOI： 10.1007/s11063-021-10425-w

Source DB: PubMed Journal: Neural Process Lett ISSN： 1370-4621 Impact factor: 2.908

Introduction

Healthcare big data in its heterogenous form is now being analyzed for knowledge discovery and making decisions. Advanced machine and deep learning or neural net techniques [1] for analysis are researched upon to incorporate over cloud taking it towards Smart Health System [2, 3]. Learning Healthcare System [4] revolves around shifting traditional healthcare processes to expert diagnosis and treatments of various diseases [5]. Medical diagnosis is formed considering some common risks and precautions associated [3]. The realization [3, 6] to conquer problems associated with traditional healthcare through computation is a complex task especially in diagnosing [4, 7]. Advances are still being made with help of deep learning neural net techniques [8] under the umbrella of artificial intelligence (AI). Intelligent diagnostic system would shift load of routine clinical tasks from doctors and they would be free to focus on serious patients and complex cases after initial screening and diagnosis feedback from Smart Health System. Multi-class classification [9] of types of Dengue Fever (DF) and comorbidities derived from endocrine data is our problem domain here for evaluating the proposed healthcare diagnostic analytical technique using deep learning heuristics [8]. This multi-class classification of versatile diseases diagnosed through application of different event settings makes our problem heterogeneous [10] in nature. Reported by World Health Organization (WHO) [11], it was determined that around 50 million dengue infections occur annually globally. Rigorous research in diagnostics [12] has been done over the time in disease prediction, treatment, prevention and control. To assist this research, the diagnostics solution is provided for DF for its three main types: DF, dengue hemorrhagic fever (DHF), and dengue shock syndrome (DSS). In this paper, 17 parameters have been considered for forming the diagnosis problem with hyper parameterization [13] and types of diagnosis are broadened to 8 classes; DF, DF (D/C), DHF, DHF (D/C), DHF (HD), DHF (Leak), DHF/DSS, and DSS. Some missing parameters include; NS3, virology, and active/passive infection [14]. In case of recent outbreak of Coronavirus or Covid-19 the diagnosis is challenging [15]. Patient with usual known symptoms of cough, fever or pneumonia are referred for RT-PCR test which is found 30–60% accurate [15]. Diabetes Mellitus (DM) is one of the incommunicable diseases and major health hazard in developing countries including Pakistan. Predictive methodology for analyzing diabetes patients’ data is thus devised to predict types of diabetes prevalent and complications associated to it [16]. It is known that the patient diagnosed with diabetes has to be very careful in keeping the blood sugar controlled otherwise there are chances that long-term diabetes may develop certain complications in form of some known chronic diseases (mayoclinic.org) mainly; cardiovascular disease, brain stroke, nerve damage (neuropathy), kidney damage (nephropathy), eye damage (retinopathy), foot damage, worse skin conditions, hearing impairment, alzheimer’s disease, etc. Furthermore, diabetic patients are put in high-risk zone for catching Covid-19. Therefore, for prediction of diagnosis of these diseases, researchers are inclined towards using deep learning heuristics that is trained on previous diagnosis data and predicts the diagnosis for similar undiagnosed cases [3]. Authors of this paper have contributed to evaluate different facets of proposed diagnostic analytics model using the hybridization of three high level deep learning algorithms on phenotypically and symptomatically rich datasets of multiple diseases. The considerable study done is mentioned in Sect. 2 addressing the problems found. Section 3 puts light on formation and modeling of our datasets for the experiments done in Orange framework. In Sect. 4, heuristics of our algorithmic model is discussed for selection of the three most successful algorithms for their properties after analyzing previous results achieved in similar scenarios. Section 5 elaborates the results when our model is applied to different dimensions of endocrine diagnostic dataset and its limited view as big data analytics. Section 6 documents the observations from experiments. Section 7, summarizes and concludes with analysis of its usefulness in future.

Related Study On Healthcare Analytics Applied For Diagnostics

The advances are seen in constrained clustering as it formulates over algorithms like, k-means, spectral and other mixture models. The extended model [17] overcomes its three significant limitations; (i) handles instance level constraints having higher difficulty level apart from handling standard together/apart constraints, (ii) resolves cluster level constraints by balancing the size, and (iii) triplet constraints by ordering pair-wised constraints through side information. A good feature set based on similarity is the core requirement for analyzing a complex dataset. There are two basic challenges in constrained clustering. While constraints positively influence the performance of some algorithms when averaged over multiple constraints sets but individually results in worse case compared to no constraint. Sometimes there are limited constraints and expert guidance is inputted through side information. When constrained clustering is formulated over deep learning principals it gets advantages in form of scalability and due to hyper parameterization, the negative effect of individual constraints is diminished as by introducing triple constraints negative instances are separated from positive instances. Data-driven healthcare [18] is highly sought-after research area to transform personalized care for treatment purposes. EHR is the best representation for data-driven healthcare so far having lots of noise and sparseness. Extracting good features for phenotypically assessment of patients is thus challenging. Four layered convolutional neural networks, is applied for extraction of phenotypes to make predictions. From the temporal EHR matrix of each patient all phenotypes are extracted that are filtered for the most significant phenotypes which finally help in predictions. Deep learning is applied on discrete patient for best interpretation in four dimensional temporal EHR matrix. Naïve Bayes [19] is also a known machine learning algorithm and widely used but becomes more efficient when applied in mesh with other algorithms.

Machine Learning Evolving into Deep Learning Neural Nets

Study as outlined [20-25] in Fig. 1 is done on latest trend of advancement in machine learning and analytics is seen in rising of deep learning phenomenon. Here we would dig into history of deep learning and reason for its wider use and adoption for smart health solution.

Fig. 1

Related study organization structure

Related study organization structure In 2015, researchers gave their view in [26] that deep learning as an advanced form of machine learning approach is taking over medical diagnosis based on latest studies on the topic [1, 3–5, 8, 9, 13, 16, 26] (Fig. 1). It uses different combinations of neural network algorithms to abstract multiple levels of representation in medical data whether it is speech, images, or text. Researchers themselves found it highly useful for its superior accuracy to interpret medical images using convolutional neural networks (CNN) with back propagation. We see the very first architecture given for deep learning by Ivakhnenko [26] in 1965 is clearly based on the logic of multilayer neural nets. The considerable work on deep learning started in 1986 when Rina Dechter [26] proposed the term ‘deep learning’ for the given architecture. It is also known that genetic and differential evolutionary algorithms that is a branch of machine learning forming baseline for deep learning, performs better than particle swarm optimization algorithms when high computation is required as in big data analytics [27]. Results from 13 problems for 33 different algorithms concluded self-optimized Successful Parent Selecting Linear Population Size Reduction eigenvector-based (SPS-L-SHADE-EIG) algorithm [28] to be winning maximum problems and clearly the problems having most function calls (shown in Table 11 of [27]). Lately other algorithms appeared for evaluation [29-31].

Challenges in Analysis of Healthcare Data

Managing Healthcare data for analysis to predict or diagnose the patient is challenging and more if a clinic is using traditional method of filing of patients’ profiles [32]. Storing medical data electronically as Electronic Medical Records (EMR) in an international standard format used by SNOMED, ICD-10, etc. requires lots of computation and intellectual resources which make the whole process expensive, difficult and time consuming. When the healthcare data is structured on international standards only then it would be best evaluated and interpreted everywhere on similar grounds. Therefore, till now the analysis of healthcare data and its visualization is done in isolation or using online datasets made available.

Deep Learning and Heuristics for Modeling Analytics

Mainly, in analytics, data mining and machine learning approaches are used. Segmentation, clustering, rule-based associations, and classification concepts lies within the branch of data mining. When deciding if clustering or classification is done on labeled or unlabeled dataset, if analysis is done on runtime or prior stored data this approach would be named as machine learning [33]. This approach is used for naming our learning data model for analytics as supervised (learning through labeled offline data), unsupervised (learning on data entered at runtime) or semi-supervised (using both known data and new unlabeled data that is adding to the storage space). Further machine learning approach transforms to deep learning when self-learning multilayer neural networks takes its place for optimization [12] by forming data hierarchy for unlabeled complex datasets to represent best features in data through exploration [34].

Deep Learning Models for Complex EHRs

Deep learning is associated for performing analysis on complex datasets where there is huge volume or excessive features from which abstraction is done through hyper parameterization [11]. Hyper-parameters are number of layers, its hidden units, the activation function, arrangement of layers in a network, etc. Hyper-parameter tuning on a new dataset is a difficult task. Hyper- parameters that are found efficient in simple networks often fail to perform in complex scenario. Results change for every dataset therefore MENNDL was proposed for selection of optimal hyper-parameters for large compute nodes. It used the package for its deep learning model named as Convolutional Architecture for Fast Feature Embedding (CAFFE). The work contributed towards deep fair clustering with multi-state protected variables [35] also formed our understanding of complexity in managing and visualizing clusters of classes with large feature set as in our case it is 18. Forming equal size clusters [35] is not the problem we are looking at here. For our problem the definition of fair clustering is the accuracy of diagnosing patients and associating them in the right class and the distribution may vary. Our fairness measure mostly revolves around the distance metrics we choose to bind similar diagnosis in one cluster.

Datasets Formation and Modeling

Datasets taken to experiment in this paper are three; (i) DF diagnostic data of 104 patients classified into eight labels; DF, DF (D/C), DHF, DHF (D/C), DHF (HD), DHF (Leak), DHF/DSS, and DSS, (ii) diagnostic dataset of 810 instances of other infectious diseases; Covid-19, SARS, ARDS, Pneumocystis, Streptococcus, Chlamydophila, Klebsiella, Legionella, Lipoid, etc. downloaded from GitHub repository, and (iii) 15696 records dataset of 100 endocrine patients with diagnosis of DM and its comorbidity diseases. Further, correlated data model with 240 instances was extracted from 15696 records with associations of DM and its comorbidities. We tested our dataset having 15696 endocrine diagnostic records using Z-Score and Anova metrics of Ordinary Least Squares (OLS) to know the best features as in Fig. 2. Z-score plot for independent features ‘age’ and ‘test results’ gives us 99% confidence level where anova test shows positive correlation for ‘GF’ (Glucose Fasting) with the diagnosis of endocrine disease. Hence, best features selection is better visualized in Figs. 4, 7, 9 and 11 within Orange framework.

Fig. 2

z-score plot for ’age’ and ’test results’ with Anova chart for Glucose Fasting (GF) test for disease diagnosis

Fig. 4

Selected best 7 features for dengue diagnosis

Fig. 7

11 best features selected of 20 features with single target variable ‘finding’ in infectious diseases Dataset

Fig. 9

7 best features selected for Diagnosis of Diabetes and its Comorbidities

Fig. 11

a Cluster of Diagnosis of Diabetes Mellitus (DM), b Diagnostic Clusters for Comorbidities of DM on probability scale (A distorted and vague for larger dataset with variable parameters)

z-score plot for ’age’ and ’test results’ with Anova chart for Glucose Fasting (GF) test for disease diagnosis

Deep Learning Algorithms Used for Proposed Heuristic

We did exploration on dataset of 104 patients diagnosed with or without DF with most important parameters counted to 7 out of 18 resulting in different forms of diagnosis from non-critical to fatal cases categorized into 8 classes. Algorithms that were used are Louvain Clustering, Manifold Learning, and Hierarchical Clustering, and results were projected through scatter plot and linearly. In parallel, Multidimensional Scaling (MDS) with weights is applied for clarity in visualization followed by CN2 Rule Induction classifier for single target class label (Diagnosed) to establish rules omitting outliers. Final, labeled clusters are formed on inliers that are visualized with probabilities of occurrences.

Problems Addressed to Select Winning Algorithms

In [17], we find global size constraint limiting for our problem as any one diagnostic class may not have same number of nodes and size of clusters would be variant. Exploring discrete patient’s [18] symptoms or temporality is also not our concern as we focus on each diagnosis made during patient’s visit or admission filtering key phenotypes. KNN is the most common known artificial intelligence algorithm; therefore, it was tested on endocrine dataset of 15,696 records. Key features were made fuzzy; PatientID, gender, test, result, note, practitioner comment and ICD-10-CM. Accuracy achieved on training was 0.52 and resultant visualizations are shown in Fig. 2. KNN applied on endocrine dataset with accuracy of 0.52 to predict diagnosis KNN model in Fig. 3 showed mean absolute error of 22.84 (It is the mean of the absolute value of the errors), Residual sum of squares (MSE) of 569.58 (Mean Squared Error (MSE) is the mean of the squared error) and R2-score of . The higher the R-squared, the better the model fits your data. Best possible score of R2 is 1.0 and here it is negative because the model is arbitrarily worse. The visualizations are also not significant as demonstrated in Sect. 4 for LMHFL.

Fig. 3

KNN applied on endocrine dataset with accuracy of 0.52 to predict diagnosis

Naïve Bayes [19] may be a winning algorithm but may fail due to its singularity in case of complex heterogeneous dataset in terms of features. We find approaches like Multi-Dimensional Scaling (MDS) faster for visual representation of dataset. Therefore, in our case, we explored combination of algorithms; (i) Louvain clustering, (ii) Manifold Learning, and (iii) Hierarchical clustering, on different sizes of datasets with variations in hyper parameters for large to limited number of features that could be tuned to get maximum accuracy in labeling classes.

Tools for Interpretation and Visualization

Weka is widely used analytics tool but there are others as well when more precise and clear representation of data is required using different data modeling and combination of analytics algorithms lying in family of deep learning approach. Other analytics tools used in [12] beside Weka are RapidMiner and Orange. Our paper demonstrates results from Deep Analytical Hybrid Model run in Orange Framework explained in this section.

Louvain Clustering

Louvain clustering [36] is refinement of Greedy Modularity Optimization (GMO). It localizes modularity starting by defining multiple communities and shifting nodes to neighboring clusters while optimizing for maximizing modularity [37, 38]. The accuracy of Louvain Clustering was compared with other similar supervised learning algorithms and was found 94% accurate [39]. Selected best 7 features for dengue diagnosis

Manifold Learning

Authors in [28, 33] shows the strength of Manifold Learning algorithm which itself is recognized for comparison and anomaly detection in multiple machine learning models and known for its interpretation of subsets of diverse features. In conventional methods focus resides on inner structure of model for example in deep neural networks but here more complex cases are solved with integrated multiple models to study inputs and outputs as an unsupervised approach. Its results are better visualized through scatter plot and tabular view that is customized to show feature discrimination. Its interactive design is the result of efforts contributed by a body of researchers and engineers. It is greatly effective in refining and using machine learning models in parallel to articulate individual feature characteristics in subsets that diminishes the chances of coding errors found in other learning models. Manifold learning is used both for 2D and 3D feature classifications as it is used for diagnosing diabetic retinopathy (DR) [40] and Prediction of IQs from functional MRI (fMRI) [36] on real-time dataset.

Hierarchical Clustering

Machine learning paradigm is greatly assisted with visual analytics (VA) for improvements as in VA-Assisted-ML (VIS4ML) [41]. VIS4ML comprises of high-level methodologies using VA capabilities to assist machine learning and may be used to optimize model development. Hierarchical decomposition of network of nodes is one significant method to visualize gaps and similarity in data as representation through clustering is a known conventional approach for deep learning [42]. Hierarchical clustering has always been a measure for qualitative analysis on quantifiable data. Hierarchically Clustered Representation Learning (HCRL) [42] is proposed to keep hierarchical structure in a deep neural architecture. A three-layered approach [43] was presented for categorizing similar patients in deep metric learning framework having ICD-10 coding scheme.

Louvain Mani-Hierarchical Fold Learning for Diagnosis of Dengue Fever (DF)

Researchers here have devised a semi supervised learning model comprised with heuristic (exploratory) approach based on three layers embedding high level deep learning approaches for visual interpretation of diagnostics of DF categorized as DF, DF (D/C), dengue hemorrhagic fever (DHF), DHF (D/C), DHF (HD), DHF (Leak), DHF/DSS and dengue shock syndrome (DSS). Louvain clustering, manifold learning and hierarchical clustering are applied to model our proposed heuristic named as Louvain Mani- Hierarchical Fold Learning (LMHFL) in orange framework. Seven best features are selected ranked (Fig. 4) using various statistical measures to calculate their influence on our learning of different inferences extracted through provided data. Other known machine learning methods are applied like; Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Support Vector Machine (SVM) with nonlinear kernel (Table 1) with varying distance metrics. In Tables 1, 2 and 3 several interpretations were drawn with differing models based on altering hyper parameters (visualizations are displayed in Figs. 5, 6). Clusters formations are visualized and in detail elaborated in Table 3.

Table 1

Louvain mani-hierarchical fold learning shown in various model interpretations due to its flexibility

Models/Hyperparameters	Method	Component	K	C	Evaluation metric	Learning rate	Iterations	Outliers
Louvain Clustering	PCA	7	5, 2	3	Euclidean	NA	NA	62
Manifold Learning	t-SNE	7	NA	NA	Euclidean	150	2000	NA
Hierarchical Clustering	Normalized/One class SVM with non-linear kernel (RBF)	NA	NA	7, 10, 14, 20	Manhattan	NA	NA	NA
CN2 Rule Induction Classifier	MDS/Unordered/Weighted	NA		NA	Entropy	NA	NA	NA
Louvain Clustering	PCA	7	4, 0	3	Euclidean	NA	NA	62
Manifold Learning	t-SNE	6	NA	NA	Euclidean	150	2000	NA
Hierarchical Clustering	Normalized/One class SVM with non-linear kernel (RBF)	NA	NA	8, 12, 16, 20	Manhattan	NA	NA	NA
CN2 Rule Induction Classifier	MDS/Unordered/Weighted	NA	NA	NA	Entropy	NA	NA	NA
Louvain Clustering	NA	7	3, 3	3	Euclidean	NA	NA	62
Manifold Learning	t-SNE	4	NA	NA	Euclidean	150	2000	NA
Hierarchical Clustering	Normalized/One class SVM with non-linear kernel (RBF)	NA	NA	9, 15, 18, 20	Manhattan	NA	NA	NA
CN2 Rule Induction Classifier	MDS/Unordered/Weighted	NA	NA	NA	Entropy	NA	NA	NA
Louvain Clustering	NA	6	2, 5	4	Euclidean	NA	NA	62
Manifold Learning	t-SNE	3	NA	NA	Euclidean	150	2000	NA
Hierarchical Clustering	Normalized/One class SVM with non-linear kernel (RBF)	NA	NA	7, 11, 16, 20	Manhattan	NA	NA	NA
CN2 Rule Induction Classifier	MDS/Unordered/Weighted	NA	NA	NA	Entropy	NA	NA	NA
Louvain Clustering	NA	6	1, 2	7	Euclidean Man	NA	NA	62

Table 2

Detailed structure of clusters formation with df class labels for model/s with different parameter settings in multiple iterations (interpret)

Clusters	After pruning outliers	Focused clusters	DF	DF (D/C)
7	C1–C6	C6=DF	C2, C4–C6	–
10	C1–C10	C3=DSS	C2, C4–C10	C8, C9
14	C1–C14	C4=DSS, C6=DHF	C2, C3, C5, C7, C9–C14	C11, C12
20	C1–C20	C5=DF, C6=DSS, C8=DHF, C9=DF	C3–C5, C7, C9, C10, C13–C20	C15, C18
8	C1–C8	–	C1–C8	C2, C5
12	C1–C12	C3=DF	C1–C12	C2, C7
20	C1–C20	C5=DF, C1, C10, C12=DHF, C16=DSS	C2, C3, C5–C7, C9, C11, C13–C15, C17, C19, C20	C3, C11
9	C2–C9	C2=DF	C2–C4, C6–C9	–
15	C2–C5, C8, C10–C15	C2, C3, C10=DF, C14=DSS	C2–C5, C10–C13, C15	–
18	C2–C6, C9, C11–C18	C2, C3, C11, C14=DF, C13=DHF, C17=DSS	C2–C4, C6, C11, C12, C14, C16, C18	–
20	C3–C7, C10, C11, C13–C20	C3, C4, C13, C16=DF, C10, C19=DSS, C11, C15=DHF	C3–C5, C7, C13, C14, C16, C18, C20	–
7	C1, C3-C7	C1=DF	C1, C4–C7	–
11	C1, C3, C4, C6, C8–C11	C1, C10=DF	C1, C4, C6, C8–C11	–
16	C1, C4, C5, C7, C9–C11, C13–C16	C1, C13=DF, C10=DHF, C15=DSS	C1, C5, C7, C9, C11, C13, C14	–
20	C1, C2, C5–C8, C10, C12–C15, C17–C20	C1, C2, C8, C14, C17=DF, C6, C19=DSS, C13, C15=DHF	C1, C2, C7, C8, C10, C12, C14, C17, C18	–
8	C1–C8	C7=DHF	C1, C3–C6	–
13	C1–C6, C8–C13	C11=DHF	C1, C4–C6, C8	–
17	C2–C8, C10–C12, C14–C17	C6=DF, C14, C17=DHF, C16=DHF (D/C)	C2, C5–C8, C10	–
20	C2–C9, C12–C14, C16, C18–C20	C7=DF, C16, C20=DHF, C19=DHF (D/C)	C2, C5, C7–C9, C12	–

Table 3

Detailed structure of clusters formation with df (types) class labels for model/s with different parameter settings in multiple iterations (interpret)

Clusters	DHF	DHF (D/C)	DHF (HD)	DHF (Leak)	DHF/DSS	DSS
7	C1-C5	–	C2	C2	–	C1, C2, C3, C5
10	C1, C2, C4-C10	C8-C10	C2	C2	C9	C1-C6, C8-C10
14	C1, C2, C5-C12, C14	C11, C12, C14	C3	C2	C12	C1, C2, C4, C5, C7-C9, C11-C13
20	C1-C3, C7, C8, C10-C18, C20	C16, C18, C20	C4	C3	C17	C1-C4, C6, C7, C10-C13, C15, C16, C18, C19
8	C1-C8	C2-C4	C7	C8	C2	C2, C4-C8
12	C1, C2, C4-C8, C10-C12	C2, C4, C5	C10	C12	C2	C2, C5, C6, C8-C12
20	C1-C4, C6-C13, C15, C17, C18, C20	C4, C6, C7	C15	C18	C4	C3, C7-C9, C13-C20
9	C3-C5, C7-C9	–	C6	C4	–	C3-C6, C8, C9
15	C4, C5, C8, C12, C13, C15	–	C11	C5	–	C4, C5, C8, C11, C13-C15
18	C4, C5, C9, C13, C15, C18	–	C12	C5	–	C4, C6, C9, C12, C15-C18
20	C5, C6, C17, C20	–	C14	C6	–	C5, C7, C10, C14, C17-C20
7	C3-C7	–	C7	C7	–	C3-C5, C7
11	C3, C4, C8, C9, C11	–	C11	C11	–	C3, C4, C6, C8, C11
16	C4, C5, C9-C11, C16	–	C14	C16	–	C4, C5, C7, C9, C14, C15
20	C5, C7, C12, C13, C15, C20	–	C18	C20	–	C5-C7, C10, C12, C18, C19
8	C1-C8	C4, C8	–	C5	–	C2, C3, C5, C6, C8
13	C1-C6, C9-C13	C5, C13	–	C6	–	C2-C4, C6, C8-C10, C12
17	C2-C5, C7, C8, C11, C12, C14, C15, C17	C7, C16	–	C8	–	C3-C5, C8, C10-C12, C15
20	C2-C4, C6, C8, C9, C13, C14, C16, C18, C20	C8, C19	–	C9	–	C3-C6, C9, C12-C14, C18

Fig. 5

Louvain Mani-Hierarchical Fold Learning Model–1 to classify DF data having multiple diagnosis in 7 focused clusters for 8 classes. Classes: DF, DF (D/C), dengue hemorrhagic fever (DHF), DHF (D/C), DHF (HD), DHF (Leak), DHF/DSS and dengue shock syndrome (DSS)

Fig. 6

a 7 Clusters formed in Model-1, b Final Clusters filtering Outliers (Model-1) on probability scale. A clearer view for 100 records of Dengue Fever Patients

Louvain mani-hierarchical fold learning shown in various model interpretations due to its flexibility Detailed structure of clusters formation with df class labels for model/s with different parameter settings in multiple iterations (interpret) Detailed structure of clusters formation with df (types) class labels for model/s with different parameter settings in multiple iterations (interpret) Louvain Mani-Hierarchical Fold Learning Model–1 to classify DF data having multiple diagnosis in 7 focused clusters for 8 classes. Classes: DF, DF (D/C), dengue hemorrhagic fever (DHF), DHF (D/C), DHF (HD), DHF (Leak), DHF/DSS and dengue shock syndrome (DSS) a 7 Clusters formed in Model-1, b Final Clusters filtering Outliers (Model-1) on probability scale. A clearer view for 100 records of Dengue Fever Patients

Louvain Mani-Hierarchical Fold Learning for Diagnosis of Covid-19 and other infectious diseases

Another widely hit infectious disease that has hit the World is Coronavirus (Covid-19) with some other infectious diseases like, SARS, etc. In this paper, as researchers used text/tabular dataset that symptomatically diagnose diseases based on recommended tests and results for Covid-19 and other infectious diseases. Single target class label (finding) depended on 11 best features in Fig. 7. Multiple target class labels like; finding and survival were considered but the proposed model lacked rule induction feature for multiple classes. Features: offset, age, sex, RT_PCR_positive, intubated, intubation_present, went_icu, in_icu, needed_supplemental_O2, extubated, temperature, pO2_saturation, leukocyte_count, neutrophil_count, lymphocyte_count, view, modality, date, folder, survival (total: 20 features). Meta attributes: patientid, location, clinical_notes, other_notes. Target: finding Out of several views, below were the rules induced with maximum accuracy of 0.701 for diagnosis of Covid-19 and other infectious diseases (Fig. 8). Parameters for extracting rules were: (i) Rule ordering: ordered, (ii) Covering algorithm: exclusive, (iii) Gamma: 0.7, (iv) Evaluation measure: laplace, (v) Beam width: 5, (vi) Minimum rule coverage: 1, (vii) Maximum rule length: 11, (viii) Default alpha: 1.0, (ix) Parent alpha: 1.0.

Fig. 8

18 rules extracted for diagnosis of COVID-19 and other infectious diseases with highest accuracy of 0.701 having rule length of 3 (Truncated View)

11 best features selected of 20 features with single target variable ‘finding’ in infectious diseases Dataset 18 rules extracted for diagnosis of COVID-19 and other infectious diseases with highest accuracy of 0.701 having rule length of 3 (Truncated View) 7 best features selected for Diagnosis of Diabetes and its Comorbidities

Louvain Mani-Hierarchical Fold Learning (LMHFL) as Big Data Analytics Technique

The heuristic model for LMHFL algorithm was then tried on around 15696 records of endocrine patients with 7 best features selected using Orange framework in Fig. 9 and gave below results that lacked clarity on different parameters setting (Table 4). These ambiguous unclear visualizations in Fig. 10 were due to the huge volume of data that took almost 48 hours processing time. It is greatly felt that for better results for LMHFL algorithm on healthcare big data high performance cloud platform is needed.

Table 4

Set parameters for LMHFL

Louvain Clustering	Manifold Learning	Data	Detection
Normalize data: Yes, PCA preprocessing: Yes, 7 components, Metric: Euclidean, K neighbors: 100, Resolution: 2.0, 33 Clusters	Method: t-SNE, n_components: 7, metric: euclidean, perplexity: 30, early_exaggeration: 12, learning_rate: 100, n_iter: 3000, initialization: PCA	Input instances: 9646; Features: PatientID, VAN, Appointments, Test_Date, Assessment, Age, Gender; Meta attributes: Note, ICD-10-CM, PC, Result, Cluster; Target: Examination, Test, Diagnosis; Inliers: 3023; Outliers: 6623	Detection method: One class SVM with non-linear kernel (RBF); Regularization (nu): 50; Kernel coefficient: 0.01

Fig. 10

Multi-Dimensional Scaling (MDS) in LMHFL for Endocrine dataset having 15696 records for 8 classes of Diabetes Mellitus (DM or dm) and its Comorbidities; Breast Cancer (as Ca Breast), Hormonel, Hypertension (HTN), Hyper Lipidemia, Thyroid, Insuficiencia Renal Cronica (IRC) and Other; in 11 clusters as 3 clusters of (CA BREAST, Ca Breast, ca breast), 2 clusters of (DM and dm), HORMONEL, HTN, Hyper lipidemia, THYROID, IRC and Other

Set parameters for LMHFL

Louvain Mani-Hierarchical Fold Learning (LMHFL) for Diagnosing Diabetes and its Comorbidities

LMHFL was able to give better visual results for 240 instances (records) extracted using DM comorbidities data model for diagnosing diabetes and its associating comorbidities linked to each patient profile with limited features; PatientID, TestID, Test, and Result for targeted single class; Disease (diagnosed). LMHFL was evaluated apart from probability chart (Fig. 11 using Spearman correlation giving maximum results for four combination of features that were reduced to C0, C1, C2, C3, C4, C5 and C6, in ranges; 0.952, 0.942, 0.916 and 0.817. CN2 Rule Induction was applied to finally view combination of rules generated that were 28 for diagnosis of diabetes mellitus and its comorbidities. Multi-Dimensional Scaling (MDS) in LMHFL for Endocrine dataset having 15696 records for 8 classes of Diabetes Mellitus (DM or dm) and its Comorbidities; Breast Cancer (as Ca Breast), Hormonel, Hypertension (HTN), Hyper Lipidemia, Thyroid, Insuficiencia Renal Cronica (IRC) and Other; in 11 clusters as 3 clusters of (CA BREAST, Ca Breast, ca breast), 2 clusters of (DM and dm), HORMONEL, HTN, Hyper lipidemia, THYROID, IRC and Other a Cluster of Diagnosis of Diabetes Mellitus (DM), b Diagnostic Clusters for Comorbidities of DM on probability scale (A distorted and vague for larger dataset with variable parameters) Features evaluation matrix DM and its Comorbidity Diseases are related by specific Test (shown as frequencies and probability of occurrence) 28 Induced rules for diabetes and its comorbidities using entropy measure (truncated view)

Insight on Experimental Results

Several valuable observations from multiple experiments applying same algorithmic model LMHFL with different hyper parameter settings were documented conducted on multiple diseases (Fig. 12).

Fig. 12

Features evaluation matrix

LMHFL may be applied for diagnosing multiple diseases. Accuracy of diagnosis is validated on its probability of occurrence with given parameters (as in Figs. 6b, 11, 13 and 14). Best quality results of rules induction for DF, Covid-19 with other infectious diseases and DM comorbidities datasets is evaluated by Laplace (Fig. 8) as 0.727, 0.701 and 0.203 for 41, 18 and 24 rules, respectively. Entropy measure showed 0 quality on given algorithmic parameters (Fig. 14) with 28 rules for endocrine diseases and 173 rules for DF classes.

Fig. 13

DM and its Comorbidity Diseases are related by specific Test (shown as frequencies and probability of occurrence)

Fig. 14

28 Induced rules for diabetes and its comorbidities using entropy measure (truncated view)

A DM comorbidities data model is extracted with 240 instances showing probability of occurrences in Fig. 13, and Spearman maximum correlation achieved was 0.952. Visualizations for DF patients in Fig. 5 classify each patient diagnosis based on results for different tests. The diagnosis dependency is seen to relate with specific features (Figs. 5, 6 and 11 show relation with tests conducted and classification based on results). More detailed view of rules induced was observed. Final diagnosis is still left upon doctors after judging the probabilistic scale for disease occurrence on given rules deduced from available features. Visualizations lack clarity on endocrine big data and are not yet patient specific which is platform dependent and this algorithmic model may be validated later over high-performance cloud.

Analysis for Future Recommendation

In Sect. 4, previous machine learning algorithms were referred to and tried for better representations but failed. LMHFL is a hybrid deep learning heuristic model for visual analytics that may be applied on imperfect training data available in huge amount normalized into structured datasheet complying to HL7 FHIR v4 to get accurate inferences. In [44], small dataset of only 80 diabetic patients was considered to group as healthy, diabetes Type 2 without depression and with depression to find inferences based on biomarkers. These groups were evaluated on training data with CI of 95% and sample size having depression as comorbidity disease is 50%. The dataset [44] was characterized through t-test and p-value. This study [44] is said to have several limitations. We took larger datasets in different variations and feature based selection for our study. To establish the concreteness of our results for DF (104 instances), infectious diseases having COVID-19 (810 instances), DM (15696 instances) and endocrine comorbidities (240 selected instances from 15696 rows) we passed test data through focused clusters that were labeled as single or multiple target classes [9, 45]. Test data classified in other clusters would gain confidence on the most appropriate probabilistic ratio (as in Figs. 6, 11 or 13) and would be finalized by a qualified doctor intervention. The flexibility to tune this algorithm to fit any dataset is its strength but for better quality and in-time processing of results high performance cloud platform is required that may be RapidMiner or much better version for speed and accuracy. In this paper, the proposed heuristics is limited by its ability to label multi-dimensional data, rules induced were only for single target class and images were not taken as features for diagnosis. In future, authors tend to find in-depth associations [46] of mentioned comorbidities with diabetes mellitus depending on features or biomarkers. The heuristics [46] would be tested and validated on custom rules extracted from experts’ opinions in endocrine knowledge domain.

15 in total

Leveraging Deep Learning for Designing Healthcare Analytics Heuristic for Diagnostics.

Introduction

Related Study On Healthcare Analytics Applied For Diagnostics

Machine Learning Evolving into Deep Learning Neural Nets

Challenges in Analysis of Healthcare Data

Deep Learning and Heuristics for Modeling Analytics

Deep Learning Models for Complex EHRs

Datasets Formation and Modeling

Deep Learning Algorithms Used for Proposed Heuristic

Problems Addressed to Select Winning Algorithms

Tools for Interpretation and Visualization

Louvain Clustering

Manifold Learning

Hierarchical Clustering

Louvain Mani-Hierarchical Fold Learning for Diagnosis of Dengue Fever (DF)

Louvain Mani-Hierarchical Fold Learning for Diagnosis of Covid-19 and other infectious diseases

Louvain Mani-Hierarchical Fold Learning (LMHFL) as Big Data Analytics Technique

Louvain Mani-Hierarchical Fold Learning (LMHFL) for Diagnosing Diabetes and its Comorbidities

Insight on Experimental Results

Analysis for Future Recommendation

Review 1. Disease and economic burdens of dengue.

2. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models.

3. VIS4ML: An Ontology for Visual Analytics Assisted Machine Learning.

4. Increased serum interleukin-9 and interleukin-1β are associated with depression in type 2 diabetes patients.

Review 5. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis.

Review 6. Clinical information extraction applications: A literature review.

7. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system.

Review 8. Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data.

Review 9. Artificial intelligence and deep learning in ophthalmology.

1. Recurrent Neural Networks for Feature Extraction from Dengue Fever.