| Literature DB >> 33551665 |
Sarah Shafqat1,2, Maryyam Fayyaz3, Hasan Ali Khattak4, Muhammad Bilal5, Shahid Khan6, Osama Ishtiaq6, Almas Abbasi1, Farzana Shafqat2,6, Waleed S Alnumay7, Pushpita Chatterjee8,9.
Abstract
Healthcare Informatics is a phenomenon being talked about from the early 21st century in the era in which we are living. With evolution of new computing technologies huge amount of data in healthcare is produced opening several research areas. Managing the massiveness of this data is required while extracting knowledge for decision making is the main concern of today. For this task researchers are doing explorations in big data analytics, deep learning (advanced form of machine learning known as deep neural nets), predictive analytics and various other algorithms to bring innovation in healthcare. Through all these innovations happening it is not wrong to establish that disease prediction with anticipation of its cure is no longer unrealistic. First, Dengue Fever (DF) and then Covid-19 likewise are new outbreak in infectious lethal diseases and diagnosing at all stages is crucial to decrease mortality rate. In case of Diabetes, clinicians and experts are finding challenging the timely diagnosis and analyzing the chances of developing underlying diseases. In this paper, Louvain Mani-Hierarchical Fold Learning healthcare analytics, a hybrid deep learning technique is proposed for medical diagnostics and is tested and validated using real-time dataset of 104 instances of patients with dengue fever made available by Holy Family Hospital, Pakistan and 810 instances found for infectious diseases including prognosis of; Covid-19, SARS, ARDS, Pneumocystis, Streptococcus, Chlamydophila, Klebsiella, Legionella, Lipoid, etc. on GitHub. Louvain Mani-Hierarchical Fold Learning healthcare analytics showed maximum 0.952 correlations between two clusters with Spearman when applied on 240 instances extracted from comorbidities diagnostic data model derived from 15696 endocrine records of multiple visits of 100 patients identified by a unique ID. Accuracy for induced rules is evaluated by Laplace (Fig. 8) as 0.727, 0.701 and 0.203 for 41, 18 and 24 rules, respectively. Endocrine diagnostic data is made available by Shifa International Hospital, Islamabad, Pakistan. Our results show that in future this algorithm may be tested for diagnostics on healthcare big data.Entities:
Keywords: Big data; Deep learning algorithm; Endocrine diseases; Healthcare analytics; Infectious diseases; Learning healthcare system; Medical diagnostics; Neural nets
Year: 2021 PMID: 33551665 PMCID: PMC7852051 DOI: 10.1007/s11063-021-10425-w
Source DB: PubMed Journal: Neural Process Lett ISSN: 1370-4621 Impact factor: 2.908
Fig. 1Related study organization structure
Fig. 2z-score plot for ’age’ and ’test results’ with Anova chart for Glucose Fasting (GF) test for disease diagnosis
Fig. 4Selected best 7 features for dengue diagnosis
Fig. 711 best features selected of 20 features with single target variable ‘finding’ in infectious diseases Dataset
Fig. 97 best features selected for Diagnosis of Diabetes and its Comorbidities
Fig. 11a Cluster of Diagnosis of Diabetes Mellitus (DM), b Diagnostic Clusters for Comorbidities of DM on probability scale (A distorted and vague for larger dataset with variable parameters)
Fig. 3KNN applied on endocrine dataset with accuracy of 0.52 to predict diagnosis
Louvain mani-hierarchical fold learning shown in various model interpretations due to its flexibility
| Models/Hyperparameters | Method | Component | K | C | Evaluation metric | Learning rate | Iterations | Outliers |
|---|---|---|---|---|---|---|---|---|
| Louvain Clustering | PCA | 7 | 5, 2 | 3 | Euclidean | NA | NA | 62 |
| Manifold Learning | t-SNE | 7 | NA | NA | Euclidean | 150 | 2000 | NA |
| Hierarchical Clustering | Normalized/One class SVM with non-linear kernel (RBF) | NA | NA | 7, 10, 14, 20 | Manhattan | NA | NA | NA |
| CN2 Rule Induction Classifier | MDS/Unordered/Weighted | NA | NA | Entropy | NA | NA | NA | |
| Louvain Clustering | PCA | 7 | 4, 0 | 3 | Euclidean | NA | NA | 62 |
| Manifold Learning | t-SNE | 6 | NA | NA | Euclidean | 150 | 2000 | NA |
| Hierarchical Clustering | Normalized/One class SVM with non-linear kernel (RBF) | NA | NA | 8, 12, 16, 20 | Manhattan | NA | NA | NA |
| CN2 Rule Induction Classifier | MDS/Unordered/Weighted | NA | NA | NA | Entropy | NA | NA | NA |
| Louvain Clustering | NA | 7 | 3, 3 | 3 | Euclidean | NA | NA | 62 |
| Manifold Learning | t-SNE | 4 | NA | NA | Euclidean | 150 | 2000 | NA |
| Hierarchical Clustering | Normalized/One class SVM with non-linear kernel (RBF) | NA | NA | 9, 15, 18, 20 | Manhattan | NA | NA | NA |
| CN2 Rule Induction Classifier | MDS/Unordered/Weighted | NA | NA | NA | Entropy | NA | NA | NA |
| Louvain Clustering | NA | 6 | 2, 5 | 4 | Euclidean | NA | NA | 62 |
| Manifold Learning | t-SNE | 3 | NA | NA | Euclidean | 150 | 2000 | NA |
| Hierarchical Clustering | Normalized/One class SVM with non-linear kernel (RBF) | NA | NA | 7, 11, 16, 20 | Manhattan | NA | NA | NA |
| CN2 Rule Induction Classifier | MDS/Unordered/Weighted | NA | NA | NA | Entropy | NA | NA | NA |
| Louvain Clustering | NA | 6 | 1, 2 | 7 | Euclidean Man | NA | NA | 62 |
Detailed structure of clusters formation with df class labels for model/s with different parameter settings in multiple iterations (interpret)
| Clusters | After pruning outliers | Focused clusters | DF | DF (D/C) |
|---|---|---|---|---|
| 7 | C1–C6 | C6=DF | C2, C4–C6 | – |
| 10 | C1–C10 | C3=DSS | C2, C4–C10 | C8, C9 |
| 14 | C1–C14 | C4=DSS, C6=DHF | C2, C3, C5, C7, C9–C14 | C11, C12 |
| 20 | C1–C20 | C5=DF, C6=DSS, C8=DHF, C9=DF | C3–C5, C7, C9, C10, C13–C20 | C15, C18 |
| 8 | C1–C8 | – | C1–C8 | C2, C5 |
| 12 | C1–C12 | C3=DF | C1–C12 | C2, C7 |
| 20 | C1–C20 | C5=DF, C1, C10, C12=DHF, C16=DSS | C2, C3, C5–C7, C9, C11, C13–C15, C17, C19, C20 | C3, C11 |
| 9 | C2–C9 | C2=DF | C2–C4, C6–C9 | – |
| 15 | C2–C5, C8, C10–C15 | C2, C3, C10=DF, C14=DSS | C2–C5, C10–C13, C15 | – |
| 18 | C2–C6, C9, C11–C18 | C2, C3, C11, C14=DF, C13=DHF, C17=DSS | C2–C4, C6, C11, C12, C14, C16, C18 | – |
| 20 | C3–C7, C10, C11, C13–C20 | C3, C4, C13, C16=DF, C10, C19=DSS, C11, C15=DHF | C3–C5, C7, C13, C14, C16, C18, C20 | – |
| 7 | C1, C3-C7 | C1=DF | C1, C4–C7 | – |
| 11 | C1, C3, C4, C6, C8–C11 | C1, C10=DF | C1, C4, C6, C8–C11 | – |
| 16 | C1, C4, C5, C7, C9–C11, C13–C16 | C1, C13=DF, C10=DHF, C15=DSS | C1, C5, C7, C9, C11, C13, C14 | – |
| 20 | C1, C2, C5–C8, C10, C12–C15, C17–C20 | C1, C2, C8, C14, C17=DF, C6, C19=DSS, C13, C15=DHF | C1, C2, C7, C8, C10, C12, C14, C17, C18 | – |
| 8 | C1–C8 | C7=DHF | C1, C3–C6 | – |
| 13 | C1–C6, C8–C13 | C11=DHF | C1, C4–C6, C8 | – |
| 17 | C2–C8, C10–C12, C14–C17 | C6=DF, C14, C17=DHF, C16=DHF (D/C) | C2, C5–C8, C10 | – |
| 20 | C2–C9, C12–C14, C16, C18–C20 | C7=DF, C16, C20=DHF, C19=DHF (D/C) | C2, C5, C7–C9, C12 | – |
Detailed structure of clusters formation with df (types) class labels for model/s with different parameter settings in multiple iterations (interpret)
| Clusters | DHF | DHF (D/C) | DHF (HD) | DHF (Leak) | DHF/DSS | DSS |
|---|---|---|---|---|---|---|
| 7 | C1-C5 | – | C2 | C2 | – | C1, C2, C3, C5 |
| 10 | C1, C2, C4-C10 | C8-C10 | C2 | C2 | C9 | C1-C6, C8-C10 |
| 14 | C1, C2, C5-C12, C14 | C11, C12, C14 | C3 | C2 | C12 | C1, C2, C4, C5, C7-C9, C11-C13 |
| 20 | C1-C3, C7, C8, C10-C18, C20 | C16, C18, C20 | C4 | C3 | C17 | C1-C4, C6, C7, C10-C13, C15, C16, C18, C19 |
| 8 | C1-C8 | C2-C4 | C7 | C8 | C2 | C2, C4-C8 |
| 12 | C1, C2, C4-C8, C10-C12 | C2, C4, C5 | C10 | C12 | C2 | C2, C5, C6, C8-C12 |
| 20 | C1-C4, C6-C13, C15, C17, C18, C20 | C4, C6, C7 | C15 | C18 | C4 | C3, C7-C9, C13-C20 |
| 9 | C3-C5, C7-C9 | – | C6 | C4 | – | C3-C6, C8, C9 |
| 15 | C4, C5, C8, C12, C13, C15 | – | C11 | C5 | – | C4, C5, C8, C11, C13-C15 |
| 18 | C4, C5, C9, C13, C15, C18 | – | C12 | C5 | – | C4, C6, C9, C12, C15-C18 |
| 20 | C5, C6, C17, C20 | – | C14 | C6 | – | C5, C7, C10, C14, C17-C20 |
| 7 | C3-C7 | – | C7 | C7 | – | C3-C5, C7 |
| 11 | C3, C4, C8, C9, C11 | – | C11 | C11 | – | C3, C4, C6, C8, C11 |
| 16 | C4, C5, C9-C11, C16 | – | C14 | C16 | – | C4, C5, C7, C9, C14, C15 |
| 20 | C5, C7, C12, C13, C15, C20 | – | C18 | C20 | – | C5-C7, C10, C12, C18, C19 |
| 8 | C1-C8 | C4, C8 | – | C5 | – | C2, C3, C5, C6, C8 |
| 13 | C1-C6, C9-C13 | C5, C13 | – | C6 | – | C2-C4, C6, C8-C10, C12 |
| 17 | C2-C5, C7, C8, C11, C12, C14, C15, C17 | C7, C16 | – | C8 | – | C3-C5, C8, C10-C12, C15 |
| 20 | C2-C4, C6, C8, C9, C13, C14, C16, C18, C20 | C8, C19 | – | C9 | – | C3-C6, C9, C12-C14, C18 |
Fig. 5Louvain Mani-Hierarchical Fold Learning Model–1 to classify DF data having multiple diagnosis in 7 focused clusters for 8 classes. Classes: DF, DF (D/C), dengue hemorrhagic fever (DHF), DHF (D/C), DHF (HD), DHF (Leak), DHF/DSS and dengue shock syndrome (DSS)
Fig. 6a 7 Clusters formed in Model-1, b Final Clusters filtering Outliers (Model-1) on probability scale. A clearer view for 100 records of Dengue Fever Patients
Fig. 818 rules extracted for diagnosis of COVID-19 and other infectious diseases with highest accuracy of 0.701 having rule length of 3 (Truncated View)
Set parameters for LMHFL
| Louvain Clustering | Manifold Learning | Data | Detection |
|---|---|---|---|
| Normalize data: Yes, PCA preprocessing: Yes, 7 components, Metric: Euclidean, K neighbors: 100, Resolution: 2.0, 33 Clusters | Method: t-SNE, n_components: 7, metric: euclidean, perplexity: 30, early_exaggeration: 12, learning_rate: 100, n_iter: 3000, initialization: PCA | Input instances: 9646; Features: PatientID, VAN, Appointments, Test_Date, Assessment, Age, Gender; Meta attributes: Note, ICD-10-CM, PC, Result, Cluster; Target: Examination, Test, Diagnosis; Inliers: 3023; Outliers: 6623 | Detection method: One class SVM with non-linear kernel (RBF); Regularization (nu): 50; Kernel coefficient: 0.01 |
Fig. 10Multi-Dimensional Scaling (MDS) in LMHFL for Endocrine dataset having 15696 records for 8 classes of Diabetes Mellitus (DM or dm) and its Comorbidities; Breast Cancer (as Ca Breast), Hormonel, Hypertension (HTN), Hyper Lipidemia, Thyroid, Insuficiencia Renal Cronica (IRC) and Other; in 11 clusters as 3 clusters of (CA BREAST, Ca Breast, ca breast), 2 clusters of (DM and dm), HORMONEL, HTN, Hyper lipidemia, THYROID, IRC and Other
Fig. 12Features evaluation matrix
Fig. 13DM and its Comorbidity Diseases are related by specific Test (shown as frequencies and probability of occurrence)
Fig. 1428 Induced rules for diabetes and its comorbidities using entropy measure (truncated view)