Ahmed Al-Shammari1, Rui Zhou2, Mehdi Naseriparsaa3, Chengfei Liu3. 1. Department of Computer Science and Software Engineering, Faculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia; University of Al-Qadisiyah, Al Diwaniyah, Iraq. Electronic address: aalshammari@swin.edu.au. 2. Department of Computer Science and Software Engineering, Faculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia. Electronic address: rzhou@swin.edu.au. 3. Department of Computer Science and Software Engineering, Faculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia.
Abstract
BACKGROUND: Medical data stream clustering has become an integral part of medical decision systems since it extracts highly-sensitive information from a tremendous flow of medical data. However, clustering and maintaining of medical data streams is still a challenging task. That is because the evolving of medical data streams imposes various challenges for clustering such as the ability to discover the arbitrary shape of a cluster, the ability to group data streams without a predefined number of clusters, and the ability to maintain the data clusters dynamically. OBJECTIVE: To support the online medical decisions, there is a need to address the clustering challenges. Therefore, in this paper, we propose an effective density-based clustering and dynamic maintenance framework for grouping the patients with similar symptoms into meaningful clusters and monitoring the patients' status frequently. METHODS: For clustering, we generate a set of initial medical data clusters based on the combination of Piece-wise Aggregate Approximation and the density-based spatial clustering of applications with noise called (PAA+DBSCAN) algorithm. For maintenance, when new medical data streams arrive, we maintain the initially generated medical data clusters dynamically. Since the incremental cluster maintenance is time-consuming, we further propose an Advanced Cluster Maintenance (ACM) approach to improve the performance of the dynamic cluster maintenance. RESULTS: The experimental results on real-world medical datasets demonstrate the effectiveness and efficiency of our proposed approaches. The PAA+DBSCAN algorithm is more efficient and effective than the exact DBSCAN algorithm. Moreover, the ACM approach requires less running time in comparison with the Baseline Cluster Maintenance (BCM) approach using different tuning parameter values in all datasets. That is because the BCM approach tracks all the data points in the cluster. CONCLUSION: The proposed framework is capable of clustering and maintaining the medical data streams effectively by means of grouping the patients who share similar symptoms and tracking the patients status that naturally tends to be changing over time.
BACKGROUND: Medical data stream clustering has become an integral part of medical decision systems since it extracts highly-sensitive information from a tremendous flow of medical data. However, clustering and maintaining of medical data streams is still a challenging task. That is because the evolving of medical data streams imposes various challenges for clustering such as the ability to discover the arbitrary shape of a cluster, the ability to group data streams without a predefined number of clusters, and the ability to maintain the data clusters dynamically. OBJECTIVE: To support the online medical decisions, there is a need to address the clustering challenges. Therefore, in this paper, we propose an effective density-based clustering and dynamic maintenance framework for grouping the patients with similar symptoms into meaningful clusters and monitoring the patients' status frequently. METHODS: For clustering, we generate a set of initial medical data clusters based on the combination of Piece-wise Aggregate Approximation and the density-based spatial clustering of applications with noise called (PAA+DBSCAN) algorithm. For maintenance, when new medical data streams arrive, we maintain the initially generated medical data clusters dynamically. Since the incremental cluster maintenance is time-consuming, we further propose an Advanced Cluster Maintenance (ACM) approach to improve the performance of the dynamic cluster maintenance. RESULTS: The experimental results on real-world medical datasets demonstrate the effectiveness and efficiency of our proposed approaches. The PAA+DBSCAN algorithm is more efficient and effective than the exact DBSCAN algorithm. Moreover, the ACM approach requires less running time in comparison with the Baseline Cluster Maintenance (BCM) approach using different tuning parameter values in all datasets. That is because the BCM approach tracks all the data points in the cluster. CONCLUSION: The proposed framework is capable of clustering and maintaining the medical data streams effectively by means of grouping the patients who share similar symptoms and tracking the patients status that naturally tends to be changing over time.