Ashutosh Kumar Dubey1, Umesh Gupta2, Sonal Jain2. 1. JK Lakshmipat University, Near Mahindra SEZ, P.O. Mahapura Ajmer Road, Jaipur, Rajasthan, 302 026, India. ashutoshdubey123@gmail.com. 2. JK Lakshmipat University, Near Mahindra SEZ, P.O. Mahapura Ajmer Road, Jaipur, Rajasthan, 302 026, India.
Abstract
PURPOSE: Breast cancer is one of the most common cancers found worldwide and most frequently found in women. An early detection of breast cancer provides the possibility of its cure; therefore, a large number of studies are currently going on to identify methods that can detect breast cancer in its early stages. This study was aimed to find the effects of k-means clustering algorithm with different computation measures like centroid, distance, split method, epoch, attribute, and iteration and to carefully consider and identify the combination of measures that has potential of highly accurate clustering accuracy. METHODS: K-means algorithm was used to evaluate the impact of clustering using centroid initialization, distance measures, and split methods. The experiments were performed using breast cancer Wisconsin (BCW) diagnostic dataset. Foggy and random centroids were used for the centroid initialization. In foggy centroid, based on random values, the first centroid was calculated. For random centroid, the initial centroid was considered as (0, 0). RESULTS: The results were obtained by employing k-means algorithm and are discussed with different cases considering variable parameters. The calculations were based on the centroid (foggy/random), distance (Euclidean/Manhattan/Pearson), split (simple/variance), threshold (constant epoch/same centroid), attribute (2-9), and iteration (4-10). Approximately, 92 % average positive prediction accuracy was obtained with this approach. Better results were found for the same centroid and the highest variance. The results achieved using Euclidean and Manhattan were better than the Pearson correlation. CONCLUSIONS: The findings of this work provided extensive understanding of the computational parameters that can be used with k-means. The results indicated that k-means has a potential to classify BCW dataset.
PURPOSE:Breast cancer is one of the most common cancers found worldwide and most frequently found in women. An early detection of breast cancer provides the possibility of its cure; therefore, a large number of studies are currently going on to identify methods that can detect breast cancer in its early stages. This study was aimed to find the effects of k-means clustering algorithm with different computation measures like centroid, distance, split method, epoch, attribute, and iteration and to carefully consider and identify the combination of measures that has potential of highly accurate clustering accuracy. METHODS: K-means algorithm was used to evaluate the impact of clustering using centroid initialization, distance measures, and split methods. The experiments were performed using breast cancer Wisconsin (BCW) diagnostic dataset. Foggy and random centroids were used for the centroid initialization. In foggy centroid, based on random values, the first centroid was calculated. For random centroid, the initial centroid was considered as (0, 0). RESULTS: The results were obtained by employing k-means algorithm and are discussed with different cases considering variable parameters. The calculations were based on the centroid (foggy/random), distance (Euclidean/Manhattan/Pearson), split (simple/variance), threshold (constant epoch/same centroid), attribute (2-9), and iteration (4-10). Approximately, 92 % average positive prediction accuracy was obtained with this approach. Better results were found for the same centroid and the highest variance. The results achieved using Euclidean and Manhattan were better than the Pearson correlation. CONCLUSIONS: The findings of this work provided extensive understanding of the computational parameters that can be used with k-means. The results indicated that k-means has a potential to classify BCW dataset.
Entities:
Keywords:
Breast cancer; Breast cancer Wisconsin (BCW) diagnostic dataset; Foggy and random centroid; K-means
Authors: Jacques Ferlay; Isabelle Soerjomataram; Rajesh Dikshit; Sultan Eser; Colin Mathers; Marise Rebelo; Donald Maxwell Parkin; David Forman; Freddie Bray Journal: Int J Cancer Date: 2014-10-09 Impact factor: 7.396
Authors: Carla Floricel; Nafiul Nipu; Mikayla Biggs; Andrew Wentzel; Guadalupe Canahuate; Lisanne Van Dijk; Abdallah Mohamed; C David Fuller; G Elisabeta Marai Journal: IEEE Trans Vis Comput Graph Date: 2021-12-24 Impact factor: 4.579
Authors: Jesus A Basurto-Hurtado; Irving A Cruz-Albarran; Manuel Toledano-Ayala; Mario Alberto Ibarra-Manzano; Luis A Morales-Hernandez; Carlos A Perez-Ramirez Journal: Cancers (Basel) Date: 2022-07-15 Impact factor: 6.575