Shuo Liu1, Jinshu Zeng2, Huizhou Gong3, Hongqin Yang4, Jia Zhai5, Yi Cao6, Junxiu Liu7, Yuling Luo8, Yuhua Li9, Liam Maguire7, Xuemei Ding10. 1. Faculty of Mathematics and Informatics, Fujian Normal University, Qishan Fuzhou, 350108, China. 2. Department of Ultrasonic Medical, The First Affiliated Hospital of Fujian Medical University, Fuzhou, 350005, China. 3. College of Foreign Languages, Fujian Normal University, Cangshan Fuzhou, 350007, China. 4. Fujian Provincial Key Laboratory for Photonics Technology, Key Laboratory of OptoElectronic Science and Technology for Medicine of Ministry of Education, Fujian Normal University, Cangshan Fuzhou, 350007, China. Electronic address: hqyang@fjnu.edu.cn. 5. Business School, University of Salford, Manchester, M5 4WT, UK. 6. Department of Business Transformation and Sustainable Enterprise, Surrey Business School, University of Surrey, Surrey, GU2 7XH, UK. 7. Faculty of Computing, Engineering and Built Environment, Ulster University, Londonderry, BT48 7JL, UK. 8. Faculty of Electronic and Engineering, Guangxi Normal University, Guilin, 541004, China. 9. School of Computing, Science and Engineering, University of Salford, Manchester, M5 4WT, UK. 10. Faculty of Mathematics and Informatics, Fujian Normal University, Qishan Fuzhou, 350108, China; Faculty of Computing, Engineering and Built Environment, Ulster University, Londonderry, BT48 7JL, UK. Electronic address: xuemeid@fjnu.edu.cn.
Abstract
BACKGROUND: Breast cancer is the most prevalent cancer in women in most countries of the world. Many computer-aided diagnostic methods have been proposed, but there are few studies on quantitative discovery of probabilistic dependencies among breast cancer data features and identification of the contribution of each feature to breast cancer diagnosis. METHODS: This study aims to fill this void by utilizing a Bayesian network (BN) modelling approach. A K2 learning algorithm and statistical computation methods are used to construct BN structure and assess the obtained BN model. The data used in this study were collected from a clinical ultrasound dataset derived from a Chinese local hospital and a fine-needle aspiration cytology (FNAC) dataset from UCI machine learning repository. RESULTS: Our study suggested that, in terms of ultrasound data, cell shape is the most significant feature for breast cancer diagnosis, and the resistance index presents a strong probabilistic dependency on blood signals. With respect to FNAC data, bare nuclei are the most important discriminating feature of malignant and benign breast tumours, and uniformity of both cell size and cell shape are tightly interdependent. CONTRIBUTIONS: The BN modelling approach can support clinicians in making diagnostic decisions based on the significant features identified by the model, especially when some other features are missing for specific patients. The approach is also applicable to other healthcare data analytics and data modelling for disease diagnosis.
BACKGROUND:Breast cancer is the most prevalent cancer in women in most countries of the world. Many computer-aided diagnostic methods have been proposed, but there are few studies on quantitative discovery of probabilistic dependencies among breast cancer data features and identification of the contribution of each feature to breast cancer diagnosis. METHODS: This study aims to fill this void by utilizing a Bayesian network (BN) modelling approach. A K2 learning algorithm and statistical computation methods are used to construct BN structure and assess the obtained BN model. The data used in this study were collected from a clinical ultrasound dataset derived from a Chinese local hospital and a fine-needle aspiration cytology (FNAC) dataset from UCI machine learning repository. RESULTS: Our study suggested that, in terms of ultrasound data, cell shape is the most significant feature for breast cancer diagnosis, and the resistance index presents a strong probabilistic dependency on blood signals. With respect to FNAC data, bare nuclei are the most important discriminating feature of malignant and benign breast tumours, and uniformity of both cell size and cell shape are tightly interdependent. CONTRIBUTIONS: The BN modelling approach can support clinicians in making diagnostic decisions based on the significant features identified by the model, especially when some other features are missing for specific patients. The approach is also applicable to other healthcare data analytics and data modelling for disease diagnosis.