Literature DB >> 28077895

Applying Naive Bayesian Networks to Disease Prediction: a Systematic Review.

Mostafa Langarizadeh¹, Fateme Moghbeli¹.

Abstract

INTRODUCTION: Naive Bayesian networks (NBNs) are one of the most effective and simplest Bayesian networks for prediction.
OBJECTIVE: This paper aims to review published evidence about the application of NBNs in predicting disease and it tries to show NBNs as the fundamental algorithm for the best performance in comparison with other algorithms.
METHODS: PubMed was electronically checked for articles published between 2005 and 2015. For characterizing eligible articles, a comprehensive electronic searching method was conducted. Inclusion criteria were determined based on NBN and its effects on disease prediction. A total of 99 articles were found. After excluding the duplicates (n= 5), the titles and abstracts of 94 articles were skimmed according to the inclusion criteria. Finally, 38 articles remained. They were reviewed in full text and 15 articles were excluded. Eventually, 23 articles were selected which met our eligibility criteria and were included in this study. RESULT: In this article, the use of NBN in predicting diseases was described. Finally, the results were reported in terms of Accuracy, Sensitivity, Specificity and Area under ROC curve (AUC). The last column in Table 2 shows the differences between NBNs and other algorithms. DISCUSSION: This systematic review (23 studies, 53,725 patients) indicates that predicting diseases based on a NBN had the best performance in most diseases in comparison with the other algorithms. Finally in most cases NBN works better than other algorithms based on the reported accuracy.
CONCLUSION: The method, termed NBNs is proposed and can efficiently construct a prediction model for disease.

Entities: Chemical Disease Gene Species

Keywords: Naive Bayes Algorithms; Naive Bayes Models; Naive Bayesian Network; Naive Bayesian Network and disease prediction

Year: 2016 PMID： 28077895 PMCID： PMC5203736 DOI： 10.5455/aim.2016.24.364-369

Source DB: PubMed Journal: Acta Inform Med ISSN： 0353-8109

1. INTRODUCTION

Bayesian theory and probability are named after a British 18th century mathematician, Thomas Bayes. Bayesian logic can show the result of a patient’s test with a pre-test probability (of the population), to predict or determine the chance of finding a disease. Bayesian theory implies that Bayes’ theorem can be used as a rule for inferring or updating the amount of ‘belief’ in the light of new information. Bayesian networks can be seen as a substitute for logistic regression models where we can formulate dependency or independency of variables (1, 2). The NBNs consist only of one parent and several child nodes as in Figure 1 and it is based on Bayes theorem in machine learning. Let D be a training set of database. Each database is represented by an n _ D attribute vector. X includes ‘n’ independent attributes (x). Suppose there are ‘m’ classes such as c, then classification is to derive the maximal p(c. This can be derived from Bayes’ theorem as equation 1:

Figure 1

The structure of the naïve Bayes network

The structure of the naïve Bayes network p(x) needs to be maximized because it has equal value for all classes as equation 2: An easy assumption in NBN is that the attributes are conditionally independent (i.e. there is no dependency between all attributes). So, the class assignments of this test samples are according to equations (3) and (4); For example, if a new sample posterior probability comes i.e. p (c) has the highest value among the all p (c) for all the k classes, it belongs to c class base on the NBN rule (3, 4). NBNs can estimate the post-test probability given the values of various predictive variables. Astonishingly, the performance of a NBN is somewhat competitive given that this is obviously an unrealistic assumption (5, 6). The structure of NBNs can be showed with a directed acyclic graph which is showed in Figure 1, where the nodes represent variables and edges between the nodes show dependency among the variables. Within a node, a variable can take many distinctive values, each with a special probability. One major issue in this model is that the root node of the network has a connection to all predictive variables and does not depend on any other variables (7). In a NBN, as used in this study, among the different feature variables, there is no inter-dependency. They are thus regarded as conditionally independent, hence the definition of ‘naive’. An example of the classification problems of these naïve Bayesian networks is the article published by Price et al. on the classification of cervical cancer patients (1). This study was designed to perform a comprehensive review of a naïve Bayesian network and its use in predicting diseases. The purpose of the present study was to review published evidence of using NBN in disease prediction to show the power of this method in comparison with the other methods. Moreover, to the best of our knowledge, it is the first study that directly compares NBN with other models for disease prediction.

2. METHODS

Inclusion and exclusion criteria Inclusion criteria were determined based on the topic of study and the effects of naïve Bayesian networks in predicting different diseases. The full text of articles needed to be available. The articles written in English language were selected. The type of diseases predicted by Naïve Bayesian network and included in this review was shown in Table 1.

Table 1

frequency of disease

frequency of disease NBNs aim to improve disease treatment and also its diagnosis in early stages for a faster and better treatment. Therefore, any NBNs attempt to: * Make a faster and more accurate disease prediction. * Help the physicians for making a reliable decision. We classified the studies according to disease prediction and Method comparison. Original articles were included in this study. On the other hand, studies that used NBN as a feature selection (n =16) as well as those that reported survival of disease based on NBN (n =2) were excluded. The titles and abstracts of the articles found were screened based on inclusion criteria as described above. Full texts of published articles were reviewed. Two reviewers independently did the review process and summarized the information in every article. Any disagreements between the reviewers were discussed by the references to the original articles. Exclusion criteria were as the following: a) studies that used the naïve Bayesian networks as a feature selection, b) studies lacking in disease prediction (3) those which used naïve Bayesian networks for predicting anything but the disease. We did not include studies in languages other than English. As shown in Figure 2, a total of 99 articles were identified based on the search conducted in PubMed. Just one database was selected because it is the reliable medical source. After the duplicates were excluded, we checked titles and abstracts of 94 articles based on our inclusion criteria. Finally, 23 articles met our eligibility criteria and were included in this study.

Figure 2

Selection process for study inclusion

2.2 Search strategy

A literature review was done on 26 July 2015 using PubMed database in order to peruse the relevant studies published in the past ten years from 2005 to 2015. We only used one valid database like PubMed in order to receive original and relevant articles and PubMed is the main database that includes several medical articles. The more databases we searched, the more articles but irrelevant we got, so we preferred using only one database. A combination of the following MESH terms and keywords was used: ((“Naïve Bayes models”[Mesh] or (disease prediction) or (predicting disease), (“Naïve Bayes algorithms” [Mesh]), (“Naïve Bayesian network” [Mesh]). We also set limits to our search according to the study result and language.

2.3 Data extraction

Titles and abstracts of all selected articles were reviewed independently by two reviewers. Papers categorized as relevant or of irrelevance based on the abstracts were again independently assessed by both reviewers. Any disagreements between the two reviewers were resolved through discussion. Reasons for exclusion were recorded according to the exclusion criteria. The data extraction and quality assessment of the articles were done by the first author and checked by the coauthor for accuracy to identify any missing information individually. A data extraction was done independently. This extraction contains the following items: * Article properties e.g. title and publication year, * Subject, * Illnesses, * Number of Variables (features). Results in terms of accuracy, sensitivity, specificity or Area under ROC (Receiver operating characteristic) curve (AUC) of NBNs. And finally, the comparison with other methods that is mentioned in the related article. All the extracted data were categorized in Table 2.

Table 2

Summary of important factors in NBN predictors.PPV: positive predictive value; NPV: negative predictive value

Summary of important factors in NBN predictors.PPV: positive predictive value; NPV: negative predictive value Data-synthesis and analysis A narrative synthesis was carried out based on the classification of diseases. The outcomes were the effects on the following items: Accuracy = TP + TN / (TP + FP + FN + TN), Sensitivity = TP / (TP + FN), Specificity = TN / (TN + FP), Area under ROC curve (AUC): Accuracy is measured by the area under the ROC curve (As shown in Figure 3). A rough guide for classifying the accuracy of a diagnostic test is the traditional academic point system:

Figure 3

ROC Curves

ROC Curves * 0.9-1 = excellent (A), * 0.8-0.9 = good (B), * 0.7-0.8 = fair (C), * 0.6-0.7 = poor (D), * 0.5-0.6 = fail (F) (8).

3. RESULTS

3.1 Description of the articles included

Articles were published between 2005 and 2015. A total of 15 articles reported accuracy and 12 of them reported sensitivity and specificity while only 11 studies in 23 studies reported AUC. 5 studies used naïve Bayesian networks for predicting brain disease. Other diseases are presented in Table 1. Table 1 demonstrates the frequency of diseases applied in the articles. The mean duration was approximately 9 years and the mean of variables or features that NBN used was about 17 cases. Results are explained based on the outcomes of naïve Bayesian networks performance. Table 2 demonstrates the results and important factors that influenced our outcomes briefly. In the proceeding, we will explain the effects of NBNs on predicting different diseases and compare them with other methods like logistic Regression (LR), support vector machine (SVM), neural network (NN), decision tree (DT), tree augmented NBN (TAN), Bayesian network (BN). In most cases, NBNs work better than the others and it shows that the fundamental algorithm can work well and it is really useful for disease prediction. Maybe, in some cases are not necessary to enhance our method, because of good effects of NBNs in accurate predictions.

3.1.1 Using NBN as a brain disease prediction

21.7 percent of the articles in this study included brain diseases such as brain tumor, metastasis, trauma and Alzheimer (5, 9-11). Computed tomography (CT) is widely accepted as an effective method to diagnose and detect rare but clinically significant in patients suffering minor head injury or brain disease. As such, it has been increasingly utilized as a simple and available test for these patients (12). However, a seminal study conducted by Brenner and Hall (13) warns against its bad effects (particularly for children) due to the radiation exposure associated with CT. Independent CT imaging studies (12, 14, 15) advocate the adoption of a comprehensive approach that targets physicians’ education to decrease the over-reliance on CT imaging for those suffering from brain disease. Therefore, to reduce the harmfulness of CT, it can be beneficial to use data mining methods specially, NBN as a predictor. As shown in Table 2, the final results are presented.

3.1.2 Naïve Bayesian networks predictor in cancer disease

3 articles belonged to this group. Two of them used naïve Bayesian networks for predicting breast cancer (16, 17). It is one of the most important cancer in women (18, 19), and one discussed the most common disease in men that is the prostate cancer (20). Cancer is a widely spread disease that accounts for many mortalities all over the world. In 2008, the World Health Organization (WHO) estimated the number of new cancer diseases in the world to be increasing up to 7.5 million (21). Among all types of cancer, prostate cancer is the most frequent one among men. In 2008, around 900,000 new cases of prostate cancer were diagnosed, and approximately 260,000 men died as a result over the same period (16). This study considers the use of naïve Bayesian networks techniques in order to improve the prediction of pathological stage in prostate cancer and breast cancer. The 2 articles that discussed breast cancer chose a different attribute in predicting with NBNs and finally achieved different results as shown in Table 2.

4. DISCUSSION

This systematic review (23 studies, 53,725 patients) indicates that predicting diseases based on NBNs had the best performance in most diseases and the prediction model depends on the attributes defined for NBNs. Finally, the results show that in some cases NBNs work better than other methods that are reported in the last columns in Table 2. However, its use can improve physicians’ decision in disease diagnosis. The results of studies were categorized in 6 groups as shown in Table 2. This study follows clear search methods as mentioned before. Our findings show that 12 of the 15 (80%) articles that reported the accuracy had an accuracy higher than 75% and it showed that NBN has a good performance. 6 of the 11 articles (54.5%) has the AUC higher than 80%. Although previous studies explained that using machine learning for predicting diseases specially NBNs can be useful and can help physicians for a better diagnosis (38). Additionally, our review shows that it helps to use NBNs as a predictor especially for diseases categorized in Table 1. Prediction models are increasingly important in clinical practice (39, 40), as indicated by the number of new publications which is described their development. One of their purposes is to aid clinical decision-making by merging patient characteristics in order to calculate the probability of a certain disorder or problem (diagnosis and prognosis) (41, 42). Our findings recommend that using NBNs can improve disease prediction better than other methods.

5. CONCLUSION

A lot of machine learning strategies are currently used in predicting diseases. This systematic review has shown that one type called NBNs as the simplest one can be useful for predicting diseases and actually in some way can be better than other methods and this model can help health practitioners to make decisions more confidently.

36 in total

1. Application of Bayesian classifier for the diagnosis of dental pain.

Authors: Subhagata Chattopadhyay; Rima M Davis; Daphne D Menezes; Gautam Singh; Rajendra U Acharya; Toshio Tamura
Journal: J Med Syst Date: 2010-10-13 Impact factor: 4.460

Review 2. Computed tomography--an increasing source of radiation exposure.

Authors: David J Brenner; Eric J Hall
Journal: N Engl J Med Date: 2007-11-29 Impact factor: 91.245

3. Predicting dementia development in Parkinson's disease using Bayesian network classifiers.

Authors: Dinora A Morales; Yolanda Vives-Gilabert; Beatriz Gómez-Ansón; Endika Bengoetxea; Pedro Larrañaga; Concha Bielza; Javier Pagonabarraga; Jaime Kulisevsky; Idoia Corcuera-Solano; Manuel Delfino
Journal: Psychiatry Res Date: 2012-11-11 Impact factor: 3.222

4. Machine learning for improved pathological staging of prostate cancer: a performance comparison on a range of classifiers.

Authors: Olivier Regnier-Coudert; John McCall; Robert Lothian; Thomas Lam; Sam McClinton; James N'dow
Journal: Artif Intell Med Date: 2011-12-27 Impact factor: 5.326

5. A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data.

Authors: Julian Wolfson; Sunayan Bandyopadhyay; Mohamed Elidrisi; Gabriela Vazquez-Benitez; David M Vock; Donald Musgrove; Gediminas Adomavicius; Paul E Johnson; Patrick J O'Connor
Journal: Stat Med Date: 2015-05-18 Impact factor: 2.373

6. CT utilization: the emergency department perspective.

Authors: Joshua Seth Broder
Journal: Pediatr Radiol Date: 2008-09-23

7. Application of the Naïve Bayesian Classifier to optimize treatment decisions.

Authors: Joanna Kazmierska; Julian Malicki
Journal: Radiother Oncol Date: 2007-11-26 Impact factor: 6.280

8. Prediction of different types of liver diseases using rule based classification model.

Authors: Yugal Kumar; G Sahoo
Journal: Technol Health Care Date: 2013 Impact factor: 1.285

9. Applying data mining techniques to improve diagnosis in neonatal jaundice.

Authors: Duarte Ferreira; Abílio Oliveira; Alberto Freitas
Journal: BMC Med Inform Decis Mak Date: 2012-12-07 Impact factor: 2.796

10. Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: a prospective case-control cohort analysis.

Authors: Ross K K Leung; Ying Wang; Ronald C W Ma; Andrea O Y Luk; Vincent Lam; Maggie Ng; Wing Yee So; Stephen K W Tsui; Juliana C N Chan
Journal: BMC Nephrol Date: 2013-07-23 Impact factor: 2.388

16 in total

1. Genome-wide copy number variation pattern analysis and a classification signature for non-small cell lung cancer.

Authors: Zhe-Wei Qiu; Jia-Hao Bi; Adi F Gazdar; Kai Song
Journal: Genes Chromosomes Cancer Date: 2017-05-04 Impact factor: 5.006

2. Naïve Bayesian network-based contribution analysis of tumor biology and healthcare factors to racial disparity in breast cancer stage-at-diagnosis.

Authors: Yi Luo; Henry Carretta; Inkoo Lee; Gabrielle LeBlanc; Debajyoti Sinha; George Rust
Journal: Health Inf Sci Syst Date: 2021-09-24

Review 3. Artificial intelligence and machine learning in precision and genomic medicine.

Authors: Sameer Quazi
Journal: Med Oncol Date: 2022-06-15 Impact factor: 3.738

Review 4. Application of Ethics for Providing Telemedicine Services and Information Technology.

Authors: Mostafa Langarizadeh; Fatemeh Moghbeli; Ali Aliabadi
Journal: Med Arch Date: 2017-10

5. Mortality prediction in patients with isolated moderate and severe traumatic brain injury using machine learning models.

Authors: Cheng-Shyuan Rau; Pao-Jen Kuo; Peng-Chen Chien; Chun-Ying Huang; Hsiao-Yun Hsieh; Ching-Hua Hsieh
Journal: PLoS One Date: 2018-11-09 Impact factor: 3.240

6. Finding Diagnostically Useful Patterns in Quantitative Phenotypic Data.

Authors: Stuart Aitken; Helen V Firth; Jeremy McRae; Mihail Halachev; Usha Kini; Michael J Parker; Melissa M Lees; Katherine Lachlan; Ajoy Sarkar; Shelagh Joss; Miranda Splitt; Shane McKee; Andrea H Németh; Richard H Scott; Caroline F Wright; Joseph A Marsh; Matthew E Hurles; David R FitzPatrick
Journal: Am J Hum Genet Date: 2019-10-10 Impact factor: 11.025

7. Automated detection of patients with dementia whose symptoms have been identified in primary care but have no formal diagnosis: a retrospective case-control study using electronic primary care records.

Authors: Elizabeth Ford; Joanne Sheppard; Seb Oliver; Philip Rooney; Sube Banerjee; Jackie A Cassell
Journal: BMJ Open Date: 2021-01-22 Impact factor: 2.692

8. Human Tongue Thermography Could Be a Prognostic Tool for Prescreening the Type II Diabetes Mellitus.

Authors: Usharani Thirunavukkarasu; Snekhalatha Umapathy; Palani Thanaraj Krishnan; Kumar Janardanan
Journal: Evid Based Complement Alternat Med Date: 2020-01-14 Impact factor: 2.629

9. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine.

Authors: Zeeshan Ahmed; Khalid Mohamed; Saman Zeeshan; XinQi Dong
Journal: Database (Oxford) Date: 2020-01-01 Impact factor: 3.451

10. Development of a store-and-forward telescreening system of diabetic retinopathy: lessons learned from Iran.

Authors: Reza Safdari; Mostafa Langarizadeh; Alireza Ramezani; Taleb Khodaveisi; Ahmadreza Farzaneh Nejad
Journal: J Diabetes Metab Disord Date: 2018-05-29