Literature DB >> 29072396

Bayesian Inference on Malignant Breast Cancer in Nigeria: A Diagnosis of MCMC Convergence

Ropo Ebenezer Ogunsakin1, Lougue Siaka.   

Abstract

Background: There has been no previous study to classify malignant breast tumor in details based on Markov Chain Monte Carlo (MCMC) convergence in Western, Nigeria. This study therefore aims to profile patients living with benign and malignant breast tumor in two different hospitals among women of Western Nigeria, with a focus on prognostic factors and MCMC convergence. Materials and
Methods: A hospital-based record was used to identify prognostic factors for malignant breast cancer among women of Western Nigeria. This paper describes Bayesian inference and demonstrates its usage to estimation of parameters of the logistic regression via Markov Chain Monte Carlo (MCMC) algorithm. The result of the Bayesian approach is compared with the classical statistics.
Results: The mean age of the respondents was 42.2 ±16.6 years with 52% of the women aged between 35-49 years. The results of both techniques suggest that age and women with at least high school education have a significantly higher risk of being diagnosed with malignant breast tumors than benign breast tumors. The results also indicate a reduction of standard errors is associated with the coefficients obtained from the Bayesian approach. In addition, simulation result reveal that women with at least high school are 1.3 times more at risk of having malignant breast lesion in western Nigeria compared to benign breast lesion.
Conclusion: We concluded that more efforts are required towards creating awareness and advocacy campaigns on how the prevalence of malignant breast lesions can be reduced, especially among women. The application of Bayesian produces precise estimates for modeling malignant breast cancer. Creative Commons Attribution License

Entities:  

Keywords:  Bayesian; malignant breast cancer; MCMC

Year:  2017        PMID: 29072396      PMCID: PMC5747394          DOI: 10.22034/APJCP.2017.18.10.2709

Source DB:  PubMed          Journal:  Asian Pac J Cancer Prev        ISSN: 1513-7368


Introduction

Breast cancer is the commonest cause of mortality and morbidity among women worldwide, and currently the most common cancer among Nigerian women (Adebamowo and Ajayi, 1999; Ebughe et al., 2013; Ojewusi and Arulogun, 2016; Oladimeji et al., 2015; Banjo, 2004). The human breast is a pair of mammary glands composed of specialized epithelium and stroma in which both benign and malignant lesions can occur (Dauda et al., 2011). Benign breast constitutes the larger of the breast lesions but much concern is given to malignant lesions of the breast since breast cancer is the most frequent malignancy in the majority of the women (Uwaezuoke and Udoye, 2014). Globally, breast cancer accounts for 18.4% of cancers associated with women. In 2012, (Jedy-Agba et al., 2012) reported that the incidence of breast cancer in Nigeria has risen significantly with the incidence in 2009-2010 reported to be at 54.3 per 100 000, thereby representing a 100% increase within the last decade. The report about patients diagnosed with breast cancer in eastern Nigeria suggested that every 1 out of 5, representing 23%, are malignant in nature (Yusufu et al., 2003). From literature, we found that previous studies only focused on benign breast cancer (Abudu et al., 2007; Adesunkanmi and Agbakwuru, 2000; Forae et al., 2014; Guray and Sahin, 2006; Kumar et al., 2014; Anyikam et al., 2008; Godwins et al., 2011; Ochicha et al., 2002). The application of the Bayesian technique and its usage to analyze cancer data has proliferated in recent years. Several researchers such as (Acquah, 2013) studied the comparison of Bayesian and classical and found that Bayesian gave a better result than the classical statistics. Other studies have also shown similar result (Yu and Wang, 2011; Mila and Michailides, 2006; Albert, 1996; Congdon, 2014; Marrelec et al., 2003; Daíz and Batanero, 2016; Lozano-Fernández, 2008; Gordóvil-Merino et al., 2010). In general, studies comparing both methods find that Bayesian technique proffers a better solution compared to classical statistics. The Bayesian technique assumes model parameters as random variables and not as constants, while the probability of the unascertained parameters can be obtained via Bayes theorem (Congdon, 2005; O’Neill, 2002; O’Neill et al., 2000; Wong and Ismail, 2016). This provides information regarding parameter uncertainty that might be very difficult to obtain using the classical technique. Classical technique fits the logistic regression by means of an iterative approach and in some cases, as a result of this iterative approach, convergence may be difficult to achieve. The robustness and accuracy of the results produced by Bayesian approach makes its gain popularity in data analysis. As such, this paper investigates the significant predictors as well as characterizing patients diagnosed of benign and malignant breast cancer lesion using both classical approach and Bayesian approach.

Materials and Methods

Data Collection

Ethical approval was obtained from the ethics committee of the Federal Medical Teaching Hospital, Ekiti State, Nigeria. This data was extracted from cancer registry of the Federal Medical Teaching Hospital. We accessed 237 records and 20 variables of breast cancer data. Some of these variables describe socio- demographic and cancer-specific information of an incidence of breast cancer. Extensive variable selection procedures were performed on the 20 variables. The records of patients aged 20 years and above were sorted out for this analysis. Information collected includes age, marital status, educational level, religion, race, type of breast cancer, occupation, Lab number, case number, site of the female breast cancer, type of diagnosis and histological type. Other information recorded was the modality of treatment received: surgery, chemotherapy, hormonal therapy, radiotherapy. The software R was used for the classical statistical analysis and the software WinBUGS14 for the Bayesian analysis. As a requirement of the Bayesian approach, several diagnostics tests were performed to answer convergence of the Markov chain Monte Carlo (MCMC) algorithm and the true reflection of the posterior distribution.

Bayesian Binary Logistic Regression

Bayesian logistic regression, which applies Bayesian inference, has the formulation of a logistic equation and includes both continuous and categorical explanatory variables. Binary regression model is used to describe the probability of a binary response variable as function of some covariates. The logistic regression model belongs to the class of Generalized Linear Models. Generalized linear models generalize the standard linear model: Binary logistic regression model is represented as: If the response under consideration is observed, we have π = 1 for the ith individual and zero otherwise. And is π the probability that the ith individual presents the response under consideration, λ is the j vector of unknown parameters, represent vector of known covariates associated to the ith individual. Therefore, G is now define as any transformation assuming values between 1 and 0. Since function G can be any arbitrary cumulative distribution, this study consider only the logistic part as earlier mentioned. Hence, The link function defines the linear predictor as expressed below Suppose η denotes the probability of having malignant or benign breast lesion, the logit transformation is expressed as The likelihood function for data π = (π, ......., π)T is expressed as For the Bayesian analysis, it is important to provide a joint prior distribution over the parameter space. The preferred prior for logistic regression parameters is a multivariate normal distribution and is given by (Ojo et al., 2017; Ntzoufras, 2011): Where and Note ML is the maximum likelihood estimate. Hence we have Therefore, the posterior distribution is represented as follows The latter part of expression (8) can be regarded as normal distribution for parameters λ and it has no closed form. Posterior distribution is usually of high dimension and analytically intractable which sometimes required knowledge of powerful integration. In order to overcome these difficulties, Markov chain Monte Carlo (MCMC) algorithm is needed (Ngesa et al., 2014). MCMC technique is among of the technique employed to generates the estimates of unknown parameters and corrects the values generated in order to have a better estimate of the desired posterior distribution, p(θ|η) (Ojo et al., 2017; Ntzoufras, 2011). When MCMC is employed to generate a sample of p(θ|η), there is need to check that the MCMC algorithm converges to the desired posterior distribution (Ojo et al., 2017).

Assessing Bayesian Markov Chain Monte Carlo and Convergence

In this study, non-informative prior were assumed in order not to influence the posterior distribution and it was assumed that λ ~ N(1,0.0001). All the Bayesian analysis was carried out using WinBUGS 14 (Ntzoufras, 2011). We ran 1,500,000 Markov chain Monte Carlo (MCMC) iterations, with the initial 200,000 discarded to cater for the burn-in period. The 5,000 iterations left were used for assessing convergence of the MCMC. We assessed MCMC convergence of our model parameters by checking Heidelberger-Welch diagnostic, autocorrelation plot, Gelman-Rubin plots (Gelman et al., 2014a), and running quantiles of the MCMC output.

Gelman-Rubin

The diagnostic of Gelman and Rubin requires two or more chains from over-dispersed starting points by computing the within and between chains variability respectively. Large deviation between two variances implies non-convergence of the chain. If all the chains have converged as expected, the posterior marginal variance estimate is expected to be very close within the chain variance. The test statistics for the Gelman-Rubin diagnostic test can be estimated as follows (Lesaffre and Lawson, 2012): Where k is the number of iterations of the chains. Convergence is monitored when R̂ → 1. R̂ is called the estimated potential scale reduction factor (PSRF). Brooks and Gelman (Gelman et al., 2014a) proposed an alternative approach that generalizes the initial method to consider more than one parameter concurrently. The estimate of the posterior variance covariance is now computed as: where and denote the q -dimensional within and between covariance matrix estimates of the λ-variate. It then imply, if $$$ is the highest eigen value of, hence Where R̂c is the multivariate potential scale reduction factor (MPSRF). Convergence is attained when multivariate shrink factor converges to 1. Heidelberger and Welch’s: In order to test the hypothesis of stationarity, we first propose that we have a sequence {X : t = 1, 2, ..., K} from a covariance stationary process with unknown spectral density, S(ω) Therefore, for, k ≥ 1 and is approximately distributed as a Brownian bridge for large $$$, where Hence, the null hypothesis for stationarity is now tested using Cramer-von Mises statistic.

Results

Socio-demographic profile of participants

The main goal of this paper is to investigate the significant predictors as well as characterizing patients diagnosed of benign and malignant breast cancer lesion and presents diagnosis of MCMC convergence in western Nigeria, comparing the classical approach and Bayesian approach. Various prognostic factors are considered which include: intercept (λ0), marital status: separated (λ1), level of education: at least high school (λ2), religion: Christian (λ3), tribe: yoruba (λ4), age: 35-49 (λ5), 50-69 (λ6), 70+ (λ7), occupation: retired (λ8), self employed (λ9). A total of 237 breast cancer patients’ data was extracted for analysis in the current study. Of these, 192 cases accounting for (81.01%) were malignant breast lesions, while 45 cases (18.99%) were benign giving a ratio of 4.3:1 for malignant to benign breast lesion. The mean age of the respondents was 42.2 ±16.6 years with 52% of the women aged between 35-49 years. Table 1 shows the Heidelberger and Welch stationarity tests for the Bayesian Markov chain Monte Carlo. It shows the stationarity and convergence during the burn-in period.
Table 1

Heidelberger and Welch Stationarity and Half-Width Tests for the Bayesian Chains Used in the Diagnosis of MCMC

Param.Stationarity TestP–ValueHalf–widthMeanHalf width
C1C2C3TestC1C2C3C1C2C3
λ0passed0.9270.8880.308passed0.1170.1320.1290.0540.0540.055
λ1passed0.591−0.2190.364passed−1.148−1.144-1.1480.010.0090.01
λ2passed0.05720.8180.204passed1.3481.3431.350.010.0110.011
λ3passed0.3940.510.994passed0.8360.8380.8390.0130.0120.012
λ4passed0.9150.9870.112passed1.221.2161.220.0160.0160.016
λ5passed0.8930.8150.313passed-0.216−0.226−0.2280.0440.0440.044
λ6passed0.8080.9140.237passed−0.977−0.977-0.9830.0380.0370.038
λ7passed0.8240.940.507passed−1.536−1.55-1.5510.0440.0440.044
λ8passed0.640.9540.163passed0.61680.6130.6120.020.0420.019
λ9passed0.40.8960.092passed1.0541.0581.0530.0090.0090.009
Heidelberger and Welch Stationarity and Half-Width Tests for the Bayesian Chains Used in the Diagnosis of MCMC Table 2 present the result of MCMC diagnostics for the patients diagnosed with benign and malignant breast cancer. The posterior means were obtained after a burn-in period of 5,000 with Monte Carlo error less than 2%. The posterior means and medians of the coefficient and indicate significance. The results of the posterior provide some evidence about the important variable to be selected while profiling patients diagnosed with malignant breast cancer. For, Table 2 shows that those with at least high school education are 1.3 times more likely than others to have benign breast cancer. The results indicate that women with age ≥ 35 years were at a higher risk of been diagnosed with malignant breast cancer than those with age < 35 years.
Table 2

WinBUGS Posterior Summaries for Breast Cancer Patients

MeanSDMC error2.50%Median97.50%startSample
λ00.1261.9730.01568-3.5140.0494.2515,00049,749
λ1-1.1470.6690.003012-2.453-1.1460.1765,00049,749
λ21.3470.6370.0029870.181.3162.6955,00049,749
λ30.8380.8150.003789-0.6370.7962.5615,00049,749
λ41.2190.920.004805-0.6331.2243.0125,00049,749
λ5-0.2231.6720.0012833.91-0.1024.6855,00049,749
λ6-0.9791.5230.01116-4.418-0.8361.6035,00049,749
λ7-1.5451.6860.01269-5.25-1.4151.3715,00049,749
λ80.6141.1330.005761-1.4540.5543.0025,00049,749
λ91.0550.590.00283-0.0931.0472.2415,00049,749
WinBUGS Posterior Summaries for Breast Cancer Patients Table 3 shows the result of a classical logistic analysis of the malignant breast cancer. The results indicate that malignant was observed to be strongly associated with age and educational status. This indicates that women with at least high school education have a significantly higher risk of being diagnosed with malignant breast tumors.
Table 3

Result of Classical Logistic Regression for Patients Diagnosed of Benign and Malignant

EstStd Errorz valuePr(>| z |)
λ0-2.4211.2308-1.9670.0492
λ11.24790.59762.0880.4459
λ20.59260.77740.7620.0368
λ31.07820.63921.6870.0916
λ41.20480.85151.4150.1571
λ51.19520.57322.0850.0371
λ60.50340.81880.6150.5387
λ71.05341.44390.730.4656
λ80.48231.04320.4620.6439
λ90.98980.56581.7490.0802
Result of Classical Logistic Regression for Patients Diagnosed of Benign and Malignant

Assessing the performance of Markov Chain Monte Carlo (MCMC) chains in WinBUGS

When the results of the model are computed, it is necessary to check for the stationarity of Markov chain Monte Carlo algorithm. Both Figure 1 and Figure 2 were presented to demonstrate that there is no problem of autocorrelation among the MCMC chain. The blue and red lines in Figure 4 denote the variance within and between chains. To support that the chain is converged, the ratio must converge to one and the blue and red lines must converge to a stable value. It also displays the red lines representing the potential scale reduction factor denoted by . Hence, Figure 4 indicates that all the →1 which suggests that the algorithm converges. Both Figure 3 and Figure 4 explain the same thing but one is obtained through CODA/BOA and the other through WinBUGS.
Figure 1

Running Quantiles for the Posterior Parameters in the Case of Female Benign and Malignant Breast Cancer Patients

Figure 2

Auto-Correlation Plots for the Female Benign and Malignant Breast Cancer Patients

Figure 4

Gelman Rubin Convergence Diagnosis for Independent Variables

Figure 3

The Plot of the Brooks-Gelman MPSRF for Three Chains of 49,749 Iterations

Running Quantiles for the Posterior Parameters in the Case of Female Benign and Malignant Breast Cancer Patients Auto-Correlation Plots for the Female Benign and Malignant Breast Cancer Patients The Plot of the Brooks-Gelman MPSRF for Three Chains of 49,749 Iterations Gelman Rubin Convergence Diagnosis for Independent Variables

Discussion

The present confirm findings from studies conducted in Nigeria over the past years, on Breast cancer among women in the western Nigeria (Olugbenga et al., 2012; Abudu et al., 2007). All these studies showed that age could be a risk factor for malignant breast lesion. Similar studies have been documented in other parts of Africa and the rest of the world (Arora and Simmons, 2009). From the results of analysis, patient’s age 35-49 years constituted the majority of patients (52%) in our study, indicating that women age 35-49 have a higher risk of developing malignant breast cancer than their other counterpart in the group. Therefore, more attention on breast cancer treatment are necessary for this age group. This agrees with breast cancer facts and figures released between 2011-2012. However, this corresponds to the working class population and it is also the child bearing age for many women. This may be as a result of the use of contraceptive and hormonal imbalance which common among the women (Onyeanusi, 2015; Olugbenga et al., 2012). From the study, malignant breast lesions appeared to have higher distribution among those who had at least high school education, an observation which supports previous studies (Yuksel et al., 2017; Ibrahim et al., ¨ 2015, Yusufu et al., 2003; Ntekim et al., 2009). This result was supported by the descriptive statistics which shows that 52.3% of those diagnosed had at least high school education meaning that those who are educated were more interested in presenting their health problems to rather than consulting fake medical doctors. The high proportion of malignant breast lesion might also be attributed with lifestyle changes among those educated. In addition, this may also be as a result of their exposure to advancement in life like the nature of occupation, diet, without observing caution to health management. We found that the mean age of breast cancer patients in western is 42 years; this is similar to several Nigerian institution based studies, Adebamowo reported 43 years (Adebamowo and Adekunle, 1999), Ikpatt et.al 42.7 years (Ikpat et al., 2002) and 44.9 years by Ebughe.et al (Ebughe et al., 2013). Although our variables’ interactions did not categorize age and educational status as components of socioeconomic status (SES), our findings are similar to those of some studies which showed that higher socioeconomic status (SES) is associated with higher breast cancer incidence (Pudrovska and Anikputa, 2012; Krieger et al., 2010; Vainshtein, 2008). Additional studies have provided a possible explanation for these findings that women with high SES are more likely to obtain routine breast cancer screening due to better access to preventive healthcare based on their level of education and increasing age, hence, increasing the detection of breast cancer (Akinyemiju et al., 2015). Although our study did not investigate the risk factors for breast cancer in association with its sub molecular types, a recent study conducted by (Akinyemiju et al., 2015) evaluated the association between SES and breast cancer subtypes using a valid measure of SES and the Surveillance, Epidemiology and End Results (SEER) database. Socioeconomic status based on measures of income, occupational class, education and house value, were categorized into quintiles and explored. Their findings showed that a positive association between SES and breast cancer incidence is primarily driven by hormone receptor positive lesion. Malignant breast lesions which can be subdivided into non-invasive and invasive tumors are documented to be more commonly diagnosed in postmenopausal women (LehmannChe et al., 2013). A molecular classification of breast cancer, with more than five reproducible subtypes (basal-like, ERBB2, normal-like, luminal A, and luminal B) has been defined through gene expression profiling and microarray analysis (Lønning et al., 2007). In addition, performing the gene set enrichment analysis (GSEA), a gene set linked to the growth factor (GF) signaling was observed to be significantly enriched in the luminal B tumors (Loi et al., 2009). Another study states that multiple pathways were identified by mapping gene sets defined in Gene Ontology Biological Process (GOBP) for estrogen receptor positive (ER+) or estrogen receptor negative (ER-); and among them, in a separate set, pathways related to apoptosis and cell division or G-protein coupled receptor signal transduction were associated with the metastatic capability of ER+ or ER- tumours, respectively (Jack et al., 2007). The plot of Gelman Rubin convergence in Fig 4 suggesting that the MCMC sequence has converged on the posterior density as red line fall towards one for all parameters. Our findings are similar to the result obtained by Salameh.et.al (Salameh et al., 2014), Jackman.et.al (Jackman, 2000). Figure 3 shows a plot of the Brooks-Gelman MPSRF (denoted) along with the maximum PSRF (denoted) for successively larger segments of the chains. This plot suggests that although the chains differs significantly for the first few thousand iterations, they mix together after that and three chains of 1,500,000 iterations each is probably sufficient to ensure convergence of the chains. It also suggests using a burn-in of about 200,000 each. The result in Table 2 shows that each parameter passes the stationarity and half-width test respectively. This suggests that for the current study, the stationarity of the Markov chain and the sample size obtained is adequate for the estimation of mean values of the three iterations. Findings from Bayesian and classical inference are not significantly different which could be due to the non-informative prior utilized in the Bayesian model. When both techniques produced similar results, findings from Bayesian are given more attention because it is more robust compared to the classical. The model used in this paper updates quickly and adding complexity will also improve the required time for updating. This diagnostic are necessary to ensure that we are actually sampling from a chain that has converged after a desirable burn-in. Using the posterior mean as a point estimate, Table 3 compares the classical statistics estimates with the simulation (MCMC) result. The estimated means and standard errors appear quite close with minimum results show a reduction of standard errors associated with the coefficients obtained from the Bayesian approach, hence resulting in higher stability to the coefficients. Other studies have also shown similar result (Gordóvil-Merino et al., 2010; Acquah, 2013). Findings of this study shows that age of the patients and those with at least high school education are at higher risk of being diagnosed with malignant breast lesion than benign breast lesion in Western Nigeria. The higher proportion of those affected by malignant breast lesion is found among the educated and younger women. Therefore, this shows that non-educated women do not patronize these services based on our findings. More efforts are required towards creating awareness and advocacy campaigns on how the prevalence of malignant breast lesions can be reduced, especially among women. We recommend that governments, non-governmental organizations and other sectors involved in policy making to put in place policies, strategies and sensitization that target non-educated women to enhance their patronization of breast cancer screening in the health facilities, so as to access the appropriate management health assessment as well as providing financially supported treatments for breast cancer patients.

Statement conflict of Interest

The authors have declared no conflict of interest.
  27 in total

1.  A tutorial introduction to Bayesian inference for stochastic epidemic models using Markov chain Monte Carlo methods.

Authors:  Philip D O'Neill
Journal:  Math Biosci       Date:  2002 Nov-Dec       Impact factor: 2.144

Review 2.  Benign breast diseases: classification, diagnosis, and management.

Authors:  Merih Guray; Aysegul A Sahin
Journal:  Oncologist       Date:  2006-05

3.  Disparities in breast cancer incidence across racial/ethnic strata and socioeconomic status: a systematic review.

Authors:  Jeffrey Vainshtein
Journal:  J Natl Med Assoc       Date:  2008-07       Impact factor: 1.798

4.  Histopathological types of breast cancer in Gombe, North Eastern Nigeria: a seven-year review.

Authors:  A M Dauda; M A Misauno; E O Ojo
Journal:  Afr J Reprod Health       Date:  2011-03

5.  Benign breast lesions in Eastern Nigeria.

Authors:  Adanna Anyikam; Martin A Nzegwu; Ben C Ozumba; Ifeoma Okoye; Daniel B Olusina
Journal:  Saudi Med J       Date:  2008-02       Impact factor: 1.484

6.  Benign breast lesions in Bayelsa State, Niger Delta Nigeria: a 5 year multicentre histopathological audit.

Authors:  Stanley Chibuzo Uwaezuoke; Ezenwa Patrick Udoye
Journal:  Pan Afr Med J       Date:  2014-12-18

7.  Bayesian generalized linear mixed modeling of Tuberculosis using informative priors.

Authors:  Oluwatobi Blessing Ojo; Siaka Lougue; Woldegebriel Assefa Woldegerima
Journal:  PLoS One       Date:  2017-03-03       Impact factor: 3.240

8.  Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer.

Authors:  Jack X Yu; Anieta M Sieuwerts; Yi Zhang; John W M Martens; Marcel Smid; Jan G M Klijn; Yixin Wang; John A Foekens
Journal:  BMC Cancer       Date:  2007-09-25       Impact factor: 4.430

9.  An Application of Bayesian Approach in Modeling Risk of Death in an Intensive Care Unit.

Authors:  Rowena Syn Yin Wong; Noor Azina Ismail
Journal:  PLoS One       Date:  2016-03-23       Impact factor: 3.240

10.  A Risk Assessment Comparison of Breast Cancer and Factors Affected to Risk Perception of Women in Turkey: A Cross-sectional Study.

Authors:  Serpil Yüksel; Gülay Altun Uğraş; İkbal Çavdar; Atilla Bozdoğan; Sibel Özkan Gürdal; Neriman Akyolcu; Ecem Esencan; Gamze Varol Saraçoğlu; Vahit Özmen
Journal:  Iran J Public Health       Date:  2017-03       Impact factor: 1.429

View more
  1 in total

1.  Data Driven for Early Breast Cancer Staging using Integrated Mammography and Biopsy.

Authors:  Tongjai Yampaka; Duangjai Noolek
Journal:  Asian Pac J Cancer Prev       Date:  2021-12-01
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.