Literature DB >> 23113194

Classification of Death Rate due to Women's Cancers in Different Countries.

M Farhadian1, H Mahjub, A Moghimbeigi, J Poorolajal, Gh Sadri.   

Abstract

BACKGROUND: The two most frequently diagnosed cancers among women worldwide are breast and cervical cancers. The objective of the present study was to classify the different countries based on the death rates from sex specific cancers.
METHODS: In this cross-sectional study, we used dataset regarding death rate from breast, cervical, uterine, and ovarian cancers in 190 countries worldwide reported by World Health Organization. Normal mixture models were fitted with different numbers of components to these data. The model's parameters estimated using the EM algorithm. Then, appropriate number of components was determined and was selected the best-fit model using the BIC criteria. Next, model-based clustering was used to allocate the world countries into different clusters based on the distribution of women's cancers. The MIXMOD program using MATLAB software was used for data analysis.
RESULTS: The best model selected with four components. Then, countries were allocated into four clusters including 43 (23%) in the first cluster, 28 (14%) in the second cluster, 75 (39%) in the third cluster, and 44 (24%) in the fourth cluster. Most countries in South America were to the first cluster. In addition, most countries in Africa, Central, and Southeast Asia were located to the third cluster. Furthermore, the fourth cluster consisted of Pacific continent, North America and European countries.
CONCLUSION: Considering the benefits of clustering based on normal mixture models, it seems that can be applied this method in wide variety of medical and public heath contexts.

Entities:  

Keywords:  BIC criteria; Finite mixture models; Model-based clustering; Neoplasm

Year:  2012        PMID: 23113194      PMCID: PMC3468994     

Source DB:  PubMed          Journal:  Iran J Public Health        ISSN: 2251-6085            Impact factor:   1.429


Introduction

Distribution of various types of disease in different populations varies based on associated factors such as social, cultural, racial, geographical, and nutritional characteristics. Cancer imposes a major disease burden worldwide, with variation among countries and regions. Around the world, the two most frequently diagnosed cancers among women are breast and cervical cancers. Breast cancer is the main leading cause of cancer death among females, accounting for 23% of the total cancer cases and 14% of the cancer deaths, which is more than double the second most common cervical cancer that made up 10 percent. Other common cancer sites among women included colorectal, respiratory, ovarian and stomach cancers (1, 2). The study of cancer distribution in regional and global levels is difficult because of their relation with vast variety of phenomena. Thus, the first step to study such phenomena is to detect and classify the regions with common characteristics. Different studies have been carried out to investigate the geographical distribution of various cancers. However, studies on simultaneous distribution of cancer are limited. In most studies, having considered a specific type of cancer, different area of countries has been clustered (3, 4). Whereas if the goal of the study showing pattern of various type of cancers in different countries, multivariate methods of statistical techniques must be applied. Cluster analysis is one of the prevailing methods that widely used for classification of such phenomena (5). Multivariate Statistical methods in order to classification of diseases and the various indicators are used in different studies. For example, Farhadian et al. used multivariate method of factor analysis to examine the relation between social economic indicators and the indicators of child mortality in different provinces of Iran (6). Yazdi et al. used cluster analysis technique to classify different provinces of Iran based on the health indicators of mother (7). Mahjub et al. also used multivariate method of factor analysis to assessment women’s health needs in different provinces of Iran (8). Clustering methods are widely used for classification of objects of similar kind into respective categories. Cluster analysis consists of different algorithms and methods such as fuzzy clustering, K-Means clustering, hierarchical clustering, and model-based clustering (5). One drawback of various clustering method is ignoring the distribution of variables. In order to solve this problem, using the method “model based clustering” was purposed. The model-based clustering was first introduced by Wolfe and then was revised by McLachlan and Peel in 2000 (9) and by Fraley and Raftery in 2002 (10). Model-based clustering is widely used in multivariate analysis such as density estimation and discriminant analysis (10), magnetic resonance imaging (11, 12), and microarray data analysis (13–15). Classification of the disease distribution among different countries worldwide based on single factor is straightforward. However, considering the classification of countries based on different indices need to apply multivariate methods such as clustering method. Model-based clustering is a powerful multivariate method based on mixture distribution, which can efficiently classify the countries considering several indices (9). Furthermore, no literature was found for classifying death rates from women’s cancers in different countries in the world. Therefore, the main objective of the present study is to classify countries based on the death rates from women’s cancers using model-based clustering.

Materials and Methods

In this study, which was performed in 2009, we used dataset regarding death rate from different variables of breast, cervical, uterine, and ovarian cancers in 192 countries worldwide reported by World Health Organization (WHO) (16). From the data Kiribati and Sanmario countries were excluded because of missing observations in some variables. Hence, 190 countries remained for analysis. In order to perform model-based clustering, initially normal mixture models were fitted with different numbers of components to the data in order to determine the optimal number of clusters. Then, the best-fitted model was selected with appropriate number of components using Bayesian Information Criteria (BIC). Next, the model parameters were estimated using maximum likelihood via the EM algorithm. Finally, based on the best-fitted model, the countries were allocated into four distinct clusters based on death rate from women’s common cancers. Accordingly, the regions with homogenous characteristics were determined. An important aspect of model-based clustering is that the former is based on a statistical model. In other word, in model-based clustering, a postulated statistical model is considered for the population from which the sample data is obtained. Accordingly, can be displayed the probability density function of any given data Xi with mixture distribution as follows: Where, k is the number of components in mixture model, f(x) is mixture density, and Pk is the probability of the observation that comes from a gth mixture component and called mixing proportions. The parameters of the model (Pk and θ) were estimated using the EM algorithm. When the mixture model is fitted, the data can be categorized into g clusters. The posterior probability for membership of each observation can be obtained using following formula: For each observation, the posterior probability was calculated based on the mentioned formula for each cluster. Then, we assigned the observation to the cluster in which the posterior probability was the highest. Accordingly, cluster k includes observations that are devoted to component k. For multivariate data analysis of a continuous nature, attention has been concentrated on the use of multivariate analysis with normal component distribution because of their computational convenience. In that case, the probability density function of component distribution with the mean μ and the covariance matrix Σ will be as follows (9): The EM algorithm is the reference tool by which the maximum likelihood in a mixture model can be derived (17). Geometric features of the clusters, including shape, volume, and orientation, can be specified by the covariance matrices, Σ. Banfield and Raftery in 1993 (18) and Celeux and Govaert in 1995 (19) suggested a basic design for geometric constraints in multivariate normal mixtures. They parameterized covariance matrices through eigenvalues decomposition in the following form: Where, λ is an associated constant of probability, D is the orthogonal matrix of eigenvectors, A is a diagonal matrix. The parameter s λ and A determine size, direction and the shape of cluster k respectively. By analyzing constraints and applying various elements, 28 different models were obtained (18, 19). For the present data, we fitted various mixture normal models (g=2 to 6) considering 28 different forms of decomposition of related matrix of variance covariance components. Model selection was based on minimum Bayesian Information Criteria (BIC) amount as well as the minimum value of Entropy Index (EI) in different number of mixture components in 28 suggested models (20). We used MIXMOD version 2.1.1 and MATLAB version 7.0 software for data analysis.

Results

A total number of 140 models with different components and decomposition matrix of variance covariance were fitted to the data. Among these models, the highest and lowest values of BIC were 3553.04 and 3118.66 respectively. So a mixture model of four components with different mixing proportion, shapes, and directions having minimum BIC=3118.66 and Entropy Index=24.99 was selected. The estimated parameters of the selected model are shown in Table 1.
Table 1:

Estimated parameters of the best-fit model

ComponentMixing proportionVariableMeanVariance covariance matrix
Breast cancerCervical cancerUterine cancerOvary cancer
Component 10.24Breast9.0525.58−3.514.044.41
Cervical6.32−3.5115.05−2.470.53
Uterine2.754.04−2.473.090.51
Ovary1.784.410.530.511.26
Component 20.14Breast4.672.502.590.311.45
Cervical4.002.594.160.082.50
Uterine1.070.310.080.170.05
Ovary2.211.452.500.052.03
Component 30.38Breast4.774.03−0.500.170.31
Cervical4.66−0.506.20−0.020.36
Uterine0.330.17−0.020.020.03
Ovary1.230.310.360.030.27
Component 40.23Breast18.2415.40−0.481.034.10
Cervical2.68−0.482.151.021.05
Uterine3.571.031.021.161.13
Ovary5.604.101.051.132.73
Based on the information in the table, posterior probabilities for each country were computed. Then, each country was allocated to one of the four mixture components based on the highest posterior probability. Accordingly, the world countries were allocated to four distinct clusters, including 43 (23%) countries in cluster 1, 28 (14%) countries in cluster 2, 75 (39%) countries in cluster 3, and 44 (24%) countries in cluster 4 as it can be seen in Table 2.
Table 2:

Classification of women’s cancers in different countries worldwide

Component 1
AlbaniaBrazilGuatemalaParaguaySerbia
AntiguaBurkina FasoGuyanaRep of KoreaSouth Africa
ArgentinaCubaHaitiRomaniaSuriname
AzerbaijanDominicaIndonesiaSaint KittsSwaziland
BahamasDominicanJamaicaSaint LuciaTrinidad
BarbadosEcuadorLesothoSaint VincentUruguay
BelizeEl SalvadorMauritiusSao TomeVenezuela
BosniaGeorgiaNauruSeychelles
BoliviaGrenadaNicaraguaSingapore
Component 2
CameroonFijiNigerSamoaTonga
ChileHondurasNiueSolomonTuvalu
ColombiaKyrgyzstanPalauSri LankaUzbekistan
Cook IslandsMaldivesPanamaSyrian ArabVanuatu
Costa RicaMarshallPhilippinesTajikistan
EthiopiaMicronesiaMoldovaTimor-Leste
Component 3
AfghanistanComorosIranMongoliaSierra Leone
AlgeriaCongoIraqMoroccoSomalia
AngolaCôte d’IvoireJordanMozambiqueSudan
BahrainDemo KoreaKenyaMyanmarThailand
BangladeshDjiboutiKuwaitNamibiaTogo
BeninEgyptLaoNepalTunisia
BhutanEritreaLebanonNigeriaTurkey
BotswanaEquatorialLiberiaOmanTurkmenistan
BurundiGuineaLibyanPakistanUganda
BruneiGabonMadagascarPapuaUnitedEmirates
CambodiaGambiaMalawiPeruViet Nam
Cape VerdeGhanaMalaysiaQatarYemen
CentralAfricanGuineaMaliRwandaUnitedTanzania
ChadGuinea BissauMauritaniaSaudi ArabiaZambia
ChinaIndiaMexicoSenegalZimbabwe
Component 4
AndorraCyprusIcelandMaltaSlovenia
ArmeniaCzech RepublicIrelandMonacoSpain
AustraliaDenmarkIsraelNetherlandsSweden
AustriaEstoniaItalyNew ZealandSwitzerland
BelarusFinlandJapanNorwayYugoslav
BelgiumFranceKazakhstanPolandUkraine
BulgariaGermanyLatviaPortugalUK
CanadaGreeceLithuaniaRussianUSA
CroatiaHungaryLuxembourgSlovakia
According to the results of model based clustering, most countries in South America allocated to the first cluster. In addition, most countries in Africa, Central, and Southeast Asia were located to the third cluster. Furthermore, the fourth cluster consisted of Pacific continent, North America, and European countries (Fig. 1).
Fig. 1:

Model-based clustering of the countries worldwide according to the death rates from women’s common cancers (Breast, Uterine, Cervix, and Ovary)

Discussion

Various studies indicated that mode-based clustering methods have better performance than other methods when clusters are overlapping with different shape and size (21). In addition, model-based clustering are increasingly preferred over other procedures because variance-covariance matrices of the model simplify the interpretability of the results (22). We could not find any study to use model-based clustering to classify regions or countries based on cancer data. However, several studies used other kinds of cluster models for classification of different regions such as hierarchical clustering methods, K-Means and fuzzy clustering. For example, Abadi et al. used cluster analysis to classify universities of medical sciences and faculties of medicines (23). Babaee et al. used fuzzy clustering and hierarchical clustering method to classify the provinces based on population and health indicators (24). Vahedi et al. applied hierarchical and non-hierarchical clustering methods on DNA microarray data to classify patients with breast cancer (25). In addition, there is an increasing preference to use model-based clustering over other methods worldwide. Mar et al. in 2003 applied model-based clustering method for clustering genes associated with breast cancer (26). Pan et al. in 2002 applied model-based clustering method to analyze gene microarray data. They used log likelihood ratio and BIC criteria to select the number of components of the mixture model method (27). McLachlan in 2002 used EMMIX GENE software for model-based clustering method to classify data of micro array gene (28), whereas, in our work, we used MIXMOD software for data analysis. Furthermore, Chen et al. in 2008 applied model based clustering method for diagnosis of cancer patients (29), while we classified the observations in more than two groups. More recently publication in the field of model based clustering is related to Haibe-Kains study that used model-based clustering to identify molecular species in breast cancer (30) as well as Muna et al. in 2008 applied model-based clustering method for clustering adolescent behavioral problems during adulthood (31). One limitation of the model-based clustering is the maximum number of parameters needs to be estimated. That means relatively more data points are required in each component (32). Despite its limitation, a main contribution of the present study was introduction of an appropriate and flexible method of clustering that might be used in vast variety of public health contexts. One advantage of model-based clustering is its simplicity and flexibility. Another advantage of this model is that, like other statistical models, it is possible to impose restriction on the parameters to obtain more parsimony (21). The third advantage of the model-based clustering is that there is no necessity to make decision on scaling of the observed variables while in standard non-hierarchical cluster methods like K-means, scaling of the observed variables is always an important issue (22). In conclusion, we showed that model-based clustering could be easily used to classify geographical regions appropriately based on various sample data. Considering the benefits of clustering based on normal mixture models over other conventional clustering methods, it seems that this method can be applied in wide variety of medical and public heath contexts.

Ethical considerations

Ethical issues (Including plagiarism, Informed Consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc) have been completely observed by the authors.
  12 in total

1.  Model-based clustering and data transformations for gene expression data.

Authors:  K Y Yeung; C Fraley; A Murua; A E Raftery; W L Ruzzo
Journal:  Bioinformatics       Date:  2001-10       Impact factor: 6.937

2.  A mixture model-based approach to the clustering of microarray expression data.

Authors:  G J McLachlan; R W Bean; D Peel
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

3.  A geographic analysis of prostate cancer mortality in the United States, 1970-89.

Authors:  Ahmedin Jemal; Martin Kulldorff; Susan S Devesa; Richard B Hayes; Joseph F Fraumeni
Journal:  Int J Cancer       Date:  2002-09-10       Impact factor: 7.396

4.  Classification models for breast cancer molecular subtyping: what is the best candidate for a translation into clinic?

Authors:  Benjamin Haibe-Kains
Journal:  Womens Health (Lond)       Date:  2010-09

5.  Mixture modelling for cluster analysis.

Authors:  G J McLachlan; S U Chang
Journal:  Stat Methods Med Res       Date:  2004-10       Impact factor: 3.021

6.  Donuts, scratches and blanks: robust model-based segmentation of microarray images.

Authors:  Qunhua Li; Chris Fraley; Roger E Bumgarner; Ka Yee Yeung; Adrian E Raftery
Journal:  Bioinformatics       Date:  2005-04-21       Impact factor: 6.937

7.  A model-based cluster analysis approach to adolescent problem behaviors and young adult outcomes.

Authors:  Eun Young Mun; Michael Windle; Lisa M Schainker
Journal:  Dev Psychopathol       Date:  2008

8.  Model-based region-of-interest selection in dynamic breast MRI.

Authors:  Florence Forbes; Nathalie Peyrard; Chris Fraley; Dianne Georgian-Smith; David M Goldhaber; Adrian E Raftery
Journal:  J Comput Assist Tomogr       Date:  2006 Jul-Aug       Impact factor: 1.826

9.  A mixture model approach for the analysis of small exploratory microarray experiments.

Authors:  W M Muir; G J M Rosa; B R Pittendrigh; S Xu; S D Rider; M Fountain; J Ogas
Journal:  Comput Stat Data Anal       Date:  2009-03-15       Impact factor: 1.681

10.  Local clustering in breast, lung and colorectal cancer in Long Island, New York.

Authors:  Geoffrey M Jacquez; Dunrie A Greiling
Journal:  Int J Health Geogr       Date:  2003-02-17       Impact factor: 3.918

View more
  1 in total

1.  Mapping the obesity in iran by bayesian spatial model.

Authors:  Maryam Farhadian; Abbas Moghimbeigi; Mohsen Aliabadi
Journal:  Iran J Public Health       Date:  2013-06-01       Impact factor: 1.429

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.