Literature DB >> 23113194

Classification of Death Rate due to Women's Cancers in Different Countries.

M Farhadian¹, H Mahjub, A Moghimbeigi, J Poorolajal, Gh Sadri.

Abstract

BACKGROUND: The two most frequently diagnosed cancers among women worldwide are breast and cervical cancers. The objective of the present study was to classify the different countries based on the death rates from sex specific cancers.
METHODS: In this cross-sectional study, we used dataset regarding death rate from breast, cervical, uterine, and ovarian cancers in 190 countries worldwide reported by World Health Organization. Normal mixture models were fitted with different numbers of components to these data. The model's parameters estimated using the EM algorithm. Then, appropriate number of components was determined and was selected the best-fit model using the BIC criteria. Next, model-based clustering was used to allocate the world countries into different clusters based on the distribution of women's cancers. The MIXMOD program using MATLAB software was used for data analysis.
RESULTS: The best model selected with four components. Then, countries were allocated into four clusters including 43 (23%) in the first cluster, 28 (14%) in the second cluster, 75 (39%) in the third cluster, and 44 (24%) in the fourth cluster. Most countries in South America were to the first cluster. In addition, most countries in Africa, Central, and Southeast Asia were located to the third cluster. Furthermore, the fourth cluster consisted of Pacific continent, North America and European countries.
CONCLUSION: Considering the benefits of clustering based on normal mixture models, it seems that can be applied this method in wide variety of medical and public heath contexts.

Entities: Chemical Disease Gene Species

Keywords: BIC criteria; Finite mixture models; Model-based clustering; Neoplasm

Year: 2012 PMID： 23113194 PMCID： PMC3468994

Source DB: PubMed Journal: Iran J Public Health ISSN： 2251-6085 Impact factor: 1.429

Introduction

Distribution of various types of disease in different populations varies based on associated factors such as social, cultural, racial, geographical, and nutritional characteristics. Cancer imposes a major disease burden worldwide, with variation among countries and regions. Around the world, the two most frequently diagnosed cancers among women are breast and cervical cancers. Breast cancer is the main leading cause of cancer death among females, accounting for 23% of the total cancer cases and 14% of the cancer deaths, which is more than double the second most common cervical cancer that made up 10 percent. Other common cancer sites among women included colorectal, respiratory, ovarian and stomach cancers (1, 2). The study of cancer distribution in regional and global levels is difficult because of their relation with vast variety of phenomena. Thus, the first step to study such phenomena is to detect and classify the regions with common characteristics. Different studies have been carried out to investigate the geographical distribution of various cancers. However, studies on simultaneous distribution of cancer are limited. In most studies, having considered a specific type of cancer, different area of countries has been clustered (3, 4). Whereas if the goal of the study showing pattern of various type of cancers in different countries, multivariate methods of statistical techniques must be applied. Cluster analysis is one of the prevailing methods that widely used for classification of such phenomena (5). Multivariate Statistical methods in order to classification of diseases and the various indicators are used in different studies. For example, Farhadian et al. used multivariate method of factor analysis to examine the relation between social economic indicators and the indicators of child mortality in different provinces of Iran (6). Yazdi et al. used cluster analysis technique to classify different provinces of Iran based on the health indicators of mother (7). Mahjub et al. also used multivariate method of factor analysis to assessment women’s health needs in different provinces of Iran (8). Clustering methods are widely used for classification of objects of similar kind into respective categories. Cluster analysis consists of different algorithms and methods such as fuzzy clustering, K-Means clustering, hierarchical clustering, and model-based clustering (5). One drawback of various clustering method is ignoring the distribution of variables. In order to solve this problem, using the method “model based clustering” was purposed. The model-based clustering was first introduced by Wolfe and then was revised by McLachlan and Peel in 2000 (9) and by Fraley and Raftery in 2002 (10). Model-based clustering is widely used in multivariate analysis such as density estimation and discriminant analysis (10), magnetic resonance imaging (11, 12), and microarray data analysis (13–15). Classification of the disease distribution among different countries worldwide based on single factor is straightforward. However, considering the classification of countries based on different indices need to apply multivariate methods such as clustering method. Model-based clustering is a powerful multivariate method based on mixture distribution, which can efficiently classify the countries considering several indices (9). Furthermore, no literature was found for classifying death rates from women’s cancers in different countries in the world. Therefore, the main objective of the present study is to classify countries based on the death rates from women’s cancers using model-based clustering.

Materials and Methods

In this study, which was performed in 2009, we used dataset regarding death rate from different variables of breast, cervical, uterine, and ovarian cancers in 192 countries worldwide reported by World Health Organization (WHO) (16). From the data Kiribati and Sanmario countries were excluded because of missing observations in some variables. Hence, 190 countries remained for analysis. In order to perform model-based clustering, initially normal mixture models were fitted with different numbers of components to the data in order to determine the optimal number of clusters. Then, the best-fitted model was selected with appropriate number of components using Bayesian Information Criteria (BIC). Next, the model parameters were estimated using maximum likelihood via the EM algorithm. Finally, based on the best-fitted model, the countries were allocated into four distinct clusters based on death rate from women’s common cancers. Accordingly, the regions with homogenous characteristics were determined. An important aspect of model-based clustering is that the former is based on a statistical model. In other word, in model-based clustering, a postulated statistical model is considered for the population from which the sample data is obtained. Accordingly, can be displayed the probability density function of any given data Xi with mixture distribution as follows: Where, k is the number of components in mixture model, f(x) is mixture density, and Pk is the probability of the observation that comes from a gth mixture component and called mixing proportions. The parameters of the model (Pk and θ) were estimated using the EM algorithm. When the mixture model is fitted, the data can be categorized into g clusters. The posterior probability for membership of each observation can be obtained using following formula: For each observation, the posterior probability was calculated based on the mentioned formula for each cluster. Then, we assigned the observation to the cluster in which the posterior probability was the highest. Accordingly, cluster k includes observations that are devoted to component k. For multivariate data analysis of a continuous nature, attention has been concentrated on the use of multivariate analysis with normal component distribution because of their computational convenience. In that case, the probability density function of component distribution with the mean μ and the covariance matrix Σ will be as follows (9): The EM algorithm is the reference tool by which the maximum likelihood in a mixture model can be derived (17). Geometric features of the clusters, including shape, volume, and orientation, can be specified by the covariance matrices, Σ. Banfield and Raftery in 1993 (18) and Celeux and Govaert in 1995 (19) suggested a basic design for geometric constraints in multivariate normal mixtures. They parameterized covariance matrices through eigenvalues decomposition in the following form: Where, λ is an associated constant of probability, D is the orthogonal matrix of eigenvectors, A is a diagonal matrix. The parameter s λ and A determine size, direction and the shape of cluster k respectively. By analyzing constraints and applying various elements, 28 different models were obtained (18, 19). For the present data, we fitted various mixture normal models (g=2 to 6) considering 28 different forms of decomposition of related matrix of variance covariance components. Model selection was based on minimum Bayesian Information Criteria (BIC) amount as well as the minimum value of Entropy Index (EI) in different number of mixture components in 28 suggested models (20). We used MIXMOD version 2.1.1 and MATLAB version 7.0 software for data analysis.

Results

A total number of 140 models with different components and decomposition matrix of variance covariance were fitted to the data. Among these models, the highest and lowest values of BIC were 3553.04 and 3118.66 respectively. So a mixture model of four components with different mixing proportion, shapes, and directions having minimum BIC=3118.66 and Entropy Index=24.99 was selected. The estimated parameters of the selected model are shown in Table 1.

Table 1:

Estimated parameters of the best-fit model

Component	Mixing proportion	Variable	Mean	Variance covariance matrix
Component	Mixing proportion	Variable	Mean	Breast cancer	Cervical cancer	Uterine cancer	Ovary cancer
Component 1	0.24	Breast	9.05	25.58	−3.51	4.04	4.41
		Cervical	6.32	−3.51	15.05	−2.47	0.53
		Uterine	2.75	4.04	−2.47	3.09	0.51
		Ovary	1.78	4.41	0.53	0.51	1.26
Component 2	0.14	Breast	4.67	2.50	2.59	0.31	1.45
		Cervical	4.00	2.59	4.16	0.08	2.50
		Uterine	1.07	0.31	0.08	0.17	0.05
		Ovary	2.21	1.45	2.50	0.05	2.03
Component 3	0.38	Breast	4.77	4.03	−0.50	0.17	0.31
		Cervical	4.66	−0.50	6.20	−0.02	0.36
		Uterine	0.33	0.17	−0.02	0.02	0.03
		Ovary	1.23	0.31	0.36	0.03	0.27
Component 4	0.23	Breast	18.24	15.40	−0.48	1.03	4.10
		Cervical	2.68	−0.48	2.15	1.02	1.05
		Uterine	3.57	1.03	1.02	1.16	1.13
		Ovary	5.60	4.10	1.05	1.13	2.73

Based on the information in the table, posterior probabilities for each country were computed. Then, each country was allocated to one of the four mixture components based on the highest posterior probability. Accordingly, the world countries were allocated to four distinct clusters, including 43 (23%) countries in cluster 1, 28 (14%) countries in cluster 2, 75 (39%) countries in cluster 3, and 44 (24%) countries in cluster 4 as it can be seen in Table 2.

Table 2:

Classification of women’s cancers in different countries worldwide

Component 1
Albania	Brazil	Guatemala	Paraguay	Serbia
Antigua	Burkina Faso	Guyana	Rep of Korea	South Africa
Argentina	Cuba	Haiti	Romania	Suriname
Azerbaijan	Dominica	Indonesia	Saint Kitts	Swaziland
Bahamas	Dominican	Jamaica	Saint Lucia	Trinidad
Barbados	Ecuador	Lesotho	Saint Vincent	Uruguay
Belize	El Salvador	Mauritius	Sao Tome	Venezuela
Bosnia	Georgia	Nauru	Seychelles
Bolivia	Grenada	Nicaragua	Singapore
Component 2
Cameroon	Fiji	Niger	Samoa	Tonga
Chile	Honduras	Niue	Solomon	Tuvalu
Colombia	Kyrgyzstan	Palau	Sri Lanka	Uzbekistan
Cook Islands	Maldives	Panama	Syrian Arab	Vanuatu
Costa Rica	Marshall	Philippines	Tajikistan
Ethiopia	Micronesia	Moldova	Timor-Leste
Component 3
Afghanistan	Comoros	Iran	Mongolia	Sierra Leone
Algeria	Congo	Iraq	Morocco	Somalia
Angola	Côte d’Ivoire	Jordan	Mozambique	Sudan
Bahrain	Demo Korea	Kenya	Myanmar	Thailand
Bangladesh	Djibouti	Kuwait	Namibia	Togo
Benin	Egypt	Lao	Nepal	Tunisia
Bhutan	Eritrea	Lebanon	Nigeria	Turkey
Botswana	Equatorial	Liberia	Oman	Turkmenistan
Burundi	Guinea	Libyan	Pakistan	Uganda
Brunei	Gabon	Madagascar	Papua	UnitedEmirates
Cambodia	Gambia	Malawi	Peru	Viet Nam
Cape Verde	Ghana	Malaysia	Qatar	Yemen
CentralAfrican	Guinea	Mali	Rwanda	UnitedTanzania
Chad	Guinea Bissau	Mauritania	Saudi Arabia	Zambia
China	India	Mexico	Senegal	Zimbabwe
Component 4
Andorra	Cyprus	Iceland	Malta	Slovenia
Armenia	Czech Republic	Ireland	Monaco	Spain
Australia	Denmark	Israel	Netherlands	Sweden
Austria	Estonia	Italy	New Zealand	Switzerland
Belarus	Finland	Japan	Norway	Yugoslav
Belgium	France	Kazakhstan	Poland	Ukraine
Bulgaria	Germany	Latvia	Portugal	UK
Canada	Greece	Lithuania	Russian	USA
Croatia	Hungary	Luxembourg	Slovakia

According to the results of model based clustering, most countries in South America allocated to the first cluster. In addition, most countries in Africa, Central, and Southeast Asia were located to the third cluster. Furthermore, the fourth cluster consisted of Pacific continent, North America, and European countries (Fig. 1).

Fig. 1:

Model-based clustering of the countries worldwide according to the death rates from women’s common cancers (Breast, Uterine, Cervix, and Ovary)

Discussion

Various studies indicated that mode-based clustering methods have better performance than other methods when clusters are overlapping with different shape and size (21). In addition, model-based clustering are increasingly preferred over other procedures because variance-covariance matrices of the model simplify the interpretability of the results (22). We could not find any study to use model-based clustering to classify regions or countries based on cancer data. However, several studies used other kinds of cluster models for classification of different regions such as hierarchical clustering methods, K-Means and fuzzy clustering. For example, Abadi et al. used cluster analysis to classify universities of medical sciences and faculties of medicines (23). Babaee et al. used fuzzy clustering and hierarchical clustering method to classify the provinces based on population and health indicators (24). Vahedi et al. applied hierarchical and non-hierarchical clustering methods on DNA microarray data to classify patients with breast cancer (25). In addition, there is an increasing preference to use model-based clustering over other methods worldwide. Mar et al. in 2003 applied model-based clustering method for clustering genes associated with breast cancer (26). Pan et al. in 2002 applied model-based clustering method to analyze gene microarray data. They used log likelihood ratio and BIC criteria to select the number of components of the mixture model method (27). McLachlan in 2002 used EMMIX GENE software for model-based clustering method to classify data of micro array gene (28), whereas, in our work, we used MIXMOD software for data analysis. Furthermore, Chen et al. in 2008 applied model based clustering method for diagnosis of cancer patients (29), while we classified the observations in more than two groups. More recently publication in the field of model based clustering is related to Haibe-Kains study that used model-based clustering to identify molecular species in breast cancer (30) as well as Muna et al. in 2008 applied model-based clustering method for clustering adolescent behavioral problems during adulthood (31). One limitation of the model-based clustering is the maximum number of parameters needs to be estimated. That means relatively more data points are required in each component (32). Despite its limitation, a main contribution of the present study was introduction of an appropriate and flexible method of clustering that might be used in vast variety of public health contexts. One advantage of model-based clustering is its simplicity and flexibility. Another advantage of this model is that, like other statistical models, it is possible to impose restriction on the parameters to obtain more parsimony (21). The third advantage of the model-based clustering is that there is no necessity to make decision on scaling of the observed variables while in standard non-hierarchical cluster methods like K-means, scaling of the observed variables is always an important issue (22). In conclusion, we showed that model-based clustering could be easily used to classify geographical regions appropriately based on various sample data. Considering the benefits of clustering based on normal mixture models over other conventional clustering methods, it seems that this method can be applied in wide variety of medical and public heath contexts.

Ethical considerations

Ethical issues (Including plagiarism, Informed Consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc) have been completely observed by the authors.

12 in total

1. Model-based clustering and data transformations for gene expression data.

Authors: K Y Yeung; C Fraley; A Murua; A E Raftery; W L Ruzzo
Journal: Bioinformatics Date: 2001-10 Impact factor: 6.937

2. A mixture model-based approach to the clustering of microarray expression data.

Authors: G J McLachlan; R W Bean; D Peel
Journal: Bioinformatics Date: 2002-03 Impact factor: 6.937

3. A geographic analysis of prostate cancer mortality in the United States, 1970-89.

Authors: Ahmedin Jemal; Martin Kulldorff; Susan S Devesa; Richard B Hayes; Joseph F Fraumeni
Journal: Int J Cancer Date: 2002-09-10 Impact factor: 7.396

4. Classification models for breast cancer molecular subtyping: what is the best candidate for a translation into clinic?

Authors: Benjamin Haibe-Kains
Journal: Womens Health (Lond) Date: 2010-09

Classification of Death Rate due to Women's Cancers in Different Countries.

Introduction

Materials and Methods

Results

Discussion

Ethical considerations

1. Model-based clustering and data transformations for gene expression data.

2. A mixture model-based approach to the clustering of microarray expression data.

3. A geographic analysis of prostate cancer mortality in the United States, 1970-89.

4. Classification models for breast cancer molecular subtyping: what is the best candidate for a translation into clinic?

5. Mixture modelling for cluster analysis.

6. Donuts, scratches and blanks: robust model-based segmentation of microarray images.

7. A model-based cluster analysis approach to adolescent problem behaviors and young adult outcomes.

8. Model-based region-of-interest selection in dynamic breast MRI.

9. A mixture model approach for the analysis of small exploratory microarray experiments.

10. Local clustering in breast, lung and colorectal cancer in Long Island, New York.

1. Mapping the obesity in iran by bayesian spatial model.