Literature DB >> 27892664

Breast Cancer and Modifiable Lifestyle Factors in Argentinean Women: Addressing Missing Data in a Case-Control Study

Julia Becaria Coquet¹, Natalia Tumas, Alberto Ruben Osella, Matteo Tanzi, Isabella Franco, Maria Del Pilar Diaz.

Abstract

A number of studies have evidenced the effect of modifiable lifestyle factors such as diet, breastfeeding and nutritional status on breast cancer risk. However, none have addressed the missing data problem in nutritional epidemiologic research in South America. Missing data is a frequent problem in breast cancer studies and epidemiological settings in general. Estimates of effect obtained from these studies may be biased, if no appropriate method for handling missing data is applied. We performed Multiple Imputation for missing values on covariates in a breast cancer case-control study of Córdoba (Argentina) to optimize risk estimates. Data was obtained from a breast cancer case control study from 2008 to 2015 (318 cases, 526 controls). Complete case analysis and multiple imputation using chained equations were the methods applied to estimate the effects of a Traditional dietary pattern and other recognized factors associated with breast cancer. Physical activity and socioeconomic status were imputed. Logistic regression models were performed. When complete case analysis was performed only 31% of women were considered. Although a positive association of Traditional dietary pattern and breast cancer was observed from both approaches (complete case analysis OR=1.3, 95%CI=1.0-1.7; multiple imputation OR=1.4, 95%CI=1.2-1.7), effects of other covariates, like BMI and breastfeeding, were only identified when multiple imputation was considered. A Traditional dietary pattern, BMI and breastfeeding are associated with the occurrence of breast cancer in this Argentinean population when multiple imputation is appropriately performed. Multiple Imputation is suggested in Latin America’s epidemiologic studies to optimize effect estimates in the future. Creative Commons Attribution License

Entities: Chemical Disease Gene Species

Keywords: Body mass index; breastfeeding; cancer epidemiology; dietary pattern; multiple imputation

Year: 2016 PMID： 27892664 PMCID： PMC5454599 DOI： 10.22034/apjcp.2016.17.10.4567

Source DB: PubMed Journal: Asian Pac J Cancer Prev ISSN： 1513-7368

Introduction

Cancer is the second leading cause of death by disease in Argentina, only preceded by cardiovascular diseases (Dirección de Estadísticas e Información de Salud, 2013). A large body of literature has identified several socio-cultural and biological risk factors associated with most incident cancers in Argentina (Niclis et al., 2015; Pou et al., 2014; Román et al., 2014; Tumas et al., 2014). Most of these works has been conducted by the Group of Environmental Epidemiology of Cancer in Córdoba (GEECC). Diet is a recognized modifiable factor associated with most incident cancers in the country. Argentinean traditional diet is characterized by a high consumption of animal protein and fat (obtained mainly from red meat), and low intakes of fish, fruits and vegetables (Navarro et al., 2003; Pou et al., 2014). In Argentina, breast cancer is the most commonly diagnosed cancer in total population and, specifically, in women. Moreover, a strong diet-breast cancer relationship has been reported (Tumas et al., 2014). Other modifiable factors associated with the disease that have being well studied are breastfeeding and nutritional status (Chan and Norat, 2015; Islami et al., 2015). In general, health sciences researchers do not inform how much data is missing in their studies or how they handle it appropriately (Klebanoff and Cole, 2008). Even though in the last years recommendations on how to address this problem have been published (Klebanoff and Cole, 2008; Sterne et al., 2009; Von Elm et al., 2014), epidemiologic works reporting numerically this weakness are scarce. This may be partly because health researchers avoid these analyses as they lack confidence in the practice of bias analysis and, in some cases, do not apply appropriate methods to tackle specific problems such as missing data. Moreover, in certain instances, researchers do not realize that missing data can bias results. A review about cohort studies in major medical journals showed that most papers excluded participants with missing data and performed a complete-case analysis (66%) and only 7% applied multiple imputation or Bayesian methods. The rest performed bias methods (Karahalios et al., 2013). Wood et al (Wood, White, and Thompson, 2004) reviewed several randomized controlled trials and stated that only less than a fourth of them included a sensitivity analysis about missing data. A case-control is a design study frequently used in analytical research in cancer epidemiology. This type of study requires significant planning to avoid bias and the information obtain is important to identify risk factors associated with diseases with long exposure period as cancer. Case-control studies have to deal with missing data, especially missing multiple information in covariates. This lack of information in predictors can lead to biased and/or inefficient estimates of parameter and biased standards errors resulting in incorrect confidence intervals and significance tests. Consequently, effort must be put in obtaining valid and precise risk estimates to translate these into recommendations to the population (Lash et al., 2009). One way of improving validity is to address the missing data, a common source of bias in biomedical research. In all statistical analysis, some assumptions are made about the missing data mechanism. The validity of results obtained after applying imputation methods depends on the compliance of the assumptions made about the missing data mechanism (White et al., 2011). Little and Rubin’s framework is often used to classify the missing data as being missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR) (Acock, 2005; White et al., 2011). Most statistical software applies by default a complete case analysis to address the problem of incomplete data, excluding from the analysis subjects with one or more variables with missing information. Several relatively simple methods have been developed in last decades, such as substitution by the mean, median or linear regression. These are simple imputation methods where the missing data is imputed with a single value and the complete dataset is utilized to perform the subsequent analysis (Allison, 2009). The uncertainty associated with the imputed value must be considered to obtain valid estimations because the imputed value is not the real value that we would have observed if all variables where complete in the dataset. Thus, to solve this problem multiple imputation method was developed (Rubin, 1976). In Argentina, most epidemiologic studies probing the cancer causal pathway are case-control studies. However, none of these researches have addressed the missing data problem in the country. The main objective of this study was to optimize risk estimates associated with an identified dietary pattern while using multiple imputation for missing values on covariates in a breast cancer case-control study.

Materials and Methods

Study Design

Data come from an ongoing breast cancer case-control study conducted in female adult population of Córdoba province. Three hundred and eighteen cases under 85years old with a histopathologically confirmed incident primary diagnosis of breast cancer (ICD-10th Edition, ICIE10:C50) have been enrolled between 2008 and 2015 (identified by the Córdoba Tumor Registry). In the same time, 526 controls were randomly chosen. These subjects were healthy women matched by age (± 5 years) and place of residence with cases. All women gave their informed consent and ethics approval was obtained (RePIS 058/10/E).

Data

Data were collected by trained interviewers following a structured questionnaire including auto reported information about sociodemographic and anthropometric characteristics, physical activity, smoking habits, family and personal disease history and dietary habits. Data on diet 5 years (Ambrosini et al., 2008) before interview (for controls) or diagnosis (for cases) was obtained using a food frequency questionnaire and a photographical atlas, both validated (Navarro et al., 2001, 2007).

Method

Multivariate imputation using chained equations (MICE) is a practical approach for handle missing data (Acock, 2005; White et al., 2011). In this case, the MAR mechanism of missingnes was assumed. The imputation process has been described elsewhere (White et al., 2011). Briefly, MICE method imputes missing values in different steps; initially, all missing values are filled at random and multiple imputed data sets are generated. For each one of the imputed variables an imputation model is build considering all variables that are included in the subsequent analysis, as well as those that may be predictive of the missing values. Second, the imputed data sets are analyzed separately and, finally, all the independent estimations are combined into an overall estimate. In this paper, 20 dataset were generated (Sterne et al., 2009) and the imputation method was performed when variables had more than 10% of missing values (Bennett, 2001). The final model selected was the most appropriate model based on a set of imputation models and the average relative variance increase (RVI) obtained (Acock, 2014). In addition, diagnostic plots were performed comparing the distribution of the imputed values with the observed values for the continuous variable imputed (Eddings and Marchenko, 2012).

Imputed Variables

Physical activity (PA) and socioeconomic status (SES) variables were imputed and then used as predictor variables in the final risk logistic regression model. These variables had a significant amount of missing data mainly because they were included after the study began. The International Physical Activity Questionnaire (IPAQ, short form) was used to obtain PA information (The International Physical Activity Questionnaire, n.d.) and was included as a continuous variable. The volume of PA was achieved by weighting each type of activity by its energy requirements defined in METs (multiples of resting metabolic rate). SES variable is build with eight variables from the dataset(Asociación Argentina de Marketing, 2002). When one of these variables is missing, the SES will have missing values too. Therefore, these observed variables were imputed and then the SES was calculated. This strategy prevents the loss of information. Six out of 8 variables were imputed. Education level, number of economic providers in the house, occupation of main provider, having computer at home, having internet at home, having debit card, having health care and having cars, were the variables used to construct SES. All of these variables were imputed except education level and occupation of main provider (<10.0% missing). In total, seven variables were imputed (PA and six variables used to build SES).

Models

The outcome was the presence/absence of breast cancer. The exposure covariate was Traditional dietary Pattern, which was previously identified in Cordoba’s population through a principal component factor analysis. This pattern was characterized by positive high loadings of fat meats, bakery products, and vegetable oil and mayonnaise (Tumas et al., 2014). Other recognized risk factors for breast cancer were included: age (Benz, 2008), SES (Bigby and Holmes, 2005), body mass index (BMI) (Chan and Norat, 2015), PA (Amadou, Torres-Mejía, Hainaut, and Romieu, 2014), reproductive variables (having children, breastfeeding, year of menarche, gynecological status) (Sisti et al., 2015; Zhou et al., 2015). A Logistic multiple regression model was used. Stata 13.0 software (StataCorp LP, USA) was used for analysis.

Results

Table 1 and 2 show the distribution of subjects by variable and distribution of missing data, illustrating shadowed variables with more than 10% of missing. There were 318 breast cancers cases and 526 controls. About half of the women were older than 60 years. More than a third of women presented a higher adherence to the Traditional dietary Pattern identified in the female population and more than half were overweight or obese (51.3%). Regarding PA, 21.6% of women were sedentary but it should be noted that the percentage of missing values in this variable was high (29.1%). In relation to education, half of the women did not finished high school education, and 29.5% of them had higher education (university or tertiary education, completed or not) (Table 1).

Table 1

Subjects and Missing Data: Absolute and Relative Distributions of Outcome, Exposure and Other Covariates, Breast Cancer Case-Control Study Córdoba, Argentina 2008-2015

Total	n 844	% 100		% of	missing	values
Breast Cancer			Physical Activity	Health Care	N° of providers	Computer	Internet	Debit Card	Cars
No	526	62.3	29.7	26.0	11.2	11.2	11.2	11.2	11.2
Yes	318	37.7	28.3	21.4	8.8	8.8	8.8	8.8	9.4
Traditional dietary Pattern
Tertil 1	245	29	31	24.1	6.9*	6.9*	6.9*	6.9*	7.3*
Tertil 2	277	32.8	28.5	24.2	10.5	10.1	10.1	10.1	10.1
Tertil 3	322	38.2	28.4	24.5	12.7	13.0	13.0	13.0	13.3
Age
<45 years	137	16.2	26.3	21.9	9.5	9.5	9.5	9.5	10.2
45-60 years	307	36.4	24.4	26.4	11.4	11.4	11.4	11.4	11.4
>60 years	400	47.4	33.7	23.5	9.7	9.7	9.7	9.75	10
BMI
<25kg/mt2	400	47.4	29	27.2	8.5*	8.7*	8.7*	8.75*	8.75*
25-30kg/mt2	268	31.7	27.9	22	9.3	9.3	9.3	9.33	9.33
>30kg/mt2	165	19.6	31.5	22.4	15.8	15.1	15.1	15.1	16.4
Unknown	11	1.3	27.3	0.0	18.2	18.2	18.2	18.2	18.2
Physical Activity
Sedentary	182	21.6	_	52.7*	4.9*	4.4*	4.4*	4.4*	4.9*
Moderate	228	27	_	28.1	8.8	8.8	8.8	8.8	8.8
Vigorous	188	22.3	_	10.6	23.9	24.5	24.5	24.5	25
Unknown	246	29.1	_	10.2	5.3	5.3	5.3	5.3	5.3
Education
No Studies	5	0.6	80.0*	0.0*	20.0*	20.0*	20*	20*	20*
Incomplete primary	63	7.5	33.3	0.0	17.6	15.9	15.9	15.9	17.9
Complete primary	281	33.3	22.1	46.6	9.2	9.2	9.2	9.2	9.2
Incomplete high school	79	9.4	34.2	0.0	8.9	10.1	10.1	10.1	10.1
Complete high school	145	17.2	21.4	19.3	10.3	10.3	10.3	10.3	10.3
Higher education	249	29.5	36.9	13.6	6.0	6.0	6.0	6.0	6.4
Unknown	22	2.6	40.9	54.5	54.5	54.5	54.5	54.5	54.5

Statistically Significant Differences (p <0.05).

Table 2

Subjects and Missing Data: Absolute and Relative Distributions of Gynecologic and Socioeconomic Variables, Breast Cancer Case-Control Study Córdoba, Argentina 2008-2015

Total	n 844	% 100			% of Missing	Values
Having Children			Physical Activity	Health Care	N° of providers	Computer	Internet	Debit Card	Cars
No	120	14.2	32.5*	24.2*	10.8	10.8	10.83	10.8	11.7
Yes	666	78.9	29.9	26.3	10.4	10.4	10.36	10.4	10.5
Unknwon	58	6.9	13.8	1.7	8.6	8.6	8.62	8.6	8.6
Breastfeeding
No	266	31.5	31.2	21.8*	7.5	7.9	7.89	7.9	8.3
Yes	521	61.7	29.4	28.2	11.7	11.5	11.52	11.5	11.7
Unknown	57	6.8	17.5	0.0	10.5	10.5	10.53	10.5	10.5
Menopause
No	201	23.8	25.4*	24.9	9.5	9.5	9.45	9.4	9.5
Yes	589	69.8	31.9	26.3	11.4	11.4	11.38	11.4	11.5
Unknown	54	6.4	12.9	0.0	1.8	1.8	1.85	1.8	3.7
Menarche
<12 years	147	17.4	29.9	22.4	11.6	11.6	11.56	11.6	11.6*
>=12 years	676	80.1	29.0	25.0	9.6	9.6	9.62	9.6	9.7
Unknown	21	2.5	28.6	14.3	23.8	23.8	23.81	23.81	28.6
Health Care
No	68	8.1	33.8*	_	14.7*	14.7*	14.7*	14.7*	14.7*
Yes	571	67.6	34.7	_	11.4	11.4	11.4	11.4	11.4
Unknwown	205	24.3	12.2	_	5.9	5.8	5.8	5.8	5.8
N° of providers
One	364	43.1	34.1*	23.1*	0.0	0.0*	0.0*	0.0*	0.5*
Two or three	379	44.9	27.9	28.0	0.0	0.3	0.3	0.3	0.3
More than three	14	1.7	21.4	21.4	0.0	0.0	0.0	0.0	0.0
Unknown	87	10.3	14.9	13.8	100.0	98.8	98.8	98.8	98.8
Computer
No	293	34.7	28.0*	32.8*	0.0*	_	0.0	0.0	0.3*
Yes	464	55.0	32.5	20.9	0.2	_	0.0	0.0	0.2
Unknown	87	10.3	14.9	13.8	98.8	_	100.0	100.0	100.0
Internet
No	346	41	27.5*	32.7*	0.3*	0.0	_	0.0	0.6*
Yes	411	48.7	33.6	19.5	0.0	0.0	_	0.0	0.0
Unknown	87	10.3	14.9	13.8	98.8	100.0	_	100.0	100.0
Debit Card
No	354	41.9	28.8*	26.0*	0.3*	0.0	0.0	_	0.6*
Yes	403	47.8	32.5	25.1	00	0.0	0.0	_	0.0
Unknown	87	10.3	14.9	13.8	98.8	100.0	100.0	_	100
Cars
None	361	42.8	35.2*	19.7*	0.3*	0.0	0.0	0.0	_
One	332	39.3	26.5	30.4	0.0	0.0	0.0	0.0	_
Two	58	6.9	29.3	36.2	0.0	0.0	0.0	0.0	_
Three or more	4	0.5	25.0	0.0	0.0	0.0	0.0	0.0	_
Unknown	89	10.5	14.6	13.5	96.6	97.7	97.7	97.7	_

statistically significant differences (p<0.05).

Subjects and Missing Data: Absolute and Relative Distributions of Outcome, Exposure and Other Covariates, Breast Cancer Case-Control Study Córdoba, Argentina 2008-2015 Statistically Significant Differences (p <0.05). Subjects and Missing Data: Absolute and Relative Distributions of Gynecologic and Socioeconomic Variables, Breast Cancer Case-Control Study Córdoba, Argentina 2008-2015 statistically significant differences (p<0.05). Relating to gynecologic variables, around 79% of the women had children, 61.7% had breastfed, 69.8% were menopause at the time of the diagnosis (or at interview in controls) and a 17.4% stated were younger than 12 years old at the time of the menarche. Concerning SES variables, around 68% declared having health care and 24.3% were missing values; around a half of the women interviewed had computer, internet and debit card and a 43% of subject did not have a car. All of these SES variables had about 10% of missing values (Table 2, occupation of main provider shown in Supplement and Supporting Data SSD). Generally, most of the distributions of the missing values differ among categories of the variables included in the analysis, yet not all of these are statistically significant. For example, the distribution of missing data in Health Care variable was different among categories of PA. Sedentary women had 52.7% of missing value in Health Care variable, compare with the 10.7% of missing in the higher category of PA. The differential distribution of missing values was also statistically significant regarding the education, having children and breastfeeding categories. Similar patters occur with categories of the SES variables as well. The percentages of missing values in the SES variables seem to decline as the participants’ education level increases. When the Traditional dietary Pattern is considered, percentages of missing data elevate as people adhere more to this food pattern. Missing data patterns seems to be MAR mechanism. Most frequent missing data patterns are shown in SSD, combining all variables included in the analysis, except Traditional dietary Pattern (without missing values). Only 31% of women have complete information in all variables. The most frequent pattern of missing data is observed in 21% of subject with missing value only on PA. 20% of women had no information on Health Care only (see Table in SSD). Table 3 shows the estimated effects (OR, 95 % CI) of covariates from the logistic regression model, when Complete Case (CC) analysis and Multiple Imputation (MI) method are performed. CC is only applied in 31% of subjects. Although significant promoting effect of the Traditional dietary Pattern was observed from both approaches, effects of other covariates, like BMI and breastfeeding, were only identified when MI is considered. Even though uncertainty associated with the imputation process is taking into account in estimations, more precise

Table 3

Association Measurements (Odds Ratio), Confidence Intervals and P-Values, Breast Cancer Case-Control Study Córdoba, Argentina 2008-2015

	Complete Case Analysis			Multiple Imputation Analysis
		(n=265)			(n=703)
	Odds Ratio	95% CI	p value	Odds Ratio	95% CI	p value
Traditional dietary Pattern	1.3	1.0-1.8	0.039	1.4	1.2-1.6	0.00
BMI	1.1	0.9-1.1	0.262	1.1	1.0-1.1	0.03
Breastfeeding	0.6	0.3-1.1	0.087	0.5	0.4-0.8	0.003
SES
Low-low	0.4	0.1-1.8	0.260	1.1	0.5-2.9	0.766
Upper-low	0.9	0.3-2.9	0.985	1.3	0.6-2.8	0.454
Middle	1.1	0.4-3.3	0.872	1.9	0.9-4.1	0.112
Upper middle	1.5	0.5-4.1	0.443	1.6	0.8-3.2	0.213
Upper	1.3	0.4-3.7	0.673	1.4	0.7-2.9	0.369
Menopause	1.2	0.6-2.6	0.633	1.5	0.9-2.5	0.102
Physical Activity	0.9	0.9-1.0	0.694	1.0	0.9-1.0	0.510
Age	0.9	0.9-1.0	0.961	0.9	0.9-1.0	0.571
Menarche	0.9	0.8-1.1	0.458	1.0	0.9-1.1	0.653
Having Children	1.5	0.6-3.7	0.328	1.6	0.9-2.7	0.102

Association Measurements (Odds Ratio), Confidence Intervals and P-Values, Breast Cancer Case-Control Study Córdoba, Argentina 2008-2015 95% confidence intervals are observed after MI method is applied. Finally, the imputation diagnostic measure (RVI) of the final risk model shows a value equal to 0.07 indicating that the estimated sampling variability for this set of covariates was just 7% larger than what would have been in the case of complete values of covariates. Figure 1 shows some of the distributions of the imputed and observed values for the physical activity covariate, as an example of the behavior of the imputation modeling. For all imputed dataset the imputed and observed distributions were similar.

Figure 1

Diagnostic Plots for Physical Activity After Imputation for the Imputed Dataset 2 and 17, Breast Cancer Case-Control Study Córdoba, Argentina 2008-2015.

Discussion

Estimates obtained applying CC analysis and MI methods differ from each other. Traditional dietary Pattern, BMI and Breastfeeding were the variables that showed significant changes in their effects on the occurrence of breast cancer, when the imputation method was considered. Furthermore, the imputation mechanism chosen in the modeling process had a successful performance, based on the value of the diagnostic measure RVI coupled to the combined distributions analysis. Eating habits of women with high adherence to the Traditional dietary Pattern may be linked to breast cancer through different pathways. Fat and carbohydrates intake may influence circulating level of plasma sex hormones and/or growth factors (Amadou et al., 2014; Lajous et al., 2005; Renehan et al., 2015). Fat meat intake may be associated through its high lipid content and the production of heterocyclic amines as well (Ronco et al., 2010) among other mechanisms; and the low presence of dietary fiber and antioxidant vitamins in this dietary pattern may also be related in part to the disease (Karimi et al., 2014). The association between BMI and breast cancer has long been reported in literature (Chan and Norat, 2015). Three hormonal candidate mechanisms have been proposed for the adiposity–cancer link (related to sex hormone, insulin and insulin-like growth factor 1 (IGF1), and adipokine pathophysiology) (Renehan et al., 2015). In a recent study, the risk ratio of incidence per 5 kg/m2 increase in BMI showed a significantly stronger trend of association between BMI and breast cancer incidence in Asia–Pacific patients than in European–Australian and North-American patients (Wang et al., 2016). Similarly, the present study showed a 3.0% increased risk of breast cancer per 1kg/m2 increase in BMI. Tumas et al (Tumas et al., 2014) have already observed an association between BMI and breast cancer in the same population of South America. Several studies have reported an inverse association between breastfeeding and breast cancer (Anothaisintawee et al., 2013; Collaborative Group on Hormonal Factors in Breast Cancer, 2002). Various mechanisms have been proposed such as decreased frequency and intensity of ovulation, mobilization of endogenous carcinogens from the ductal and lobular epithelial cell environment and facilitating the excretion of organochlorides (Lodha et al., 2011). Recently a meta-analysis showed a strong protective effect of ever breastfeeding against hormone receptor-negative breast cancers (Islami et al., 2015). In our study breast cancer was not clasified by hormone-receptors; however breastfeeding effect became signifcantly associated with breast cancer risk only in MI analysis. While Table 1 shows that more than 60.0% of women have breastfed six months or more, the CC model had not success in identifying this effect. When MI was applied a noticeable improvement of estimates precision was obtained resulting in a significant effect. Modeling risk factors in epidemiologic studies is always a multidimensional assesment. We utilized models that includes life style variables asociated with breast cancer ocurrence. BMI and breastfeeding were two of these relevant variables reported in literature and they only became significantly asociated with the disease after applying MI in other covariates. This highlights the importance of MI to elucidate effects or associations arising from not so large studies that through conventional methods may not be observed. The protective role of PA has been documented (Goodwin et al., 2015). Potential anticancer effects of PA include reductions in endogenous sex hormone concentrations, insulin resistance, and chronic low-grade inflammation (Harvie et al., 2015). A recent meta-analysis has identified a significant reduction of breast cancer incidence in European and American patients, and in pre or/and postmenopausal women as well. Furthermore there was a significant non linear dose-effect relationship: the more the PA the lower breast cancer incidence (Liu et al., 2016). In our study PA resulted not associated with breast cancer risk neither in the CC nor MI analysis. Participants were mainly sedentary or presented a moderate activity and was imputed in around 30% of women. In theory, the observed and imputed distributions should not differ from each other, thus the imputation mechanism must have imputed homogeneously in all PA categories. At a population level, almost 60% of argentinean women declare practicing low PA (Instituto Nacional de Estadísticas y Censos, 2013). Missing values are frequent in epidemiological studies and a problem in statistical analyses. Although using only CC is simpler, estimates obtained may be affected if participants with missing values are omitted. Excluding observations that have missing values also ignores the possibility of systematic differences between complete cases and incomplete cases, thus the resulting inference might not apply to the entire population, especially when the number of complete cases is small (National Research Council, 2010). The present work analyses reliability of the case-control study on breast cancer conducted in order to identify risk factors of disease. Its sample size is not very large and only a third of subjects are included in CC analysis, thus this issue must be taken into account. When results obtain from CC analysis are compare with those achieve through the MI method, unreliable p values may be obtained in the first case and assessment of the importance of covariates may be inaccurate (Ibrahim et al., 2012). Furthermore, in some cases MI is likely to be advantageous for the coefficient of a relative complete covariate when other covariates are incomplete (White and Carlin, 2010). Misleading results may be obtained regarding the exposure effect. Besides, time and resources invested in collecting information will be wasted, because some will be discarded at the moment of the analysis. It is noteworthy that we did not use MI to estimate each missing value through simulated values, but rather to represent a random sample of the missing values. This process results in valid statistical inferences when the mechanism chosen is suitable for dataset (Molenberghs and Kenward, 2007). Our work assumed that the information was missing at random (MAR), that is, for a variable X, the probability that an observation is missing depends only on the observed values of other variables, not on the unobserved values of X. Unfortunately, MAR assumption cannot be verified, since missing values are not observed; yet the RVI diagnostic measure, calculated after fitting, indicated good performance of modeling approach. In Latin America, a few health studies have applied MI to address missing data (Benjet et al., 2008; Camargos at el., 2011; Fries et al., 2013; Nunes et al., 2009; Rubinstein et al., 2010). We did not find any nutritional and cancer epidemiologic study that proposes the MI approach to deal with this information bias in the region. In Argentina, only one study related to cardiovascular diseases (Rubinstein et al., 2010) was found addressing missing data. Even though in the last few years MI has been utilized in the region, to our knowledge none cancer epidemiologic paper applying this method have been published in Latin America. Moreover, none of these studies presented any information about the quality of the imputation models proposed. The small average RVI declared in our study is an estimate of the average relative inflation in variance of the estimates caused by the missing values. Ideally, this estimate should be close to zero (Acock, Alan C., 2014). In our opinion, efforts should be made to strengthen the quality of studies in the region, mainly in Southern Cone territory. Here, epidemiological studies on cancer are not very large, and the possibility missing data may be biasing results should be evaluated. Some limitations identified were the study size, making imperative to use as much information as possible, and lack of information regarding tumor classification by hormone-receptors. This study has shown that Traditional dietary Pattern, BMI and breastfeeding are associated with the occurrence of breast cancer in this argentinean population when MI is appropriately performed. This study additionally shows the benefits of performing MI on cancer epidemiology datasets with high proportions of missing data in covariates.

Financial Support

We would like to thank the Science and Technology National Agency, FONCyT grant PICT 2012-1019 for financial support and the National Scientific and Technical Research Council (CONICET) for JBC and NT fellowships.

Conflicts of interest

none.

Authorship contribution

MPD and JBC designed, drafted and revised critically the article. NT participated in the acquisition of data and making novel contributions in discussion. JBC explored missing data mechanism, analyzed dataset and interpreted results. MPD, ARO, MT and IF revised the article critically for intellectual content. All authors approved the final version of the article.

Ethical Standards Disclosure

This study was conducted according to the guidelines laid down in the Declaration of Helsinki and all procedures involving human subjects were approved by the Ethical Committee of the Faculty of Medical Sciences, University of Córdoba. Written informed consent was obtained from all subjects.

39 in total

1. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values.

Authors: Ian R White; John B Carlin
Journal: Stat Med Date: 2010-12-10 Impact factor: 2.373

2. Missing data in clinical studies: issues and methods.

Authors: Joseph G Ibrahim; Haitao Chu; Ming-Hui Chen
Journal: J Clin Oncol Date: 2012-05-29 Impact factor: 44.544

Review 3. Leisure time physical activity and cancer risk: evaluation of the WHO's recommendation based on 126 high-quality epidemiological studies.

Authors: Li Liu; Yun Shi; Tingting Li; Qin Qin; Jieyun Yin; Shuo Pang; Shaofa Nie; Sheng Wei
Journal: Br J Sports Med Date: 2015-10-23 Impact factor: 13.800

Review 4. Obesity and breast cancer: not only a risk factor of the disease.

Authors: Doris S M Chan; Teresa Norat
Journal: Curr Treat Options Oncol Date: 2015-05

Review 5. Use of multiple imputation in the epidemiologic literature.

Authors: Mark A Klebanoff; Stephen R Cole
Journal: Am J Epidemiol Date: 2008-06-30 Impact factor: 4.897

6. Breast cancer in Latin America: global burden, patterns, and risk factors.

Authors: Amina Amadou; Gabriela Torres-Mejía; Pierre Hainaut; Isabelle Romieu
Journal: Salud Publica Mex Date: 2014 Sep-Oct

7. [Cancer and its association with dietary patterns in Córdoba (Argentina)].

Authors: Sonia Alejandra Pou; Camila Niclis; Laura Rosana Aballay; Natalia Tumas; María Dolores Román; Sonia Edith Muñoz; Julia Becaria Coquet; María del Pilar Díaz
Journal: Nutr Hosp Date: 2014-03-01 Impact factor: 1.057

Review 8. Associations of body mass index with cancer incidence among populations, genders, and menopausal status: A systematic review and meta-analysis.

Authors: Jun Wang; Dong-Lin Yang; Zhong-Zhu Chen; Ben-Fu Gou
Journal: Cancer Epidemiol Date: 2016-03-03 Impact factor: 2.984

9. Dietary patterns identified using factor analysis and prostate cancer risk: a case control study in Western Australia.

Authors: Gina Leslie Ambrosini; Lin Fritschi; Nicholas Hubert de Klerk; Dorothy Mackerras; Justine Leavy
Journal: Ann Epidemiol Date: 2008-02-08 Impact factor: 3.797

10. Characterization of meat consumption and risk of colorectal cancer in Cordoba, Argentina.

Authors: Alicia Navarro; María P Díaz; Sonia E Muñoz; María J Lantieri; Aldo R Eynard
Journal: Nutrition Date: 2003-01 Impact factor: 4.008

1 in total

1. Reproductive risk factors associated with breast cancer in women in Bangui: a case-control study.

Authors: Augustin Balekouzou; Ping Yin; Christian Maucler Pamatika; Cavin Epie Bekolo; Sylvain Wilfrid Nambei; Marceline Djeintote; Komlan Kota; Christian Diamont Mossoro-Kpinde; Chang Shu; Minghui Yin; Zhen Fu; Tingting Qing; Mingming Yan; Jianyuan Zhang; Shaojun Chen; Hongyu Li; Zhongyu Xu; Boniface Koffi
Journal: BMC Womens Health Date: 2017-03-06 Impact factor: 2.809

1 in total