Literature DB >> 23115440

A comparison of case-control and case-only designs to investigate gene-environment interactions using breast cancer data.

Jafar Hassanzadeh1, Rahmatollah Moradzadeh, Abdolreza Rajaee Fard, Sedigheh Tahmasebi, Parvaneh Golmohammadi.   

Abstract

BACKGROUND: The traditional methods of studying the gene-environment interactions need a control group. However, the selection of an appropriate control group has been associated with problems. Therefore, new methods, such as case-only design, have been created to study such interactions. The objective of this study was to compare the case-only and case-control designs using data from patients with breast cancer.
METHODS: The interaction of genetic and environmental factor as well as the ratio of control to population odds ratio was calculated for case-only (300 patients with breast cancer) and case-control (300 cases of breast cancer and 300 matched controls) designs.
RESULTS: The confidence intervals and -2log likelihood in all variables in case-only design was smaller than those in the matched case-control design. In case-only design, the standard errors of some variables such as age at menarche, the first delivery at the age of 35 yrs and more or no delivery, the history of having live birth, use of oral contraception pills, breastfeeding history were less than those in the matched case-control design.
CONCLUSION: The findings indicate that the case-only design is an efficient method to investigate the interaction of genetic and environmental factors.

Entities:  

Keywords:  Case-control; breast cancer; gene-environment interaction

Year:  2012        PMID: 23115440      PMCID: PMC3470067     

Source DB:  PubMed          Journal:  Iran J Med Sci        ISSN: 0253-0716


Introduction

Many common diseases are caused by the interaction of genetic and environmental factors.[1] Interaction is used in epidemiology to describe a situation in which two or more risk factors adjust the effects of each other with respect to the level of a certain outcome.[2] In other words, when the incidence of a disease in the presence of two factors is different from the incidence of that disease in the presence of either factor alone, the interaction of the two factors is understood.[3] Although, the case-control designs are appropriate for examining gene-environment interactions, they have some limitations including the high cost and time needed to select the control group, a big sample size for estimating interactions, and the limitations in selecting an appropriate control group.4 Recently, several modern methods have been created to study the diseases genetic factors, which are based on using the internal control group instead of external ones. One of these methods, are case-only designs in which researchers use individuals to assess the magnitude of a relationship between a specific exposure and genotype susceptibility.[5] This design does not have a lot of limitations, which exist in the analogous case-control studies.[4] Breast cancer is the most-frequently diagnosed cancer in women,[6] and is a worldwide concern.[7] It constitutes about one-third of all cancers among women.8 Approximately, one out of nine women is affected by breast cancer throughout her life.[9] The well-established risk factors of breast cancer such as age at menarche, age at the first delivery, age at menopause, and alcohol consumption may be the criteria for cumulative exposure of breast epithelium to estrogenic substances.10 Previous studies have shown that the family history and its genetic polymorphisms may be a guide to constitute the familial patterns of estrogen endogenous level.[10] Several studies have assessed the fertility factors and familial predisposition to breast cancer to investigate the gene-environment interactions.[4],[11]-[14] In these studies, pedigree information, which is the family history of breast cancer in the first degree relatives of case groups, was used as a criterion for replacing the genetic susceptibility. The case-only designs may be used to examine genetic-drug interactions, survival, and some other studies. A major issue in genetic epidemiology is that diseases results from interactions between genetic and environmental factors.[5] The aim of this study was to compare the case-only and case-control designs using the data related to patients with breast cancer in the city of Shiraz, Iran.

Materials and Methods

The study included patients with breast cancer referring to Shahid Mottahari Breast Cancer Clinic in Shiraz. Patients had been identified through screening programs for breast cancer in Health Care Centers, and had been referred to Shahid Mottahari Breast Cancer Clinic. A questionnaire comprising of demographic data, reproductive factors, care, and treatment had been completed for every patient. By the time of the present study, two thousands questionnaires about patients with breast cancer had been completed by physicians and nurses in the clinic. Quanto1.2 Software (January 2007),[15] was used to determine the sample size of 300 subjects. The samples were selected randomly among the files of patients registered at the center from 2003-2007. The study sample consisted of individuals who had been residing in the city of Shiraz for at least five years. The Ethics Committee of Shiraz University of Medical Sciences approved the study. The data of the 300 patients were used as cases in case-only design and case-control design. Moreover, for case-control design the files of 300 people without breast or ovary cancer referring to the other divisions of Shahid Mottahari Clinic such as internal and surgical divisions were selected using convenient sampling method. Referrals to the other parts of the clinic may be considered population-based, because almost all socioeconomic groups refer to the clinic for specialist medical care. Matched Case-Control Design An important concern in case-control studies is the difference that may exist between the subjects of case and control groups in terms of individual and exposure variables other than those being studied. A method to overcome this problem is the design of the study in a way that one can match the subjects of case and control groups in terms of the factors considered. For this reason individual matching was conducted. As age is a major risk factor for breast cancer, and recognized as a potential altering variable, matching was conducted on it. On the other hand, matching increases the efficiency of the study.[16] In case-control studies, the main effects of non-matched variables and the interactions between these effects may be assessed. Therefore, we used conditional logistic regression analysis. First the case group files were selected randomly from Cancer Registry Center in Breast Cancer Clinic. Then for each case a matching control was selected using the ages of cases. For each case, a control woman, who had an age of ±3 years difference from the case’s age and had no ovary or breast cancer was selected. Case-Only Design The use of case-only design, to assess gene-environment interaction was suggested by Piegorsch et al.[17] The selection of cases in case-only designs is the same as that in other case-control studies. Although with case-only design alone one cannot assess the independent effects of exposure and genotype, independence assumption of environmental exposure and genotype are the basic premise of this design.[5] The entire sample was used for case-only analysis consisting of people with the disease.[17] The size achieved from the case-only technique is interpreted as a deviation from the multiplicative relationship.[5] A sizable number of prevalent diseases are the result of the interactions between the genetic and environment factors.[1] The major advantage of case-only designs application in genetic epidemiology and assessment of the gene-environment interaction is the simplicity of collecting the required data, and decrease of calculations and financial costs. It seems that case-only designs are more efficient than the traditional case-control designs, because they show more precise estimations, which is the result of less dispersion and more homogeneity. Therefore, in order to explore a specific odds ratio (OR) for interaction, the case-only designs need fewer cases than case-control studies. Moreover, the control group often has less motivation to participate in the study; therefore, the case-only design helps in minimizing the potential bias of participants. In case-only designs, data analysis is performed in a more straightforward way than in case-control designs. Although the case-only designs is not population-based, it uses simple sampling methods.[18] The standard case-control analysis often has a weak power to explore multiplicative interactions, which are the results of the low numbers of cases and controls in matrix cells of genotype and exposure. The assumption of independence of gene-environment association results in a stronger estimation of interaction. However, the violation of this assumption results in an increased Type II error.[19] The case–only design OR is calculated by multiplying the interaction (ORint) and OR of control group. If the independence assumption of gene and exposure in control group is valid and the disease is rare, the case-only OR measures interactional effect in a multiplicative model similar to the conventional case-control studies.[17] To impose independence assumption, Weinberg and Umbach suggested a Maximum Likelihood Method based on log-linear model. They have shown that their method may need less than half of the individuals who do not have the gene-environment independence assumption.[20] In the studies of gene-environment interactions a specific genotype might be used. When the genetic marker data is not available, the family history data may be used as a proxy for genetic susceptibility; however, such a use may result in the possibility of significant misclassification.[21] Independence Assumption As Nicolle et al. stated clearly, the independence between gene and environment is central to valid interpretation of a case-only study.[17] In practice, controlling non-independency is not always easy. For example, the control of non-independence assumptions requires the knowledge of non-independence sources, which can be difficult or impossible to locate in some situations. It is difficult to control for sources of bias in cohort and case-control studies, therefore, it may also be difficult to control for the sources of bias in case-only studies. However, sensitivity analysis method, the benefits of which have been shown in case-control and cohort studies, may be used in case-only studies. As non-independence can be calculated in analysis, the case-only design may be a useful epidemiological instrument for examining gene-environment interactions.[17] In the following, a formula has been provided to describe the situation in which OR is concluded for the gene-environment associations. The formula can be used to estimate gene environment OR in source population. In a previous study,[22] gene-environment independence assumptions was examined by Correlation or Chi-Square tests. In the present study, such an assumption was investigated using standard statistical multivariable techniques.[17] In the equation, G-EORinD-1 represents the OR of gene-environment in the control group and the G-EORinPop represents the OR of gene-environment in the population. If C/PROR is equal to one, the control group can be used to estimate the OR of interaction of genetic and environmental factors in the population. In the above equation, P (D/G-E-) represents the disease baseline risks, which shows the likelihood of disease occurrence in people, who do not have the gene and environment factors. The RRGE or ORGE represents the OR of the disease in those who have both, the gene and the environmental factors (ie, two-factor interactions). The RRG or ORG represents the OR of the disease in people who have only the gene factor and RRE or ORE represents the OR of the disease in people who have only the environmental factor.[17] The case-control and case-only designs were compared in terms variations in standard error, -2Log likelihood, and 95% confidence interval for of the gene-environment interactions.[5] Statistical analyses were performed using STATA 8.0 statistical software. The measured independent variables included continuous use of oral contraceptives for the past five years, breastfeeding history, number of pregnancies, age at menarche, age at the first delivery, and the history of breast cancer in the family. The family history of breast cancer in the first degree relatives including mother or sister was used as a proxy for the disease susceptibility.[11] Conditional Logistic Regression was used for data analysis of matched case-control design.

Results

The age of the case group was 51.8±0.5 years. The OR of interaction between family history of breast cancer in the first degree relatives and other variables such as the first delivery at the age of 35 years and more, or no delivery, the history of having live birth, breastfeeding history, and oral contraception use were not statistically significant. Assessing Independency Assumption and Case-Only Study The findings regarding the independency assumption of gene-environment are shown in table 1. There was independence between the gene-environment factors in the control group, therefore, in the second step we could use interaction analysis using case-only study for all variables.
Table 1:

The ratio of control to population odds ratio for the relation of family history to other variables of the participants

RR GE RR G RR E C/PROR
 Family history and age at menarche1.43121.43120.88120.9988
 Family history and age at the first delivery1.0341.43121.02121.0028
 Family history and live birth history0.77121.43120.93121.0039
 Family history and breastfeeding history0.22121.43120.64121.0056
 Family history and contraceptive pills usage1.011.43120.83231.0017

RRGE: Relative risk of joint effects of genetic and environmental factors; RRG: Relative risk of genetic factor; RRE: Relative risk of environmental factors; C/PROR: The ratio of odds ratio of control group to that of population. If C/PROR is equal to one, the control group can be used to estimate the odds ratio of interaction of genetic and environmental factors in the population.

The ratio of control to population odds ratio for the relation of family history to other variables of the participants RRGE: Relative risk of joint effects of genetic and environmental factors; RRG: Relative risk of genetic factor; RRE: Relative risk of environmental factors; C/PROR: The ratio of odds ratio of control group to that of population. If C/PROR is equal to one, the control group can be used to estimate the odds ratio of interaction of genetic and environmental factors in the population. The findings of assessment of GE-OR in the control group as a surrogate of GEOR in the population are shown in table 1. All control/population ratio of odds ratio were approximately close to 1. The Efficiency Comparison of Case-Control and Case-Only Designs Odds ratio for GE interaction and the comparison of efficiency of case-control and case-only design are shown in table 2. Although all ORs are not significant, the standard error in estimating the interaction of family history of breast cancer in the first degree relatives and other variables such as age at menarche, the first delivery at the age of 35 years and more, or no delivery, the history of having live birth, oral contraceptive use, or breastfeeding history in the case-only design were less than those found in the case-control design.
Table 2:

The efficiency comparison of case-control and case-only designs in estimating the genetic-environment interactions

Design Odds ratio ±SE Confidence interval -2likelihood ratio
Interaction of family history and age of more than 12 yrs at menarcheCase-control0.85±1.050.08-9.6384
Case-only0.76±0.430.24-2.34247.8
Interaction of family history and the first delivery at ages between 25-34 yrsCase-control0.37±0.340.06-2.25365
Case-only0.88±0.370.38-2.04233.4
Interaction of family history and the first delivery at the age of 35 yrs and more or no deliveryCase-control87160.61±.-0365
Case-only0.95±0.740.21-4.4233.4
Interaction of family history and the history of having live birthCase-control5.27e+32.-0388.2
Case-only0.95±0.740.21-4.39232.8
Interaction of family history and breastfeeding historyCase-control1.39±1.620.14-13.62420.2
Case-only0.8±0.320.4-1.7266.8
Interaction of family history and oral contraceptive use Case-control1.01±0.0090.996-1.00437.8
Case-only0.999±0.0020.993-1.00267
The efficiency comparison of case-control and case-only designs in estimating the genetic-environment interactions Confidence intervals in estimating the interaction of family history of breast cancer in the first degree relatives and other variables including in the case-only design were less wide than those in the case-control design. Moreover, the log likelihood for assessing the interaction of family history of breast cancer in the first degree relatives and other variables in the case-only design were smaller than those in the case control design. The P values obtained in statistical analysis of the interaction of family history of breast cancer in the first degree relatives and variables like age at menarche, the first delivery at the age of 35 years and more, or no delivery, the history of having live birth, or breastfeeding history in the case-only design were smaller than those in the case-control design (table 2).

Discussion

Based on a previous study,[24] the major risk of breast cancer among patients without the gene and the environmental factors, P (D/G-E-) was equal to 0.0066. This value is one of the assumptions of equation 1 for the calculation of gene-environment independence presented in table 1. Consistent with the findings of previous studies,[11],[12] our study showed that in interaction analysis of family history of breast cancer and age at menarche, confidence intervals, standard error, and -2log likelihood in the case-only design were better than those in the case-control design. Moreover, similar to finding of Ardalan and colleagues [12] the P value in the case-only design was lower than that in the case-control design. However, such findings does not agree with those of Becher et al.[11] who showed otherwise. Consistent with previous studies, the independence assumption of gene with age at menarche was established.[11],[12] The confidence intervals and -2log likelihood of the interaction of family history of breast cancer in the first degree relatives and the first delivery at the ages between 25-34 yrs old in case-only design were better than those in the case-control design. However, standard error in the case-control design was somewhat higher. Similar to the findings of Yavari et al.[4] the present study established the independence assumption of gene-age at the first delivery. They conducted the independence assumption based on independence test and suggested the standard statistical multivariable techniques to investigate the independence assumption of gene-environment. Therefore, similar to a study done in Japan,[14] we used this method to investigate the independence assumption of gene-environment. There was not significant interaction between the family history and the first delivery at the age of 35 yrs and more or no delivery. However, our study show that the sample size of 300 subjects to investigate interaction in case control studies is not enough, and that case-only designs might be a better design to examine such an interaction in such a sample size. The independence assumption of the two factors was not established in a study by Ardalan and colleagues,[12] and therefore was removed from the case-only analysis. In agreement with the findings of Yavari et al.[4] the present study showed that case-only design had more efficiency than that of the matched case-control designs. The interaction of family history of breast cancer and the history of having live birth was not statistically significant in the case-control design. Since the independence assumption was not established, the interaction of these two factors in case-only design was not estimated in the study by Yavari and colleagues,[4] or was not reported in Becher et al. studies.[11] The present study showed that the confidence interval, standard error and -2log likelihood estimations in case-only design were smaller than those in case-control analogues. These findings are similar to that of Ardalan and colleagues,[12] and is an indication of more efficacy of case-only designs than the case-control ones. The confidence interval, standard error, -2log likelihood and p-value estimations in case-only design were smaller than those in the case-control design. This shows that the case-only design had more efficacy than the case-control one. The confidence interval, standard error, and -2log likelihood estimations in the case-only design were smaller than those in the case-control design. Such findings are a sign that case-only designs are more efficacious than the case-control design. However, the P value in the case-only design was higher than that in the case-control design. The reason for not detecting the interaction between the family history of breast cancer and other variables might be the small sample size of the participants in case-control design study. In case-only studies, the basic assumption of gene-environment independence in non-diseased group should be established. In the studies before 2004, this assumption used to be established using classic statistical tests such as Chi Square, Correlation Coefficient and so on. However, such an approach was also criticized,[25] for the lack of ability of the use of GE-OR in control groups to estimate the GE-Or in populations. Gatto et al.[17]described a modification of the methods. In order to establish the independence assumption, they introduce the standard statistical multivariable techniques, which have resolved the previous shortcoming of the design, and have led to widespread use of the techniques. The studies of Yavari et al.[4] and Becher et al.[11] used classic methods to test the independence assumption. Therefore, this study used the new method to estimate the independence assumption. The present study did not identify genes involved in breast cancer; therefore, similar to previous studies,[4],[11],[12] the family history of breast cancer, as an alternative for genetic mutations, was used. Our approach is similar to other studies using the familial history as alternatives for genetic factors to study the interactions of genetic factors in pulmonary,[14] and colon,[13] cancers.

Conclusion

Matched case-control studies require a sample higher than 300 subjects, which was used in the present study, to examine a logical interaction. Considering the smaller standard error and –2log likelihood ratio of the case-only design than those of case-control design, we might be able to suggest that the case-only design is a better method to examine the interactions between the genetic and environmental variable involved in breast cancer.
  13 in total

Review 1.  Detection of interaction involving identified genes: available study designs.

Authors:  A M Goldstein; N Andrieu
Journal:  J Natl Cancer Inst Monogr       Date:  1999

2.  Sample size requirements for association studies of gene-gene interaction.

Authors:  W James Gauderman
Journal:  Am J Epidemiol       Date:  2002-03-01       Impact factor: 4.897

3.  Commentary: Case-control-family designs: a paradigm for future epidemiology research?

Authors:  John L Hopper
Journal:  Int J Epidemiol       Date:  2003-02       Impact factor: 7.196

4.  Reproductive factors and familial predisposition for breast cancer by age 50 years. A case-control-family study for assessing main effects and possible gene-environment interaction.

Authors:  Heiko Becher; Silke Schmidt; Jenny Chang-Claude
Journal:  Int J Epidemiol       Date:  2003-02       Impact factor: 7.196

Review 5.  Further development of the case-only design for assessing gene-environment interaction: evaluation of and adjustment for bias.

Authors:  Nicolle M Gatto; Ulka B Campbell; Andrew G Rundle; Habibul Ahsan
Journal:  Int J Epidemiol       Date:  2004-09-09       Impact factor: 7.196

Review 6.  Gene-environment interactions for complex traits: definitions, methodological requirements and challenges.

Authors:  Astrid Dempfle; André Scherag; Rebecca Hein; Lars Beckmann; Jenny Chang-Claude; Helmut Schäfer
Journal:  Eur J Hum Genet       Date:  2008-06-04       Impact factor: 4.246

7.  Tests for gene-environment interaction from case-control data: a novel study of type I error, power and designs.

Authors:  Bhramar Mukherjee; Jaeil Ahn; Stephen B Gruber; Gad Rennert; Victor Moreno; Nilanjan Chatterjee
Journal:  Genet Epidemiol       Date:  2008-11       Impact factor: 2.135

Review 8.  Applications of the case-control method in genetic epidemiology.

Authors:  M J Khoury; T H Beaty
Journal:  Epidemiol Rev       Date:  1994       Impact factor: 6.222

9.  Family history and environmental risk factors for colon cancer.

Authors:  Esteve Fernandez; Silvano Gallus; Carlo La Vecchia; Renato Talamini; Eva Negri; Silvia Franceschi
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2004-04       Impact factor: 4.254

10.  Consumption of commercial whole and non-fat milk increases the incidence of 7,12-dimethylbenz(a)anthracene-induced mammary tumors in rats.

Authors:  Li-Qiang Qin; Jia-Ying Xu; Hideo Tezuka; Jue Li; Jun Arita; Kazuhiko Hoshi; Akio Sato
Journal:  Cancer Detect Prev       Date:  2007
View more
  6 in total

Review 1.  A review of the influence of mammographic density on breast cancer clinical and pathological phenotype.

Authors:  Michael S Shawky; Cecilia W Huo; Kara Britt; Erik W Thompson; Michael A Henderson; Andrew Redfern
Journal:  Breast Cancer Res Treat       Date:  2019-06-08       Impact factor: 4.872

2.  Enriched power of disease-concordant twin-case-only design in detecting interactions in genome-wide association studies.

Authors:  Weilong Li; Jan Baumbach; Afsaneh Mohammadnejad; Charlotte Brasch-Andersen; Fabio Vandin; Jan O Korbel; Qihua Tan
Journal:  Eur J Hum Genet       Date:  2019-01-18       Impact factor: 4.246

3.  CYP17 polymorphism (rs743572) is associated with increased risk of gallbladder cancer in tobacco users.

Authors:  Rajani Rai; Kiran L Sharma; Sanjeev Misra; Ashok Kumar; Balraj Mittal
Journal:  Tumour Biol       Date:  2014-04-01

4.  Common risk variants for colorectal cancer: an evaluation of associations with age at cancer onset.

Authors:  Nan Song; Aesun Shin; Ji Won Park; Jeongseon Kim; Jae Hwan Oh
Journal:  Sci Rep       Date:  2017-01-13       Impact factor: 4.379

5.  Trends of colorectal cancer in a central area of Iran during 2009-2014. An application of joinpoint regression.

Authors:  Rahmatollah Moradzadeh; Shahla Mirgaloybayat
Journal:  Contemp Oncol (Pozn)       Date:  2022-06-30

6.  Trend of gastric cancer in a province in Western Iran: A population-based study during 2001-2014.

Authors:  Rahmatollah Moradzadeh; Haidar Nadrian; Athareh Najafi
Journal:  J Res Med Sci       Date:  2020-02-20       Impact factor: 1.852

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.