Literature DB >> 23094028

Genetic association analysis of complex diseases incorporating intermediate phenotype information.

Yafang Li¹, Jian Huang, Christopher I Amos.

Abstract

Genetic researchers often collect disease related quantitative traits in addition to disease status because they are interested in understanding the pathophysiology of disease processes. In genome-wide association (GWA) studies, these quantitative phenotypes may be relevant to disease development and serve as intermediate phenotypes or they could be behavioral or other risk factors that predict disease risk. Statistical tests combining both disease status and quantitative risk factors should be more powerful than case-control studies, as the former incorporates more information about the disease. In this paper, we proposed a modified inverse-variance weighted meta-analysis method to combine disease status and quantitative intermediate phenotype information. The simulation results showed that when an intermediate phenotype was available, the inverse-variance weighted method had more power than did a case-control study of complex diseases, especially in identifying susceptibility loci having minor effects. We further applied this modified meta-analysis to a study of imputed lung cancer genotypes with smoking data in 1154 cases and 1137 matched controls. The most significant SNPs came from the CHRNA3-CHRNA5-CHRNB4 region on chromosome 15q24-25.1, which has been replicated in many other studies. Our results confirm that this CHRNA region is associated with both lung cancer development and smoking behavior. We also detected three significant SNPs--rs1800469, rs1982072, and rs2241714--in the promoter region of the TGFB1 gene on chromosome 19 (p = 1.46×10(-5), 1.18×10(-5), and 6.57×10(-6), respectively). The SNP rs1800469 is reported to be associated with chronic obstructive pulmonary disease and lung cancer in cigarette smokers. The present study is the first GWA study to replicate this result. Signals in the 3q26 region were also identified in the meta-analysis. We demonstrate the intermediate phenotype can potentially enhance the power of complex disease association analysis and the modified meta-analysis method is robust to incorporate intermediate phenotype or other quantitative risk factor in the analysis.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2012 PMID： 23094028 PMCID： PMC3477105 DOI： 10.1371/journal.pone.0046612

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Genome-wide association (GWA) studies have identified hundreds of common genetic variants associated with complex diseases and provided valuable insight into their genetic architecture. However, most of these variants confer relatively low risk effects and explain only a small proportion of the heritability of most complex diseases. For most of these diseases, less than 10% of the genetic variance is explained by the identified common variants, leaving the bulk of heritability unexplained [1]. One important reason for the unexplained heritability is that most of the genetic variants that have been identified have small odds ratios (around 1.1 for the heterozygous genotypes and 1.5–1.6) for the homozygous genotypes; latent variants likely have even less of a disease effect [2]. Researchers estimated that hundreds of genetic variants are involved in the development of complex diseases but that together they would explain only about 20% of genetic variance [3]. Investigators have used imputation of genetic loci from the Hapmap and other referent populations to boost the power of case-control association studies for complex diseases [4]–[5]. However, traditional case-control studies still have limited power to detect genetic variants with low risk effect so new statistical analysis methods are needed in the study. Disease status is often the ultimate result of influences from multiple genotypes and environmental factors. Many “intermediate” phenotypes reflect the pathway leading to disease development. An intermediate phenotype may reflect more directly the effects from causal genes than disease status and be less genetically complex and more strongly associated with susceptibility loci. Analysis of intermediate phenotypes has the potential to capture the underlying heritable trait variation that may be missed in case-control studies, thus increasing the statistical power in genetic association studies [6]–[7]. Studying intermediate phenotypes would also provide insight into the complicated etiologic disease pathways. Behavioral and other quantitative measures of increased risk for disease may also help to improve the power of studies to detect associations of genetic factors with disease risk if these behavioral or other risk factors involve the same genetic factors in their etiology as the disease. Intermediate phenotypes, also known as endophenotypes, were first used in psychiatric disorders studies as they were easier to measure and less complicated than disease status [6]. Endophenotypes have been successfully applied in unraveling the complex etiology of mental disease. For example, neurological soft signs have been used as an endophenotype in analysis of Schizophrenia. In 2012, Greenwood et al. [8] found 94 candidate genes associated with Schizophrenia-relate endophenotypes. Simons et al. [9] identified VMAT2 as a candidate gene for psychotic disorder and neurocognition using measurement of cognitive functioning as the intermediate phenotype. Researchers also have adapted endophenotype for use in other complex disease studies. For example, high mammographic density is one of the strongest known risk factors for breast cancer and is an intermediate phenotype that can help elucidate the genetic factors that contribute to development of breast cancer [10]. In 2011, researchers identified the gene ZNF365 on chromosome 10 as being associated with both breast cancer and mammographic density [11]. That same year, researchers used neuropathology and cognitive function proximate to death as the intermediate phenotypes for Alzheimer disease and identified two genes—ZNF224 and PCK1—involved in the development of Alzheimer disease [12]. In these two studies, the researchers performed linear regression analysis of the quantitative intermediate phenotype with the marker genotype as the covariates. Their findings suggested successful use of intermediate phenotypes in genetic association analysis of complex diseases. Meta-analysis is a powerful method in GWA studies, as it can combine information from independent populations, thus increasing the sample size and overcoming the lack of power in most common disease studies [13]–[16]. The combined information from multiple populations is either disease status or quantitative trait, not both of them. The most widely used meta-analysis techniques are Fisher's combined probability test [17] and inverse-variance weighting [18]. When intermediate phenotypes and disease status are both available in a study, a meta-analysis method combining disease status and intermediate phenotypes should be more powerful than either a case-control study or linear regression analysis of quantitative traits alone, as meta-analysis incorporates more information from the patients. In the present study, we demonstrated that meta-analysis can be used to examine a combination of the disease status and intermediate phenotype information from a single population in a complex disease study and a modified inverse-variance weighted method was proposed for the analysis. Simulation was conducted to evaluate the performance of Fisher's combined probability test, the modified inverse-variance weighted method, and the traditional case-control method. The results showed that inverse-variance weighting was the best of the three methods. We then applied the meta-analysis to a study of imputed lung cancer genotypes with smoking data. The results validated previous findings regarding the CHRNA3-CHRNA5-CHRNB4 region on chromosome 15q24–25.1 [19]–[23] and the promoter region of the TGFB1 gene on chromosome 19 [24]–[25], which suggested the modified inverse-variance weighting was a reliable method to do the meta-analysis within a study. A new region—3q26.1—was also identified; no genes are located in this region, and deletion of the region has been reported to be associated with some cancers [26]–[27].

Results

Simulation study of the novel method for combining results from disease and intermediate phenotype association studies

Table 1 lists the parameters for the medium- and low-risk susceptibility loci in simulations. The results for the medium- and low-risk variants are shown in Figures 1 and 2. The x-axis in each graph denotes the correlation coefficient between the SNP marker and disease locus, which increased from 0 to 0.8. The y-axis in each graph denotes the power of each test. When the SNP marker was directly associated with the disease status but the disease-related quantitative trait was not associated with the SNP marker of interest, we obtained no useful information about the quantitative trait pertaining to the SNP marker studied (lines 1, 3, and 5 in Figures 1–2). Logistic regression analysis was the most powerful method to detect the association between the SNP marker and disease status followed by Fisher's combined probability test. The power of modified inverse-variance weighted method was only about half of that of logistic regression. When the quantitative trait was an intermediate phenotype between the SNP marker and disease status, linear regression analysis of the quantitative trait provided valuable information for the association analysis. The power of the tests increased as the correlation coefficient between the SNP marker and disease locus increased (x-axis). Also, as the heritability of the quantitative trait explained by the SNP increased from 0.002 to 0.010 (columns 1–5 in Figures 1–2), the power of the linear regression analysis increased, as did the power of the meta-analysis methods, because they rely on the information from linear regression analysis. The modified inverse-variance weighted method was more powerful than Fisher's combined probability test in the meta-analysis (lines 2, 4, and 6 in Figures 1–2). Using the recessive model, logistic regression analysis had little power, and the linear regression analysis had the predominant effect in the meta-analysis. The performance of Fisher's combined probability test and the modified inverse-variance weighted method were almost equal to that of the linear regression analysis.

Table 1

Parameters for Medium- and Low-Risk Models in simulation.

	Genetic Model	β₁	γ₀	γ₁	OD_hetero	OD_homo
Medium	Add₁ a	0	−3.36	log1.20	1.20	1.44
	Add₂ b	1	−3.1	log1.15	1.20	1.43
	Dom₁	0	−3.36	log1.14	1.30	1.30
	Dom₂	1	−3.16	log1.07	1.31	1.31
	Rec₁	0	−3.13	log1.14	1.00	1.30
	Rec₂	1	−3.04	log1.07	1.00	1.31
Low	Add₁	0	−3.16	log1.10	1.10	1.21
	Add₂	1	−3.05	log1.05	1.10	1.22
	Dom₁	0	−3.18	log1.08	1.17	1.17
	Dom₂	1	−3.07	log1.04	1.17	1.17
	Rec₁	0	−3.05	log1.08	1.00	1.17
	Rec₂	1	−3.00	log1.04	1.00	1.17

ODhetero: odds ratio for heterozygous genotypes; ODhomo: odds ratio for homozygous genotypes; Add: additive; Dom: dominant; Rec: recessive.

Disease model 1 in Figure 1–2.

Disease model 3 in Figure 1–2.

Figure 1

Power Plots for the Medium-Risk Model.

Figure 2

Power Plots for the Low-Risk Model.

ODhetero: odds ratio for heterozygous genotypes; ODhomo: odds ratio for homozygous genotypes; Add: additive; Dom: dominant; Rec: recessive. Disease model 1 in Figure 1–2. Disease model 3 in Figure 1–2. The type I error rate in this simulation was set at 0.01. To obtain an accurate estimation of the type I error rate, we carried out 10,000 simulations for each set of conditions under the null hypothesis of no association between the SNP marker and disease locus. We did not observe an inflated type I error rate in this simulation for any of the methods (Table S1 and S2).

Application of the modified inverse-variance weighted meta-analysis method to imputed lung cancer genotypes with smoking data

The −log10(p)s for logistic regression analysis of disease status, linear regression analysis of cigarettes per day (CPD) with adjustment for disease status, Fisher's combined probability test, and our modified inverse-variance weighted method are plotted in Figure 3. The SNPs with −log10(p)s greater than 5 are highlighted as the SNPs potentially associated with lung cancer. Although the SNPs do not meet the commonly accepted criterion of −log10(p)>8 because of our limited sample size, they are still very promising signals that can be further validated. The inflation factors λ in the tests were ranged from 1.01–1.02, indicating no spurious association caused by population stratification in the analyses. Consecutive significant SNPs in a chromosomal region are listed in Table S3, and we identified three significant regions in our meta-analysis. The most significant region was AGPHD1-CHRNA3-CHRNA5-CHRNB4 on chromosome 15q24–25.1 (Figure 4). When we used a traditional case-control method, no SNP in this region had a −log10(p) greater than 5 because of the limited sample size. In the meta-analysis, this region became very significant with the strongest signal at rs12914385 with a p-value 1.98×10−9. This result confirmed that the CHRNA3-A5 region on 15q24–25.1 is associated with both lung cancer development and smoking behavior, which several other independent studies have already proven [19]–[23], and that CPD is an intermediate phenotype for lung cancer.

Figure 3

Manhattan Plot of GWA Studies of Lung Cancer and CPD Data.

Figure 4

−Log10(P) Plot of Significant SNPs on Chromosomes 3, 15, and 19 in Meta-analysis of imputed lung cancer genotypes with smoking data.

Manhattan Plot of GWA Studies of Lung Cancer and CPD Data.

1–4: case-control method (λ = 1.018), linear regression analysis with adjustment for disease status (λ = 1.010), Fisher's combined probability test (λ = 1.013), and the modified inverse-variance weighted method (λ = 1.011). −log10(p)>4.5 was used as the cutoff in plot 1 to match with the previous GWA study published in 2008 (Nat Genet, 40.5: 616–622). −log10(p)>5 was used as the cutoff in plot 2–4 to reduce false discovery rate. Another significant region is the B9D2 gene, which encodes a protein that lies partially within the TGFB1 promoter on chromosome 19 [28]. SNP rs1800469, rs1982072, and rs2241714 had a p value less than 0.001 in our case-control study and less than 0.01 in quantitative trait analysis. In the modified inverse-variance weighted meta-analysis, these three SNPs had a p-value of 1.46×10−5, 1.18×10−5, and 6.57×10−6, respectively. Previous authors reported that the SNPs rs1800469 and rs2241712 in the promoter of the TGFB1 gene are associated with chronic obstructive pulmonary disease and lung cancer in cigarette smokers [24]–[25]. Our results supported the signal at rs1800469; rs2241712 was not present in our genotype data, but rs2241714 (about 350 bp away from rs2241712) was also significant (Table S3). The evidence from a GWA study supports that the TGFB1 gene is associated with tobacco-induced lung cancer. The significant SNPs in the TGFB1 promoter region may be related to abnormal TGFB1 gene transcription levels in lung cancer patients. We also identified a large region on 3p26 (4.139–4.258 Mb) associated with both lung cancer development and smoking behavior, a total of 74 SNPs with a p-values around 1.0×10−5 were detected, and only two of them were from the genotyped SNPs, they were rs1444056 (4214953 bp) and rs1403124 (4188033 bp). No genes with known functions reside in this region although deletion of the region has been reported to be associated with some cancers [26]–[27]. For the significant SNPs identified in the three regions, the modified inverse-variance weighted method always produced a stronger signal than did Fisher's combined probability test. To further compare the performance of Fisher's combined probability test and the modified inversed-variance weighted method in association analysis, we plotted the −log10(p) in the case-control method versus Fisher's combined probability test, and the case-control method versus the inverse-variance weighted method (Figure 5). The plot on the left in the figure shows that Fisher's combined probability test tended to produce more significant signals for non-significant SNPs (−log10(p)<4) in the case-control study, which may have introduced a higher false-discovery rate than the inverse-variance weighted method in real data analysis. The reason for this finding is that Fisher's combined probability test is based on the p values from the logistic and linear regression tests, and so cannot tell the directions of the association effect from these two regression tests. Fisher's combined probability test can produce a significant result even when the effects in logistic and linear regression analyses are in opposite directions. This should not be true when the quantitative trait is an intermediate phenotype for the disease being studied. The modified inverse-variance weighted method does not have this problem because it is based on linear combination of the two effect sizes. Therefore, it is a better method than Fisher's test for a single- population meta-analysis.

Figure 5

P-Value Comparisons between the tests.

X-axis, −log10(p) from logistic regression analysis. Y-axis, −log10(p) from Fisher's combined probability test (left); −log10(p) from the modified inverse-variance weighted method (right).

P-Value Comparisons between the tests.

X-axis, −log10(p) from logistic regression analysis. Y-axis, −log10(p) from Fisher's combined probability test (left); −log10(p) from the modified inverse-variance weighted method (right).

Discussion

Researchers have widely used meta-analysis in genetic association studies to combine information from different populations and increase sample sizes. However, it is rarely used to combine different types of data in a single population. Genetic researchers will often collect phenotypic information in addition to disease status to better understand the pathophysiology of disease development and to maximize study findings; in many of these instances, the information is on intermediate phenotypes. A meta-analysis method incorporating both the disease status and intermediate phenotype should be more powerful than a traditional case-control study method. In the present study, we examined a modified inverse-variance weighted meta-analytical method. Simulation studies showed that this method is more powerful than the traditional case-control method in association analysis of complex diseases, especially for identification of disease loci having very minor effects. Also, compared with Fisher's combined probability test, inverse-variance weighted meta-analysis is more robust as it has a bigger power and a lower type I error rate. We set the MAFs of the SNP marker and disease locus as equal in our simulation studies and we observed that the results of the tests were similar when the MAFs were set differently (results not shown). In addition, the intermediate phenotypes in both patients and controls were available in this study. This phenotype is sometimes only available in patients because either the quantitative trait is expressed in them only or the cost of measuring the quantitative trait in controls is too high. Our simulation study showed that the meta-analysis was still better than the case-control study method when the quantitative trait was only available for patients (results not shown). We further applied meta-analysis to empirical data analysis. Smoking behavior, which can be quantified as smoking duration or smoking quantity, is the most important risk factor for lung cancer development. In 2008, several replicated studies showed that there was a strong association between the nicotinic acetylcholine receptor subunit cluster of genes (CHRNA) on chromosome 15q25.1 and lung cancer. But there was no conclusion on whether the association was direct or mediated via smoking behavior. Hung's group [23] observed an increased risk even in non-smokers, which implied at least some of the risk was not mediated via smoking. Thorgeirsson et al. [29] suggested that the association with lung cancer was mainly mediated through smoking behavior. In 2010, researchers using genome-wide approaches provided conclusive evidence for a strong association between CHRNA genes and smoking behavior [22]. There is reason to believe that CHRNA genes are associated with both smoking and lung cancer. Smoking behavior is an attribute associating with increased lung cancer risk. The method that we derive can be applied equally well to either intermediate phenotypes or to behavioral attributes that associate with increased risk for a disease. To address this comment, we revised our paper by inserting discussion about modeling either intermediate phenotypes or other quantitative risk factors into the model. The GWA study incorporating the quantitative trait of CPD with the imputed genotype data detected significant SNPs on chromosomes 3, 15, and 19. The signal in the CHRNA3-CHRNA5-CHRNB4 region was much stronger in the meta-analysis than in the case-control study. The highest p value was 1.98×10−9, which was a very strong signal in our small sample size (1154 cases and 1146 controls). Many independent studies have replicated the finding of association of CHRNA3-CHRNA5-CHRNB4 on 15q24 with lung cancer and smoking behavior. Our results further confirmed this finding. Also, it suggested that CPD is abehaviorally mediated risk factor for lung cancer or an intermediate phenotype that is involved in lung cancer risk. Whether or not the genetic effects of the nicotinic receptor variants on chromosome 15q25.1 directly contribute to lung cancer risk or only contribute through their effects on smoking behavior is a topic of ongoing debate and further study. Mediation analyses [30]–[31] have shown both direct and indirect (through smoking behavior) effects of the SNPs in this region on lung cancer risk. In contrast, a study of the SNP effects on cigarette per day use versus cotinine levels among smokers shows a much stronger effect on cotinine levels [32]–[33]. This finding suggests that reported cigarettes per day is inadequately capturing the actual exposure individuals experience to nicotine, but this observation still does not indicate yet the exact pattern of relationship of the genetic effects on smoking versus lung cancer risk [34]. The SNPs rs1800469 and rs2241712 in the promoter of the TGFB1 gene on chromosome 19 were associated with chronic obstructive pulmonary disease and lung cancer in smokers in previous studies. These polymorphisms can only be detected in our study using meta-analysis. Thus, meta-analysis combining an intermediate phenotype and the disease status is a powerful tool for detecting genetic variants in complex disease association studies, especially when the effects of the susceptibility loci are minor. The significant SNPs detected in these verified regions demonstrate that our modified inverse-variance weighted meta-analysis is a reliable method for genetic association studies when an intermediate phenotype is available. In the lung cancer study, the intermediate or behaviorally related phenotype, smoking quantity, has a positive relationship with disease status. This positive correlation may not always be true. For example, there is a negative relationship between brain size and Alzheimer's disease. In this case, the quantitative trait can be specified as the measurement of the overall brain shrinkage from the patient's normal brain size, which has a positive relationship with the disease. Researchers may use prior studies to assess correlations between the intermediate phenotype and the disease of interest to help determine how this information should be combined in the joint analysis. In this study, the modified inverse-variance based test was applied when only one intermediate phenotype is available. Statistically, it can also be applied when multiple intermediate phenotypes are available in the data as this method is based on the combination of estimators from several regression tests with the modified inverse variance as the weights. However, consideration is needed on the complicated disease model when multiple intermediate phenotypes are existent. The disease model could include multiple disease pathways with each one having an intermediate phenotype in it, or one pathway with more than one intermediate phenotypes in it, or even a mixture of them. Further investigation is needed for the application of this method in a more complicated situation. Whereas an intermediate phenotype is very useful in GWA studies, it also has potential to help researchers understand the intricate interactions among the disease associated genes and elucidate the complicated mechanism underlying the human diseases. The rapid development of microarray technology has made genome-wide gene expression profiles available to researchers. The gene expression levels are closely linked with both the genetic variants and disease status, providing a large number of intermediate phenotypes for complex diseases. Meta-analysis combining the disease status and gene expression data will be very powerful in identifying the functional genetic variants associated with complex diseases. This modified inverse-variance weighted meta-analytic approach is a promising tool in deciphering complex disease codes.

Materials and Methods

Simulation study

Given a disease locus A having two alleles A1 and A2 with allele frequencies q1 and q2 (q1+q2 = 1), an SNP marker M has two alleles M1 and M2 with allele frequencies m1 and m2 (m1+m2 = 1). The SNP marker and disease locus are closely linked so that they are in linkage disequilibrium, which can be quantified using the correlation coefficient (r). The pathway to a complex disease has an intermediate phenotype that can be measured as a quantitative trait (Y). If X denotes the genotype at the SNP marker, then the relationship between Y and X can be expressed using the linear equation Y = β0+β1X+ε, in which ε represents the error term following N(0,σ2 E). The genotypes at X were coded as 0, 1, and 2 for an additive effect; 0, 1, and 1 for a dominant effect; 0, 0, and 1 for a recessive effect. The relationship between the disease status and SNP marker and quantitative trait can be expressed using the equation P(D|X,Y) = exp(γ0+γ1X+γ2Y)/(1+exp(γ0+γ1X+γ2Y)).The minor allele frequencies (MAFs) of the SNP marker (m1) and disease allele are both set at 0.3 for a common allele frequency. The values for β and γ parameters in the logistic equation are chosen to fix the disease incidence rate at 0.05 (Table 1). The value of σ2 E represents the residual effect in the regression analysis, which includes the effect of environmental factors and impact of other genetic loci. The heterozygous and homozygous odds ratios at the SNP marker range from 1.2 to 1.4 for a medium-risk model and 1.1 to 1.2 for a low-risk model. The genetic variance of the quantitative trait explained by the SNP marker can be expressed as σ2 A = 2m1m2α2, σ2 D = (2m1m2ak)2, in which α = a[1+k(m1−m2)] [35]. In this equation, a and k represent additive and dominant effects of the SNP marker respectively. In this simulation, the heritability of the quantitative trait explained by the SNP marker ranges from 0.002 to 0.010, with a step size of 0.002; these numbers are based on the estimation of common variants effect-size distribution from recent GWA studies in complex diseases [3]. The type I error rate in the simulation study was set at 0.01. Additive, dominant, and recessive effects were simulated at the SNP marker, although the additive model is the common model assumed in quantitative trait association analyses. For the medium-risk variants, we simulated 2000 cases and 2000 controls for the additive and dominant models and 6000 cases and 6000 controls for the recessive model. For the low-risk variants, we simulated 4000 cases and 4000 controls for the additive and dominant models and 8000 cases and 8000 controls for the recessive model. The analysis program is coded in R which is available upon request, and we have posted the code to SourceForge (http://sourceforge.net/p/modifiedinverse/wiki/Home/).

Disease models in simulations

Figure 6 lists the possible relationships among the disease susceptibility locus, disease status, and quantitative trait given this trait is an intermediate phenotype. In model 1, the quantitative trait and SNP marker are independently associated with the disease status, and the disease locus is not related to the quantitative trait. Models 2 and 3 are different from each other etiologically. Model 2 is a single pathway, whereas model 3 is a dual pathway between the disease susceptibility locus and disease status. Mediation analysis can be used to differentiate these two models [36], but it is impossible to separate models 2 and 3 when the disease status and quantitative trait are evaluated separately in the analysis. The results from disease models 1 and 3 were reported, with the correlation coefficient (r) between the SNP and disease susceptibility locus increasing from 0 to 0.8 [Figure 1–2]. When r equals 0, the SNP is not associated with the disease locus, which is the null hypothesis in the simulation.

Figure 6

Three possible disease models for one disease locus with an intermediate phenotype.

G, disease susceptibility locus; D, disease status; QT, quantitative trait (an intermediate phenotype).

Three possible disease models for one disease locus with an intermediate phenotype.

G, disease susceptibility locus; D, disease status; QT, quantitative trait (an intermediate phenotype).

Analysis of simulated data

Two statistical tests were conducted for the analysis in step 1. Test 1 is logistic regression analysis of the disease status with the SNP marker genotype as the covariate, and test 2 is linear regression analysis of the quantitative trait with adjustment for the disease status. For the three proposed disease models under the assumption that the quantitative trait is an intermediate phenotype, no association between the SNP and disease status, either directly or indirectly, means there is no association between the SNP and quantitative trait and no association between the SNP and disease status. So test 1 and test 2 are independent under the null hypothesis. Step 2 is to combine the results from test 1 and 2 using meta-analysis methods. If p1 and p2 are the p values for test 1 and 2, Fisher's combined probability test can be used to combine the values to provide an overall p value using the formula χ2 = −2[ln(p1)+ln(p2)], which follows a chi-square distribution with four degrees of freedom [17]. Two Z-score statistics from tests 1 and 2 were obtained from score tests, which can be combined using inverse-variance weighting. Suppose the two Z-scores from tests 1 and 2 are Z1 and Z2*, both follow an approximately normal distribution as follows:Under the null hypothesis, Z1 and Z2* are independent (μ1 = μ2* = 0). The estimator of effect size of the SNP from linear regression analysis of the quantitative trait can be arbitrary depending on the subjective selection of the measurement unit and normalization procedure of the trait, which will affect the combination result from the inverse-variance weighted method. However, if the quantitative trait is an intermediate phenotype involved in the development of disease, the estimators of the coefficients for SNP marker from logistic and linear regression analysis and their standard errors should be close to a consistent unit, as these are two different tests for the same associations, i.e., the association between the SNP marker and disease status. Therefore, Z2* can be scaled so that it has the same unit as Z1. In the present study, the standard error of Z2* is scaled so that it is the same as that of Z1, which is the same as multiplying each quantitative trait by a constant c, where c = sqrt(σ2 1/σ2 2*). This produces the new Z-score statistic Z2 = Z1c∼N(μ2,σ2 1). Let L = bZ1+(1−b)Z2, 0

Lung cancer and smoking data with imputed marker data

The study examined 1154 ever-smokers with lung cancer and 1137 control ever-smokers. The patients and controls were frequency-matched by age and sex, and they were all of European origin. Their genotype data came from Illumina HumanHap300 v1.1 BeadChips, and the GWA study results were published in 2008. The genotypes were further imputed using the MACH (version 1.0.15) [37] with the HapMap 2 database (release 21), which contained 2,557,253 tagging SNPs. The statistical tests were conducted on imputed genotypes. Smoking cigarettes per day (CPD) was used as a quantitative trait in the analysis. We used the smoking data of CPD as the intermediate phenotype in our analysis. The box plot and histogram in Figure S1 show the distribution of the CPD data, and the Q-Q plot in Figure S1 shows the normality of the CPD data. We used a square root transformation to normalize the CPD data. SNPs with a −log10(p) greater than 5 were regarded as promising significant SNPs with adjustment for multiple comparisons in the association analysis. The normally accepted −log10(p)>8 was not used because of our limited sample size. Top, descriptive plots of CPD before aquare root transformation Bottom, descriptive plots of CPD after square root transformation. (DOC) Click here for additional data file. Type I Error Rates for Medium-Risk Variants in simulation. Add: additive; Dom: dominant; Rec: recessive. aDisease locus and quantitative trait are independently associated with the disease. bQuantitative trait is intermediate between the disease locus and disease status. cTest 1, logistic regression; test 2, linear regression; test 3, Fisher's combined probability test; test 4, modified inverse-variance weighted method. (DOC) Click here for additional data file. Type I Error Rates for Low-Risk Variants in simulation. Add: additive; Dom: dominant; Rec: recessive. aDisease locus and quantitative trait are independently associated with the disease. bQuantitative trait is intermediate between the disease locus and disease status. cTest 1, logistic regression; test 2, linear regression; test 3, Fisher's combined probability test; test 4, modified inverse-variance weighted method. (DOC) Click here for additional data file. Significant SNPs on Chromosomes 3, 15, and 19 in the Association Analysis. CHR: chromosome; NA: not available. Test 1, logistic regression; test 2, linear regression analysis of CPD with adjustment for disease status; test 3, Fisher's combined probability test; test 4, modified inversx10-variance weighted method. (DOC) Click here for additional data file. Inverse variance weighted combination of Z1 and Z2 has global minimum variance value. (DOC) Click here for additional data file.

34 in total

1. Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction.

Authors: Tyler J VanderWeele; Kofi Asomaning; Eric J Tchetgen Tchetgen; Younghun Han; Margaret R Spitz; Sanjay Shete; Xifeng Wu; Valerie Gaborieau; Ying Wang; John McLaughlin; Rayjean J Hung; Paul Brennan; Christopher I Amos; David C Christiani; Xihong Lin
Journal: Am J Epidemiol Date: 2012-02-03 Impact factor: 4.897

2. Cotinine conundrum--a step forward but questions remain.

Authors: Margaret R Spitz; Christopher I Amos; Laura J Bierut; Neil E Caporaso
Journal: J Natl Cancer Inst Date: 2012-04-25 Impact factor: 13.506

3. Intermediate phenotype analysis of patients, unaffected siblings, and healthy controls identifies VMAT2 as a candidate gene for psychotic disorder and neurocognition.

Authors: Claudia J P Simons; Ruud van Winkel
Journal: Schizophr Bull Date: 2012-04-24 Impact factor: 9.306

Review 4. Mammographic breast density as an intermediate phenotype for breast cancer.

Authors: Norman F Boyd; Johanna M Rommens; Kelly Vogt; Vivian Lee; John L Hopper; Martin J Yaffe; Andrew D Paterson
Journal: Lancet Oncol Date: 2005-10 Impact factor: 41.316

5. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci.

Authors: Andre Franke; Dermot P B McGovern; Jeffrey C Barrett; Kai Wang; Graham L Radford-Smith; Tariq Ahmad; Charlie W Lees; Tobias Balschun; James Lee; Rebecca Roberts; Carl A Anderson; Joshua C Bis; Suzanne Bumpstead; David Ellinghaus; Eleonora M Festen; Michel Georges; Todd Green; Talin Haritunians; Luke Jostins; Anna Latiano; Christopher G Mathew; Grant W Montgomery; Natalie J Prescott; Soumya Raychaudhuri; Jerome I Rotter; Philip Schumm; Yashoda Sharma; Lisa A Simms; Kent D Taylor; David Whiteman; Cisca Wijmenga; Robert N Baldassano; Murray Barclay; Theodore M Bayless; Stephan Brand; Carsten Büning; Albert Cohen; Jean-Frederick Colombel; Mario Cottone; Laura Stronati; Ted Denson; Martine De Vos; Renata D'Inca; Marla Dubinsky; Cathryn Edwards; Tim Florin; Denis Franchimont; Richard Gearry; Jürgen Glas; Andre Van Gossum; Stephen L Guthery; Jonas Halfvarson; Hein W Verspaget; Jean-Pierre Hugot; Amir Karban; Debby Laukens; Ian Lawrance; Marc Lemann; Arie Levine; Cecile Libioulle; Edouard Louis; Craig Mowat; William Newman; Julián Panés; Anne Phillips; Deborah D Proctor; Miguel Regueiro; Richard Russell; Paul Rutgeerts; Jeremy Sanderson; Miquel Sans; Frank Seibold; A Hillary Steinhart; Pieter C F Stokkers; Leif Torkvist; Gerd Kullak-Ublick; David Wilson; Thomas Walters; Stephan R Targan; Steven R Brant; John D Rioux; Mauro D'Amato; Rinse K Weersma; Subra Kugathasan; Anne M Griffiths; John C Mansfield; Severine Vermeire; Richard H Duerr; Mark S Silverberg; Jack Satsangi; Stefan Schreiber; Judy H Cho; Vito Annese; Hakon Hakonarson; Mark J Daly; Miles Parkes
Journal: Nat Genet Date: 2010-12 Impact factor: 38.330

6. Genetic polymorphisms in 15q25 and 19q13 loci, cotinine levels, and risk of lung cancer in EPIC.

Authors: Maria N Timofeeva; James D McKay; George Davey Smith; Mattias Johansson; Graham B Byrnes; Amélie Chabrier; Caroline Relton; Per Magne Ueland; Stein Emil Vollset; Øivind Midttun; Ottar Nygård; Nadia Slimani; Isabelle Romieu; Françoise Clavel-Chapelon; Marie-Christine Boutron-Ruault; Guy Fagherazzi; Rudolf Kaaks; Birgit Teucher; Heiner Boeing; Cornelia Weikert; H Bas Bueno-de-Mesquita; Carla van Gils; Petra H M Peeters; Antonio Agudo; Aurelio Barricarte; Jose-Maria Huerta; Laudina Rodríguez; Maria-José Sánchez; Nerea Larrañaga; Kay-Tee Khaw; Nick Wareham; Naomi E Allen; Ruth C Travis; Valentina Gallo; Teresa Norat; Vittorio Krogh; Giovanna Masala; Salvatore Panico; Carlotta Sacerdote; Rosario Tumino; Antonia Trichopoulou; Pagona Lagiou; Dimitrios Trichopoulos; Torgny Rasmuson; Göran Hallmans; Elio Riboli; Paolo Vineis; Paul Brennan
Journal: Cancer Epidemiol Biomarkers Prev Date: 2011-08-23 Impact factor: 4.254

7. A systematic review and meta-analysis of neurological soft signs in relatives of people with schizophrenia.

Authors: Kishen Neelam; Deepak Garg; Max Marshall
Journal: BMC Psychiatry Date: 2011-08-22 Impact factor: 3.630

8. Common variants in ZNF365 are associated with both mammographic density and breast cancer risk.

Authors: Sara Lindström; Celine M Vachon; Jingmei Li; Jajini Varghese; Deborah Thompson; Ruth Warren; Judith Brown; Jean Leyland; Tina Audley; Nicholas J Wareham; Ruth J F Loos; Andrew D Paterson; Johanna Rommens; Darryl Waggott; Lisa J Martin; Christopher G Scott; V Shane Pankratz; Susan E Hankinson; Aditi Hazra; David J Hunter; John L Hopper; Melissa C Southey; Stephen J Chanock; Isabel dos Santos Silva; JianJun Liu; Louise Eriksson; Fergus J Couch; Jennifer Stone; Carmel Apicella; Kamila Czene; Peter Kraft; Per Hall; Douglas F Easton; Norman F Boyd; Rulla M Tamimi
Journal: Nat Genet Date: 2011-01-30 Impact factor: 38.330

9. Association analysis of 94 candidate genes and schizophrenia-related endophenotypes.

Authors: Tiffany A Greenwood; Gregory A Light; Neal R Swerdlow; Allen D Radant; David L Braff
Journal: PLoS One Date: 2012-01-13 Impact factor: 3.240

10. Association between genetic variants on chromosome 15q25 locus and objective measures of tobacco exposure.

Authors: Marcus R Munafò; Maria N Timofeeva; Richard W Morris; David Prieto-Merino; Naveed Sattar; Paul Brennan; Elaine C Johnstone; Caroline Relton; Paul C D Johnson; Donna Walther; Peter H Whincup; Juan P Casas; George R Uhl; Paolo Vineis; Sandosh Padmanabhan; Barbara J Jefferis; Antoinette Amuzu; Elio Riboli; Mark N Upton; Paul Aveyard; Shah Ebrahim; Aroon D Hingorani; Graham Watt; Tom M Palmer; Nicholas J Timpson; George Davey Smith
Journal: J Natl Cancer Inst Date: 2012-04-25 Impact factor: 13.506

6 in total

1. Genetic and nongenetic variation revealed for the principal components of human gene expression.

Authors: Anita Goldinger; Anjali K Henders; Allan F McRae; Nicholas G Martin; Greg Gibson; Grant W Montgomery; Peter M Visscher; Joseph E Powell
Journal: Genetics Date: 2013-09-11 Impact factor: 4.562

2. Optical coherence tomography combined with videokeratography to differentiate mild keratoconus subtypes.

Authors: Yaron S Rabinowitz; Xiaohui Li; Ana Laura Caiado Canedo; Renato Ambrósio; Yelena Bykhovskaya
Journal: J Refract Surg Date: 2014-02 Impact factor: 3.573

Review 3. Missing heritability of complex diseases: Enlightenment by genetic variants from intermediate phenotypes.

Authors: Adrián Blanco-Gómez; Sonia Castillo-Lluva; María Del Mar Sáez-Freire; Lourdes Hontecillas-Prieto; Jian Hua Mao; Andrés Castellanos-Martín; Jesus Pérez-Losada
Journal: Bioessays Date: 2016-05-31 Impact factor: 4.345

Review 4. Heterogeneity of chronic obstructive pulmonary disease: from phenotype to genotype.

Authors: Xu Chen; Xiaomao Xu; Fei Xiao
Journal: Front Med Date: 2013-12 Impact factor: 4.592

Review 5. A Decade of GWAS Results in Lung Cancer.

Authors: Yohan Bossé; Christopher I Amos
Journal: Cancer Epidemiol Biomarkers Prev Date: 2017-06-14 Impact factor: 4.254

6. Bayesian and frequentist analysis of an Austrian genome-wide association study of colorectal cancer and advanced adenomas.

Authors: Philipp Hofer; Michael Hagmann; Stefanie Brezina; Erich Dolejsi; Karl Mach; Gernot Leeb; Andreas Baierl; Stephan Buch; Hedwig Sutterlüty-Fall; Judith Karner-Hanusch; Michael M Bergmann; Thomas Bachleitner-Hofmann; Anton Stift; Armin Gerger; Katharina Rötzer; Josef Karner; Stefan Stättner; Melanie Waldenberger; Thomas Meitinger; Konstantin Strauch; Jakob Linseisen; Christian Gieger; Florian Frommlet; Andrea Gsur
Journal: Oncotarget Date: 2017-10-09

6 in total