| Literature DB >> 25629170 |
Thomas J Hoffmann1, Lori C Sakoda2, Ling Shen2, Eric Jorgenson2, Laurel A Habel2, Jinghua Liu3, Mark N Kvale3, Maryam M Asgari2, Yambazi Banda3, Douglas Corley2, Lawrence H Kushi2, Charles P Quesenberry2, Catherine Schaefer2, Stephen K Van Den Eeden4, Neil Risch5, John S Witte6.
Abstract
An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project). We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77). We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4x10-12). The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8x10-4) and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting pleiotropic effects. [corrected].Entities:
Mesh:
Substances:
Year: 2015 PMID: 25629170 PMCID: PMC4309593 DOI: 10.1371/journal.pgen.1004930
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Confirmation of HOXB13 G84E mutation status from classification and regression tree.
The top of the figure shows three CART trees produced for the computationally phased haplotypes of the enriched reference panel of 93 individuals (22 carriers) plus 1000 Genomes data (2 carriers). Listed in the trees are the splits that classify the G84E mutation. The leaves in the tree contain the best guess classification of G84E on the top, and the number of reference alleles on the left and the number of G84E mutations on the right. The first tree, in black, is formed from selecting amongst all 57 SNPs +/− 3 crossovers. The second tree, in green, is formed from selecting from the same set of SNPs except excluding the 3 found in the first tree. The third tree, in blue, is formed from selecting amongst the same set of SNPs except excluding the 7 found in the first and second trees. Below the trees is a local chromosome plot of the region in reference to the surrounding genes and recombination rate of the region, with the color of the rs# for each SNP indicating the tree from which it was derived. KGW, 1000 Genomes white race/ethnicity individuals; frq, frequency.
Figure 2Genotyping cluster plot of the G84E variant.
A subset of the RPGEH GERA cohort, in addition to the CMHS cohort, were additionally genotyped at the G84E variant. All carriers are imputed correctly, but some individuals are falsely identified as carriers (r2 = 0.57, 95% CI = 0.37–0.77). This is because of lack of specificity of the ancestral haplotype for mutation carriers. Counts of (Exome array genotype call, GWAS imputation call) categories for RPGEH GERA and Men’s Health cohort are given in brackets [.], and for RPGEH GERA alone in parenthesis (.). The most likely/best guess genotypes are given for the imputed data. Discordances are noted with the larger points.
Figure 3Ancestry of the HOXB13 G84E variant.
Using the first two principal components (PCs) we created a smoothed estimate of the carrier frequency of each individual’s expected additive coding by using the 2,000 closest individuals (Euclidean distance) to calculate a G84E carrier frequency at that location, excluding individuals with >25% Ashkenazi ancestry. Text for the center of each Human Genome Diversity Project (HGDP) population is given to enhance interpretation; the mutation is most prevalent in northwestern Europe and Russian groups. To further adjust for incomplete LD, we multiplied the imputation carrier frequency by the r2 estimate of 0.57.
Figure 4Age-specific risk of prostate cancer by HOXB13 G84E mutation carrier status.
One minus the usual Kaplan-Meier survival curve, with the probability of prostate cancer on the y-axis. The risk for G84E carriers is significantly higher than that for non-carriers. (a) Unadjusted. (b) Adjusted for incomplete LD.
Pleiotropic effect of G84E mutation on risk of cancer.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Kidney | 5 | 303 | 0.94% | 2.32 (0.94, 5.76) | 3.32 (0.89, 9.99) | 0.034 |
| Bladder | 5 | 335 | 0.85% | 2.05 (0.81, 5.22) | 2.84 (0.70, 8.94) | 0.065 |
| Non-Hodgkin’s Lymphoma | 10 | 846 | 0.67% | 1.81 (0.98, 3.36) | 2.42 (0.96, 5.52) | 0.029 |
| Melanoma | 15 | 1301 | 0.66% | 1.50 (0.89, 2.55) | 1.88 (0.81, 3.95) | 0.066 |
| Pancreas | 2 | 149 | 0.77% | 1.49 (0.28, 7.83) | 1.86 (0.18, 13.70) | 0.32 |
| Breast | 38 | 3183 | 0.68% | 1.48 (1.04, 2.10) | 1.84 (1.07, 3.16) | 0.014 |
| Endometrium | 6 | 578 | 0.59% | 1.44 (0.64, 3.24) | 1.77 (0.49, 5.22) | 0.19 |
| Colon | 11 | 1119 | 0.56% | 1.42 (0.77, 2.60) | 1.74 (0.65, 4.06) | 0.13 |
| Thyroid | 3 | 217 | 0.79% | 1.35 (0.35, 5.22) | 1.61 (0.23, 8.81) | 0.33 |
| Ovary | 2 | 198 | 0.58% | 1.32 (0.32, 5.49) | 1.56 (0.20, 9.26) | 0.35 |
| Multiple Myeloma | 1 | 135 | 0.42% | 1.26 (0.20, 7.92) | 1.46 (0.12, 13.75) | 0.40 |
| Lung | 5 | 667 | 0.43% | 1.10 (0.44, 2.72) | 1.18 (0.30, 4.22) | 0.42 |
| Oral | 2 | 283 | 0.40% | 0.98 (0.23, 4.10) | 0.97 (0.14, 6.78) | NA |
| Lymphocytic Leukemia | 1 | 218 | 0.26% | 0.53 (0.05, 5.24) | 0.39 (0.03, 9.06) | NA |
| Any | 123 | 10493 | 0.67% | 1.36 (1.13, 1.63) | 1.63 (1.22, 2.29) | 5.8×10−4 |
| Any single cancer | 106 | 9532 | 0.63% | 1.41 (1.14, 1.75) | 1.72 (1.23, 2.51) | 8.9×10−4 |
| Two or more cancers | 17 | 961 | 1.01% | 2.21 (1.35, 3.61) | 3.12 (1.60, 6.14) | 0.0011 |
Estimates of the association between the G84E mutation and risk of fourteen cancer(s) in the RPGEH GERA cohort. An overall association was detected between the G84E mutation and the fourteen cancers grouped together (odds ratio = 1.36).
aNumber of non-cancer carriers / total number of individuals without cancer ( = frequency): = (430 / 54,482) × 0.57 = 0.45% for both genders and = (263 / 31,965) × 0.57 = 0.47% for females.
bThe expected number of carriers according to best guess (tested using additive dosages)
cAdjusted for incomplete LD by multiplying the imputation carrier frequency by the r2 estimate of 0.57.
dAdjusted for age and genetic ancestry. One-sided p-value (see Methods) and OR from the non-corrected analysis (with two-sided 95% CI), not adjusted for multiple comparisons. Some ORs are larger than 1.0 even though the frequencies in cases are not higher than controls because of the adjustment for covariates.
eSince the imputed variant is not perfectly correlated with the genotyped variant, the OR is underestimated, so we correct the estimate as for incomplete LD (see Methods).
fAny = any of the fourteen cancers using a meta-analysis approach that takes into account shared controls and cases.
gAnalysis using an indicator variable for presence of any single cancer of the fourteen cancers.
hAnalysis using an indicator variable for presence of any two or more of the fourteen cancers.