| Literature DB >> 32620889 |
Nilanjan Chatterjee1,2, Montserrat Garcia-Closas3, Yan Dora Zhang4,5, Amber N Hurson3,6, Haoyu Zhang3,7, Parichoy Pal Choudhury3, Douglas F Easton8,9, Roger L Milne10,11,12, Jacques Simard13, Per Hall14,15, Kyriaki Michailidou9,16, Joe Dennis9, Marjanka K Schmidt17,18, Jenny Chang-Claude19,20, Puya Gharahkhani21, David Whiteman22, Peter T Campbell23, Michael Hoffmeister24, Mark Jenkins11, Ulrike Peters25, Li Hsu25, Stephen B Gruber26, Graham Casey27, Stephanie L Schmit28, Tracy A O'Mara29, Amanda B Spurdle29, Deborah J Thompson9, Ian Tomlinson30,31, Immaculata De Vivo32,33, Maria Teresa Landi3, Matthew H Law21, Mark M Iles34, Florence Demenais35, Rajiv Kumar36, Stuart MacGregor21, D Timothy Bishop37, Sarah V Ward38, Melissa L Bondy39, Richard Houlston40, John K Wiencke41, Beatrice Melin42, Jill Barnholtz-Sloan43, Ben Kinnersley40, Margaret R Wrensch41, Christopher I Amos44, Rayjean J Hung45, Paul Brennan46, James McKay46, Neil E Caporaso3, Sonja I Berndt3, Brenda M Birmann32, Nicola J Camp47, Peter Kraft48, Nathaniel Rothman3, Susan L Slager49, Andrew Berchuck50, Paul D P Pharoah8,9, Thomas A Sellers28, Simon A Gayther51, Celeste L Pearce26,52, Ellen L Goode53, Joellen M Schildkraut54, Kirsten B Moysich55, Laufey T Amundadottir56, Eric J Jacobs23, Alison P Klein57, Gloria M Petersen53, Harvey A Risch58, Rachel Z Stolzenberg-Solomon3, Brian M Wolpin59, Donghui Li60, Rosalind A Eeles61, Christopher A Haiman26, Zsofia Kote-Jarai61, Fredrick R Schumacher62, Ali Amin Al Olama63,64, Mark P Purdue3, Ghislaine Scelo46, Marlene D Dalgaard65,66, Mark H Greene67, Tom Grotmol68, Peter A Kanetsky28, Katherine A McGlynn3, Katherine L Nathanson69, Clare Turnbull40, Fredrik Wiklund14, Stephen J Chanock3.
Abstract
Genome-wide association studies (GWAS) have led to the identification of hundreds of susceptibility loci across cancers, but the impact of further studies remains uncertain. Here we analyse summary-level data from GWAS of European ancestry across fourteen cancer sites to estimate the number of common susceptibility variants (polygenicity) and underlying effect-size distribution. All cancers show a high degree of polygenicity, involving at a minimum of thousands of loci. We project that sample sizes required to explain 80% of GWAS heritability vary from 60,000 cases for testicular to over 1,000,000 cases for lung cancer. The maximum relative risk achievable for subjects at the 99th risk percentile of underlying polygenic risk scores (PRS), compared to average risk, ranges from 12 for testicular to 2.5 for ovarian cancer. We show that PRS have potential for risk stratification for cancers of breast, colon and prostate, but less so for others because of modest heritability and lower incidence.Entities:
Mesh:
Year: 2020 PMID: 32620889 PMCID: PMC7335068 DOI: 10.1038/s41467-020-16483-3
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Estimated number of independent common susceptibility variants and heritability across 14 cancer sites.
| Number of cases in the analysis | Cancer sitea | Total number of susceptibility SNPs (SE) | Total heritability, in log-OR scaleb (SE) | Average heritability explained per susceptibility SNPc (SE), in | Number of SNPs associated with larger variance component (SE) | % of heritability explained by SNPs with larger variance component | AUC associated with the best PRSd (SE) |
|---|---|---|---|---|---|---|---|
| <10,000 | CLL | 2025 (1501) | 1.62 (0.37) | 7.2 (4.4) | 52 (15) | 41 | 0.82 (0.03) |
| <10,000 | Esophageal | 3641 (2515) | 1.24 (0.36) | 3.4 (1.9) | NAe | NA | 0.78 (0.03) |
| <10,000 | Testicular | 2598 (2088) | 2.81 (0.40) | 9.2 (6.6) | 196 (75) | 54 | 0.88 (0.02) |
| <10,000 | Oropharyngeal | 3623 (2060) | 0.68 (0.27) | 1.9 (0.5) | NA | NA | 0.72 (0.04) |
| <10,000 | Pancreas | 1757 (1490) | 0.60 (0.16) | 3.2 (2.2) | 47 (27) | 31 | 0.71 (0.03) |
| 10,000–25,000 | Renal | 2220 (1555) | 0.57 (0.12) | 2.4 (1.4) | 46 (36) | 24 | 0.70 (0.02) |
| 10,000–25,000 | Glioma | 2364 (1593) | 0.87 (0.11) | 2.2 (1.2) | 61 (25) | 55 | 0.75 (0.01) |
| 10,000–25,000 | Melanoma | 1098 (533) | 0.65 (0.09) | 4.4 (1.6) | 106 (58) | 52 | 0.72 (0.01) |
| 10,000–25,000 | Colorectal | 1484 (696) | 0.43 (0.10) | 2.9 (0.8) | 14 (11) | 7 | 0.68 (0.02) |
| 10,000–25,000 | Endometrial | 1052 (772) | 0.27 (0.07) | 2.5 (1.3) | 46 (34) | 26 | 0.64 (0.02) |
| 10,000–25,000 | Ovarian | 1015 (715) | 0.24 (0.06) | 2.2 (1.1) | 49 (31) | 36 | 0.64 (0.02) |
| >25,000 | Lung | 6096 (2750) | 0.39 (0.06) | 0.6 (0.2) | 15 (7) | 15 | 0.67 (0.01) |
| >25,000 | Prostate | 4530 (1052) | 0.77 (0.04) | 1.1 (0.2) | 276 (99) | 51 | 0.73 (0.01) |
| >25,000 | Breast | 7599 (1615) | 0.60 (0.03) | 0.6 (0.1) | 587 (133) | 56 | 0.71 (0.00) |
SNP single-nucleotide polymorphism, SE standard errors, CLL chronic lymphocytic leukemia.
aAll results are reported using the best fitted (two- or three-component) normal mixture model for effect-size distributions, with respect to a reference panel of 1.07 million common SNPs included in the Hapmap3 panel after removal of MHC region.
bTotal heritability is characterized by population variance of the underlying true PRS as , where denotes per-SNP effect-size of the non-null SNPs in the log-odds-ratio scale.
cAverage heritability explained per susceptibility SNP excludes SNPs with extremely large effects (see “Methods”).
dArea under the curve (AUC) associated with best PRS is calculated using the formula AUC= where is the cumulative density function of standard normal distribution.
eNA indicates that a two-component model is favorable compared to three-component model.
Fig. 1Estimated effect-size distributions for susceptibility SNPs across 14 cancer sites.
Effect-size distribution of susceptibility SNPs is modeled using a two-component normal mixture model for all sites, except esophageal and oropharyngeal cancers. For these sites, effect sizes are modeled using a single normal distribution that provided similar fit as the two-component normal mixture model (see Supplementary Figs. 1 and 2). SNPs with extremely large effects are excluded for effect-size distribution estimation (see “Methods”). Plots are stratified by sample size of the GWAS for comparability. Distributions with fatter tails imply the underlying traits have relatively greater number of susceptibility SNPs with larger effects. Note here that the effect-size distribution is plotted on the log scale of odds ratio (x-axis). CLL chronic lymphocytic leukemia.
Fig. 2Projections of percentage of GWAS heritability explained by SNPs as sample size for GWAS increases.
Results are shown for projections including SNPs at the optimized p value threshold (solid curve) and at genome-wide significance (p < ) level (dashed curve). Colored dots correspond to sample size for the largest published GWAS and those for doubled and quadruped sizes. For oropharyngeal cancer, the projections at the “current sample size” are based on a sample size of 25K cases and 25K controls. For breast and esophageal cancer, the projections at the “current sample size” are based on the current largest GWAS sample sizes: 123K cases and 106K controls and 10K cases and 17K controls, respectively. For all other cancer sites, the projections at the “current sample size” are based on the GWAS sample sizes in Supplementary Table 1. CLL chronic lymphocytic leukemia.
Fig. 3Projections of area under the curve (AUC) characterizing predictive performance of PRS as sample size for GWAS increases.
Results are shown for PRS including SNPs at the optimized p value threshold. The dotted horizontal red line indicates the maximum AUC achievable according to the estimate of GWAS heritability. Colored dots correspond to sample size for largest published GWAS and those for doubled and quadruped sizes. For oropharyngeal cancer, the projections at the “current sample size” are based on a sample size of 25K cases and 25K controls. For breast and esophageal cancer, the projections at the “current sample size” are based on the current largest GWAS sample sizes: 123K cases and 106K controls and 10K cases and 17K controls, respectively. For all other cancer sites, the projections at the “current sample size” are based on the GWAS sample sizes in Supplementary Table 1. CLL chronic lymphocytic leukemia.
Fig. 4Projections of relative risks for individuals at or higher than 99th percentile of PRS as sample size for GWAS increases.
Results are shown where PRS is built based on SNPs at optimized p value threshold. The dotted horizontal red line indicates the maximum relative risk achievable according to estimate of GWAS heritability. Colored dots correspond to sample size for the largest published GWAS and those for doubled and quadruped sizes. y-Axis is presented in log10 scale. For oropharyngeal cancer, the projections at the “current sample size” are based on a sample size of 25K cases and 25K controls. For breast and esophageal cancer, the projections at the “current sample size” are based on the current largest GWAS sample sizes: 123K cases and 106K controls and 10K cases and 17K controls, respectively. For all other cancer sites, the projections at the “current sample size” are based on the GWAS sample sizes in Supplementary Table 1. CLL chronic lymphocytic leukemia.
Fig. 5Projected distribution of average residual lifetime risk in the US population of non-Hispanic whites aged 30–75 years.
The risk is obtained according to variation of polygenic risk scores. The projections are shown for PRS built based on GWAS with current, doubled and quadrupled sample sizes and the best PRS that corresponds to limits defined by heritability. The projections are obtained by combining information on projected population variance of PRS, age-specific population incidence rate, competing risk of mortality and current distribution of age according to US 2016 census. For oropharyngeal cancer, the projections at the “current sample size” are based on a sample size of 25K cases and 25K controls. For breast and esophageal cancer, the projections at the “current sample size” are based on the current largest GWAS sample sizes: 123K cases and 106K controls and 10K cases and 17K controls, respectively. For all other cancer sites, the projections at the “current sample size” are based on the GWAS sample sizes in Supplementary Table 1. CLL chronic lymphocytic leukemia.