G Tiao1, M R Improgo1,2,3, S Kasar1,2,3, W Poh1,2,3, A Kamburov1, D-A Landau1,2,3, E Tausch4, A Taylor-Weiner1, C Cibulskis1, S Bahl1, S M Fernandes2, K Hoang2, E Rheinbay1, H T Kim5, J Bahlo6, S Robrecht6, K Fischer6, M Hallek6, S Gabriel1, E S Lander1, S Stilgenbauer4, C J Wu1,2,3, A Kiezun1, G Getz1,7,8, J R Brown1,2,3. 1. Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 2. Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA. 3. Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. 4. Department of Internal Medicine III, Ulm University, Ulm, Germany. 5. Department of Computational Biology and Biostatistics, Dana-Farber Cancer Institute, Boston, MA, USA. 6. Department I of Internal Medicine and Center of Integrated Oncology Cologne Bonn, University Hospital, Cologne, Germany. 7. Department of Pathology, Massachusetts General Hospital, Boston, MA, USA. 8. Department of Pathology, Harvard Medical School, Boston, MA, USA.
CLL is a highly heritable cancer, with a 7.5-fold increased risk in first-degree relatives[1]. However, inherited predisposition to CLL remains largely unexplained by traditional linkage or genome-wide association studies. Here, we hypothesized that CLL heritability might arise from rare coding variants not analyzed in previous studies.We compared rare germline variants (minor allele frequency < 0.01) in coding regions of 516 samples from CLL patients of European descent to those found in 8,920 ethnically matched, normal population controls. This represents the largest and most comprehensive search for risk alleles in CLL exomes to date. To maximize our power to detect significant associations, we combined data from multiple sequencing studies (see Supplementary Methods and Tables S1-S2 for cohort descriptions).An important consideration when aggregating samples across multiple sequencing studies is controlling for biological and technical heterogeneity. Differences in patient ethnicities, sequencing technologies, depth of coverage, and variant calling methods may give rise to spurious results. Here, we controlled for these factors by: (i) simultaneously processing original sequencing data from all cohorts; (ii) jointly calling variants across all cases and controls; and (iii) analyzing only ethnically matched, unrelated samples over DNA sites with sequencing coverage sufficient to achieve high-confidence genotype calls across the entire sample cohort. We then performed an unbiased, exome-wide rare variant burden test on cases and controls (Figures S1-S2).We identified two genes significantly associated with CLL (q≤0.05): CDK1 and ATM. CDK1, a gene that encodes a cyclin-dependent kinase critical for cell division, was significantly enriched for rare, non-synonymous germline variants in CLL cases versus controls (p = 5.75 × 10-7, Table 1). One recurrent missense variant, CDK1p.R59C (rs8755), was observed in 5 cases and 10 controls (Table S3). This missense variant lies in the CDK1 kinase domain (Figure S3) and is predicted to be Possibly Damaging by the PolyPhen2 prediction tool[2], suggesting that the variant may affect protein function.
Table 1
Significant hits in gene-based rare variant burden test for CLL
Gene
p-value
FDR (q-value)
# of Cases with Rare Variants (%)
# of Controls with Rare Variants (%)
OR (95% CI)
Fisher OR (95% CI)
Discovery Cohort
CDK1
5.75 × 10-7
0.0091
8 (1.6%)
24 (0.3%)
5.8 (2.6 – 13.1)
5.83 (2.25-13.5)
ATM
1.43 × 10-6
0.011
112 (21.7%)
1296 (14.5%)
1.6 (1.3 – 2.0)
1.66 (1.34-2.06)
Extension Cohort*
CDK1
2.17 × 10-4
0.107
8 (1.2%)
26 (0.3%)
4.28 (1.93-9.51)
4.29 (1.67-9.8)
ATM
1.04 × 10-8
1.7 × 10-4
170 (26.3%)
1483 (16.6%)
1.79 (1.49-2.15)
1.79 (1.49-2.15)
Due to the inclusion of additional case samples, some variant allele frequencies were altered, leading to an increase in the number of variants that met allele frequency and quality control thresholds. This led to an increase in the number of controls with rare variants in the extension call set.
The second significant gene we identified was ATM (p = 1.43 × 10-6, Table 1), a well-known tumor suppressor gene on chromosome 11q. One of the most enriched recurrent variants is L2307F (2.3% cases, OR = 10.1, 4.9-20.7). Interestingly, L2307F has been previously reported in two CLL cases and a breast cancer case, the latter segregating in a family also affected with hematologic malignancy.[3, 4] The L2307F variant lies in the FAT domain of the ATM protein (Figure 1A) and is predicted to be Probably Damaging by PolyPhen2. Subsequent targeted sequencing using Sequenom technology in an independent set of 149 CLL cases revealed a similar frequency of 2.01% (3 out of 149) for the L2307F variant. In 27 cases with available RNASeq data, expression of the rare germline ATM variant was confirmed, and in all but one case, the alternate allele fractions in RNA transcripts were consistent with those in germline genomic DNA (Table S4).
Figure 1
CLL patients harbor multiple ATM lesions: germline variants, somatic mutations, and loss of heterozygosity
A. Rare germline variants in ATM are enriched in CLL cases (discovery cohort). Variants found in CLL cases are displayed with the total number of cases (above the protein track) and controls (below), along with the corresponding percentages in their respective cohorts. B. Additional rare germline variants in ATM were observed in the extension cohort. C. Co-occurrence of ATM lesions in a total of 646 patient samples: rare germline variants (top row), loss-of-heterozygosity (middle row), and non-synonymous somatic mutations (bottom row). Samples are arranged along the x-axis (unlabeled). Presence of the genetic lesion is indicated in red, absence of the lesion is marked by grey. Missing data are displayed in black. Samples with copy-neutral LOH are indicated by purple boxes; the remaining LOH events are 11q deletions. Percentages of samples with somatic ATM events are labeled in parentheses for the patient cohorts with germline and no germline ATM events, respectively. Data for samples with no known germline or somatic ATM events (359 patients) are not shown to scale due to space limitations.
The majority of the recurrent variants observed in ATM were non-synonymous missense variants (Table S6) in contrast to the predominantly loss-of-function alleles seen in ataxia-telangiectasia, a hereditary disorder associated with increased risk of leukemias and lymphomas. Analysis of the frequency-weighted distribution of PolyPhen2 scores across these missense variants revealed a significant shift toward more damaging scores in the cases versus controls (p=0.0038, one-sided Kolmogorov-Smirnov test; Figure S4). These observations are consistent with recent reports of the potential role of germline missense variants in cancer heritability[5, 6]. In fact, 22 of the rare missense ATM variants we identified in CLL cases were also associated with breast cancer risk in a meta-analysis of breast cancer studies[7].As an extension to our initial findings, we studied two additional cohorts of CLL cases (n=106 exomes, n=24 genomes). We combined these additional cases with our 516 original cases and compared against our original control cohort. This expanded joint analysis approach has been shown to consistently improve the statistical power for detecting genetic associations[8]. We found ATM to be the top hit in this combined analysis (p = 1.04 × 10-8, Table 1, Table S5). ATM variants are summarized in Figure 1B and Table S6 while patient characteristics by ATM germline status are summarized in Table S7. For CDK1, no additional rare variants were found in the extension cohort and hence its significance dropped below our significance threshold (q=0.107, Table 1, Table S8). Patient characteristics by CDK1 germline status are summarized in Table S9.The classical model of tumor suppressor inactivation involves the loss of both wild-type alleles of the tumor suppressor gene. In CLL, ATM is frequently lost through somatic deletion of the chromosome 11q region that spans the ATM locus[9] and through inactivating somatic mutations[10, 11]. We observed an enrichment of these somatic “second hits” in patient samples harboring rare ATM germline variants. Among the 112 patients in the discovery cohort with ATM germline variants, we found 23 with somatic ATM mutations, 29 with 11q deletions, and 2 with copy-neutral loss of heterozygosity (LOH) in the ATM locus (Figure 1C). The presence of a germline ATM variant was significantly associated with the occurrence of an ATM somatic mutation (OR=1.79, 95% CI 0.98-3.16; p=0.047, two-tailed Fisher exact test), as well as with LOH, either copy-neutral or via 11q deletion (OR=1.68, 95% CI 0.98-2.82; p=0.042, two-tailed Fisher exact test). Overall, the presence of a rare germline ATM variant was significantly associated with the presence of at least one of these somatic events (OR=2.17, 95% CI 1.35-3.48; p=9.1×10-4, two-tailed Fisher exact test). The association remained significant in the extension cohort (OR=1.74, 95% CI 1.16-2.60; p=5.5×10-3, two-tailed Fisher exact test). The observation that patients with rare coding germline variants in ATM were significantly more likely to harbor a second inactivating somatic lesion in ATM suggests that rare germline variants in ATM behave as tumor suppressor alleles.To test further whether the rare germline ATM variants are likely to be functional, we examined which ATM allele is lost when 11q is deleted. If the variants had no effect on the development of CLL, we would expect an equal likelihood of their loss or retention in 11q deleted cases. Strikingly, we found that patients with rare germline ATM variants who acquired an 11q deletion more often lost the wild-type ATM allele. Specifically, 80% carried only the rare variant germline allele in their tumor samples (16 out of 20 with clonal or near-clonal 11qdel). This rate of wild-type allele loss is significantly greater than expected by chance (p=0.012, two-tailed binomial test). Furthermore, in the four cases that lost the variant allele, none had Probably Damaging PolyPhen2 scores, suggesting that their effect on protein function was less severe. These results suggest that many rare germline ATM alleles may confer selective advantage to malignant B-cells.Given the association of somatic 11q deletion[9] or ATM mutation[12] with worse clinical outcomes in CLL, we investigated whether the presence of a rare germline variant in ATM would be associated with worse clinical outcomes. In a Cox regression analysis adjusting for treatment arm, somatic del(11q), but not germline ATM variant, was a significant predictor of outcome (Tables S10-S11, Figure S5). When we added two other CLL prognostic factors, del(17p) and IGHV status, to the Cox regression analysis (Table S12), del(11q) was no longer significantly associated with PFS. We also investigated the effect of a rare germline ATM variant and/or 11q deletion on overall survival, and saw no effect, including in a Cox regression analysis adjusting for treatment arm (Table S13, Figure S6).Taken together, our results show that rare, protein-coding germline variants in ATM are frequent events in CLL, with ATM behaving as a classic tumor suppressor gene, showing preferential somatic loss of the wild-type allele. Although previous research has hinted at a role for specific alleles of ATM in CLL risk, these studies either involved relatively small sample sizes, resulting in low statistical power, or were targeted approaches that did not evaluate the entire ATM gene or did not evaluate ATM against other potential risk genes in an unbiased, exome-wide manner[13-15]. In contrast, we have applied consistent technical processing to a large cohort of jointly-called, ethnically matched CLL and normal population controls, with a focus on rare coding variants. This, along with careful quality control, variant filtering, and accounting for population substructure in an unbiased, exome-wide association analysis, allowed us to identify ATM as a CLL risk gene.Because CLL predominantly affects individuals of European descent, we chose to focus our study on patients of European ethnicity. We expect, however, that further studies examining patients of different ethnic backgrounds may uncover additional germline risk genes not detectable in the European study cohort. Indeed, the presence of residual population substructure within the cohort of European subjects in this study suggests that there may be germline predisposition genes affecting different European sub-populations that we have not yet identified. Another area requiring further exploration is the search for germline risk factors in familial CLL cases. Our study included many patients for whom the familial or sporadic disease status was not available (n=387), and among those with known status, most were sporadic or without living affected or available relatives (n=195). Larger studies of whole exomes and whole genomes with a focus on familial cases and underrepresented ethnic populations will be needed to further increase power to detect additional risk genes and alleles for CLL. The approach we describe here, combining rare variant association with somatic sequencing analysis, can be applied to any type of heritable cancer and holds great promise for identifying new germline cancer predisposition alleles as progressively larger cohorts of germline cancer cohorts are sequenced.
Authors: Xose S Puente; Magda Pinyol; Víctor Quesada; Laura Conde; Gonzalo R Ordóñez; Neus Villamor; Georgia Escaramis; Pedro Jares; Sílvia Beà; Marcos González-Díaz; Laia Bassaganyas; Tycho Baumann; Manel Juan; Mónica López-Guerra; Dolors Colomer; José M C Tubío; Cristina López; Alba Navarro; Cristian Tornador; Marta Aymerich; María Rozman; Jesús M Hernández; Diana A Puente; José M P Freije; Gloria Velasco; Ana Gutiérrez-Fernández; Dolors Costa; Anna Carrió; Sara Guijarro; Anna Enjuanes; Lluís Hernández; Jordi Yagüe; Pilar Nicolás; Carlos M Romeo-Casabona; Heinz Himmelbauer; Ester Castillo; Juliane C Dohm; Silvia de Sanjosé; Miguel A Piris; Enrique de Alava; Jesús San Miguel; Romina Royo; Josep L Gelpí; David Torrents; Modesto Orozco; David G Pisano; Alfonso Valencia; Roderic Guigó; Mónica Bayés; Simon Heath; Marta Gut; Peter Klatt; John Marshall; Keiran Raine; Lucy A Stebbings; P Andrew Futreal; Michael R Stratton; Peter J Campbell; Ivo Gut; Armando López-Guillermo; Xavier Estivill; Emili Montserrat; Carlos López-Otín; Elías Campo Journal: Nature Date: 2011-06-05 Impact factor: 49.962
Authors: Dan A Landau; Scott L Carter; Petar Stojanov; Aaron McKenna; Kristen Stevenson; Michael S Lawrence; Carrie Sougnez; Chip Stewart; Andrey Sivachenko; Lili Wang; Youzhong Wan; Wandi Zhang; Sachet A Shukla; Alexander Vartanov; Stacey M Fernandes; Gordon Saksena; Kristian Cibulskis; Bethany Tesar; Stacey Gabriel; Nir Hacohen; Matthew Meyerson; Eric S Lander; Donna Neuberg; Jennifer R Brown; Gad Getz; Catherine J Wu Journal: Cell Date: 2013-02-14 Impact factor: 41.582
Authors: Lili Wang; Michael S Lawrence; Youzhong Wan; Petar Stojanov; Carrie Sougnez; Kristen Stevenson; Lillian Werner; Andrey Sivachenko; David S DeLuca; Li Zhang; Wandi Zhang; Alexander R Vartanov; Stacey M Fernandes; Natalie R Goldstein; Eric G Folco; Kristian Cibulskis; Bethany Tesar; Quinlan L Sievers; Erica Shefler; Stacey Gabriel; Nir Hacohen; Robin Reed; Matthew Meyerson; Todd R Golub; Eric S Lander; Donna Neuberg; Jennifer R Brown; Gad Getz; Catherine J Wu Journal: N Engl J Med Date: 2011-12-12 Impact factor: 91.245
Authors: Ivan A Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily E Ramensky; Anna Gerasimova; Peer Bork; Alexey S Kondrashov; Shamil R Sunyaev Journal: Nat Methods Date: 2010-04 Impact factor: 28.547
Authors: Sean V Tavtigian; Peter J Oefner; Davit Babikyan; Anne Hartmann; Sue Healey; Florence Le Calvez-Kelm; Fabienne Lesueur; Graham B Byrnes; Shu-Chun Chuang; Nathalie Forey; Corinna Feuchtinger; Lydie Gioia; Janet Hall; Mia Hashibe; Barbara Herte; Sandrine McKay-Chopin; Alun Thomas; Maxime P Vallée; Catherine Voegele; Penelope M Webb; David C Whiteman; Suleeporn Sangrajrang; John L Hopper; Melissa C Southey; Irene L Andrulis; Esther M John; Georgia Chenevix-Trench Journal: Am J Hum Genet Date: 2009-09-24 Impact factor: 11.025
Authors: E L Young; B J Feng; A W Stark; F Damiola; G Durand; N Forey; T C Francy; A Gammon; W K Kohlmann; K A Kaphingst; S McKay-Chopin; T Nguyen-Dumont; J Oliver; A M Paquette; M Pertesi; N Robinot; J S Rosenthal; M Vallee; C Voegele; J L Hopper; M C Southey; I L Andrulis; E M John; M Hashibe; J Gertz; F Le Calvez-Kelm; F Lesueur; D E Goldgar; S V Tavtigian Journal: J Med Genet Date: 2016-01-19 Impact factor: 6.318
Authors: Grace Tiao; M Reina Improgo; Eugen Tausch; Stacey M Fernandes; Jasmin Bahlo; Sandra Robrecht; Kirsten Fischer; Michael Hallek; Stephan Stilgenbauer; Adam Kiezun; Gad Getz; Jennifer R Brown Journal: Blood Date: 2017-10-19 Impact factor: 22.113
Authors: Ferran Nadeu; David Martin-Garcia; Guillem Clot; Ander Díaz-Navarro; Martí Duran-Ferrer; Alba Navarro; Roser Vilarrasa-Blasi; Marta Kulis; Romina Royo; Jesús Gutiérrez-Abril; Rafael Valdés-Mas; Cristina López; Vicente Chapaprieta; Montserrat Puiggros; Giancarlo Castellano; Dolors Costa; Marta Aymerich; Pedro Jares; Blanca Espinet; Ana Muntañola; Inmaculada Ribera-Cortada; Reiner Siebert; Dolors Colomer; David Torrents; Eva Gine; Armando López-Guillermo; Ralf Küppers; Jose I Martin-Subero; Xose S Puente; Sílvia Beà; Elias Campo Journal: Blood Date: 2020-09-17 Impact factor: 22.113
Authors: Romina Royo; Laura Magnano; Julio Delgado; Sara Ruiz-Gil; Josep Ll Gelpí; Holger Heyn; Malcom A Taylor; Tatjana Stankovic; Xose S Puente; Ferran Nadeu; Elías Campo Journal: Blood Cancer J Date: 2022-06-07 Impact factor: 9.812
Authors: Thomas P Slavin; Bradford Coffee; Ryan Bernhisel; Jennifer Logan; Hannah C Cox; Guido Marcucci; Jeffrey Weitzel; Susan L Neuhausen; Debora Mancini-DiNardo Journal: Cancer Genet Date: 2019-04-13
Authors: Alisa M Goldstein; Elizabeth M Gillanders; Melissa Rotunno; Rolando Barajas; Mindy Clyne; Elise Hoover; Naoko I Simonds; Tram Kim Lam; Leah E Mechanic Journal: Cancer Epidemiol Biomarkers Prev Date: 2020-05-28 Impact factor: 4.254
Authors: Geffen Kleinstern; Nicola J Camp; Lynn R Goldin; Celine M Vachon; Claire M Vajdic; Silvia de Sanjose; J Brice Weinberg; Yolanda Benavente; Delphine Casabonne; Mark Liebow; Alexandra Nieters; Henrik Hjalgrim; Mads Melbye; Bengt Glimelius; Hans-Olov Adami; Paolo Boffetta; Paul Brennan; Marc Maynadie; James McKay; Pier Luigi Cocco; Tait D Shanafelt; Timothy G Call; Aaron D Norman; Curtis Hanson; Dennis Robinson; Kari G Chaffee; Angela R Brooks-Wilson; Alain Monnereau; Jacqueline Clavel; Martha Glenn; Karen Curtin; Lucia Conde; Paige M Bracci; Lindsay M Morton; Wendy Cozen; Richard K Severson; Stephen J Chanock; John J Spinelli; James B Johnston; Nathaniel Rothman; Christine F Skibola; Jose F Leis; Neil E Kay; Karin E Smedby; Sonja I Berndt; James R Cerhan; Neil Caporaso; Susan L Slager Journal: Blood Date: 2018-04-19 Impact factor: 25.476