Literature DB >> 33057201

Inherited causes of clonal haematopoiesis in 97,691 whole genomes.

Alexander G Bick1,2,3,4, Joshua S Weinstock5, Satish K Nandakumar2,6, Charles P Fulco2,7, Erik L Bao2,6,8, Seyedeh M Zekavat2,9, Mindy D Szeto10,11, Xiaotian Liao2,6, Matthew J Leventhal2, Joseph Nasser2, Kyle Chang12, Cecelia Laurie13, Bala Bharathi Burugula14, Christopher J Gibson15, Amy E Lin16, Margaret A Taub17, Francois Aguet2, Kristin Ardlie2, Braxton D Mitchell18,19, Kathleen C Barnes10,20, Arden Moscati21, Myriam Fornage22,23, Susan Redline3,24,25, Bruce M Psaty26,27,28,29, Edwin K Silverman3,30, Scott T Weiss3,30, Nicholette D Palmer31, Ramachandran S Vasan32, Esteban G Burchard33,34, Sharon L R Kardia35, Jiang He36,37, Robert C Kaplan38,39, Nicholas L Smith27,29,40, Donna K Arnett41, David A Schwartz42, Adolfo Correa43, Mariza de Andrade44, Xiuqing Guo45, Barbara A Konkle46,47, Brian Custer48,49, Juan M Peralta50, Hongsheng Gui51, Deborah A Meyers52, Stephen T McGarvey53, Ida Yii-Der Chen54, M Benjamin Shoemaker55, Patricia A Peyser35, Jai G Broome13, Stephanie M Gogarten13, Fei Fei Wang13, Quenna Wong13, May E Montasser18, Michelle Daya10, Eimear E Kenny56, Kari E North57, Lenore J Launer58, Brian E Cade24,59, Joshua C Bis26, Michael H Cho3,30, Jessica Lasky-Su3,30, Donald W Bowden31, L Adrienne Cupples60, Angel C Y Mak33, Lewis C Becker61, Jennifer A Smith35,62, Tanika N Kelly36,37, Stella Aslibekyan63, Susan R Heckbert27,29, Hemant K Tiwari64, Ivana V Yang42, John A Heit65, Steven A Lubitz2,3,66, Jill M Johnsen46,47, Joanne E Curran50, Sally E Wenzel67, Daniel E Weeks68, Dabeeru C Rao69, Dawood Darbar70, Jee-Young Moon38, Russell P Tracy71, Erin J Buth13, Nicholas Rafaels20, Ruth J F Loos21,72, Peter Durda71, Yongmei Liu73, Lifang Hou74, Jiwon Lee24, Priyadarshini Kachroo3,30, Barry I Freedman75, Daniel Levy76,77, Lawrence F Bielak35, James E Hixson78, James S Floyd26,27,47, Eric A Whitsel79,80, Patrick T Ellinor2,3,66, Marguerite R Irvin63, Tasha E Fingerlin81, Laura M Raffield82, Sebastian M Armasu44, Marsha M Wheeler83, Ester C Sabino84, John Blangero50, L Keoki Williams51, Bruce D Levy3,85, Wayne Huey-Herng Sheu86, Dan M Roden87,88,89, Eric Boerwinkle89,90, JoAnn E Manson3,91,92, Rasika A Mathias61, Pinkal Desai93, Kent D Taylor94,95, Andrew D Johnson76,77, Paul L Auer96, Charles Kooperberg97, Cathy C Laurie13, Thomas W Blackwell5, Albert V Smith5, Hongyu Zhao98,99, Ethan Lange10, Leslie Lange10, Stephen S Rich100, Jerome I Rotter94,95, James G Wilson101,102, Paul Scheet12, Jacob O Kitzman14,103, Eric S Lander2,7,104, Jesse M Engreitz2,105, Benjamin L Ebert2,3,15,106, Alexander P Reiner27,97, Siddhartha Jaiswal107, Gonçalo Abecasis5,108, Vijay G Sankaran2,3,6, Sekar Kathiresan109,110,111,112, Pradeep Natarajan113,114,115.   

Abstract

Age is the dominant risk factor for most chronic human diseases, but the mechanisms through which ageing confers this risk are largely unknown1. The age-related acquisition of somatic mutations that lead to clonal expansion in regenerating haematopoietic stem cell populations has recently been associated with both haematological cancer2-4 and coronary heart disease5-this phenomenon is termed clonal haematopoiesis of indeterminate potential (CHIP)6. Simultaneous analyses of germline and somatic whole-genome sequences provide the opportunity to identify root causes of CHIP. Here we analyse high-coverage whole-genome sequences from 97,691 participants of diverse ancestries in the National Heart, Lung, and Blood Institute Trans-omics for Precision Medicine (TOPMed) programme, and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid and inflammatory traits that are specific to different CHIP driver genes. Association of a genome-wide set of germline genetic variants enabled the identification of three genetic loci associated with CHIP status, including one locus at TET2 that was specific to individuals of African ancestry. In silico-informed in vitro evaluation of the TET2 germline locus enabled the identification of a causal variant that disrupts a TET2 distal enhancer, resulting in increased self-renewal of haematopoietic stem cells. Overall, we observe that germline genetic variation shapes haematopoietic stem cell function, leading to CHIP through mechanisms that are specific to clonal haematopoiesis as well as shared mechanisms that lead to somatic mutations across tissues.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 33057201      PMCID: PMC7944936          DOI: 10.1038/s41586-020-2819-2

Source DB:  PubMed          Journal:  Nature        ISSN: 0028-0836            Impact factor:   69.504


The U.S. National Heart, Lung, and Blood Institute (NHLBI) Trans-omics for Precision Medicine (TOPMed) project seeks to use high-coverage (>35x) whole genome sequencing (WGS) and molecular profiling to improve fundamental understanding of heart, lung, blood, and sleep disorders.[7] Within the TOPMed program, we designed a study to detect CHIP from blood DNA-derived WGS in 97,691 individuals across 52 largely observational epidemiologic studies to discover the inherited genetic causes and phenotypic consequences of CHIP (Supplementary Table 1). To confidently identify somatic mutations in blood-derived DNA, we applied a somatic variant caller[8] to TOPMed WGS data. We identified CHIP carriers on the basis of a pre-specified list of leukemogenic driver mutations (see Methods, Supplementary Table 2).[5] In total, we identified 4,938 CHIP mutations in 4,229 individuals (Supplementary Table 3). The median variant allele fraction (VAF) of the CHIP mutations observed was 16%. Consistent with prior reports, >75% of these CHIP mutations were in one of three genes, DNMT3A, TET2, and ASXL1. Approximately 15% of these CHIP mutations were in the five next most frequent genes (PPM1D, JAK2, SF3B1, SRSF2 and TP53, Figure 1). Amongst these 8 genes, there was marked heterogeneity in clonal fraction. For example, DNMT3A and TET2 CHIP clonal fraction of the peripheral blood was ~25% smaller (p= 1.3 × 10−15) and ~14% smaller (p=2.1 × 10−4), respectively, than ASXL1 clonal fraction, implicating the presence of driver mutation gene-specific differences in clonal selection (Extended Data Figure 1a). 90% of individuals with CHIP driver mutations had only one identified mutation (Extended Data Figure 1b).
Fig. 1|

Identifying CHIP in TOPMed Genomes.

CHIP was identified in 97,631 whole genome sequenced peripheral blood samples through the curation of somatic driver mutations. Counts for 8 most common driver genes plotted. inset, CHIP prevalence increased with age. Center line represents general additive model spline, 95% confidence interval is shaded (n=82,807 individuals; two-sided t-test: p<10−300).

Extended Data Fig. 1|

Characterizing TOPMed CHIP.

a, There was marked heterogeneity of CHIP clone size as measured by variant allele fraction by CHIP driver gene. Violin plot spanning minimum and maximum values calculated on full dataset (Supplementary Table 3). Sample size for each element in violin plot displayed in Fig. 1, b, 90% of individuals with CHIP had only one CHIP driver mutation identified c, CHIP prevalence with age was highly concordant across sequenced cohorts. CHIP prevalence was estimated from a logistic mixed model with spline-transformed age, sex, and cohort included as predictors. The cohort was included as a random intercept. Sample size for each cohort listed in Supplementary Table 1. d, CHIP prevalence with age in this study (blue triangles, N=82,807) was highly consistent with previously observed CHIP prevalence (dots represent mean point prevalence with shaded area represents 95% confidence interval; NGenovese=12,380; NJaiswal = 17,182; NXie = 2,728).

CHIP phenotypic associations

CHIP prevalence was strongly correlated with age at blood draw (p < 10−300, Figure 1 inset). CHIP prevalence was highly consistent across studies and comparable to previous reports[2-4] using whole exome sequencing (Extended Data Figure 1c,d). Consistent with prior studies, history of smoking was associated with increased CHIP odds (OR = 1.18, p=5 × 10−5) whereas Hispanic ancestry and East Asian ancestry were each associated with reduced CHIP odds (OR = 0.50, p=0.008 and OR = 0.56, p=0.001 respectively) after adjusting for age (Supplementary Table 4). Carriers of frameshift CHIP mutations were on average older individuals than carriers of single nucleotide CHIP mutations (Wilcox rank sum test: p=0.01). In the subset of individuals with ASXL1 CHIP mutations, which are exclusively loss-of-function single nucleotide stop-gain mutations or frameshift mutations, ASXL1 frameshift mutation carriers were similarly older (Wilcox rank sum test: p=0.009, Extended Data Figure 3a).
Extended Data Fig. 3|

CHIP associates with Blood, Lipid, and Inflammatory traits.

a, CHIP consistently associated with increased Red Cell Distribution Width (RDW). JAK2, SF3B1 and SRSF2 showed driver gene specific effects on blood traits (see Supplementary Table S5) b, CHIP status was not consistently associated with lipid traits, other than JAK2 CHIP which was associated with decreased total cholesterol and a trend towards decreased LDL (see Supplementary Table S6) c, CHIP status is associated with inflammatory markers, however notable heterogeneity existed across CHIP mutations (see Supplementary Table S7). Associations utilized a two-sided t-test from a multivariate general linear model including age, smoking, race and gender and study center and were not adjusted for multiple comparisons. Sample sizes and exact p-values for each phenotype are listed in Supplementary Tables 5–7.

JAK2 CHIP carriers were the youngest among CHIP carriers. Relative to JAK2, ASXL1 and TET2 carriers were 3.3 (p=0.01) and 3.9 (p=9.1 × 10−4) years older, respectively, while PPM1D, SF3B1 and SRSF2 carriers were 5.0, 6.9 and 7.7 years older (p=5.7 × 10−4, 1.8 × 10−6, 1.3 × 10−4), respectively (Extended Data Figure 3b). To evaluate the overlap between CHIP and large-scale mosaic chromosomal rearrangements[9], we evaluated a subset of 855 samples with both WGS and array genotyping data. The two somatic events did not co-occur more than expected by chance (hypergeometric p=0.25, Extended Data Figure 3c). CHIP is distinguished from other clonal hematologic disorders based on the absence of cytopenia, dysplasia, and neoplasia.[6] We observed a modest increase in total white blood cell count (p=1.1 × 10−5) and a modest decrease in hemoglobin (p=0.04), among those with CHIP compared to those without (Extended Data Figure 3a, Supplementary Table 5). In aggregate, CHIP driver mutations were associated with increased red blood cell distribution width (RDW, p=3.0 × 10−5) consistent with prior observations.[10] Notably, RDW is a hematologic parameter that increases with age and predicts overall mortality and poor clinical outcomes in the setting of CVD and in older adults.[11] Given the prior association of CHIP with atherosclerotic cardiovascular disease[5,12], we asked whether CHIP carriers had altered lipid profiles. Consistent with prior reports[5], we observed negative correlations of JAK2 CHIP carrier status with total cholesterol (p=5.1 ×10−4) and LDL cholesterol (p=0.0014) but no other significant associations (Extended Data Figure 3b, Supplementary Table 6). We characterized the human inflammatory profile of CHIP carriers (Extended Data Figure 3c, Supplementary Table 7). In aggregate, CHIP was associated with increased IL-6 (p=0.0035). There was no association of CHIP with quantitative C-reactive protein (CRP) and elevated CRP did not reliably identify carriers of CHIP (AUC: 0.55; for cutoff of CRP>2 mg/L: PPV=6.3%, sensitivity=60%). Driver gene-specific analyses highlighted the association of TET2 CHIP with increased IL-1b (p=2.4 × 10−4), while JAK2 and SF3B1 were associated with increased circulating IL-18 (p=1.3×10−4 and 1.27 ×10−20 respectively). To identify underlying determinants of the somatic mutational spectrum, we performed COSMIC mutational signature analysis[13] on passenger somatic mutations identified in CHIP carriers and non-carriers (see Methods). Among CHIP carriers, we observe enrichment of signature 4, which has been associated with smoking, as well as signature 6, which has been associated with defective DNA mismatch repair. (Extended Data Figure 5).
Extended Data Fig. 5|

CHIP Single variant association regional association plots.

a, TERT locus b, TRIM59/KPNA4 locus c, TET2 locus. Two-sided association testing performed using SAIGE (N=65,405 individuals, see methods)

Germline genetic determinants of CHIP

Germline genetic variants have been previously associated with clonal hematopoiesis, defined either by somatic mosaicism of SNVs and indels[14] or by large scale chromosomal rearrangements[9], in individuals of European ancestry, and identified variants at a single locus, TERT, that associates with clonal hematopoiesis. Given the distinct association of clonal hematopoiesis with known leukemogenic mutations (i.e., CHIP) with both cancer[2,15,16] and atherosclerotic cardiovascular disease[5,12], we sought to discover germline genetic variations conferring increased risk for CHIP acquisition. We performed a single variant genome-wide association analysis in a subset of 65,405 individuals (3,831 CHIP cases) where the likelihood of having a CHIP mutation was >1% (see Methods). The trait heritability explained by the analysis with LD score-regression was 3.6%. Our WGS-based association analysis of CHIP replicated the lead variant of the single locus previously associated at genome wide significance with clonal hematopoiesis (defined based on somatic mosaicism of SNVs and indels),[14] rs34002450 (OR 1.2, p=2.0 × 10−13). rs34002450 is in strong LD (r2=0.55) with our lead variant at this locus rs7705526, a common variant (MAF 0.29) in the 5th intron of TERT, which encodes telomere enzyme reverse transcriptase. In TOPMed, carriers of the rs34002450-A (minor) allele have a 1.3-fold risk of developing CHIP (p=8.4×10−24). This variant was previously significantly associated with increased leukocyte telomere length[17], myeloproliferative neoplasms (MPN, Bao, co-submitted manuscript) and clonal chromosomal mosaicism[9]. In a phenome-wide association analysis (PheWAS) of rs34002450-A in UK Biobank, we identified significant increased risk of MPN (p=2.6 × 10−13), uterine leiomyoma (p=3.2 × 10−9) and brain cancer (p=3.6 × 10−8). We performed a conditional analysis at the TERT locus, and identified a second intronic TERT variant rs13167280 (MAF 0.11, r2=0.2 with rs7705526) that independently associates with CHIP status (OR 1.3, p=6.1×10−10; conditional OR: 1.1, p=4.7 × 10−4). In the TOPMed single-variant association analysis, we additionally identified 2 other novel genome-wide significant genetic loci, including one locus on chromosome 3 in an intergenic region spanning KPNA4/TRIM59 and one locus on chromosome 4 near TET2 (Figure 3, Extended Data Figure 6, Supplementary Table 8).
Fig. 3|

African ancestry specific TET2 locus risk variant disrupts hematopoietic stem cell TET2 enhancer decreasing TET2 expression and increasing self-renewal.

a, the TET2 locus with fine-mapped risk variants, Activity-by-Contact (ABC) hematopoietic stem and progenitor cell (HSPC) enhancers, DNase-Seq CD34+ HSPC and RefSeq genes. ABC model predicts that rs79901204 disrupts a TET2 enhancer resulting in decreased TET2 expression (see methods). b, expanded view of TET2 enhancer element. c, rs79901204 disrupts a GATA motif/E-Box motif. d, rs79901204 is associated with decreased TET2 expression in human peripheral blood RNA-seq (NA/A=230, NA/T=16, NT/T=1, two-sided linear mixed model p=0.012). TPM, transcripts per million. Boxplot displays median, 25th and 75th percentiles, mean (diamond symbol) and outlier observations (black dots) e, luciferase assay in CD34+ primary cells demonstrates four-fold attenuation of enhancer activity by the rs79901204 T risk allele relative to the A reference allele (N=3, two-sided t-test p=0.007). f, deleting the TET2 enhancer (ENH) in CD34+ primary cells results in decreased TET2 expression relative to deletion of control locus AAVS1 (N=3, two-sided t-test, p=0.04). g, Human HSPCs were electroporated with Cas9 targeting a coding region of TET2 and AAVS1 (a control locus) and plated for primary and secondary colony-forming assays. h, two TET2 guides had differential editing efficiency. i, TET2 coding disruption leads to expanded secondary colony formation compared to AAVS1 controls (N=3, two-sided t-test p=0.01, p=0.002 for g1 and g2 respectively, with greater expansion identified in the TET2 guide with greater editing efficiency (two-sided t-test p=0.04). Mean and standard deviation of number of each colony type plotted. CFU-M, colony forming unit-macrophage; CFU-GM, granulocyte macrophage; CFU-GEMM, granulocyte erythrocyte macrophage megakaryocyte; CFU-G, granulocyte; BFU-E, burst forming unit-erythroid. In e, f, h, points represent independent replicates, mean values and error bars represent standard error are plotted.

Extended Data Fig. 6|

CHIP transcriptome-wide association study (TWAS) results across 48 tissues identified 7 significant loci.

UTMOST algorithm applied to CHIP genome wide association study results from n=65,405 individuals (see methods). Genomic coordinates listed on x-axis. P-value from generalized Berk-Jones test on Y axis. Multiple hypothesis corrected threshold, p<2.9 × 10−6 displayed as dotted red line.

rs1210060191 is a common variant (MAF 0.54) in a locus with an association signal that spans a 300kb region that includes KPNA4, TRIM59, IFT80, and SMC4. The lead variant is a 1 bp intronic deletion in TRIM59. Carriers of the del(T) allele have a 1.16-fold increased risk of CHIP (p=5.3×10−10). Variants in LD with this variant have been identified as associated with MPN (Bao et al, co-submitted manuscript). No other significant phenotypic associations were noted in UK Biobank PheWAS analyses. rs144418061 is an African ancestry specific variant (MAF 0.035 in African Ancestry samples, not present in non-African-ancestry samples) in an intergenic region near TET2. Carriers of the A allele have a 2.4-fold increased risk for CHIP (p=4.0×10−9). We replicated this association in an additional set of 570 TOPMed CHIP cases and 8,819 TOPMed controls (OR: 2.1, p=0.026). The association is equally robust for DNMT3A CHIP, TET2 CHIP and ASXL1 CHIP, suggesting that the germline variant does not specifically predispose to TET2 CHIP. Although other variants in the vicinity of TET2 have been associated with MPN (Bao et al, co-submitted manuscript), this variant has not been previously identified as associated with any traits in the literature likely due to the under-representation of African ancestry genomes in published association studies. We considered whether there might be germline variants that predispose to specific CHIP driver mutations by separately performing a GWAS on DNMT3A CHIP and TET2 CHIP. We identified a single novel locus for DNMT3A chip at rs2887399 in an intron of T-cell leukemia/lymphoma 1A (TCL1A). Carriers of the T allele (MAF 0.26) are at 1.23-fold risk of acquiring a DNMT3A CHIP mutation (p=3.9 × 10−9). Intriguingly carriers of the T allele are at decreased risk of acquiring a TET2 CHIP mutation (OR: 0.82, p=0.0012), and consequently it was not identified in the primary CHIP GWAS analysis. This variant is also associated with mosaic loss of chromosome Y.[18] We evaluated whether our associations between germline loci and CHIP clones were robust across CHIP clone size spectrum, using the association between the JAK2 46/1 haplotype (tagged by rs1327494) and JAK2 CHIP.[19] We find that rs1327494 associates with JAK2 CHIP presence across VAF thresholds. We evaluated whether this observation generalized beyond JAK2 CHIP to encompass all CHIP. Intriguingly, we find that the TERT locus (tagged by rs7705526) is associated with CHIP presence across all VAF thresholds (Supplementary Table 9). These observations imply that our genetic associations are not dependent on clone size detectable by deep-coverage whole genome sequencing. As single-variant analyses have limited power to detect rare-variant associations, we next performed several types of variant aggregation association tests. First, we performed a transcriptome-wide association analysis to quantify the relationship between changes in gene expression and genetic predisposition to CHIP[20] (see Methods). This approach identified the Chr3 KPNA4/TRIM59 locus and six additional loci including: AHRR, ASL, KREMN2, LEAP2, JSRP1, RASEF. (Extended Data Fig. 7–8) AHRR directs hematopoietic progenitor cell expansion and differentiation.[21]
Extended Data Fig. 7|

Tissue-specific results from the top 9 overall UTMOST-significant genes.

UTMOST algorithm applied to CHIP genome wide association study results from n=65,405 individuals. P-value from generalized Berk-Jones test. eQTL z-scores for associations with P<0.05 are displayed in each bar. GTEX eQTL tissue listed on Y-axis.

Extended Data Fig. 8 |

CRISPR/Cas9 editing efficiency of TET2 Enhancer deletion in primary CD34+ HSPCs.

a, Schematic showing the position of the two sgRNAs used to delete the TET2 enhancer (512bp) containing rs79901204. B, Gel electrophoresis image of PCR products from genomic DNA of edited HSPCs indicating unedited (WT) and deletion bands at sgRNA target site. Percentages of deletion alleles determined by band intensity and is shown below each lane. The experiment contains 3 biological replicates and was performed once.

We also performed gene-based association tests for aggregations of rare (MAF<0.1%) putative loss-of-function (pLOF) germline variants within genes for CHIP presence. Although no genes reached exome-wide significance, the top associated gene was DNA damage repair gene CHEK2 (OR 1.7, p=1.3×10−5, Supplementary Table 10). Rare germline variants in CHEK2 are implicated in a diverse set of hematologic and solid tumor malignancies.[22,23] Common variants in CHEK2 are associated with MPN[19] and a low frequency frameshift CHEK2 is associated with somatic chromosomal mosaicism[9]. In recent experimental work, suppression of CHEK2 in human cord blood Lin−CD34+ cells increased cellular proliferation in long term culture. (Bao et al, co-submitted manuscript) These results suggest that while CHEK2 while may ordinarily limit hematopoietic stem cell expansion, loss of CHEK2 function may promote self-renewal increasing risk of CHIP. We next sought to determine whether rare variants in non-coding regions associate with CHIP acquisition (see Methods). One set of variants in HAPLN1 enhancers exceeded a p-value threshold of p<0.05 after Bonferroni-correction (OR: 6.8, p=1.96 × 10−5, Supplementary Table 11). HAPLN1 is an extracellular matrix protein, produced in bone marrow stromal cells that has previously been implicated in NF-κB signaling.[24] We asked whether germline genetic variants might be associated with CHIP clonal expansion. No single variants or aggregated rare variants exceeded Bonferroni significance (Supplementary Table 12–13).

TET2 CHIP risk locus characterization

Lastly, we bioinformatically and experimentally characterized the mechanism by which the non-coding African American-specific variant at the TET2 locus influenced risk for CHIP. First, iterative conditional analyses at the locus suggested that there was most likely only a single causal variant. Fine-mapping prioritized 25 variants in the credible set (>99% posterior probability), none of which overlaps the coding sequence or promoter of a protein-coding gene. We hypothesized that the causal variant affects an enhancer for TET2 in hematopoietic stem cells, because heterozygous Tet2 knockout in mice increases the self-renewal of hematopoietic stem cells in vivo[25] and recapitulates the clonal expansion observed in humans with somatic mutations in TET2. [5,10] Accordingly, we used the Activity-by-Contact (ABC) model to predict which noncoding elements act as enhancers in CD34+ hematopoietic stem and progenitor cells (HSPC, see Methods). Only a single variant (rs79901204) in this credible set overlapped an element predicted to regulate any gene, and that element was indeed predicted to regulate TET2 expression. (Figure 3a, Supplementary Table 14) The T risk allele disrupts a consensus GATA/E-Box motif, likely resulting in reduced binding of the activating transcription factors GATA1 and GATA2 (Figure 3b,c). We then evaluated whether rs79901204 affected TET2 expression in vivo in human peripheral blood samples. We utilized whole blood RNAseq from 247 African American individuals, 16 of whom were heterozygotes for rs79901204 and one who was a homozygote. In these samples, the T risk allele led to a dose-dependent decrease in whole blood TET2 expression (Beta: −0.27, SE: 0.11, two-sided linear mixed model p=0.012, Figure 3d). Therefore, we sought to test our hypothesis that that the rs79901204 risk allele acts to decrease the activity of this TET2 enhancer and that decreased enhancer activity reduces expression of TET2 in vitro. To test whether rs79901204 affects enhancer activity, we tested a 600 base pair region containing the regulatory element using a plasmid-based luciferase enhancer assay in hematopoietic cells. The reference sequence activated luciferase expression by 118-fold (versus control constructs with no enhancer sequence), while the T risk allele activated expression by only 27-fold (two-sided t-test p=0.007, Figure 3e). To test whether deletion of this enhancer would alter TET2 gene expression, we performed deletion of the enhancer element in CD34+ HSPCs using a pair of CRISPR/Cas9 guides introduced as ribonucleoproteins, which resulted in decreased TET2 expression after 48 hours (Figure 3f). We then sought to establish the effect of decreased TET2 expression on HSPC expansion using a colony forming unit cellular assay. Human HSPCs were electroporated with Cas9 targeting a coding region of TET2 and AAVS1 (a control locus) and plated for primary and secondary colony-forming assays (Figure 3g). To establish a dose response relationship, two TET2 guides were used with differential editing efficiency (Figure 3h, Extended Data Figure 9). TET2 coding disruption resulted in expanded secondary colony formation compared to AAVS1 controls, with greater expansion identified in the TET2 guide with greater editing efficiency (Figure 3i). Thus, we demonstrate that reduction of TET2 activity promotes self-renewal and proliferation of HSPCs, illustrating how both germline noncoding and somatic coding variation at this locus converge to affect TET2 and influence the development of CHIP.
Extended Data Fig. 9 |

rs79901204 associated with genome wide differential methylation signal,

Methylation Quantitative Trait association results of rs79901204 variant with cpg methylation probes identify an altered peripheral leukocyte methylation profile genome wide in N = 1747 individuals. The strongest signal is at the chr4 TET2 locus. P-values on Y-axis derived from two-sided linear mixed effects model (see methods). To account for multiple hypothesis testing, a Bonferroni threshold of p < 5.8 × 10−8 was used to establish statistical significance.

Given the established role of TET2 in DNA de-methylation and our finding that rs79901204 is associated with decreased TET2 expression (Figure 3d), we hypothesized that carriers of rs79901204 T allele may have altered peripheral blood methylation profiles. We performed a methylation-QTL analysis of 1747 African Americans and identified 597 genes across the genome with differentially methylated CpG loci associated with rs79901204 T carrier status. The most strongly differentially methylated sites were at the TET2 locus itself. (Extended Data Fig. 10, Supplementary Table 15)
Extended Data Fig. 10 |

Sensitivity of CHIP detection at various VAFs across sequencing depths.

A set of 30 samples from a previously published CHIP cohort (Gibbons et al, 2017) were computationally down sampled to 30x, 40x, 50x, 100x and 400x sequencing depth. TOPMed WGS data was typically in the 40x depth range across CHIP genes. WGS data has excellent sensitivity to detect CHIP clones with VAF >10%, and ~50% sensitivity to detect CHIP VAF 5–10%, with minimal ability to detect CHIP clones <5%.

Our observations permit several conclusions. First, our sample size which is nearly an order of magnitude larger than prior CHIP analyses[2,3,14] enables refinement of CHIP phenotype associations at the level of CHIP driver genes. We find that considerable heterogeneity exists across CHIP phenotypes by driver gene. For example, IL-1b and IL-18 both activate through the inflammasome and increase IL-6. However, while TET2 CHIP is associated with increased levels of IL-1b, JAK2 and SF3B1 CHIP are associated with IL-18. Second, our work highlights multiple mechanisms through which germline genetic variation can shape somatic variation in hematopoietic stem cells. A set of the germline loci are associated with increased propensity to acquire mutations due to failure of genes that maintain genome integrity (e.g. TERT and CHEK2) and which have been implicated in stem cell maintenance/self-renewal (Bao et al, co-submitted manuscript). These loci are associated with acquisition of somatic mutations resulting in neoplasm in multiple tissues. Other germline loci are associated with increased hematopoietic stem cell self-renewal (e.g. TET2). While the TET2 locus is associated with increased risk of acquiring any CHIP driver mutations, it is not associated with cancer outside of the hematopoietic stem cell compartment. A third set of germline loci are associated with the acquisition of CHIP mutations in specific driver genes. This previously was described in the JAK2 46/1 haplotype leading to JAK2 p.V617F via a cis haplotype effect.[26-28] We now identify a novel DNMT3A CHIP specific locus at the TCL1A promoter specifically associated with increased risk of DNMT3A CHIP, but not other CHIP subsets. We identify a convergence of common and rare germline genetic predisposition to leukocyte telomere length, MPN, large scale somatic chromosomal mosaicism and CHIP, suggesting shared causal mechanisms. Importantly, to date, only CHIP with leukemogenic driver mutations (as opposed to somatic chromosomal mosaicism[9] or CHIP with unknown driver mutations[14]) has been robustly associated with non-oncologic diseases independently of age. The partially overlapping genetic predisposition we observe across these three clonal phenomena suggest that although there may be similar genetic architecture that predispose individuals to acquiring a somatic mutation, the specific change may be particularly relevant to atherosclerotic disease as opposed to the general phenomenon of clonal hematopoiesis itself. Third, our work underscores the benefits of studying genomes from individuals of diverse ancestries. The inclusion of a significant number of African Ancestry samples in TOPMed permitted the discovery of the TET2 locus which was not present in other ancestries. Further inclusion of diverse individuals in genomic analyses is likely to highlight additional new biological pathways. Important limitations of our study include reduced sensitivity for detecting CHIP with low allele fractions (VAF 2–5%) even with high-coverage whole genome sequencing. Ultrasensitive targeted sequencing can facilitate detection of such leukemogenic mutations at exceedingly low VAFs but the clinical consequences of this much more pervasive phenomenon, as well as determinants of progression to CHIP is not well understood currently.[29] Furthermore, the cross-sectional analyses of CHIP with non-genetic risk factors and biomarkers limit conclusions regarding temporal relationships between CHIP and these features; however, these observations still permit risk prediction for CHIP presence. Notably, inflammatory biomarker analyses are concordant with prior model experiments indicating elevations of observed inflammatory biomarkers as a consequence of CHIP.[5,10] Lastly, given the age-dependence of CHIP, it is likely that many individuals not observed to have CHIP in this study will develop CHIP in the future. Overall, comprehensive simultaneous germline and somatic analyses of blood-derived whole genome sequence data demonstrates that germline variation influences the acquisition of somatic mutations in blood cells. Importantly, we anticipate that the TOPMed CHIP dataset defined here will be a valuable tool in establishing associations of CHIP with diverse heart, lung, blood and sleep traits.

Methods

Study Samples

Whole genome sequencing (WGS) was performed on 97,691 samples sequenced as part of 52 studies contributing to the NHLBI TOPMed research program Freeze 6 release as previously described for discovery analyses.[7] An additional distinct set of 9,389 WGS sequenced samples from the NHLBI TOPMed Freeze 8 release were used for replicating the TET2 genetic association. Study designs include prospective cohorts, families, population isolates, and case-only collections. A subset of the studies focus on heart (~40%) or lung (~30%) phenotypes, with the remainder representing prospective population cohorts or electronic health record linked cohorts which have been assessed for many diverse phenotypes. None of the studies which comprise TOPMed selected individuals for sequencing on the basis of hematologic malignancy. Approximately 82% of participants are U.S. residents with diverse ancestry and ethnicity (40% European, 32% African, 16% Hispanic/Latino, 10% Asian). Each of the constituent studies provided informed consent on the participating samples. Details on participating cohorts and samples is provided in Supplementary Table 1. The age of participants at time of blood draw was obtained for a subset of 82,807 of the samples. The median age was 55, the mean age 52.5, and the maximum age 98. The age distribution varied across the constituent cohorts. Written informed consent was obtained from all human participants by each of the studies that contributed to TOPMed with approval of study protocols by ethics committees at participating institutions as summarized in Supplementary Table 1. Each study received institutional certification prior to deposition in dbGaP which certified that all relevant institutional ethics committees approved the individual studies and that the genomic and phenotypic data submission was compliant with all relevant ethical regulations. This certification was deposited in dbGaP along with the data. Secondary analysis of the TOPMed dbGaP data as described in this manuscript was approved by the Partners Healthcare Institutional Review Board. All relevant ethics committees approved this study and this work is compliant with all relevant ethical regulations.

WGS Processing, Variant Calling and CHIP annotation

BAM files were remapped and harmonized through a previously described unified protocol.[30] SNPs and short indels were jointly discovered and genotyped across the TOPMed samples using the GotCloud pipeline.[31] An SVM filter was trained to discriminate between true variants and low-quality sites. Sample quality was assessed through pedigree errors, contamination estimates, and concordance between self-reported sex and genotype inferred sex. Variants were annotated using snpEff 4.3. Putative somatic SNPs and short indels were called with GATK Mutect2[8] (https://software.broadinstitute.org/gatk). Briefly, Mutect2 searches for sites where there is evidence for variation, and then performs local reassembly. It uses an external reference of recurrent sequencing artifacts termed a “panel of normal samples” to filter out these sites, and calls variants at sites where there is evidence for somatic variation. The panel of normal samples used for our study included 100 randomly selected individuals under the age of 40 years. Absence of a hotspot CHIP mutation was verified prior to inclusion in the panel of normal set. An external reference of germline variants[32] was provided to filter out likely germline calls. We deployed this variant calling process on Google Cloud using Cromwell (https://github.com/broadinstitute/cromwell). The caller was run individually for each sample with the same settings. The Cromwell WDL configuration file is available from the authors upon request. Samples were annotated as having CHIP if the Mutect2 output contained one or more of a pre-specified list of putative CHIP variants as previously described[2,5] (Supplementary Table 2) at a VAF >2%.

WGS sensitivity to detect CHIP

To empirically demonstrate the sensitivity of CHIP detection and VAF, we re-analyzed sequence data from 30 samples with CHIP from a previously published cohort.[33] These samples were sequenced to >400x depth. We bioinformatically down-sampled the reads to the range of sequencing depths compatible with whole exome and whole genome sequencing. The TOPMed WGS samples were sequenced to a median depth of ~40x, although sequencing of any particular region was typically 30x-50x. Across this range of sequencing depths we observe robust ability to call CHIP with VAF >10%, which is the most clinically actionable subset of CHIP. We also capture approximately half of the CHIP calls in the VAF 5–10% range. To reliably capture CHIP in the 5–10% range requires ~100x sequencing depth commonly done in whole exome sequencing, but even at this sequencing depth the majority of the VAF 2–5% CHIP calls are not reliably detected. (Extended Data Figure 11)

Amplicon sequencing validation

To evaluate the fidelity of our TOPMed WGS CHIP dataset, we performed technical validation of 76 CHIP mutations in 72 samples using targeted deep sequencing. All 76 of 76 CHIP mutations identified with WGS were also identified with targeted deep sequencing. CHIP mutations were validated by single-molecule molecular inversion probe sequencing (smMIPS).[34] Capture probes were designed to tile all coding exons (+/− 5 bp) for 12 of the mostly highly prevalent CHIP genes plus four recurrent mutation hotspots, totaling 44.5 kb. Probes were synthesized as a pool by CustomArray, Inc., amplified using Q5 DNA polymerase (NEB) using outer flanking primers, and digested with BbsI-HF (NEB) to remove adaptors. For each sample, captures were performed with 500 ng gDNA and converted to dual-barcoded Illumina sequencing libraries as described.[35] Sequence capture libraries were pooled for paired-end 150 bp sequencing on a Hiseq 4000 lane. Resulting reads were aligned with bwa mem and processed using the mimips pipeline (source code at https://github.com/kitzmanlab/mimips) to trim capture probe sequences, and to remove reads with duplicated unique molecular identifiers. Somatic variants were called by MuTect2 as described above and confirmed by manual inspection with IGV.

Somatic Chromosomal Mosaic Detection

In order to assess the relationship between CHIP and clonal mosaicism reflecting chromosomal mutation, we sought to characterize large (megabase-scale) acquired chromosomal alterations leading to allelic imbalance using existing SNP array data on a subset of the samples in this analysis. To do so, we compared statistically reconstructed haplotypes (using MaCH[36]) with the patterns of “B allele” frequencies (BAFs), measured via SNP array. Regions of nonrandom similarities between the estimated haplotypes and BAFs were detected with hapLOH[37], and indicate acquired chromosomal alterations. We identified genomic allelic imbalance events using a threshold of a posterior probability for allelic imbalance > 0.8 and event size > 1Mb. We excluded allelic imbalance events with fewer than ten markers and removed potential germline duplications if a detected event exhibited the following: 1) 50% reciprocal overlap with database of genomic variants (DGV) and 2) was not determined to be a deletion or LRR deviations > 0.08, size < 5Mb and BAF deviations > 0.1. Phasing and event detection was performed in SyQADA.[38]

Blood traits

Conventionally measured blood cell counts and indices were selected for analysis including: hemoglobin, hematocrit, red blood cell count, white blood cell count, basophil count, eosinophil count, neutrophil count, lymphocyte count, monocyte count, platelet count, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, mean platelet volume and red cell distribution width. Phenotypes were collected by each cohort, centrally harmonized by the TOPMed Data Coordinating Center (DCC). Additional documentation about harmonization algorithms for each specific trait is available from the TOPMed DCC and accompanies the data on the dbGaP TOPMed Exchange area. Up to 37,653 individuals from 10 cohorts where utilized for this analysis that had one or more blood traits measured concurrently or following the blood draw used for CHIP ascertainment. Traits were first log2 normalized and then analyzed using a general linear regression model with CHIP status, age, sex, study and the first 10 ancestry principal components as covariates.

Lipid phenotypes

Conventionally measured plasma lipids, including total cholesterol, LDL-C, HDL-C, and triglycerides, were included for analysis. LDL-C was either calculated by the Friedewald equation when triglycerides were <400 mg/dl or directly measured. Given the average effect of statins, when statins were present, total cholesterol was adjusted by dividing by 0.8 and LDL-C by dividing by 0.7. Triglycerides were natural log transformed for analysis. Phenotypes were harmonized by each cohort and deposited into dbGaP TOPMed Exchange area as previously described.[39] Up to 28,310 individuals from 19 cohorts where utilized for this analysis that had one or more lipid trait measured concurrently or following the blood draw used for CHIP ascertainment. Lipid traits were first normalized for age, sex and ancestry principal components and then analyzed using a general linear regression model with CHIP status, age, sex, study and the first 10 ancestry principal components as covariates.

Inflammatory Markers

A set of makers previously implicated in mediating cardiometabolic disease were analyzed including: CD-40, CRP, E-Selectin, ICAM-1, IL-1b, IL-6, IL-10, IL-18, 8-epi PGF2a, Lp-PLA2 mass and activity, MCP1, MMP9, MPO, OPG, P-selectin, TNF-Alpha, TNF-Alpha Receptor 1, TNF-receptor 2. Phenotypes were collected by each cohort, centrally harmonized by the TOPMed DCC and then deposited into dbGaP TOPMed Exchange area. Additional documentation about harmonization algorithms for each specific trait is available from the TOPMed DCC and accompanies the data on dbGaP. Up to 22,092 individuals from 10 cohorts were utilized for this analysis that had one or more inflammatory marker measured concurrently or following the blood draw used for CHIP ascertainment. Inflammatory markers were first normalized using a log2(x+1) transformation and then analyzed using a general linear regression model with CHIP status, age, sex, study and the first 10 ancestry principal components as covariates.

Mutational Signatures

We identified all putatively somatic singleton mutations in a subset of the TOPMed samples that included 3,764 cases with a single CHIP driver mutation and a randomly sampled set of 5,000 controls. Variants were filtered to ensure a depth >=25 reads, a VAF < 35% and no overlap with the germline variant site list from TOPMed Freeze 5 (available: https://bravo.sph.umich.edu/freeze5/hg38/). Multiallelic variants and indels were excluded. We used https://cancer.sanger.ac.uk/cosmic/signatures_v2 as a reference for mutation signatures and the MutationalPatterns R package to estimate the contributions of the signatures.[13,40,41] We defined a signature as being “differentially observed” if at least 99% of its observations are in CHIP cases, or if at most 1% of its observations are in cases (i.e., one of cases or controls contains at least 99% of the signature observations).

Single Variant Association

Single variant association for each variant in Freeze 8 with MAF > 0.1% and MAC > 20 was performed with SAIGE[42], and analysis was performed using the TOPMed Encore analysis server (https://encore.sph.umich.edu). CHIP driver status was dichotomized into a case-control phenotype based on the presence of at least one driver mutation. Prior to running single variant association tests, a logistic mixed model was fit using the lme4 R package[43] to estimate the probability of the CHIP case control status conditional on a spline transformation of the centered age, genotype inferred sex, and cohort. The cohort was included as a random intercept which represents study specific contributions to the log-odds of CHIP at the mean sample age. Age was modeled with a spline to capture the non-linearity of the relationship between age and CHIP. This model was chosen over comparable models based on its AIC. Combining the age, inferred sex, and study into a single quantity aided the convergence of SAIGE compared to the inclusion of these terms separately. The first 10 principal components were also included as covariates. Given that CHIP is unlikely to manifest in younger individuals, these individuals are effectively censored in our analysis set – that is, a young individual that does not presently have CHIP may still develop CHIP in the future. To avoid the power loss associated with misclassification of controls, we pruned these individuals from our analysis set. The single variant association analysis was run on a pruned set of samples that excluded those which had less than a 1% probability CHIP as estimated by the aforementioned model. This excluded 21,712 samples leading to a final analysis set of 65,405 which was used for downstream association analyses.

Fine mapping

We applied FINEMAP 1.3[44] to the summary statistics from SAIGE, using the z-score and LD matrices as input. We fine-mapped the TET2 locus using the summary statistics from the African ancestry single variant summary statistics and estimated LD on the same set of samples using Plink 1.9. We set the maximum number of causal SNPs in the region to 10 and used a shotgun stochastic search.

Transcriptome-wide association analysis

Multi-tissue gene expression and eQTL data were retrieved from the Genotype-Tissue Expression (GTEx) project (https://www.gtexportal.org). We applied the unified test for molecular signatures (UTMOST)[20] to perform cross-tissue transcriptome-wide association analysis for CHIP. We used cross-tissue gene expression imputation models trained from 44 tissues in GTEx. Gene-level association meta-analysis was performed using the generalized Berk-Jones test implemented in UTMOST (https://github.com/Joker-Jerome/UTMOST). Statistical significance was determined using a Bonferroni corrected p-value cutoff 2.9 × 10-6.

Rare Variant Analyses

Collapsing burden tests were applied to specific variant grouping schemes using EPACTS (https://genome.sph.umich.edu/wiki/EPACTS). The same covariates as the single variant tests were used on the same set of samples. We used burden tests due to their limited compute requirements, which were considerable for the number of variants and samples tested. Two grouping schemes were specified: the first groups coding variation, and the second groups putative regulatory elements in a relevant cell line. The first used all putative LOF variants as identified by snpEff. Given that some variants were present in both the Mutect2 calls and the germline variant calls, we pruned the LOF variants to exclude variants that were present in both call sets. The second grouping scheme used all variants in regions that were predicted enhancers for CD34 cells that had CADD scores of at least 10. Predicted enhancers were identified by the activity-by-contact model.[45]

Predicting enhancer-gene regulation for TET2.

We used the Activity-by-Contact (ABC) model[46] to predict which enhancers regulate which genes in CD34+ hematopoietic progenitor cells, with minor modifications as follows. Briefly, this model predicts the effect of each putative regulatory element (defined as a DNase peak within 5Mb of a given promoter) by multiplying the Activity of each element (estimated from DNase-seq and H3K27ac ChIP-seq) by its Contact with a target promoter (estimated from Hi-C data). The ABC score of a single element on a gene’s expression is the predicted effect of that element divided by the sum of the predicted effects of all elements for a given gene. We identified putative regulatory elements by using MACS2 to call peaks in DNase-seq data from mobilized CD34+ hematopoietic progenitor cells from the Roadmap Epigenome Project (downloaded from http://egg2.wustl.edu/roadmap/data/byFileType/alignments/consolidated/E050-DNase.tagAlign.gz) Initially we considered all peaks with p-value < 0.1. To further refine this list, we kept the 100,000 peaks with the highest number of DNase-seq reads. We then resized these peaks to be 500 bp in length centered on the peak summit, merging any overlapping peaks, and removed any peaks overlapping ENCODE “blacklisted regions”[47] (regions of the genome previously observed to accumulate anomalous numbers of reads in epigenetic sequencing experiments; downloaded from https://sites.google.com/site/anshulkundaje/projects/blacklists). To this peak list, we added 500 bp regions centered on the transcription start site of all genes. Any overlapping regions resulting from these additions or extensions were merged. Within each putative regulatory element, we estimated enhancer Activity as the geometric mean of read counts from DNase-seq and H3K27ac ChIP-seq data from the Roadmap Epigenome Project (http://egg2.wustl.edu/roadmap/data/byFileType/alignments/consolidated/E050-DNase.tagAlign.gz, and E050-H3K27ac.tagAlign.gz). We estimated enhancer-promoter Contact from the KR-normalized Hi-C contact maps in primary CD34+ cells. we then calculated effect of each putative enhancer-gene connection by multiplying the Activity and Contact for that element and gene. Dividing the effect of each element by the sum of effects for all elements for a given gene yields the ABC score: To call predicted enhancer-gene connections, we used a threshold on the ABC score of 0.015. The rs79901204 variant overlapped an enhancer with ABC score of 0.0308 for TET2, which, based on comparison of ABC scores to large-scale enhancer perturbation datasets, corresponds to a positive predictive value of approximately 61%.

Functional Evaluation of TET2 locus

The genomic region containing risk and non-risk allele of the variant rs79901204 (600bp) was synthesized as gblocks (IDT Technologies) and cloned into the Firefly luciferase reporter constructs (pGL4.24) using NheI and EcoRV sites. The Firefly constructs (500ng) were co-transfected with pRL-SV40 Renilla luciferase constructs (50ng) into 100,000 K562 cells (ATCC) using Lipofectamine LTX (Invitrogen) according to manufacturer’s protocols. Cells were harvested after 48 hours and the luciferase activity measured by Dual-Glo Luciferase Assay system (Promega). K562 cell identity was validated using STR analysis. Mycoplasma testing was routinely performed on all cells used in the study and confirmed to test negative.

CRISPR/Cas9 editing of CD34+ human HSPCs

Editing of TET2 enhancer and TET2 coding regions was performed by electroporation of Cas9 Ribonucleoprotein complex (RNP) into CD34+ human HSPCs. CD34+ HSPCs from adult donors obtained from the Fred Hutchinson Cancer Research Center, Seattle, USA were thawed 24 hours prior to electroporation and cultured in HSC expansion conditions throughout the experiment (Stemspan II media with CC100 cytokine cocktail from Stem Cell Technologies and TPO (50ng/ul) and small molecule UM171 (35nM)). The RNP complex was made by mixing Cas9 (50 pmol) and modified sgRNAs from Synthego (100 pmol in total). HSPCs (3.75 × 10 5) resuspended in 20 μl P3 solution were mixed with RNP and transferred to Nucleocuvette strips for electroporation with program DZ-100 (Lonza 4D Nucleofector). TET2 gene expression was measured at 6 days post-electroporation. For enhancer deletion experiments two guides targeting 5’ and 3’ ends of the enhancer element was used simultaneously (ENH_sgRNA_1: GGATTCTGTATTCGTCTGTG & ENH_sgRNA_2: TCTACTCACAGGGCCCAATG). For TET2 coding disruption experiments single guides were used (TET2_CDS1: TGGAGAAAGACGTAACTTCG & TET2_CDS2: TCTGCCCTGAGGTATGCGAT). For negative control, a guide targeting AAVS1 site was used (GGGGCCACTAGGGACAGGAT). Editing efficiency of TET2 CDS and AAVS1 guides were measured by Sanger sequencing followed by TIDE analysis. Editing efficiency of TET2 enhancer deletion was measured by PCR and agarose gel electrophoresis.

Colony-forming unit cell assays

3 days post RNP-electroporation, 500 CD34+ HSPCs were plated in 1ml methylcellulose media (# H4034, Stem Cell Technologies). Primary CFU-C colonies were counted after 14 days. For the colony replating experiments, 2 weeks after the primary plating, the colonies from three pates were pooled, washed with PBS, and the cells were plated in new methylcellulose media at 25,000 cells/ml for an additional 2 weeks.

RNA-Sequencing and eQTL Analysis:

RNA-Sequencing was performed on peripheral blood mononuclear cells from a subset of the MESA cohort. Alignment to the GRCh38 reference genome was done using STAR 2.5.3a.[48] Gene Quantification and quality control was performed using RNA-SeQC 1.19.[49] For RNA-SeQC, isoforms were collapsed into a single transcript per gene using the procedure described at https://github.com/broadinstitute/gtex-pipeline/blob/master/gene_model/. Samples that failed the RNA-Seq QC, fingerprinting, or expression-based sex check were filtered out. Further details on the RNASeq pipeline are available here: https://www.nhlbiwgs.org/sites/default/files/TOPMed_RNAseq_pipeline_COREyr2.pdf Analysis was performed using samples from 247 African Americans from MESA cohort Exam 1. Transcript expression was converted to TPM units (transcripts per million) and log2-transformed for analysis consistent with the GTEx consortium[50] best practices. Analysis of rs79901204 with TET2 expression was performed using a linear mixed model adjusting for age at blood draw, sex, PC1–10 of population stratification from the WGS data, sequencing batch, and kinship relatedness matrix.

Genome-wide Methylation-QTL analysis of TET2 risk locus

Illumina Methylation EPIC 850K array data interrogating over 850,000 CpG DNA methylation sites was generated at the University of Washington’s Northwest Genomic Center from blood samples collected from African Americans at the Jackson Heart Study baseline exam. Fluorescent signal intensities were preprocessed with the R package minfi[51] using the normal-exponential out-of-band (noob) background correction method with dye-bias normalization. N = 1747 total samples (1097 women and 650 men) remained after severe outliers were identified and removed. 71 individuals were positive for CHIP and 100 were carriers of the rs79901204 variant. Methylation levels at each CpG site were then quantified as β values, defined as the ratio of intensities between methylated (M) and unmethylated (U) signals where β = M/(M+U+100). Values therefore ranged from β = 0 (completely unmethylated) to β = 1 (completely methylated). Batch correction for assay plate position was performed on the β values via ComBat.[52] Relative leukocyte cell counts (CD8+ T-lymphocytes, CD4+ T-lymphocytes, Natural Killer cells, B cells, Monocytes, and Granulocytes) were estimated as previously described by Houseman[53] and Horvath[52]. To investigate methylation in the TET2 locus, a linear mixed effects model was fitted using CpGassoc[53] in R 3.6.0 with rs79901204 as the predictor and the batch-corrected methylation β levels as the dependent variable, adjusting for age, sex, estimated cell counts, the top 10 principal components of genetic ancestry, and CHIP status. A Bonferroni corrected threshold of p = 5.8×10−8 was used to establish statistical significance.

Characterizing TOPMed CHIP.

a, There was marked heterogeneity of CHIP clone size as measured by variant allele fraction by CHIP driver gene. Violin plot spanning minimum and maximum values calculated on full dataset (Supplementary Table 3). Sample size for each element in violin plot displayed in Fig. 1, b, 90% of individuals with CHIP had only one CHIP driver mutation identified c, CHIP prevalence with age was highly concordant across sequenced cohorts. CHIP prevalence was estimated from a logistic mixed model with spline-transformed age, sex, and cohort included as predictors. The cohort was included as a random intercept. Sample size for each cohort listed in Supplementary Table 1. d, CHIP prevalence with age in this study (blue triangles, N=82,807) was highly consistent with previously observed CHIP prevalence (dots represent mean point prevalence with shaded area represents 95% confidence interval; NGenovese=12,380; NJaiswal = 17,182; NXie = 2,728).

CHIP age association by mutational mechanism, gene and overlap with somatic chromosomal mosaicism.

a, cumulative density plot of CHIP incidence with age stratified by single nucleotide variant (SNV) vs frameshift mutations. SNVs were observed in younger individuals than Frameshift mutations (N=4,939; two-sided wilcox rank sum test p=0.01). b, cumulative density plot of CHIP incidence with age stratified by driver gene. c, 855 elderly WHI individuals (mean age: 70) with both whole genome and the array genotyping data available were interrogated for large-scale mosaic chromosomal rearrangements. The two somatic events did not co-occur more than would be expected by chance (hypergeometric p=0.25).

CHIP associates with Blood, Lipid, and Inflammatory traits.

a, CHIP consistently associated with increased Red Cell Distribution Width (RDW). JAK2, SF3B1 and SRSF2 showed driver gene specific effects on blood traits (see Supplementary Table S5) b, CHIP status was not consistently associated with lipid traits, other than JAK2 CHIP which was associated with decreased total cholesterol and a trend towards decreased LDL (see Supplementary Table S6) c, CHIP status is associated with inflammatory markers, however notable heterogeneity existed across CHIP mutations (see Supplementary Table S7). Associations utilized a two-sided t-test from a multivariate general linear model including age, smoking, race and gender and study center and were not adjusted for multiple comparisons. Sample sizes and exact p-values for each phenotype are listed in Supplementary Tables 5–7.

CHIP passenger somatic mutation spectrum.

a, Singleton mutation counts by nucleotide context in CHIP Cases and Controls. b, Signature contribution in CHIP cases and controls identified differential enrichment

CHIP Single variant association regional association plots.

a, TERT locus b, TRIM59/KPNA4 locus c, TET2 locus. Two-sided association testing performed using SAIGE (N=65,405 individuals, see methods)

CHIP transcriptome-wide association study (TWAS) results across 48 tissues identified 7 significant loci.

UTMOST algorithm applied to CHIP genome wide association study results from n=65,405 individuals (see methods). Genomic coordinates listed on x-axis. P-value from generalized Berk-Jones test on Y axis. Multiple hypothesis corrected threshold, p<2.9 × 10−6 displayed as dotted red line.

Tissue-specific results from the top 9 overall UTMOST-significant genes.

UTMOST algorithm applied to CHIP genome wide association study results from n=65,405 individuals. P-value from generalized Berk-Jones test. eQTL z-scores for associations with P<0.05 are displayed in each bar. GTEX eQTL tissue listed on Y-axis.

CRISPR/Cas9 editing efficiency of TET2 Enhancer deletion in primary CD34+ HSPCs.

a, Schematic showing the position of the two sgRNAs used to delete the TET2 enhancer (512bp) containing rs79901204. B, Gel electrophoresis image of PCR products from genomic DNA of edited HSPCs indicating unedited (WT) and deletion bands at sgRNA target site. Percentages of deletion alleles determined by band intensity and is shown below each lane. The experiment contains 3 biological replicates and was performed once.

rs79901204 associated with genome wide differential methylation signal,

Methylation Quantitative Trait association results of rs79901204 variant with cpg methylation probes identify an altered peripheral leukocyte methylation profile genome wide in N = 1747 individuals. The strongest signal is at the chr4 TET2 locus. P-values on Y-axis derived from two-sided linear mixed effects model (see methods). To account for multiple hypothesis testing, a Bonferroni threshold of p < 5.8 × 10−8 was used to establish statistical significance.

Sensitivity of CHIP detection at various VAFs across sequencing depths.

A set of 30 samples from a previously published CHIP cohort (Gibbons et al, 2017) were computationally down sampled to 30x, 40x, 50x, 100x and 400x sequencing depth. TOPMed WGS data was typically in the 40x depth range across CHIP genes. WGS data has excellent sensitivity to detect CHIP clones with VAF >10%, and ~50% sensitivity to detect CHIP VAF 5–10%, with minimal ability to detect CHIP clones <5%.
  50 in total

1.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

2.  A common JAK2 haplotype confers susceptibility to myeloproliferative neoplasms.

Authors:  Damla Olcaydu; Ashot Harutyunyan; Roland Jäger; Tiina Berg; Bettina Gisslinger; Ingrid Pabinger; Heinz Gisslinger; Robert Kralovics
Journal:  Nat Genet       Date:  2009-03-15       Impact factor: 38.330

3.  Tet2 loss leads to increased hematopoietic stem cell self-renewal and myeloid transformation.

Authors:  Kelly Moran-Crusio; Linsey Reavie; Alan Shih; Omar Abdel-Wahab; Delphine Ndiaye-Lobry; Camille Lobry; Maria E Figueroa; Aparna Vasanthakumar; Jay Patel; Xinyang Zhao; Fabiana Perna; Suveg Pandey; Jozef Madzo; Chunxiao Song; Qing Dai; Chuan He; Sherif Ibrahim; Miloslav Beran; Jiri Zavadil; Stephen D Nimer; Ari Melnick; Lucy A Godley; Iannis Aifantis; Ross L Levine
Journal:  Cancer Cell       Date:  2011-06-30       Impact factor: 31.743

4.  Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly.

Authors:  Florian Zink; Simon N Stacey; Gudmundur L Norddahl; Michael L Frigge; Olafur T Magnusson; Ingileif Jonsdottir; Thorgeir E Thorgeirsson; Asgeir Sigurdsson; Sigurjon A Gudjonsson; Julius Gudmundsson; Jon G Jonasson; Laufey Tryggvadottir; Thorvaldur Jonsson; Agnar Helgason; Arnaldur Gylfason; Patrick Sulem; Thorunn Rafnar; Unnur Thorsteinsdottir; Daniel F Gudbjartsson; Gisli Masson; Augustine Kong; Kari Stefansson
Journal:  Blood       Date:  2017-05-08       Impact factor: 22.113

5.  An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

Authors:  Goo Jun; Mary Kate Wing; Gonçalo R Abecasis; Hyun Min Kang
Journal:  Genome Res       Date:  2015-04-16       Impact factor: 9.043

6.  Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects.

Authors:  Allison A Regier; Yossi Farjoun; David E Larson; Olga Krasheninina; Hyun Min Kang; Daniel P Howrigan; Bo-Juen Chen; Manisha Kher; Eric Banks; Darren C Ames; Adam C English; Heng Li; Jinchuan Xing; Yeting Zhang; Tara Matise; Goncalo R Abecasis; Will Salerno; Michael C Zody; Benjamin M Neale; Ira M Hall
Journal:  Nat Commun       Date:  2018-10-02       Impact factor: 14.919

7.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.

Authors:  Kristian Cibulskis; Michael S Lawrence; Scott L Carter; Andrey Sivachenko; David Jaffe; Carrie Sougnez; Stacey Gabriel; Matthew Meyerson; Eric S Lander; Gad Getz
Journal:  Nat Biotechnol       Date:  2013-02-10       Impact factor: 54.908

8.  Mosaic loss of chromosome Y is associated with common variation near TCL1A.

Authors:  Weiyin Zhou; Mitchell J Machiela; Neal D Freedman; Nathaniel Rothman; Nuria Malats; Casey Dagnall; Neil Caporaso; Lauren T Teras; Mia M Gaudet; Susan M Gapstur; Victoria L Stevens; Kevin B Jacobs; Joshua Sampson; Demetrius Albanes; Stephanie Weinstein; Jarmo Virtamo; Sonja Berndt; Robert N Hoover; Amanda Black; Debra Silverman; Jonine Figueroa; Montserrat Garcia-Closas; Francisco X Real; Julie Earl; Gaelle Marenne; Benjamin Rodriguez-Santiago; Margaret Karagas; Alison Johnson; Molly Schwenn; Xifeng Wu; Jian Gu; Yuanqing Ye; Amy Hutchinson; Margaret Tucker; Luis A Perez-Jurado; Michael Dean; Meredith Yeager; Stephen J Chanock
Journal:  Nat Genet       Date:  2016-04-11       Impact factor: 38.330

9.  Landscape of somatic mutations in 560 breast cancer whole-genome sequences.

Authors:  Serena Nik-Zainal; Helen Davies; Johan Staaf; Manasa Ramakrishna; Dominik Glodzik; Xueqing Zou; Inigo Martincorena; Ludmil B Alexandrov; Sancha Martin; David C Wedge; Peter Van Loo; Young Seok Ju; Marcel Smid; Arie B Brinkman; Sandro Morganella; Miriam R Aure; Ole Christian Lingjærde; Anita Langerød; Markus Ringnér; Sung-Min Ahn; Sandrine Boyault; Jane E Brock; Annegien Broeks; Adam Butler; Christine Desmedt; Luc Dirix; Serge Dronov; Aquila Fatima; John A Foekens; Moritz Gerstung; Gerrit K J Hooijer; Se Jin Jang; David R Jones; Hyung-Yong Kim; Tari A King; Savitri Krishnamurthy; Hee Jin Lee; Jeong-Yeon Lee; Yilong Li; Stuart McLaren; Andrew Menzies; Ville Mustonen; Sarah O'Meara; Iris Pauporté; Xavier Pivot; Colin A Purdie; Keiran Raine; Kamna Ramakrishnan; F Germán Rodríguez-González; Gilles Romieu; Anieta M Sieuwerts; Peter T Simpson; Rebecca Shepherd; Lucy Stebbings; Olafur A Stefansson; Jon Teague; Stefania Tommasi; Isabelle Treilleux; Gert G Van den Eynden; Peter Vermeulen; Anne Vincent-Salomon; Lucy Yates; Carlos Caldas; Laura van't Veer; Andrew Tutt; Stian Knappskog; Benita Kiat Tee Tan; Jos Jonkers; Åke Borg; Naoto T Ueno; Christos Sotiriou; Alain Viari; P Andrew Futreal; Peter J Campbell; Paul N Span; Steven Van Laere; Sunil R Lakhani; Jorunn E Eyfjord; Alastair M Thompson; Ewan Birney; Hendrik G Stunnenberg; Marc J van de Vijver; John W M Martens; Anne-Lise Børresen-Dale; Andrea L Richardson; Gu Kong; Gilles Thomas; Michael R Stratton
Journal:  Nature       Date:  2016-05-02       Impact factor: 49.962

10.  Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.

Authors:  Wei Zhou; Jonas B Nielsen; Lars G Fritsche; Rounak Dey; Maiken E Gabrielsen; Brooke N Wolford; Jonathon LeFaive; Peter VandeHaar; Sarah A Gagliano; Aliya Gifford; Lisa A Bastarache; Wei-Qi Wei; Joshua C Denny; Maoxuan Lin; Kristian Hveem; Hyun Min Kang; Goncalo R Abecasis; Cristen J Willer; Seunggeun Lee
Journal:  Nat Genet       Date:  2018-08-13       Impact factor: 38.330

View more
  102 in total

1.  Genome-wide association study identifies novel susceptibility loci for KIT D816V positive mastocytosis.

Authors:  Gabriella Galatà; Andrés C García-Montero; Thomas Kristensen; Ahmed A Z Dawoud; Javier I Muñoz-González; Manja Meggendorfer; Paola Guglielmelli; Yvette Hoade; Ivan Alvarez-Twose; Christian Gieger; Konstantin Strauch; Luigi Ferrucci; Toshiko Tanaka; Stefania Bandinelli; Theresia M Schnurr; Torsten Haferlach; Sigurd Broesby-Olsen; Hanne Vestergaard; Michael Boe Møller; Carsten Bindslev-Jensen; Alessandro M Vannucchi; Alberto Orfao; Deepti Radia; Andreas Reiter; Andrew J Chase; Nicholas C P Cross; William J Tapper
Journal:  Am J Hum Genet       Date:  2021-01-08       Impact factor: 11.025

Review 2.  Liquid biopsy enters the clinic - implementation issues and future challenges.

Authors:  Michail Ignatiadis; George W Sledge; Stefanie S Jeffrey
Journal:  Nat Rev Clin Oncol       Date:  2021-01-20       Impact factor: 66.675

3.  Blood's life history traced through genomic scars.

Authors:  Aswin Sekar; Benjamin L Ebert
Journal:  Nature       Date:  2022-06       Impact factor: 49.962

4.  Clonal hematopoiesis of indeterminate potential in patients with acute coronary syndrome undergoing percutaneous coronary intervention in the absence of traditional risk factors.

Authors:  Zaixin Jiang; Yi Li; Chenghui Yan; Xiaolin Zhang; Quanyu Zhang; Jing Li; Xiaoxiang Tian; Miaohan Qiu; Zhenyang Liang; Sichong Ma; Kun Na; Ziqi Li; Sanbao Chen; Yu Zhao; Zizhao Qi; Xiying Liu; Yaling Han
Journal:  Clin Res Cardiol       Date:  2022-06-15       Impact factor: 5.460

Review 5.  Cardio-onco-metabolism: metabolic remodelling in cardiovascular disease and cancer.

Authors:  Anja Karlstaedt; Javid Moslehi; Rudolf A de Boer
Journal:  Nat Rev Cardiol       Date:  2022-04-19       Impact factor: 32.419

6.  Premature Menopause, Clonal Hematopoiesis, and Coronary Artery Disease in Postmenopausal Women.

Authors:  Michael C Honigberg; Seyedeh M Zekavat; Abhishek Niroula; Gabriel K Griffin; Alexander G Bick; James P Pirruccello; Tetsushi Nakao; Eric A Whitsel; Leslie V Farland; Cecelia Laurie; Charles Kooperberg; JoAnn E Manson; Stacey Gabriel; Peter Libby; Alexander P Reiner; Benjamin L Ebert; Pradeep Natarajan
Journal:  Circulation       Date:  2020-11-09       Impact factor: 29.690

Review 7.  Clonal hematopoiesis of indeterminate potential (CHIP): Linking somatic mutations, hematopoiesis, chronic inflammation and cardiovascular disease.

Authors:  Christopher S Marnell; Alexander Bick; Pradeep Natarajan
Journal:  J Mol Cell Cardiol       Date:  2021-07-21       Impact factor: 5.000

Review 8.  Importance of clonal hematopoiesis in heart failure.

Authors:  Nicholas W Chavkin; Kyung-Duk Min; Kenneth Walsh
Journal:  Trends Cardiovasc Med       Date:  2021-04-20       Impact factor: 6.677

9.  A Single-Cell Analysis of DNMT3A-Mediated Clonal Hematopoiesis in Heart Failure.

Authors:  Megan A Evans; Kenneth Walsh
Journal:  Circ Res       Date:  2021-01-21       Impact factor: 17.367

10.  Germline ATG2B/GSKIP-containing 14q32 duplication predisposes to early clonal hematopoiesis leading to myeloid neoplasms.

Authors:  Jean Pegliasco; Pierre Hirsch; Christophe Marzac; Françoise Isnard; Jean-Côme Meniane; Caroline Deswarte; Philippe Pellet; Céline Lemaitre; Gwendoline Leroy; Graciela Rabadan Moraes; Hélène Guermouche; Barbara Schmaltz-Panneau; Florence Pasquier; Chrystelle Colas; Patrick R Benusiglio; Odile Bera; Jean-Henri Bourhis; Eolia Brissot; Olivier Caron; Samy Chraibi; Pascale Cony-Makhoul; Christine Delaunay-Darivon; Simona Lapusan; Flore Sicre de Fontbrune; Pascal Fuseau; Albert Najman; William Vainchenker; François Delhommeau; Jean-Baptiste Micol; Isabelle Plo; Christine Bellanné-Chantelot
Journal:  Leukemia       Date:  2021-06-25       Impact factor: 11.528

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.