Literature DB >> 34304711

Loss of KMT2C reprograms the epigenomic landscape in hPSCs resulting in NODAL overexpression and a failure of hemogenic endothelium specification.

Shailendra Maurya¹, Wei Yang², Minori Tamai¹, Qiang Zhang³, Petra Erdmann-Gilmore³, Amelia Bystry¹, Fernanda Martins Rodrigues³, Mark C Valentine¹, Wing H Wong¹, Reid Townsend³, Todd E Druley¹.

Abstract

Germline or somatic variation in the family of KMT2 lysine methyltransferases have been associated with a variety of congenital disorders and cancers. Notably, KMT2A-fusions are prevalent in 70% of infant leukaemias but fail to phenocopy short latency leukaemogenesis in mammalian models, suggesting additional factors are necessary for transformation. Given the lack of additional somatic mutation, the role of epigenetic regulation in cell specification, and our prior results of germline KMT2C variation in infant leukaemia patients, we hypothesized that germline dysfunction of KMT2C altered haematopoietic specification. In isogenic KMT2C KO hPSCs, we found genome-wide differences in histone modifications at active and poised enhancers, leading to gene expression profiles akin to mesendoderm rather than mesoderm highlighted by a significant increase in NODAL expression and WNT inhibition, ultimately resulting in a lack of in vitro hemogenic endothelium specification. These unbiased multi-omic results provide new evidence for germline mechanisms increasing risk of early leukaemogenesis.

Entities: Chemical

Keywords: Histone methyltransferases; chromatin remodelling; development; gene expression; hemogenic endothelium; mesoderm; pluripotency

Mesh：

Year: 2021 PMID： 34304711 PMCID： PMC8865227 DOI： 10.1080/15592294.2021.1954780

Source DB: PubMed Journal: Epigenetics ISSN： 1559-2294 Impact factor: 4.528

Introduction:

Paediatric cancers typically harbour relatively few somatic mutations and frequently demonstrate developmentally immature phenotypes, suggesting a contribution from germline variation that might result in aberrant tissue development [1]. Our group previously found an enrichment of heterozygous germline missense mutations in KMT2C in infants with leukaemia, compared to healthy controls [2]. This enrichment was independent of the presence of KMT2A fusions, which are the hallmark somatic mutation in infant leukaemia (>75% of cases) and occur in utero [3]. In mammals, somatic mutations of KMT2C and KMT2D are associated with various malignancies [4], with clear evidence for tumour suppressor roles [5,6]. Given the enrichment of KMT2C germline mutations in infant leukaemia and the genome-wide epigenetic changes mediated by KMT2C, we hypothesized that germline KMT2C dysfunction may adversely impact early developmental stages of haematopoiesis, or perhaps mesoderm more broadly. The human COMPASS complexes are comprised of highly conserved proteins from yeast to humans that regulate gene expression through histone modifications [7,8]. Six different lysine methyltransferases (KMT) anchor COMPASS complexes in higher eukaryotes and are categorized into three subgroups based on homologies in amino acid sequence and subunit composition: SET1A (NM_014712), SET1B (NM_015048); 2] MLL1/KMT2A (NM_05933), MLL2/KMT2B (NM_014727); 3] MLL3/KMT2C, (NM_170606), MLL4/KMT2D (NM_003482) [9]. While incompletely understood, the literature suggests that paralogs exert non-overlapping and highly specialized functions by regulating the transcription of discrete subsets of genes [9-11]. However, KMT2C and KMT2D are partially redundant in function [7,12], as both proteins play an essential role in mediating monomethylation at histone 3, lysine 4 (H3K4me1), primarily at enhancers [13]. In contrast, recent studies have highlighted other non-redundant and broad-ranging functions [14,15], such as a role for KMT2C-specific transcriptional regulation that is independent of its H3K4me1 activity on enhancers [16]. Other reports describe KMT2C-mediated histone trimethylation (H3K4me3) at promoters [17,18]. With respect to early development, KMT2C knockout (KO) mice die around birth with no apparent morphological abnormalities, while KMT2DKO mice showed early embryonic lethality around E9.5 [19]. Loss of KMT2C in mice also leads to aberrant myelopoiesis, causing myeloid infiltration into lymphoid organs; however, the loss of KMT2C alone was insufficient to drive leukaemia [20]. The role of KMT2C has also been characterized in nuclear receptor functioning [17,21,22], metabolism [23] and circadian rhythms [13,21]. While all SET and KMT2 proteins are epigenetic modifiers, each histone modification is associated with particular regulatory elements and mediates specific functions, enabling complex control over gene transcription [24]. KMT2C and KMT2D are associated with H3K4me1, which is a highly dynamic histone modification and correlates with cell type-specific gene expression profiles, whereas H3K4me3 marks ‘active’ promoters and is more invariant across cell types [reviewed by 19, 24]. H3K4me1, along with H3K27ac, mark ‘active’ enhancers, while the combination of H3K4me1 with H3K27me3 (mediated via polycomb proteins) is a repressive mark associated with ‘poised’ enhancers [25,26]. Currently, no comparative study exists describing the role of KMT2C and its epigenetic regulation in human pluripotent stem cells (hPSC). Pluripotent/precursor cells have multilineage potential, at the precursor stage, cells have pre-marked genomic regions, which cooperate in terminal transcriptional programmes for fate determination [27-30] and may vary in a quantifiable manner upon KMT2C dysfunction. We found that KMT2CKO hPSCs have a highly variable epigenetic landscape compared to their isogenic controls and are unable to complete the endothelial to haematopoietic transition in vitro. To interrogate this mechanism, we have performed a multi-omics analysis revealing that KMT2CKO human pluripotent cells have a transcriptional profile closer to mesendoderm with a significant upregulation of NODAL/TGFβ signalling.

Results

KMT2CKO hiPSCs retain pluripotency

Given our observation that infant leukaemia is enriched in heterozygous germline missense mutations in KMT2C [2], we interrogated the role of KMT2C in blood development by creating hPSC models (hiPSC and hESC) with isogenic KMT2C knockouts (Supp Figure 1a) amenable for directed haematopoietic differentiation in vitro. Changes in RNA and protein expression were specific to KMT2C loss (Supp Figure 1b-d). We next asked if the loss of KMT2C altered pluripotency. We observed no morphological differences between wild type and KMT2CKO cells (Supp Figure 2a) as well as comparable immunostaining for pluripotency markers Oct4, Sox2, and Nanog (Supp Figure 2b,c). In addition, teratoma assays were performed and the KMT2CKO line generated a teratoma demonstrating all three germ layers (Supp Figure 3), suggesting that loss of KMT2C does not overtly alter the hiPSCs pluripotent state.

Figure 1.

Figure 2.

RNA-seq reveals 319 differentially expressed genes in KMT2CKO hiPSCs. (a) Volcano plot for fold change in expression (Y-axis) against the log of the fold change (X-axis). (b) Triplicates of RNA-seq from each cell line show consistent differences in gene expression across 319 genes. (c) PCA analysis reveals that KMT2CKO cells more closely resemble mesendoderm than any of the three germ layers.

Figure 3.

Epigenome tracks showing histone modifications, ATAC-seq peaks and relative RNA expression for NODAL, its regulator CER1 and two of its ligands, BMP4 and WNT3 in WT and KMT2CKO hiPSCs. Each gene demonstrates higher expression in the KMT2CKO compared to WT (boxes in the RNA tracks) while NODAL, BMP4 and WNT3 show the expected decrease in H3K4me3 in the KMT2CKO (boxes in the H3K4me3 tracks).

KMT2CKO pluripotent cells fail to specify hemogenic endothelium. hiPSCs (A panels) and H1 hESCs (B panels) with and without KMT2C were directed through haematopoietic differentiation according to the protocol by Sturgeon et al. (Sturgeon CM, Nat Biotech 2014). In all four cell lines, comparable amounts of CD34+/CD43- cells were specified (A1, A2, B1, B2). To differentiate between arterial, venous and hemogenic endothelium, the CD34+ CD43- cells were further subsorted via CD73 and CD184. HE is CD73-CD184- (boxes). In both KMT2CKO pluripotent lines, there is a failure to specify hemogenic endothelium at levels equivalent to WT. Chi-square analyses found the decrease in hemogenic endothelium to be significant with p-values ≤0.01 for both human iPSCs and ESCs as listed in Experimental Procedures under ‘Directed hematopoietic differentiation.’ RNA-seq reveals 319 differentially expressed genes in KMT2CKO hiPSCs. (a) Volcano plot for fold change in expression (Y-axis) against the log of the fold change (X-axis). (b) Triplicates of RNA-seq from each cell line show consistent differences in gene expression across 319 genes. (c) PCA analysis reveals that KMT2CKO cells more closely resemble mesendoderm than any of the three germ layers. Epigenome tracks showing histone modifications, ATAC-seq peaks and relative RNA expression for NODAL, its regulator CER1 and two of its ligands, BMP4 and WNT3 in WT and KMT2CKO hiPSCs. Each gene demonstrates higher expression in the KMT2CKO compared to WT (boxes in the RNA tracks) while NODAL, BMP4 and WNT3 show the expected decrease in H3K4me3 in the KMT2CKO (boxes in the H3K4me3 tracks).

KMT2CKO human pluripotent cells fail to specify hemogenic endothelium in vitro

To identify potential haematopoietic phenotypes due to KMT2CKO, we next differentiated our hiPSCs to mesoderm and haematopoietic progenitors using published protocols for haematopoietic specification by Keller [31], which activates the Wnt pathway via exogenous application of the GSK3 inhibitor, CHIR99021, to specify definitive haematopoietic progenitors or the Wnt inhibitor, IWP2, to enable NODAL/Activin signalling and the specification of primitive haematopoiesis. As shown in Figure 1.A1 and 1.A2, both WT and KMT2CKO hiPSCs generate comparable numbers of CD34+ CD43- progenitors (6.03% and 6.19%, respectively). However, progenitors of the arterial, venous, and haematopoietic system are all CD34+ CD43-. To differentiate these subpopulations, these cells are then subsorted with CD73 and CD184. Hemogenic endothelium (HE) is CD73-CD184-, while venous endothelium is CD73+ CD184- and arterial endothelium is CD73midCD184+ [31]. As shown Figure 1.A3 and 1.A4, the KMT2CKO hiPSCs failed to specify CD73-CD184- HE compared to WT. To validate this observation, we established the same KMT2CKO in H1 hESCs and took these cells through the same directed differentiation (Figure 1.B1-B4). H1 WT and KMT2CKO hESCs exhibited the same morphology from pluripotency through mesoderm (consistent with our hiPSC teratoma assay results) and embryoid body formation (Supp Figure 4). We observed the same failure of hemogenic endothelium specification (Figure 1, B4, box). This failure was specific to KMT2C, as H1 hESCs transduced with a scrambled gDNA vector did give rise to HE (Supp Fig 5). Furthermore, KMT2CKO H1 ESCs showed a significant lack of colony-forming capacity for all primitive haematopoietic progenitors (Suppl Fig 6). In contrast, the subpopulations of venous and arterial endothelium were equivalent between WT and KMT2CKO hiPSCs and hESCs. Given this clear blood-specific phenotype due to KMT2C loss, we sought to identify a mechanism.

Figure 4.

Schematic overview of how the lack of KMT2C-mediated histone modifications in hPSCs alters cell fate specification in vitro.

RNAseq analysis identifies gene expression as similar to mesendoderm

We performed transcriptome analysis to identify gene expression differences and compare against known cell types. Pairwise comparisons identified 319 differentially expressed genes upon KMT2CKO (133 downregulated and 186 upregulated; Supp Table 1A,B) with a false discovery rate (FDR) <0.05 and log of fold change >2 (Figure 2a,b). Genes up/downregulated >10-fold in KMT2CKO compared to WT are listed in Table 1. In KMT2CKO, we observed the highest upregulated expression of NODAL, its ligands (BMP4, WNT3) and its regulators (FST, CER1, MIXL1, LEFTY1). Prior studies on human pluripotent cells have demonstrated that NODAL/TGFb contributes to the maintenance of pluripotency [32,33] and is regulated via OCT4(POU5F1)/SOX2 TF binding and blocks differentiation [34].

Table 1.

Genes whose expression is increased or decreased >10-fold in KMT2C KO

Down- or up-regulated	Rank	Gene	Fold change (FC)	Log(FC)
Down-regulated	1	RPS4Y1	−38.274	−3.920
	2	SPTSSB	−16.984	−3.355
	3	UCMA	−15.773	−2.970
	4	RHOH	−15.440	−3.552
	5	AC009078.1	−14.436	−4.149
	6	SMIM24	−13.782	−4.218
	7	CXCL5	−13.317	−3.577
	8	AP000688.2	−11.542	−2.907
	9	RAMP3	−10.443	−3.008
	10	MAGEH1	−10.297	−10.110
	11	ZNF208	−10.253	−2.930
	12	ZNF790-AS	−10.048
Up-regulated	1	NODAL	22.946	4.010
	2	FST	21.084	3.306
	3	CER1	15.480	5.027
	4	BMP4	15.207	2.998
	5	FOS	14.426	2.847
	6	GLIPR1L1	12.508	2.164
	7	HPGD	12.191	3.240
	8	TSPAN18	11.758	2.290
	9	GAD1	11.664	5.901
	10	MIXL1	11.172	3.566
	11	WNT3	10.734	4.397
	12	MT2A	10.447	2.590
	13	DUSP10	10.062	2.192
	14	USP3	10.038	2.050
	15	CYP26A1	10.012	2.461

Genes whose expression is increased or decreased >10-fold in KMT2C KO Collectively, these differentially expressed genes are part of one or more tightly integrated gene regulatory networks (Supp Fig 7). With these transcriptome data, RNA-seq read counts from human embryonic stem cell lines; HUES64 or H1 were obtained from ENCODE (https://www.encodeproject.org/) for each of the cellular phenotypes surveyed in the PCA plot shown in Figure 2c. From our transcriptome data for the KMT2CKO and WT lines, the top 100 differentially expressed genes (Supp Table 2) according to adjusted p values (not fold change) were compared against the ENCODE data. From this analysis, we conclude that the gene expression profile of KMT2CKO cells is closest to that of mesendoderm, suggesting that the lack of KMT2C prevents the cells from gene expression necessary to fully commit to either mesoderm or endoderm.

Table 2.

Active enhancer transcription factor (TF) binding motifs that are significantly enriched in either wild type (WT) or KMT2C KO

Enriched in WT or KO	Rank	Motif for which TF binding site	HOMER P-value	TF family subtype
Enriched in WT	1	OCT4-SOX2-TCF-NANOG	1E-34
Enriched in WT	2	OCT4	1E-34	Homeobox
Enriched in KMT2C KO	1	TAL1/SCL	1E-44	bHLH
	2	AR-halfsite	1E-21	NR
	3	ZFX	1E-21	ZF
	4	REST-NRSF	1E-20	ZF
	5	ASCL1	1E-20	bHLH
	6	ATOH1	1E-18	bHLH
	7	TCF12	1E-18	bHLH
	8	TCF21	1E-15	bHLH
	9	EBF1	1E-14	EBF
	10	NF1-halfsite	1E-14	CTF
	11	FOX-Ebox	1E-14	Forkhead
	12	SMAD4	1E-14	MAD
	13	SMAD2	1E-14	MAD
	14	FOXA1 (GSE26831)	1E-14	Forkhead
	15	FOXA1 (GSE27824)	1E-14	Forkhead
	16	MYOG	1E-14	bHLH
	17	FOXA2	1E-13	Forkhead
	18	REPIN1/AP4	1E-13	bHLH

Active enhancer transcription factor (TF) binding motifs that are significantly enriched in either wild type (WT) or KMT2C KO

KMT2CKO alters chromatin accessibility

We next performed ATAC-seq and compared regions of differential chromatin landscape between KMT2CKO and WT hiPSC lines. The absence of KMT2C resulted in a substantial decrease in ATAC-seq peaks at promoter regions (a 19.83% decrease at promoters up to 3 kb of a TSS), consistent with closed chromatin and inaccessible TF binding sites. Further, this decrease was followed by a commensurate increase of 23.04% in ATAC-seq peaks at introns, downstream sequences, and distal intergenic regions (Supp Fig 8A,B) in the KMT2CKO line, suggesting that KMT2C’s known activity at enhancers and distal regulatory elements is essential to first open specific promoters for TF binding and subsequent gene expression. The lack of KMT2C kept enhancers open for TF binding rather than allowing promoter binding sites to open. To identify which genes may be regulated via this mechanism, we next performed TF motif enrichment within these differentially accessible ATAC regions. Open chromatin in WT was significantly enriched for 87 different motifs (Supp Table 3). Of these, binding sites for CTCF and CTCFL were most significantly enriched followed by binding sites for several homeobox (OCT4/POU5F1, OCT6/POU3F1) and high mobility group (SOX2, SOX3, SOX6, SOX10, SOX15) TFs, which includes two of the Yamanaka factors and presumably localized to promoter regions lost upon KMT2CKO (Supp Fig 8A). In contrast, the absence of KMT2C resulted in rearrangement of available TF binding motifs resulting in maintenance of OCT4/POU5F1 and SOX2 binding sites, but a significant increase in binding sites for the Zinc finger proteins of the cerebellum (ZIC) and their reverse complement (‘Unknown ESC element’) along with several ETS TF family (ERG, FLI1, ETV1, ETV2, ETS1) binding sites (Supp Table 4). Lim et al. previously established that Zic proteins maintain pluripotency in murine ESCs under the regulation of Oct4/Pou5f1, Nanog and Sox2 [35], consistent with KMT2CKO cells retaining a pluripotent phenotype.

Table 3.

Poised enhancer transcription factor (TF) binding motifs that are significantly enriched in either wild type (WT) or KMT2C KO

Enriched in WT or KO	Rank	Motif for which TF binding site	HOMER P-value	TF family subtype
Enriched in WT	1	OCT4-SOX2-TCF-NANOG	1E-25
Enriched in KMT2C KO	1	TAL1/SCL	1E-59	bHLH
	2	ATOH1	1E-37	bHLH
	3	FOXL2	1E-32	Forkhead
	4	FOXA2	1E-29	Forkhead
	5	REST-NRSF	1E-27	ZF
	6	TCF12	1E-24	bHLH
	7	FOXA1 (GSE26831)	1E-24	Forkhead
	8	ASCL1	1E-23	bHLH
	9	TCF21	1E-21	bHLH
	10	FOX-Ebox	1E-21	Forkhead
	11	REPIN1/AP4	1E-21	bHLH
	12	MYOG	1E-19	bHLH
	13	‘Unknown ESC element’ (ZIC complementary sequence	1E-19	ZF
	14	FOXA1 (GSE27824)	1E-19	Forkhead
	15	FOXP1	1E-18	Forkhead
	16	SMAD2	1E-17	MAD
	17	ZIC	1E-17	ZF
	18	OLIG2	1E-16	bHLH
	19	EBF1	1E-16	EBF
	20	ZFX	1E-16	ZF
	21	AR-halfsite	1E-15	NR
	22	NFY	1E-15	NTF
	23	LHX1	1E-15	Homeobox
	24	NeuroD1	1E-14	bHLH
	25	SOX3	1E-13	HMG
	26	NF1-halfsite	1E-13	CTF

Poised enhancer transcription factor (TF) binding motifs that are significantly enriched in either wild type (WT) or KMT2C KO

The histone modification landscape in hPSCs

ATAC-seq or DNA hypersensitivity mapping does not distinguish between different types of regulatory elements (active enhancers, poised enhancers, and bivalent promoters), is biased, and provides little information on domain level features [36]. Therefore, to characterize these regulatory regions, we performed ChIPmentation for four histone modifications in WT and KMT2CKO hPSCs. The histone modifications defining ‘primed,’ ‘active’ and ‘poised’ enhancers as well as ‘bivalent’ versus ‘active’ promoters are listed in Experimental Procedures and Supplementary Table 5. ‘Active’ enhancers (AE) correlate with tissue-specific gene expression, while ‘poised’ enhancers (PE) correlate with potential gene expression at subsequent developmental stages [25,37,38]. As shown in Supp Fig 9A, the proportion of each histone modification was similarly distributed across the genome of WT and KMT2CKO. However, the absence of KMT2C resulted in a variable distribution of histone modification between cell lines. Given that AE are the primary targets of KMT2C, differences were primarily observed for H3K4me1 marks (80% difference) and H3K27ac marks (87% difference) (Supp Fig 9B,C). In contrast, H3K27me3 marks showed a lesser difference of 56% while H3K4me3 marks at promoters showed the least difference with only 19% of peaks different between wild type and KMT2CKO (Supp Fig 9D,E, respectively). (i) Comparison of the active enhancer landscape: The active enhancer landscape is mainly shaped by the cooperative binding of ubiquitous and cell-type-specific TFs [39]. As KMT2C is known to mediate the H3K4me1 at enhancers, we first compared the active enhancer landscape between wild type and KMT2CKO cells. As shown in the Venn diagram of Supp Fig 10A, there were a total of 29,161 active enhancer peaks called. Of these, only 2,231 (7.7%) were independent of KMT2CKO and shared between both lines. In contrast, 19,311 active enhancer peaks were specific to the KMT2CKO line, while 7,619 were specific to wild type. Supp Fig 10B,C shows the results of GO term analyses for the AE specific to WT versus KMT2CKO, respectively. In general, these results suggest movement away from cellular differentiation (WT) towards more functions associated with GTPase, RAS activity along with cellular junction organization and function (KMT2CKO). To identify the putative functions of these KMT2CKO-specific active enhancer subgroups, we performed TF binding motif analysis (Table 2). In wild type, consistent with the pluripotent status of the cells and the available TF binding motif analysis from ATAC-seq, the only significantly enriched active enhancer motifs were for the cooperative binding site for OCT4/POU5F1-SOX2-TCF-NANOG and OCT4/POU5F1 alone. In contrast, the absence of KMT2C demonstrated a loss of open OCT4 TF binding sites and a significant enrichment of 18 different active enhancer TF binding motifs. (ii) Comparison of poised enhancers: Poised enhancers (PE), marked by H3K4me1 and H3K27me3, are thought to be incapable of driving gene expression when cells are in a pluripotent state [26]. However, the loss of H3K27me3, coupled with the acquisition of H3K27 acetylation (H3K27ac), endows these enhancers with gene regulatory functions and converts PE to AE. Given this background, we identified overlapping H3K4me1 and H3K27me3 peaks to identify and compare PE marks between wild type and KMT2CKO pluripotent cells. In wild type hiPSCs, we identified 4,134 unique overlapping H3K4me1 and H3K27me3 marks compared to a nearly 4.5-fold increase of 18,593 unique overlapping marks in KMT2CKO hiPSCs. The absence of KMT2C demonstrated the expected increase in poised enhancers (Supp Fig 11A) with considerable differences in genomic loci. With respect to putative functional differences, GO terms associated with the PE marks specific to each hiPSC line generally changed from developmental functions in wild type to ion transport functions in KMT2CKO (Supp Fig 11B,C), supporting the lack of ion transport function by KMT2C and its broader role in guiding development. On the global level, the substantial increase in PE following KMT2C knockout suggested alterations in gene regulatory networks involved in differentiation and cell specification. These genome signatures are binding sites for pioneer transcription factors and lineage determining transcription factors [40-43]. To identify TF binding sites within these differential regions, we applied motif analyses for wild type-specific and KMT2CKO-specific PE signature (Table 3). In wild type, consistent with our results from ATAC-seq and AE ChIPmentation, the only significantly enriched PE motif was the cooperative binding site for OCT4/POU5F1-SOX2-TCF-NANOG (Table 3). In contrast, the absence of KMT2C demonstrated a loss of PE binding sites for OCT4 and a significant enrichment of 26 different PE TF binding motifs. Specifically, we noted that these motifs consisted primarily of binding sites for regulators of classic WNT1/b-catenin signalling (TAL1/SCL, ATOH1, TCF12, ASCL1) along with multiple forkhead proteins (FOXL2, FOXA2, FOXA1, FOX:Ebox, FOXP1). As pioneer TFs at PE are known to unmask chromatin domains during development [44], this finding is consistent with those of Wang et al., who found FOX TFs bound to PE during specification of hESC-derived endodermal lineage intermediates [45], suggesting that knocking out KMT2C may result in human pluripotent cells being primed for endodermal fate specification. (iii) Characterizing bivalent promoters. Pluripotent cells are enriched for promoters harbouring the activating H3K4me3 mark as well as the repressive H3K27me3 mark, a state called ‘bivalency.’ While bivalent promoters are not unique to pluripotent cells, they are enriched in these cell types, mainly marking developmental and lineage-specific genes, which are generally static but can be rapidly activated or repressed. While KMT2C is not known to have a direct role in establishing bivalency, a few studies have observed that KMT2C maintains H3K4me3 marks [17,18]. We identified a total of 5,568 bivalent promoters in WT and KMT2CKO cells (Supp Fig 12) without significant differences in TF binding motifs, which is consistent with KMT2C not having a clear role in establishing bivalency but more of an impact at distal regulatory elements.

Enhancers’ act synergistically to promote gene expression:

We next sought to investigate the relationship between enhancer status and gene expression. We assigned each identified enhancer to the nearest promoter, allowing a maximal distance of 500 kb between enhancer and target promoter. As expected, genes associated with AE show significantly higher average expression levels, followed by those associated with primed enhancers, then poised enhancers, and finally genes not associated with a marked enhancer (Supp Fig 13A,B; Supp Tables 6,7), regardless of WT or KMT2CKO. Additionally, the more AE associated with a given gene, that gene demonstrated significantly higher expression levels (Supp Fig 13 C,D; Supp Tables 8,9). These results support our interpretation that differences in active and poised enhancers are correlated with gene expression differences and that more active enhancers associated with a given gene results in significantly higher overall expression. This is further visualized in Figure 3 showing aligned ChIPmentation, ATAC-seq and RNA-seq peaks for NODAL, its regulator CER1 and two ligands, BMP4 and WNT3 compared between WT and knockout. All four genes have significant mRNA overexpression in KMT2CKO compared to WT along with a decrease in trimethylation of H3K4 and, to a lesser extent, H3K27. Supp Fig 7 shows the interconnected gene regulatory network(s) that include these four genes along with 30 others. Similar epigenome browser tracks for the remaining genes are shown in Supp Fig 14.

Proteome and phospho-proteome suggest heterogeneity between WT and KMT2CKO

Since proteins are the ultimate functional effectors of activity in biological systems, we sought to correlate our epigenetic and expression results with an unbiased survey of global proteomic and phospho-proteomic expression. KMT2CKO displayed consistent changes in the basal proteomic and phosphorylation status of proteins (Supp Fig 15A-D). Among 678 differentially expressed proteins, 331 proteins were upregulated, while 347 proteins were downregulated (Supp Fig 15B, Supp Table 10A,B). Only 299 phospho-proteins showed differential expression – 134 phosphoproteins were upregulated, and 165 proteins were down-regulated (Supp Fig 15D, Supp Table 11A,B). As expected, KMT2C was one of the most underexpressed proteins in both samples, and in accordance with RNA-seq downregulation, one of the most downregulated proteins was RPS4Y1 (logFC(−2.02); P-value 7.2 × 10−9) while, curiously, the most upregulated proteins were several metallothioneins (MT1A/B/E/F/G/H/M/X) along with LEFTY1 (logFC(0.93); P-value 5.0 × 10−5), another NODAL regulator. To further compare to our existing datasets (transcriptome and proteome), we correlated the normalized log10-transformed transcriptome and proteome expression values to each other. This showed a significant (p < 2.2e-16), but relatively weak, Pearson-correlation coefficient of 0.18 (Supp Fig 15E). This weak correlation between transcript and protein levels is consistent with our observation that KMT2CKO does not impact the cells’ pluripotent status but impacts their ability to differentiate. Overall, the types of over- and underrepresented gene set terms were similar between transcriptome, proteome and phospho-proteome (Supp Figure 9A,B) and overlapped with RNAseq GO analysis, suggesting that the transcriptome/proteome co-processing of our samples did not induce any significant biases in terms of functional complexity. To test for possible large-scale systematic compositional biases caused by KMT2C deletion, we performed gene set enrichment analysis via GAGE methodology from our differentially expressed proteome data with KEGG and gene ontology biological processes pathway data for WNT and NODAL pathways (Supp Table 12). Across three biological processes and the WNT pathway as a whole, the p-value for each analysis was ≤0.05, suggesting a pathway-specific enrichment for a change of function. In sum, these results suggest that deletion of KMT2C reprogrammes the cis-regulatory elements that may change the actual binding position of some master regulator (e.g., TAL1) at these cis-regulatory elements, thereby impacting terminal differentiation.

Discussion

The canonical WNT/β-catenin pathway is essential for multiple developmental milestones including haematopoietic specification [31,46] and aberrant Wnt signalling has been associated with subtypes of leukaemogenesis [reviewed in 47]. Physiologic Wnt signalling can also be regulated by KMT2A [48], but in KMT2A-rearranged leukaemias, which comprise more than 70% of infant leukaemia cases [49], Wnt signalling is fully dependent upon KMT2A [50]. Despite decades of model organism research on KMT2A-rearranged leukaemias, these fusions alone, when expressed at physiologic levels without other mutations, very rarely (if ever) induce a neo/perinatal leukaemia in murine models that phenocopies human infant leukaemia [51-53], suggesting additional factors were required for infant leukaemogenesis. To that end, we previously examined germline exomes from infant leukaemia patients and found a significant enrichment of missense germline variants in multiple COMPASS complex members, particularly KMT2C [2]. Against this context, we hypothesized that the missense germline KMT2C mutations in infant leukaemia skews normal blood development from the very start of mesoderm differentiation such that the resulting haematopoietic progenitors are more easily transformed with the addition of a somatic driver, such as a KMT2A fusion. To explore the role of KMT2C in pluripotent cells, we focused on a multi-omic, proteomic and functional study in hPSCs and found that the absence of KMT2C does not impair the pluripotent phenotype, but does result in a heavily altered epigenetic landscape leading to altered gene and protein expression and ultimately, a failure of hemogenic endothelium or primitive haematopoietic specification in vitro, leaving the resulting cells with a transcriptional profile closer to mesendoderm than mesoderm (Figures 2,4). Our ATAC-seq results for KMT2CKO demonstrated a global reduction in open chromatin at promoters that bind chromatin topology regulators CTCF and CTCFL/BORIS. Enhancer-promoter interactions are mediated by architectural proteins, such as CTCF, MEDIATOR, and COHESIN, which regulate the organization of topologically associated domains (TADs) in a cell- and gene-specific manner during development [54-56]. More specifically, CTCF is required for proper expression of Hox gene clusters during differentiation [55]. CTCF deletion alters chromatin structure and subsequent transcription of myeloid-specific factors [57] ultimately driving aberrant HOX gene transcription in AML [58]. Without KMT2C, open chromatin shifts to accessibility for binding sites associated with the zinc finger of the cerebellum (ZIC) family of C2H2 zinc finger TFs. This is consistent with a prior report showing that KMT2C/D loss leads to a global reduction of chromatin interactions at enhancers in the ES cells [59]. The fact that we do not find a commensurate increase of ZIC or decrease in CTCF/CTCRL mRNA/protein expression suggests that KMT2C (potentially as part of its COMPASS complex) mediates histone modifications that alter the chromatin landscape but does not directly regulate transcription or translation of either gene family. In contrast, murine models of Zic proteins have demonstrated that these transcription factors are essential for maintaining pluripotency of ES cells [60], but can also inhibit canonical Wnt/b-catenin signalling in vitro and in vivo [61]. Consistent with our results, Zic2 was previously found to be enriched at AE and PE in ES cells and is essential for chromatin accessibility and regulation of transcriptional programmes during development [60,62]. Furthermore, ZIC2 was shown to interact with SMAD2/SMAD3 and cause early developmental NODAL-dependent transcriptional alterations at FOXA2 targets [63]. Deregulation of ZIC proteins has been associated with at least 20 different cancer types [reviewed in 64]. In some cases, ZIC family members are overexpressed while in others, DNA methylation results in a lack of ZIC protein expression. The end result is disruption of either the canonical Wnt/β-catenin, TGFβ, or sonic hedgehog pathways in different cell types at different developmental stages, thereby contributing to transformation. We next annotated cis-regulatory elements modified via KMT2C by using histone modification patterns for AE and PE, which endow cells with the ability to interpret environmental cues correctly [37,65]. Transcription factor binding at active enhancers is a key determinant of tissue-specific gene expression [66-68] an essential step to execute developmental decisions for proper temporal and spatial control which is critical for embryonic development and correct fate decisions. We reasoned that uncovering the functionally relevant TFs associated with developmentally dynamic enhancers would identify lineage-specific regulators in controlling haematopoietic specification. Motif analysis of the KMT2C-dependent changes in AE and PE further complemented the shift towards open chromatin at ZIC binding sites, as we noted a shift away from OCT4/POU5F1 enhancers towards enhancers associated with NODAL and TGFβ signalling. Of the KMT2CKO-specific enriched AE and PE (Tables 3B, 4B), nearly all are associated with NODAL/TGFβ, but specifically ATOH1, ASCL1, TCF12, TAL1/SCL, SMAD2, multiple FOX genes, along with the same ZIC and its complementary TF binding motif observed in our ATAC-seq results. This strongly implies that the lack of KMT2C has resulted in these pluripotent cells turning off WNT/β-catenin in favour of NODAL/TGFβ signalling. This interpretation was further supported by transcriptome sequencing where the loss of KMT2C resulted in the largest fold expression increase in NODAL, itself, along with additional effectors: FST, BMP4, CER1, GAD1, MIXL1, and ligands WNT3 and BMP4. NODAL has been shown to be necessary for maintaining pluripotency in hESCs [69] as well as inhibiting mesoderm differentiation [70] and promoting endoderm differentiation [71]. Indeed, the KMT2CKO cells demonstrated a pluripotent phenotype, behaved identically to their isogenic WT counterparts in vitro from day 0 to day 3, and then failed to specify not only definitive hemogenic endothelium, all consistent with the increased NODAL expression, but also primitive haematopoietic progenitors which require NODAL/Activin signalling, suggesting that additional effectors remain inactive in the KMT2CKO cells. In summary, somatic mutations in KMT2C have been implicated in various cancers and germline, missense mutations have been associated with infant leukaemia, which trace transformation to in utero development [3]. Given the relationship between paediatric cancers and aberrant developmental mechanisms, we sought to interrogate the role of KMT2C starting at pluripotency rather than focusing on transformation of terminally differentiated haematopoietic cells. While this multi-omic and functional assessment of KMT2C in human pluripotent cells is unique and expansive, translational interpretation to human cancer phenotypes should be cautiously interpreted as germline or somatic KMT2C mutations are heterozygous and almost always missense, suggesting a hypomorphic, rather than null, impact on protein function that likely alters cellular behaviours in more subtle ways. With respect to germline variability, we postulate that such variation does not drive transformation, but merely creates a more easily transformed cell type such that when a stochastic driver mutation (e.g., KMT2A-fusion) is present at a critical developmental stage, transformation occurs. This model is consistent with other studies of aberrant development and early oncogenesis [1]. Future work will further explore this hypothesis with additional functional studies in vitro and in vivo.

Experimental procedures

Wild type and isogenic KMT2CKO hPSCs

Reprogrammed human inducible pluripotent stem cells (hiPSC) were generated from white blood cells collected from a healthy human male by the Washington University Genome Engineering and iPSC Core (GEiC). From this control line, the GEiC generated an isogenic, bi-allelic KMT2CKO line via CRISPR-guided non-homologous end-joining using a guide RNA targeting exon 3, resulting in truncation of the remaining 56 exons (Supp Figure 1a). The same process was also used for human embryonic stem cells (H1) (Wisconsin Stem Cell Bank). Supp Figure 1b-d demonstrates that the knockout of KMT2C was specific, compared to its paralogs, in hPSCs.

Teratoma assays

Teratoma assays were performed by the Washington University Mouse Genetics Core in the Division of Comparative Medicine (DCM) using the protocol published by Nelakanti [72]. Briefly, 1 × 106 cells diluted in 50 mL of Matrigel™ was injected bilaterally into the gastrocnemius of two NOD-SCID IL2Rgammanull (NSG) mice. After eight weeks, the mice were sacrificed, and the muscles harvested for tumours. Tumours only grew in one of the two mice, which were evaluated independently by veterinary pathologists at the DCM.

Directed haematopoietic differentiation

Directed haematopoietic differentiation of human pluripotent stem cells was performed as published by Sturgeon and colleagues [31]. Statistical analyses of the flow sorted cells shown in Figure 1 were performed by Chi-Square analysis as documented in the table below. The Chi-square statistic is 297.37. The p-value is <0.00001. The Chi-square statistic is 29.86. The p-value is <0.00001.

RNA sequencing

Cells were cultured to 70% confluency and then washed once with PBS, trypsinized and pelleted by centrifugation at 500 g for 10 min at 4°C. Cell pellets were transferred to the Genome Technology Access Center (GTAC) at Washington University for mRNA selection, sequencing library preparation, and sequencing on the Illumina NextSeq500 platform.

RNA-seq analysis

RNA-seq reads were aligned to the Ensembl release 72 primary assemblies with STAR version 2.5.1a [73]. Gene counts were derived from the number of uniquely aligned unambiguous reads by Subread: feature count version 1.4.6-p5 [74]. All gene counts were then imported into the R/Bioconductor package EdgeR [75], and TMM normalization size factors were calculated to adjust for samples for differences in library size. Ribosomal genes and genes not expressed in the smallest group size minus one sample greater than one count-per-million were excluded from further analysis. The TMM size factors and the matrix of counts were then imported into the R/Bioconductor package Limma [76]. Weighted likelihoods based on the observed mean-variance relationship of every gene and sample were then calculated for all samples with the voomWithQualityWeights [77]. The performance of all genes was assessed with plots of the residual standard deviation of every gene to their average log-count with a robustly fitted trend line of the residuals. Differential expression analysis was then performed to analyse for differences between conditions, and the results were filtered for only those genes with Benjamini-Hochberg FDR adjusted p-values ≤0.05.

ChIPmentation

ChIPmentation was carried out as previously described [78] with minor modifications. Briefly, cells were washed once with PBS followed by fixation using 1% formaldehyde in up to 1 ml PBS for 10 min at room temperature. Glycine was used to stop the reaction. Cells were collected at 500 g for 10 min at 4°C (subsequent work was performed in a 4°C cold room and used ice-cold buffers unless otherwise specified) and washed once with 150 µl ice-cold PBS supplemented with protease inhibitors (Thermo Scientific #A32955). After that, fixed cells were either stored at −80°C for future experiments or lysed in sonication buffer supplemented with a protease inhibitor, as described, and then sonicated in a Covaris microtube (AFA fibre crimp-cap) with a Covaris E220 sonicator using the following settings: Peak incident power: 200; Duty factor: 10%; Cycles per burst: 200; Treatment time: 150 seconds (or until the DNA fragments’ size is in the range of 250–700 bp). Following sonication, equilibration buffer was added into the lysate. Lysates were centrifuged at 14,000 RPM at 4°C for 10 minutes. Supernatant containing the sonicated chromatin was transferred into a 1.5 ml DNA LoBind Eppendorf tube for immunoprecipitation. For each immunoprecipitation, 20 µl magnetic DynabeadTM Protein A (Life Technologies) were washed twice and re-suspended in 2X PBS supplemented with 0.1% BSA. For each immunoprecipitation, 1 µg of the appropriate antibody (described below) was added and bound to beads by rotating at least 6 hours at 4°C. Blocked antibody and conjugated beads were then placed on a DYNAL Invitrogen magnetic bead separator, supernatant was aspirated, and the sonicated lysate was added to the beads followed by overnight incubation at 4°C on a rotator. Beads were washed as described in original protocol at 4°C (in a cold room) with various buffers as provided in the protocol. Beads were then re-suspended in 25 µl tagmentation mix (19 µl tagmentation buffer + 1 µl Tagment DNA enzyme supplemented with 5 µl nuclease free water) from the Nextera DNA Sample Prep kit (Illumina) and incubated at 37°C for 10 minutes in a thermocycler. The beads were washed with appropriate buffer (150 µl) per the protocol and then transferred into a 1.5 mL microfuge tube. Supernatant was immediately aspirated, leaving beads attached to the wall of the tube while in place on the magnetic separator. Bead pellets were then resuspended with 45 µl elution buffer supplemented with proteinase K (NEB) and incubated for 1 hour at 55°C and then 8–10 hours at 65°C to revert formaldehyde cross-linking. After placing on the DYNAL Invitrogen magnetic bead separator, the supernatant was transferred to a clean microfuge tube, and the beads were discarded. Finally, DNA was purified via MinElute kit (Qiagen). From this purified DNA, qPCR was performed as described in the protocol to estimate the optimum number of enrichment cycles. The final enrichment of the libraries was then performed according to protocol and subsequently purified using AMPure XP beads followed by a size selection to recover libraries with a fragment length of 250–400 bp prior to sequencing.

Antibodies used in ChIPmentation

ChIP antibodies were purchased from Diagenode: H3K4me3 (#C15410003), H3K4me1 (#C15410037), H3K27ac (#C15410174), H3K27me3 (#C15410069), Rabbit IgG (#C15410206).

ChIPmentation analysis

Biological replicates were prepared for each histone modification – H3K4me1, H3K4me3, H3K27ac, and H3K27me3 – in both WT and KMT2CKO hiPSC along with two replicates of rabbit IgG as a negative control. Raw sequence reads were processed using the ENCODE Transcription factor and Histone ChIP-Seq processing pipeline (http://github.com/ENCODE-DCC/chip-seq-pipeline2), accessed 27 February 2019). The pipeline filtered and mapped the reads to hg19, validated the quality of the data, and generated fold change signal tracks over the control samples using MACS2. Peaks were further called using epic2 [79] using a false discovery rate (FDR) of 0.05, enabling both broad and narrow histone mark peaks to be efficiently identified. Motif search around enhancer signal and bivalent promoters signal were conducted using homer v4.8.3 (http://homer.ucsd.edu/homer/index.html). Identifying an enrichment of differential peak-associated genes as called by Gene Ontology (http://geneontology.org) was performed using Bioconductor R package clusterProfiler v3.12.0 [80].

ATAC-seq library preparation, sequencing, and analysis

To map chromatin accessibility, we used the Assay for Transposase Accessible Chromatin (ATAC-seq) protocol optimized by Semenkovich [81]. Sequence reads were demultiplexed and mapped using bowtie (http://bowtie-bio.sourceforge.net/index.shtml) to hg19. Peaks were identified, and signal tracks were generated with MACS2 using the ENCODE ATAC-seq pipeline (http://github.com/ENCODE-DCC/atac-seq-pipeline), assessed on 13 May 2019. Consistency among replicates was assessed based on Irreproducible Discovery Rates (IDR). Differential binding peaks between KMT2CKO and wild type were identified with the R package DiffBind using an FDR <0.05. Signal tracks of fold enrichment were visualized with the WashU Epigenome browser (https://epigenomegateway.wustl.edu).

Definition of enhancers and promoters

As listed in Supp Table 5, promoters were defined as non-overlapping −1kb and +1kb intervals around transcription start sites (TSS). Enhancers were defined by H3K4me1 peaks and were assigned to their closest promoter, allowing for a maximum distance of 500 kb. Active enhancers were those overlapped with H3K27ac peaks. Poised and primed enhancers were assigned to promoters after excluding those associated with any active enhancers. Poised enhancers overlapped with H3K27me3, whereas primed enhancers did not. Promoters were defined by H3K4me3 peaks within 1kb of TSS. Bivalent promoters were defined by the overlapping peak of H3K4me3 and H3K27me3.

Peptide preparation, isobaric labelling, and off-line fractionation for LC-MS

The frozen cell pellets (~10 million cells) were solubilized [82] in 0.5 mL of 8 M urea buffer (8 M urea, 75 mM NaCl, 50 mM Tris (pH 8.0), 1 mM EDTA, 2 µg/mL aprotinin, 10 µg/mL leupeptin, 1 mM PMSF, 1:100 vol/vol Phosphatase Inhibitor Cocktail 2, 1:100 vol/vol Phosphatase Inhibitor Cocktail 3, 10 mM NaF) with ultrasonication using a Covaris S220X sonicator (Peak Incident Power: 150 W, Duty Factor: 10%, cycles/burst: 500, time: 8 min, temp: 4°C). The protein content was determined by the bicinchoninic acid (BCA) method as shown in Supp Table 13. For the reference pool, 60 µg from each sample was combined and 2 × 250 µg was processed with the samples. A protein aliquot (250 µg) was digested with trypsin after reduction and alkylation of disulphide bonds. Peptides were prepared and labelled with tandem mass tag reagents prior to off-line fractionation using high-pH reversed phase chromatography [82]. Aliquots of the twenty-five fractions (~0.5 μg) were analysed using LC-MS. The 25 fractions were further combined to 13 fractions for phosphopeptide enrichment as previously described [82] and analysed by LC-MS.

Nano-LC-MS

The samples in 1% (vol/vol) aqueous FA were loaded (2.5 µL) onto a 75 µm i.d. × 50 cm Acclaim® PepMap 100 C18 RSLC column (Thermo-Fisher Scientific) on an EASY nano-LC (Thermo Fisher Scientific). The column was equilibrated using constant pressure (700 bar) with 11 μL of solvent A (1% (vol/vol) aqueous FA). The peptides were eluted using the following gradient programme with a flow rate of 300nL/min and using solvents A and B (1% (vol/vol) FA/MeCN): solvent A containing 5% B for 5 min, increased to 23% B over 105 min, to 35% B over 20 min, to 95% B over 1 min and constant 95% B for 19 min. The data were acquired in data-dependent acquisition (DDA) mode. The MS1 scans were acquired with the Orbitrap™ mass analyser over m/z = 350 to 1500 and resolution set to 70,000. Twelve data-dependent high-energy collisional dissociation spectra (MS2) were acquired from each MS1 scan with a mass resolving power set to 35,000, a range of m/z = 100–2000, an isolation width of 1.2 m/z, and a normalized collision energy setting of 32%. The maximum injection time was 60 ms for parent-ion analysis and 120 ms for product-ion analysis. The ions that were selected for MS2 were dynamically excluded for 40 sec. The automatic gain control (AGC) was set at a target value of 3e6 ions for MS1 scans and 1e5 ions for MS2. Peptide ions with charge states of one or ≥7 were excluded for HCD acquisitions.

Protein identification

The unprocessed MS data from the mass spectrometer were converted to peak lists using Proteome Discoverer (version 2.1.0.81, Thermo-Fisher Scientific) with the integration of reporter-ion intensities of TMT 10-plex at a mass tolerance of ±3.15 mDa. The MS2 spectra with charges +2, +3 and +4 were analysed using Mascot software [83] (Matrix Science, London, UK; version 2.5.1). Mascot was set up to search against a SwissProt database of human (version June 2016, 20,237 entries) and common contaminant proteins (cRAP, version 1.0 Jan. 1st, 2012, 116 entries), assuming the digestion enzyme was trypsin/P with a maximum of 4 missed cleavages allowed. The searches were performed with a fragment ion mass tolerance of 0.02 Da and a parent ion tolerance of 20 ppm. Carbamidomethylation of cysteine was specified in Mascot as a fixed modification. Deamidation of asparagine, formation of pyro-glutamic acid from N-terminal glutamine, acetylation of protein N-terminus, oxidation of methionine, and pyro-carbamidomethylation of N-terminal cysteine were specified as variable modifications. Peptide spectrum matches (PSM) were filtered at 1% false-discovery rate (FDR) by searching against a reversed database and the ascribed peptide identities were accepted. The uniqueness of peptide sequences among the database entries was determined using the principal of parsimony. Protein identities were inferred using a greedy set cover algorithm and the identities containing ≥2 Occam’s razor peptides were accepted [84].

Protein relative quantification

The processing, quality assurance, and analysis of TMT data were performed with proteoQ (version 1.0.0.0, https://github.com/qzhang503/proteoQ), a tool developed with the tidyverse approach [85,86] under the free software environment for statistical computing and graphics, R (R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/) and RStudio (RStudio Team (2016). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/). Briefly, reporter-ion intensities under 10-plex TMT channels were first obtained from Mascot, followed by the removal of PSM entries from shared peptides or with intensity values lower than 1E3. Intensity of PSMs was converted to logarithmic ratios at base two, in relative to the average intensity of reference samples within a 10-plex TMT. Under each TMT channel, Dixon’s outlier removals were carried out recursively for peptides with greater than two identifying PSMs. The median of the ratios of PSM that can be assigned to the same peptide was first taken to represent the ratios of the incumbent peptide. The median of the ratios of peptides was then taken to represent the ratios of the incumbent protein. To align protein ratios under different TMT channels, likelihood functions were first estimated for the log-ratios of proteins using finite mixture modelling, assuming two-component Gaussian mixtures (R package: mixtools: normalmixEM) [87]. The ratio distributions were then aligned in that the maximum likelihood of the log-ratios is centred at zero for each sample. Scaling normalization was performed to standardize the log-ratios of proteins across samples. To discount the influence of outliers from either log-ratios or reporter-ion intensities, the values between the 5th and 95th percentile of log-ratios and 5th and 95th percentile of intensity were used in the calculations of the standard deviations.

Informatic and statistical analysis

Metric multidimensional scaling (MDS) and Principal component analysis (PCA) of protein log2-ratios were performed with the base R function stats:cmdscale and stats:prcomp, respectively. Heat-map visualization of protein log2-ratios was performed with pheatmap (https://rdrr.io/cran/pheatmap/). Linear modelling was performed using the contrast fit approach in Limma [76], to assess the statistical significance in protein abundance differences between indicated groups of contrasts. Adjustments of p-values for multiple comparisons were performed with Benjamini-Hochberg (BH) correction. Click here for additional data file.

hiPSCs
	Non-hemogenic endothelium	Hemogenic endothelium	Row totals
Wild type	5680 (5871.35) [6.24]	498 (306.65) [119.41]	6178
KMT2CKO	4487 (4295.65) [8.52]	33 (224.35) [163.21]	4520
Column Totals	10,167	531	10,698
The Chi-square statistic is 297.37. The p-value is <0.00001.
H1 ESCs
Wild type	2170 (2200.7) [0.43]	139 (108.3) [8.7]	2309
KMT2CKO	1000 (969.3) [0.97]	17 (47.7) [19.76]	1017
Column Totals	3170	156	3326
The Chi-square statistic is 29.86. The p-value is <0.00001.

84 in total

Review 1. Nanog and transcriptional networks in embryonic stem cell pluripotency.

Authors: Guangjin Pan; James A Thomson
Journal: Cell Res Date: 2007-01 Impact factor: 25.617

2. Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses.

Authors: Ruijie Liu; Aliaksei Z Holik; Shian Su; Natasha Jansz; Kelan Chen; Huei San Leong; Marnie E Blewitt; Marie-Liesse Asselin-Labat; Gordon K Smyth; Matthew E Ritchie
Journal: Nucleic Acids Res Date: 2015-04-29 Impact factor: 16.971

3. Transcriptional enhancers in animal development and evolution.

Authors: Mike Levine
Journal: Curr Biol Date: 2010-09-14 Impact factor: 10.834

4. Histone H3K4 trimethylation by MLL3 as part of ASCOM complex is critical for NR activation of bile acid transporter genes and is downregulated in cholestasis.

Authors: M Ananthanarayanan; Yanfeng Li; S Surapureddi; N Balasubramaniyan; Jaeyong Ahn; J A Goldstein; Frederick J Suchy
Journal: Am J Physiol Gastrointest Liver Physiol Date: 2011-02-17 Impact factor: 4.052

5. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities.

Authors: Sven Heinz; Christopher Benner; Nathanael Spann; Eric Bertolino; Yin C Lin; Peter Laslo; Jason X Cheng; Cornelis Murre; Harinder Singh; Christopher K Glass
Journal: Mol Cell Date: 2010-05-28 Impact factor: 17.970

6. Development of definitive endoderm from embryonic stem cells in culture.

Authors: Atsushi Kubo; Katsunori Shinozaki; John M Shannon; Valerie Kouskoff; Marion Kennedy; Savio Woo; Hans Joerg Fehling; Gordon Keller
Journal: Development Date: 2004-03-03 Impact factor: 6.868

7. Mll3 and Mll4 Facilitate Enhancer RNA Synthesis and Transcription from Promoters Independently of H3K4 Monomethylation.

Authors: Kristel M Dorighi; Tomek Swigut; Telmo Henriques; Natarajan V Bhanu; Benjamin S Scruggs; Nataliya Nady; Christopher D Still; Benjamin A Garcia; Karen Adelman; Joanna Wysocka
Journal: Mol Cell Date: 2017-05-05 Impact factor: 17.970

8. CTCF boundary remodels chromatin domain and drives aberrant HOX gene transcription in acute myeloid leukemia.

Authors: Huacheng Luo; Fei Wang; Jie Zha; Haoli Li; Bowen Yan; Qinghua Du; Fengchun Yang; Amin Sobh; Christopher Vulpe; Leylah Drusbosky; Christopher Cogle; Iouri Chepelev; Bing Xu; Stephen D Nimer; Jonathan Licht; Yi Qiu; Baoan Chen; Mingjiang Xu; Suming Huang
Journal: Blood Date: 2018-05-14 Impact factor: 22.113

9. Excess congenital non-synonymous variation in leukemia-associated genes in MLL- infant leukemia: a Children's Oncology Group report.

Authors: M C Valentine; A M Linabery; S Chasnoff; A E O Hughes; C Mallaney; N Sanchez; J Giacalone; N A Heerema; J M Hilden; L G Spector; J A Ross; T E Druley
Journal: Leukemia Date: 2013-12-04 Impact factor: 11.528

Review 10. Aberrant Wnt Signaling in Leukemia.

Authors: Frank J T Staal; Farbod Famili; Laura Garcia Perez; Karin Pike-Overzet
Journal: Cancers (Basel) Date: 2016-08-26 Impact factor: 6.639

3 in total

Review 1. Role of Enhancers in Development and Diseases.

Authors: Shailendra S Maurya
Journal: Epigenomes Date: 2021-10-04

2. Genomic crossroads between non-Hodgkin's lymphoma and common variable immunodeficiency.

Authors: Kissy Guevara-Hoyer; Jesús Fuentes-Antrás; Eduardo de la Fuente-Muñoz; Miguel Fernández-Arquero; Fernando Solano; Pedro Pérez-Segura; Esmeralda Neves; Alberto Ocaña; Rebeca Pérez de Diego; Silvia Sánchez-Ramón
Journal: Front Immunol Date: 2022-08-05 Impact factor: 8.786

Review 3. Advances in molecular characterization of myeloid proliferations associated with Down syndrome.

Authors: Jixia Li; Maggie L Kalev-Zylinska
Journal: Front Genet Date: 2022-08-10 Impact factor: 4.772

3 in total