Literature DB >> 35015755

The activity of human enhancers is modulated by the splicing of their associated lncRNAs.

Jennifer Yihong Tan¹, Ana Claudia Marques¹.

Abstract

Pervasive enhancer transcription is at the origin of more than half of all long noncoding RNAs in humans. Transcription of enhancer-associated long noncoding RNAs (elncRNA) contribute to their cognate enhancer activity and gene expression regulation in cis. Recently, splicing of elncRNAs was shown to be associated with elevated enhancer activity. However, whether splicing of elncRNA transcripts is a mere consequence of accessibility at highly active enhancers or if elncRNA splicing directly impacts enhancer function, remains unanswered. We analysed genetically driven changes in elncRNA splicing, in humans, to address this outstanding question. We showed that splicing related motifs within multi-exonic elncRNAs evolved under selective constraints during human evolution, suggesting the processing of these transcripts is unlikely to have resulted from transcription across spurious splice sites. Using a genome-wide and unbiased approach, we used nucleotide variants as independent genetic factors to directly assess the causal relationship that underpin elncRNA splicing and their cognate enhancer activity. We found that the splicing of most elncRNAs is associated with changes in chromatin signatures at cognate enhancers and target mRNA expression. We provide evidence that efficient and conserved processing of enhancer-associated elncRNAs contributes to enhancer activity.

Entities: Chemical

Mesh：

Substances：
RNA, Long Noncoding

Year: 2022 PMID： 35015755 PMCID： PMC8803168 DOI： 10.1371/journal.pcbi.1009722

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

Classically, enhancers are defined as regulatory DNA elements that positively regulate temporal and spatial expression of their target genes, in cis. Transcription activation by enhancers requires transcription factor-dependent recruitment of coactivating complexes and three-dimensional genome rearrangements that bring enhancers into close proximity of their target promoters (reviewed in [1]). Whereas enhancer activity, as classically defined, is solely determined by its sequence and genomic location, most, if not all, active enhancers are transcribed [2-4]. Initial evidence of this phenomenon emerged from the characterisation of some of the first identified enhancers [5]. Pervasive transcription of active enhancers was later confirmed genome-wide, initially in neurons [3] and macrophages [2] and more recently across a number of human and mouse cell types [6,7]. These studies revealed that enhancer transcription often precedes target promoter activation [2] and is positively correlated with target gene expression [2,3,7]. The majority (>95% [8-10]) of enhancers is transcribed bidirectionally and gives rise to relatively short single exonic RNAs that are non-polyadenylated and unstable, generally referred to as eRNAs [3]. The remaining enhancers transcribe enhancer-associated long noncoding RNAs (elncRNAs [8,9,11]) predominantly in one direction [10]. In contrast with eRNAs, elncRNAs are relatively stable, polyadenylated and can be spliced [10]. Transcription has been found to strengthen cognate enhancer activity through various non-mutually exclusive mechanisms. For example, enhancer transcription facilitates binding of enhancer factors, such as CREBBP [12], or chromatin remodeling complex, including Cohesin and Mediator, that in turn induce RNA-dependent changes in local chromatin conformation [13-15]. Enhancer transcription has also been proposed to regulate the load, pause and release of RNA Polymerase II (RNAPlI) [16,17]. While the contribution of transcription to enhancer function is by now relatively well established, the importance of the transcribed RNAs has only been demonstrated for a handful of anecdotal transcripts (for example, [13,18-21]). In general, the absence of evolutionary constraint at eRNA/elncRNA exons [11] argues that the function of most eRNAs/elncRNAs is unlikely to be encoded within their sequences. Yet a large fraction of elncRNAs is multi-exonic and recently, genome-wide analysis in multiple human cell lines [9] and mouse Embryonic Stem Cells [8] revealed that their splicing positively impacts enhancer function. Enhancers that transcribe multi-exonic elncRNAs are enriched in enhancer-specific chromatin signatures, elevated binding of co-transcriptional regulators, increased local intra-chromosomal DNA contacts, and strengthened cis-regulation on target gene expression [8,9]. What remains currently unclear is the causal relationship between elncRNA splicing and enhancer activity. High enhancer activity may promote transcription across splice/donor acceptor site-like sequence, which are common in mammals [22], leading to unregulated/noisy splicing of elncRNAs. Arguing against this is evidence that mutations disrupting elncRNA splicing directly decreased their cognate enhancer activity. For example, removal of the intronic sequence of the locus encoding for the Haunt elncRNA significantly reduced its cis regulatory function [23], whereas mutations disrupting the splicing of Blustr, another enhancer-associated lncRNA, were sufficient to negatively impact the expression of its cis target [24]. Whether the splicing of most elncRNAs directly impacts enhancer activity, as exemplified by Blustr and Haunt, or whether it is an inconsequential result of their cognate enhancer’s high activity remains unknown. We used a combination of integrative and population genomics approaches to address this outstanding question. We showed that splicing related motifs within elncRNAs evolved under selective constraint, consistent with their processing being biologically relevant. Using a mediation-based approach, we performed causal inference testing using genetic variants as independent factors to test the underlying causal relationship between elncRNA splicing and their cognate enhancer function. We showed that splicing of most elncRNAs causally contribute to target gene regulation. Furthermore, nucleotide variants that disrupt elncRNA splicing is correlated with significant reduction in cognate enhancer activity, supporting the functional importance of elncRNA splicing.

Results

Splicing of elncRNAs is associated with higher enhancer activity

We integrated publicly available RNA sequencing data for human lymphoblastoid cell lines (LCL [25]) with enhancer annotations from the ENCODE consortium [26] to distinguish between predominantly bidirectionally (eRNAs, n = 4433) and unidirectionally transcribed (elncRNAs, n = 564) enhancers active in LCLs (S1 Table and Methods). The transcription profiles of elncRNAs resemble that of promoter-associated lncRNAs (plncRNAs, n = 600) and protein-coding genes (n = 12,070, S1A Fig). In LCLs, most (n = 352, 62.4%) elncRNAs are multi-exonic, consistent with previous analysis in other human cell lines [9]. As described for other human and mouse cells, enhancers that transcribe multi-exonic elncRNAs are more active [8,9]. In particular, compared to enhancers that give rise to either single-exonic elncRNAs or eRNAs, those transcribing multi-exonic elncRNAs have higher levels of chromatin modifications associated with enhancer function (S1B, S1C and S1D Fig), including mono-methylation of Histone 3 Lysine 4 (H3K4me1, S1B Fig), acetylation of Histone 3 lysine 27 (H3K27ac, S1C Fig) and DNase I accessibility (DHSI, S1D Fig). Multi-exonic elncRNA-transcribing enhancers are also enriched for binding of histone acetyl transferase (HAT) P300 (S1E Fig), whose recruitment also distinctly marks active enhancers [27]. Consistent with previous analysis [8,9], relative to all other transcribed enhancers, multi-exonic elncRNA enhancers are enriched in CTFC binding (S1F Fig), significantly enriched at LCL promoter-enhancer loop domains (fold enrichment = 1.3, FDR = 7x10-3, Methods) and their topologically associating domains (TAD) display a significantly higher frequency of intra-TAD DNA-DNA contacts (p<0.02, two-tailed Mann-Whitney U test, S1G Fig). In summary, our analysis in LCLs supports that enhancers transcribing multi-exonic elncRNAs are more active, as previously described in other cell types [8,9].

elncRNA splicing was preserved by purifying selection

The splicing of elncRNAs may be a regulated and conserved process. Alternatively, it can be a mere by-product of inconsequential transcription across splice-site donor/acceptors favoured by increased accessibility at their highly active cognate enhancers. To distinguish between these two possibilities, we first analysed splicing related motifs of elncRNAs. As previously described [8,28,29] the GC contents of exons and introns [30], a determinant of efficient splice-site recognition, of multi-exonic elncRNAs is similar to that of protein-coding genes and promoter-associated plncRNAs, and as observed in mESCs [8] (S2A Fig, two-tailed Mann-Whitney test). Compared to plncRNAs, regions flanking elncRNA splice-sites are enriched in splicing-associated elements, including U1 snRNP binding motifs (S2B Fig, p = 0.05) and a comprehensive (S2C Fig, p = 0.04) [31] and a stringent set (S2D Fig, p = 0.05, two-tailed Mann-Whitney test)[32] of exonic splicing enhancers (ESEs). This increase in density of splicing-related motifs in elncRNAs support their increased splicing efficiency, as estimated using completed splicing index (coSI), which represents the ratio of reads that span exon-exon splice junctions (excised intron) over those that overlap exon-intron junctions (incomplete excision) [33], relative to plncRNAs (S2E Fig, p = 5x10-11, two-tailed Mann-Whitney U test), consistent with what was previously reported in mESCs [8]. If splicing of elncRNAs is functionally relevant, one would expect selection to have prevented the accumulation of deleterious mutations in their splicing-associated motifs during evolution. We estimated the nucleotide substitution rate, between human and mouse, for elncRNA splicing-related sequence motifs and compared it to that of neutrally evolving neighbouring ancestral repeats (ARs, Fig 1A). We found that splice donor/acceptor sites, as well as U1 and ESE sites, evolved under significant selective constraint by accumulating fewer substitutions during mammalian evolution than neutrally evolving ARs (p<0.002, permutation test, Fig 1A). Consistent with the observed differences in splicing efficiency, we found that splicing-associated motifs within elncRNAs (including SS, U1 and ESEs, Fig 1B, 1C and 1D), evolved significantly slower (p<2.2x10-16, two-tailed Mann-Whitney U test) than those within plncRNAs.

Fig 1

Splicing of elncRNAs has evolved under purifying selection.

Splicing of elncRNAs has evolved under purifying selection.

(A) Distribution of pairwise nucleotide substitution rate between human and mouse for 1,000 randomly subsampled sets of local ancestral repeats (ARs) with matching GC-content and size as splicing related motifs (grey). The observed rate observed at splice sites (dSS, solid line arrow), ESE (dESE, dash line arrow), and U1 (dU1, dotted line arrow) sites within multi-exonic elncRNAs. Distribution of nucleotide substitution rate relative to that of randomly subsampled local ARs (dAR) for (B) splice sites, (C) ESEs, and (D) U1s within multi-exonic elncRNA (red), plncRNA (blue) and protein-coding gene (dark green). Differences between groups were tested using a two-tailed Mann-Whitney U test. *** p < 0.001. (E) Distribution of derived allele frequency (DAF) for single nucleotide variants at splicing motifs (splice sites, ESEs and U1s) within multi-exonic elncRNAs (red), plncRNAs (blue), protein-coding genes (dark green), and ARs (black). The insert illustrates variants with low derived allele frequency (DAF<0.1). Differences between groups were tested using a two-tailed Fisher’s exact test. * p < 0.05; *** p < 0.001; NS p > 0.05. We next turned our attention to the evolution of elncRNA splicing during recent human history. Relative to neutrally evolving ARs and plncRNAs, single nucleotide polymorphisms (SNPs) within elncRNA splice sites and splicing motifs were enriched in alleles with low derived allele frequency (DAF < 0.1) (p<0.05, two-tailed Fisher’s exact test, Fig 1E), consistent with their evolution under selective constraint. Whereas, the frequency of SNPs with DAF<0.1 that overlap splicing-associated sequences within elncRNAs is statistically indistinguishable from that of protein-coding genes (p = 0.93, two-tailed Fisher’s exact test, Fig 1E), we cannot exclude this is in part a consequence of the lack of power to identify differences. Evidence that selection has purged deleterious mutations at elncRNA splicing-associated motifs during mammalian and recent human history, supports the functional relevance of elncRNA processing and argues against it being simply the consequence of increased spurious transcription and splicing at highly active enhancers.

Disruption of elncRNA splicing decreases target expression

We used genotyping data [34] and gene expression data (Lappalainen et al., 2013) from LCLs derived from 373 healthy individuals to identify nucleotide variants that disrupt elncRNA splicing (Fig 2A). This unbiased approach, which is conceptually analogous to experimentally testing the impact on target expression of introducing point mutations on elncRNA splice sites, allowed us to directly test whether changes in elncRNA splicing impact enhancer activity.

Fig 2

Disrupted elncRNA splicing impacts cis-gene regulation.

Disrupted elncRNA splicing impacts cis-gene regulation.

(A) Examples of variants that can disrupt elncRNA splicing. In contrast to individuals that carry the reference genome allele at a canonical splice site, those with alternative alleles have at least one variant that disrupt the GU dinucleotide at splice donor site (denoted as HU or GV) or AG dinucleotide at splice acceptor site (denoted as BG or AH). (B) Representation of the differential splicing events between samples with different genotypes for one elncRNA, ENSG00000205786. Median differential splicing (log10 modulus fold difference in Percentage-Spliced-In (dPSI)) of each splicing event is noted next to the arrow. Decreases are represented in red and increases represented in dark blue. (C) Distribution of the log10 modulus fold difference in PSI, relative to the median of samples with reference genotype, between individuals that carry alternative or reference alleles at each corresponding elncRNA splicing event as shown in (B). (D-H) Distribution of the log10 modulus fold difference in splicing (PSI), relative to the median of samples with reference genotype, between individuals that carry alternative or reference alleles at elncRNA splice site of ENSG00000205786, of all directly affected splicing events of (D) the elncRNA and (E) all splicing events of its target protein coding genes (5 targets); as well as fold difference in expression levels (RPKM) of (F) targets, (G) non-targets and (H) the elncRNA. Differences between groups were tested using a two-tailed Mann-Whitney U test. * p < 0.05; ** p < 0.01; *** p < 0.001; NS p > 0.05. We identified 4 variants (SS variant) that disrupt the splice-donor or acceptor sites of 4 elncRNAs with at least one predicted cis-target (S2 Table and Fig 2A and Methods). Since enhancer-promoter interactions rely on local chromosomal looping, enhancers and their associated transcripts often do not target the nearest gene in linear distance [35]. To unbiasedly predict putative elncRNA targets, we identified genes that are jointly associated to the same expression quantitative trait loci (eQTL) variants (Methods), supporting their potential co-regulation [36]. To estimate the impact of SS variant on splicing efficiency, we calculated the Percentage-Spliced-In (PSI) [37] per individual and for each elncRNA splicing event involving the splice donor or acceptor site disrupted by the SS variants (Figs 2B and 2C and S3). PSI measures intron excision by considering spliced reads spanning exon-exon junctions [37]. This is different from the original PSI definition [38,39], which uses the fraction of reads that include or exclude a given exon to measure the efficiency of exon inclusion. We compared the average difference in PSI, as a proxy for change in splicing efficiency, of all affected splicing events between individuals that carry the reference and alternative canonical splice donor/acceptor sites (GT-AG). We found that SS variants are associated with a significant decrease in splicing events involving polymorphic splice donor/acceptor sites (Figs 2B, 2C and 2D and S3). As expected, alongside decreased excision of the affected introns, SS variants are also associated with increased alternative splicing events that involve neighbouring intact splice donor or acceptor sites (Figs 2B and 2C and S4). However, compared to decreased intron excision events observed in individuals carrying the alternative SS variants, the increase in overall alternative splicing events, in the same individuals, is 2–8 times smaller and not statistically significant (Figs 2B and 2C and S4). This supports that the presence of SS variants are associated with an overall decrease in splicing. Consistent with a direct role of splicing in the modulation of enhancer function, this natural mutational study revealed that disruption of elncRNA splicing, which does not impact their protein-coding target splicing (p>0.3, two-tailed Mann-Whitney U test, Figs 2E and S4), is associated with significant decrease in target expression (p<0.05, two-tailed Mann-Whitney U test, Figs 2F and S4). The association between splice site mutations and gene expression was restricted to predicted elncRNA targets genes, as the expression level of nearby genes was unaffected (p>0.1, two-tailed Mann-Whitney U test, Figs 2G and S4). In addition to changes in target protein-coding gene levels, the relative abundance of elncRNAs was also significantly decreased in individuals that carried variants that altered elncRNA splice donor or splice acceptor sites (Figs 2H and S4). This is consistent with the well-established synergy between RNA processing and transcription and the impact of co-transcriptional splicing on gene expression [40,41]. We replicated this mutational study using 89 samples of Yoruba (YRI) population from the Geuvadis dataset and the analysis of these results consistently supports that elncRNA splicing variants impact cognate enhancer function (S5 Fig).

Causal inference support elncRNA splicing mediates target expression

The results of these mutational studies support the role of splicing in modulating enhancer activity and suggest that this is at least in part a consequence of increased transcription of multi-exonic elncRNAs. However, the interdependence between elncRNA splicing and expression together with the fact that splice site mutations may also affect enhancer factor binding make it difficult to directly use splice site mutations to disentangle the distinct roles of splicing and transcription on enhancer function. Furthermore, although unbiased, splice site mutations were only found for a subset of elncRNAs, restricting the extent of our analysis. To overcome these limitations, we first used multi-variate regression to investigate the relationship between splicing of elncRNAs and the expression of their putative targets genome-wide. First, we found that elncRNA splicing is correlated with target expression levels (S6A Fig) and this correlation is significantly higher than what we found for their proximal non-targets (p<2.2x10-16, two-tailed Mann-Whitney U test, S6B Fig). Next, we found by adding elncRNA splicing to its expression significantly, albeit moderately, improved the predicative power of its target gene expression (p = 0.02, two-tailed Mann-Whitney U test, S6C Fig). This is in contrast to elncRNA proximal non-targets, whose abundance was not explained by elncRNA splicing (p = 0.8, two-tailed Mann-Whitney U test, S6C Fig), supporting that the impact of elncRNA splicing on gene expression is target-specific. Next, we expanded the number of elncRNA loci considered in our mutational study by identifying SNPs associated with the amount of splicing at multi-exonic elncRNAs (sQTLs) (Fig 3A). To assess the impact of elncRNA splicing on enhancer activity independent of its transcription, we excluded sQTL variants that are also associated with elncRNA expression (eQTLs) from this analysis (Methods) (S7A Fig). The average distance between these variants and elncRNA transcriptional start sites and their associated-enhancer boundaries is 63.7 kb and 72.2 kb, respectively (S7B and S7C Fig), eliminating the potential confounding impact of sQTLs on enhancer factor binding. We identified sQTLs for 25% (88/352) of multi-exonic elncRNAs. Of these, more than half (49, 56%) had at least one sQTL that was also associated with their target expression (80 elncRNA-target pairs, 2,354 joint seQTLs).

Fig 3

Impact of elncRNA splicing on cis-gene regulation in the human population.

Impact of elncRNA splicing on cis-gene regulation in the human population.

(A) LINC00886 (ENSG00000240875) is a multi-exonic elncRNA whose splicing is associated with genotype of the SNP variant (rs187202716), which is also associated to the expression level of an elncRNA target, LEKR1 (ENSG00000197980). (Top panel) Distribution of the Percentage-Spliced-In (PSI) of elncRNA splicing (LINC00886, red) and target expression (RPKM) (LEKR1, green) in samples across the population that carry different alleles of the SNP variant (rs187202716). Spearman’s rho and p-values are shown. (Bottom panel) Genome browser illustrating the genomic positions of the elncRNA (LINC00886, red), its target (LEKR1, green), and their associated SNP (rs187202716, black). (B) Four models of causal inference testing that predict the relationship between joint seQTL variant (black box) associations with the splicing (sQTL) of multi-exonic elncRNAs (red boxes) and the expression level (eQTL) of their target protein-coding genes (green boxes). Schematic representation of the models of joint seQTL associations: (1) the variants are independently associated with elncRNA splicing and target expression (independent model); (2) direct association between the variant and elncRNA splicing mediates the indirect association between that and target expression (causal model); (3) direct association between the variant with target expression mediates the indirect association between that and elncRNA splicing (reactive model); and (4) the causative interaction between elncRNA splicing and target expression is more complex (undecided model). Direct associations are depicted as solid lines and indirect associations as dash lines. (C) Scatterplot depicting causal inference testing local FDR associated with each the four models (as illustrated in B). Number and proportion of joint seQTLs are provided in brackets for each model. Dotted red lines denote significance threshold at local FDR < 0.1. (D) Proportion of joint elncRNA seQTLs causally or non-casually predicted to mediate target gene expression associated with splicing junctions located at the 5´ end (red) or 3´ end (grey) ends of the transcripts. Differences between groups were tested using a two-tailed Fisher’s exact test. *** p < 0.001. Given that quantitative trait variants were identified using a relatively small cohort (373 LCL samples) and that cohort size is an important factor in QTL calling, we reasoned that the relatively small fraction of elncRNAs (~14%, 49/352) with joint seQTL associations is a consequence of our study being under-powered. Using a dataset with nearly 85-times more samples (31,684 blood samples, eQTLGen consortium) [42], we found nearly 80% of elncRNAs (70/88) splicing QTLs are jointly associated to target expression (171 elncRNA-target pairs). Compared to the smaller LCL data, target-seQTL associations were considerably higher (3.45x1061 times, p<2.2x10-16, two-tailed Mann-Whitney U test, S7D Fig). In addition, using the larger dataset, target eQTLs that were not identified to be associated with elncRNA splicing by the smaller LCL set (non-seQTLs) were associated to a similar level as those identified to be jointly associated to elncRNA splicing (seQTLs) (p = 0.3, two-tailed Mann-Whitney U test, S7E Fig). This corroborates that our analysis is limited by the sample size of the LCL data and we are likely to under-estimate the prevalence of elncRNAs whose splicing contribute to enhancer activity. However, 80% of the expression data used by the eQTLGen consortium was generated using microarray technology which cannot be used to predict splicing frequency. Given this limitation and the restricted access to its sensitive data, we could not carry out detailed analysis using the eQTLGen data and continued to use the LCL dataset. Different models can account for the joint association between a variant and elncRNA splicing or its target expression (joint seQTL, Fig 3B) [43]. This can occur if the variant is independently associated with elncRNA splicing and target expression (independent model) or as the result of the complex interaction between splicing and expression (undecided model) Alternatively, variant can be associated with the target expression that in turn mediates splicing of elncRNA (reactive model). Finally, cases where the seQTL is associated with elncRNA splicing that in turn mediate target expression (causal model) would support a functional role of elncRNA splicing. For each seQTL, we established the most likely model of association using causal inference analysis [44]. With a false discovery rate of 10%, we found that most associations between seQTLs (Fig 3C) and target expression is causally mediated by elncRNA splicing (n = 1328, 56.4%). These causal seQTLs support the role of splicing for 61% of elncRNAs in the regulation of target expression (1 to 2 targets on average). To assess whether this approach was biased towards the detection of causal seQTLs, we compared the slope and adjusted p-value of the associations between all causal seQTLs and either splicing or expression. As illustrated in S7F Fig, this analysis revealed there is no evidence that stronger sQTLs would favour causal model predictions. Of the remaining variants, most are independently associated with elncRNA splicing and target expression (n = 969, 41.2%), which is likely in part due to the relatively high false negative rates of this type of analysis [44]. To account for the effect of linkage disequilibrium and noise in genotyping data, we repeated the analysis using only seQTLs that are the best (i.e. most significant) variant associated with elncRNA splicing. Using this conservative set of seQTLs, we identified a similar proportion of elncRNAs (60%) whose splicing is predicted to causally regulate their target expression (S8 Fig), supporting the causal impact of elncRNA splicing on their cognate enhancer function. We used two independent datasets to assess the robustness of elncRNA target association with sQTL variants we predict to be associated with the splicing of elncRNAs in LCLs. Using a smaller cohort of LCLs (n = 147 [45]), we found a significant association in the same direction between sQTL and target expression for targets of 70% of elncRNAs (45% of variants). A larger fraction of associations (77% of elncRNAs and 52% of variants) could be replicated in a larger cohort of blood samples (n = 31,684 [42]). The difference in size between these two cohorts is likely to explain the difference in replication rate. The association between elncRNA splicing variants and target expression that were replicated have significantly higher effect size relative to non-replicated associations (S9 Fig). Furthermore, LCL-specific effect also likely explains why not all associations can be replicated in the large blood cohort. Importantly, 90% of seQTL associations that support elncRNA splicing as a mediator of target expression are associated with splicing junctions located at the 5´ end of the transcript, which is consistent with the known synergy between transcription and 5´ end splicing [40,41] (Fig 3D).

elncRNA splicing impacts local chromatin state

Evidence that variants associated with changes in elncRNA splicing also mediate target gene expression support the direct role of splicing on enhancer function. To assess if elncRNA splicing impacts cognate enhancer function by directly impinging on its activity, we investigated changes in chromatin signatures at elncRNA cognate enhancers. This was possible thanks to the recent release of genome-wide histone modification data, including H3K4me1 and H3K27ac, for a subset of 150 LCL samples [46]. Consistent with a synergistic contribution of splicing to enhancer activity, individuals that carried nucleotide variants that altered elncRNA splice site also had significantly lowered cognate enhancer activity, which we estimated using levels of H3K4me1 and H3K27ac [47] at cognate enhancers (Fig 4A and 4B), supporting that splicing at elncRNA loci impacts local enhancer-associated epigenetic state.

Fig 4

Impact of elncRNA splicing on cognate enhancer chromatin signatures in the human population.

Impact of elncRNA splicing on cognate enhancer chromatin signatures in the human population.

Distribution of the average log modulus fold difference in chromatin signature, relative to the median of samples with reference genotype, between individuals that carry alternative or reference alleles at elncRNA splice sites, in (A) H3K4me1 and (B) H3K27ac. (C-D) Scatterplot depicting joint elncRNA scQTLs in which elncRNA splicing is causally or non-casually predicted to mediate enhancer chromatin signatures, (C) H3K4me1 and (D) H3K27ac, respectively, using causal inference testing as illustrated using local FDR associated with the four models (as shown in Fig 4B). Dotted black lines denote significance threshold at local FDR < 0.1. Differences between groups were tested using a two-tailed Mann-Whitney U test. ** p < 0.01. To extend this analysis genome-wide, we next identified nucleotide variants associated with elncRNA splicing and chromatin modification (scQTL, 1,932 and 534 scQTLs involving 30 and 23 elncRNAs, for H3K4me1 and H3K27ac, respectively). We determined the most likely relationship between scQTLs, elncRNA splicing and chromatin signatures using causal inference analysis [44]. We found that a large proportion of scQTL variants (855 (44.3%) and 397 (74.3%) scQTLs for H3K4me1 and H3K27ac, respectively) associated independently to elncRNA splicing and chromatin mark levels. This is expected given the relatively small number of samples available, which has likely reduced the power of this analysis as discussed above. Despite this limitation, we predicted that splicing of a sizable fraction of elncRNAs modulates chromatin state (40% and 48% for H3K4me1 and H3K27ac, respectively, Fig 4C and 4D). These fractions were significantly higher compared to the fraction of elncRNAs whose splicing were predicted to be impacted by chromatin modification changes (3% and 4% for H3K4me1 and H3K27ac, respectively; p<0.05, two-tailed Fisher’s exact test, Fig 4C and 4D). We provide evidence that splicing a sizable fraction of enhancer-associate transcripts likely to directly impact the activity of their cognate enhancer activity.

Discussion

We and others have previously shown that splicing of multi-exonic elncRNAs is strongly associated with high enhancer activity in several human and mouse cell lines [8,9]. The direct contribution of splicing to enhancer activity has been shown for a few candidate elncRNA transcripts [23,24]. However, it remains uninvestigated whether elevated enhancer activity resulting from the splicing of these elncRNAs is a general phenomenon that would explain their previously described genome-wide associations [8,9]. We first aimed to address whether elncRNA splicing is functionally relevant or as a result of transcription through spurious splice sites driven by increased accessibility at highly active enhancers. To gain initial insights into this we first investigated the evolution of splicing-associated motifs, including splice sites, within elncRNAs. We found sequences that support elncRNA splicing evolved under purifying selection, an evolutionary signature of functionality. Interestingly, splicing related motifs were previously found to be amongst the most highly conserved sequences within lncRNAs [29]. With at least half of all lncRNA transcripts initiating at enhancers [11,48] and a sizable proportion of these undergoing splicing [8,9], it is likely the relative high sequence constraint observed at splicing-associated motifs within lncRNAs [29] can be attributed to enhancer transcribed lncRNAs. Consistent with this, we found splicing-related motifs within the less efficiently processed promoter-associated lncRNAs (plncRNAs) [8] to be less constrained. We investigated the direct contribution of elncRNA splicing to cognate enhancer activity using a statistical population genomics approach in human lymphoblastoid cell lines (LCLs). We found splicing of at least 60% of elncRNAs causally mediates changes in their target gene expression levels. Interestingly, a sizable proportion (40%) of elncRNA splicing was also predicted to mediate changes in chromatin signatures, including H3K4me1 and H3K27ac, at enhancers. The present analysis is limited by the size of the cohort analysed. Relatively small sample sizes reduce the ability to confidently identify quantitative trait loci [49]. This is illustrated by 1.4-fold increase in the number of seQTL detected when using data from an much larger cohort (eQTLGen [42]) than the one from Geuvadis [25]. We anticipate that the use of larger cohorts should result in an increase in the percentage of elncRNAs for which splicing can be more confidently shown to mediate target expression in cis. Our analysis of elncRNA splice-motif evolution and the impact of elncRNA processing on enhancer function and activity support that elncRNA splicing is unlikely a spurious by-product of enhancer activity. This is consistent with evidence supporting conserved elncRNA splicing and their lack of exonic constraint [8,50], suggesting the contribution of elncRNA splicing to the modulation of enhancer activity go beyond the processing of their mature transcripts. Whereas the precise molecular mechanisms underlying this association remain currently unknown, a number of mechanisms can support the role of elncRNA splicing on enhancer activity. Splicing has long been known to directly impact the rate of transcription [51]. For example, DNA elements embedded within introns have been shown to contribute to transcriptional regulation [52] and components of the spliceosome can directly enhance RNAPII initiation [53] and transcript elongation [54]. Furthermore, it was recently shown that novel exon splicing events can result in transcription initiation at novel exon proximal promoters, possibly through the recruitment of transcription machinery by splicing factors [55]. The synergy between splicing and transcription may also be influenced by changes in local chromatin environment. For example, splicing pattern changes have been shown to correlate with histone modification and factors that alter chromatin structure, including chromatin signatures at active enhancer regions [56]. On the other hand, tethering of enhancer-associated transcripts at their site of transcription has been shown to alter local chromatin environment [12-14,20]. Chromatin-bound lncRNAs have been recently shown to be enriched in U1 small nuclear ribonucleoprotein (snRNP) RNA-protein complex, a protein essential for the recognition of nascent RNA 5´ splice site and assembly of the spliceosome [57]. The depletion of U1 snRNP reduced the tethering of lncRNAs to chromatin, suggesting a mechanism by which elncRNA splicing can strengthen enhancer activity [57]. The present work provides clues to what might be the molecular mechanism(s) underlying the link between elncRNA splicing and enhancer function. Further experimental work is now needed to dissect the molecular details underlying the role of elncRNA splicing in modulating enhancer activity and investigate the differences between potentially distinct elncRNA classes, for example, by evaluating the consequences on enhancer factor recruitment, chromatin structure and target activity of experimentally perturbing splicing of candidate elncRNA.

Methods

Identification of enhancer-associated transcripts

We defined enhancer-associated transcripts by considering all ENCODE intergenic enhancers in human GM12878 lymphoblastoid cell line (LCL. 98,529 enhancers) [58] overlapping DNase I hypersensitive sites [59] and a CAGE cluster [6] in LCLs (n = 4997). We considered all LCL-expressed lncRNAs [60] and Ensembl annotated protein coding genes (version 75). LncRNAs whose 5´ end is within 500bp of a LCL enhancer were considered as being transcribed from the enhancer (n = 564 elncRNAs, S1 Table), and all remaining enhancers were presumed to transcribe eRNAs (n = 4433). Transcripts within 500bp of ENCODE LCL promoters [59] were classified as protein-coding genes (n = 12,070) or promoter-associated lncRNAs (plncRNAs, n = 600) depending on their biotype annotation. We found 62.4% of elncRNAs (n = 352), 60.5% of plncRNAs (n = 363), and 94.4% of protein-coding genes (n = 11,392) to be multi-exonic.

Read mapping and quantification

For all downloaded data sets, adaptor sequences were first removed from sequencing reads with Trimmomatic (version 0.33) [61] and then aligned to the human reference genome (hg19) using HISAT2 (version 2.0.2) [62]. Since available Geuvadis RNA sequencing and genotype data, LCL CAGE data and enhancer predictions, were all originally mapped to GRCh37 (hg19), our analyses were performed using the same human genome assembly.

Metagene analysis of elncRNAs

Direction of transcription was assessed with metagene profiles of CAGE reads centered at LCL enhancers and gene transcriptional start sites (TSS) plotted using NGSplot (version 2.4) [63] as in [8]. Enrichment of histone modifications, transcription factor binding, and gene expression levels were assessed using publicly available GM12878 DNase-seq, ChIP-seq and RNA-seq data sets (S3 Table). Metagene profiles of sequencing reads centered at gene TSSs were visualized using HOMER (version 4.7) [64].

Analysis of chromosomal architecture

Enrichment of enhancer-associated transcripts at LCL loop anchors [65], relative to expectation, was assessed using the Genome Association Tester (GAT, version 1.3.6) [66]. Specifically, loop positional enrichment was compared to a null distribution obtained by randomly sampling 10,000 times (with replacement) segments of the same length and matching GC content as the tested loci within mappable intergenic regions (as predicted by ENCODE [26]). To control for potential confounding variables that correlate with GC content, such as gene density, the genome was divided into segments of 10 kb and assigned to eight isochore bins in the enrichment analysis. The frequency of chromosomal interactions within topologically associating domains (TADs) was calculated using LCL Hi-C contact matrices [65], as previously described [60].

Identification of splicing-associated motifs

Density of human exonic splicing enhancers (ESEs), using a set of 238 computationally predicted motifs [31] and a more stringent set of 54 motifs identified across multiple studies [32], were predicted within LCL transcripts as previously described [29]. Specifically, exonic nucleotides (50 nt) flanking splice sites (SS) of internal transcript exons (> 100 nt) were considered in the analysis, after masking the 5 nt immediately adjacent to SS to avoid splice site-associated nucleotide composition bias [31,67]. Canonical U1 sites (GGUAAG, GGUGAG, GUGAGU) adjacent to 5´ splice sites (3 exonic nt and 6 intronic nt flanking the 5´ SS) were predicted as previously described [68]. FIMO (MEME version 4.12) [69] was used to search for perfect hexamer matches within these sequences.

GC content

For all LCL expressed genes with at least 2 exons, we computed the GC content for the first exon, remaining exons, and introns for each gene separately, as well as their flanking intergenic sequences of the same length (after excluding the 500 nucleotides immediately adjacent to annotations).

Splicing efficiency

Transcript splicing efficiency of long non polyA-selected RNA-seq data in GM12878 [59] was estimated by computing the proportion of fully excised introns using bam2ssj (IPSA version 3.3) [70] of transcripts for each gene. We calculated the splicing index, completed splicing index (coSI), which represents the ratio of reads that span exon-exon splice junctions (excised intron) over those that overlap exon-intron junctions (incomplete excision) [33].

elncRNA conservation across evolution and in humans

To assess selective constraints of transcripts, we estimated their pairwise nucleotide substitution rates. First, pairwise alignment between human and mouse of the different splicing-associated features (including canonical splice sites, exonic splicing enhancers and canonical U1 sites) were separately concatenated for multi-exonic elncRNAs, plncRNAs and protein-coding genes. We used BASEML from the PAML package (version 4.9, REV substitution model [71]) to estimate pairwise nucleotide substitution rates of each splicing motif. To determine whether the splicing motifs have evolved under significant purifying selection, we compared the observed nucleotide substitution rate to that of randomly selected pseudo-splicing motif sequences of the same length and GC content from neighbouring (within 1Mb) neutrally evolving sequences ancestral repeats (ARs, [72]). We repeated this process 1000 times to obtain a distribution of the expected substitution rates. To assess the conservation of splicing motifs during modern human evolution, we determined their derived allele frequency (DAF), as previously described [50]. We identified common single nucleotide polymorphisms (SNPs) in the human population (dbSNP build 150) mapped within splicing motifs of elncRNAs, plncRNAs and protein-coding genes. DAF was calculated using information on ancestral allele and the frequency of these SNPs in the European population obtained from the 1000 Genomes Project [34]. To determine the expected background distribution, we compared DAF spectrum of the observed splicing motifs to that of randomly selected sequences of the same length and GC content from local ARs.

Mapping of molecular quantitative trait loci (QTLs)

Expression values (RPKM) of multi-exonic elncRNAs and protein-coding genes in EBV-transformed LCLs derived from 373 individuals of European descent (CEU, GBR, FIN and TSI) were quantified (as described in [60]). The corresponding processed genotypes were downloaded from EBI ArrayExpress (accession E-GEUV-1) [25]. Quantification of alternative splicing events was estimated using LeafCutter (version 1.0) [37]. Single nucleotide polymorphisms (SNPs) located within the same TAD as the genes of interest were tested for association with splicing (sQTLs) and with expression levels (eQTLs) of elncRNAs and protein-coding genes. Only SNPs with minor allele frequency (MAF) greater than 5% were considered in the QTL analyses. sQTLs and eQTLs were estimated using FastQTL (version 2.184) [73]. To assess the significance of the correlation globally, we permuted the splicing or expression levels of each gene 1000 times and noted the maximum permuted absolute regression coefficient (rmax). We considered only sQTLs or eQTLs with an observed absolute regression coefficient (robs) greater than 95% of all permuted rmax values to be significant [25]. We further performed Benjamini-Hochberg multiple testing correction to estimate FDR (<5%) for all SNPs within the same TAD. Putative protein-coding gene targets of multi-exonic elncRNAs were predicted as those that reside within the same TADs and whose expression levels were associated to the same SNP variant as the expression of the elncRNAs (ie. associated to the same eQTL). We found 88 elncRNAs to have putative protein-coding gene targets and are associated with at least one sQTL (n = 26,741). Levels of histone modification marks (H3K4me1 and H3K27ac, CPM) that overlap enhancers associated with elncRNAs were downloaded for the 150 European individuals with available chromatin data [46]. To assess the relationship between elncRNA splicing and local histone modifications that mark enhancer elements, we calculated chromatin QTLs (cQTLs) associated with cognate enhancers of elncRNAs in the same way as sQTLs and eQTLs.

Replication of seQTL associations

Robustness of seQTL associations were assessed using two independent datasets: 1) LCLs (n = 147 [45]) and 2) blood samples (n = 31,684 [42]). After multiple testing correction (Benjamini-Hochberg adjusted p-value <5%) for the number of seQTL variants tested, we considered associations found in the same direction in GTEx LCLs or eQTLgen blood samples as that found in Geuvadis LCLs to be replicated. As only Z-score statistics of eQTL associations is available for download for blood samples [42], which can be transformed to slope of the association using the equation: slope = Z-score / sqrt(2 * alleleFreq * (1-alleleFreq) * (N + Z-score^2) (N = sample size, alleleFreq = allele frequency of variant). However, since we do not have information on the allele frequency of the genetic variants, we directly used Z-score for these samples in our analysis.

Impact of genetic variation at elncRNA splice sites on cis-gene expression

We considered all SNPs located at elncRNA splice sites (SS variants) and measured the fold difference for all elncRNA splicing events (using Percentage-Spliced-In (PSI) by LeafCutter [37]) affected by the SS variants between individuals that carry the reference or alternative alleles of these variants (S2 Table). Similarly, fold difference in elncRNA steady state abundance, as well as that of their putative target, and chromatin marks at their cognate enhancer elements were also assessed. In each case, we show the distribution of the fold difference in PSI or RPKM or CPM for each individual relative to the median PSI or RPKM or CPM in individuals with the reference genotype (Figs 2C–2H, 4A and 4B and S3, S4 and S5). Intron-centric splicing efficiency as measured by LeafCutter is visualized by LeafViz [37].

Causality inference between elncRNA splicing and nearby protein-coding gene expression

To infer the causal relationship between elncRNA splicing and putative target gene expression, we focused on QTLs that are associated with both splicing of elncRNAs (sQTL) and their putative target gene abundance (eQTL), and we refer to these variants as joint seQTLs. elncRNA sQTL variants that were also associated with elncRNA expression level or splicing of their putative cis-target genes were excluded from the analysis (n = 21,650 out of 26,741) (S7A Fig). In total, we found 49 elncRNAs with seQTLs shared with 80 target genes. For all triplets of seQTL–elncRNA splicing–target gene expression (n = 2,349), we performed causal inference testing using a Bayesian approach as implemented by Findr [44] by testing the models: (1) the independent model where seQTL variants are independently associated with elncRNA splicing and target gene expression; (2) the causal model with elncRNA splicing as the molecular mediator of gene expression; (3) the reactive model where gene expression mediates elncRNA splicing; and (4) the undecided model where causative interaction between elncRNA splicing and target gene expression is more complex [43]. The same causal inference testing was performing between elncRNA splicing (sQTL) and chromatin marks (cQTL) at their cognate enhancers by identifying scQTLs. We found 30 and 23 elncRNAs whose splicing-QTL is also associated with H3K4me1 (n = 1391 triplets) and H3K27ac (n = 532 triplets) marks, respectively, at their cognate enhancers.

Linear regression

We used multi-variable regression to assess whether elncRNA splicing contributes to their cognate enhancer activity by comparing the difference in the proportion of variance in target expression is explained by adding elncRNA splicing to the model (Target-expression ~ elncRNA-expression compared to Target-expression ~ elncRNA-expression + elncRNA-splicing).

Statistical tests

All statistical analyses were performed using the R software environment for statistical computing and graphics [74]. When multiple comparisons are made, we report the highest p-value of all comparisons.

Multi-exonic elncRNAs are transcribed from highly active enhancers.

(A) Metagene plots of CAGE reads centered at transcription initiation regions (TIRs) of eRNAs, and promoters (estimated as -500bp to annotated gene TSS) of elncRNAs, plncRNAs and protein-coding genes (PCGs). Sense (red) and antisense (blue) reads denote those that map to the same or opposite strand, respectively, as the direction of their cognate TIRs. Metagene plots and distribution (figure insets) of (B) H3K4me1, (C) H3K27ac, (D) DNase I hypersensitive sites (DHSI), (E) P300 and (F) CTCF ChIP-seq reads in LCLs at promoters of multi-exonic (red) and single-exonic (grey) elncRNAs, and eRNAs (yellow). (G) Distribution of the average amount of DNA:DNA contacts within LCL TADs containing multi-exonic (red) and single-exonic (grey) elncRNAs and eRNAs (yellow). Differences between groups were tested using a two-tailed Mann-Whitney U test. * p < 0.05; *** p < 0.001. (TIFF) Click here for additional data file. (A) Distribution of GC-content across exons and introns of elncRNAs (red), plncRNAs (blue), and protein-coding genes (green). Distribution of the density of predicted (B) U1 spliceosome RNAs (snRNPs) and exonic splicing enhancers (ESEs), using (C) a comprehensive (n = 238) and (D) a smaller but more stringent set (n = 54) of predicted ESE motifs, within multi-exonic elncRNAs (red), plncRNAs (blue) and protein-coding genes (green). (E) Distribution of the completed splicing index (coSI) for multi-exonic elncRNAs (red), plncRNAs (blue) and protein-coding genes (green). Differences between groups were tested using a two-tailed Mann-Whitney U test. * p < 0.05; ** p < 0.01; *** p < 0.001; NS p > 0.05. (TIFF) Click here for additional data file. Distribution of the log10 modulus fold difference in splicing (PSI), relative to the median of samples with reference genotype, between individuals that carry alternative or reference alleles at elncRNA splice site of (A) ENSG00000250903, (B) ENSG00000227388 and (C) XLOC_013468 of all directly affected splicing events of the elncRNA and all splicing events of their target protein coding genes (1, 7, and 1 targets, respectively); as well as fold difference in expression levels (RPKM) of targets, non-targets and the elncRNA. Differences between groups were tested using a two-tailed Mann-Whitney U test. * p < 0.05; ** p < 0.01; *** p < 0.001; NS p > 0.05. (TIFF) Click here for additional data file. (Top panels) Representation of the differential splicing events between samples with different genotypes for elncRNAs, (A) ENSG00000250903, (B) ENSG00000227388 and (C) XLOC_013468. Median differential splicing (log10 modulus fold difference in Percentage-Spliced-In (dPSI)) of each splicing event is noted next to the arrow. Decreases are represented in red and increases represented in dark blue. (Bottom panels) Distribution of the log10 modulus fold difference in PSI, relative to the median of samples with reference genotype, between individuals that carry alternative or reference alleles at each corresponding elncRNA splicing event as shown in the top panels. Differences between groups were tested using a two-tailed Mann-Whitney U test. * p < 0.05; ** p < 0.01; *** p < 0.001; NS p > 0.05. (TIFF) Click here for additional data file.

Disrupted elncRNA splicing impacts cis-gene regulation across YRI population.

Distribution of the median log10 modulus fold difference, relative to the median of samples with reference genotype, between individuals of the Yorubin (YRI) population that carry alternative or reference alleles at elncRNA splice sites, in splicing (PSI) of multi-exonic elncRNAs and target protein coding genes; target and non-target gene expression levels (RPKM); and elncRNA expression. Differences between groups were tested using a two-tailed Mann-Whitney U test. * p < 0.05; ** p < 0.01; *** p < 0.001; NS p > 0.05. (TIFF) Click here for additional data file.

elincRNA splicing contributes to target expression levels.

(A) elncRNA splicing is significantly correlated (Spearman’s test) with their cis-target expression levels, as illustrated using ENSG00000227036 (elncRNA) and ENSG00000125398 (target protein-coding gene). (B) Distribution of Spearman’s correlation between elncRNA splicing and either their targets or non-targets. (C) Distribution of adjusted R square comparing between two regression models that use elncRNA expression and its splicing to predict expression of their targets or non-targets: Models 1: PCG-expression ~ elncRNA-expression; Model 2: PCG-expression ~ elncRNA-expression + elncRNA-splicing. Differences between groups were tested using a two-tailed Mann-Whitney U test. * p < 0.05; ** p < 0.01; *** p < 0.001; NS p > 0.05. (EPS) Click here for additional data file. (A) Distribution of the absolute slope (left panel) and the -log10 of the adjusted p-value (right panel) of elncRNA sQTLs that are excluded (grey) or included (green) in analysis due to their significant associations with elncRNA expression, respectively. Distribution of distance between genomic coordinate of seQTL and position of their associated elncRNA annotated transcriptional start sites (B) and enhancers (C). (D) Distribution of elncRNA sQTLs that are also associated with a cis-target expression (seQTL) as identified using a LCL population with 373 samples (Geuvadis) or blood samples (eQTLGen) with 31,684 samples. (E) Distribution of elncRNA seQTLs and non-seQTLs (elncRNA sQTLs not associated with target expression in LCL population) associations as identified in LCL Geuvadis population or in blood eQTLGen samples. (F) Distribution of the absolute slope (left panel) and the -log10 of the adjusted p-value (right panel) of elncRNA sQTL association (red) and target eQTL association (green) for all causal seQTLs. Differences between groups were tested using a two-tailed Mann-Whitney U test. * p < 0.05; ** p < 0.01; *** p < 0.001; NS p > 0.05. (TIFF) Click here for additional data file.

Impact of elncRNA splicing on chromatin signature associated with enhancer activity in the human population.

Scatterplot depicting the best joint elncRNA seQTLs (the most significant variant associated with elncRNA splicing) used to infer relationship between elncRNA splicing and cis-target expression levels using causal inference testing, as illustrated using local FDR associated with the four models (as illustrated in Fig 4B). Dotted black lines denote significance threshold at local FDR < 0.1. (TIFF) Click here for additional data file. Distribution of the absolute slope or z-score (left panels) and adjusted p-value (right panels) of elncRNA target eQTL associations that can be replicated (after multiple-testing correction <5% and in the same association direction, green) or not (grey) using LCL samples from GTEx (A) and blood samples from eQTLgen (B). (TIFF) Click here for additional data file.

Genomic position of single- and multi-exonic LCL elncRNAs (hg19) and their respective number of exons.

(XLSX) Click here for additional data file.

Single nucleotide polymorphisms (SNPs) at multi-exonic elncRNA splice sites and their respective targets, as well as SNPs jointly associated to both elncRNA and target expression levels.

SNPs are denoted in the format snp_chr_position. (XLSX) Click here for additional data file.

Publicly available datasets used in the analysis.

(XLSX) Click here for additional data file. 14 Jul 2020 Submitted filename: elncRNA_RC_response_to_reviewers.pdf Click here for additional data file. 16 Nov 2020 Dear Dr. Marques, Thank you very much for submitting your manuscript "The activity of human enhancers is modulated by the splicing of their associated lncRNAs." for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. It is absolutely mandatory that the concerns of the Reviewer 2 will be resolved to his/her satisfaction. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Ilya Ioshikhes Associate Editor PLOS Computational Biology William Noble Deputy Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I praise and thank the authors for addressing my concerns in a mostly satisfactory way. My remaining requests for revision aim to improve the clarity of the description of the splicing analyses and results, namely by the correction of some misleading statements. In the way the manuscript is currently written, there is an apparent mix-up of concepts related to splicing efficiency and alternative splicing quantification. Minor essential revisions: 1. The splicing community commonly uses PSI (percent spliced-in) as a metric for quantification of alternative splicing that is computed based on the profiling of alternative exon-exon (i.e. spliced) junctions. This is, to my knowledge, how LeafCutter operates, using split reads (i.e. reads covering exon-exon junctions) to uncover alternative intron-excision options and the PSI values provided therein and used in Figures 3, S3 and S4 are ratios involving only spliced transcripts. The authors chose to also call PSI to a splicing index (originally named coSI in the Tilgner 2012 paper) that involves a ratio between (split) reads spanning exon-exon junctions and (unsplit) reads covering exon-intron junctions, as a metric for splicing efficiency. Although those two quantifications are not totally independent (the more efficient the splicing, the more exons get included), their conceptual difference is very important for our understanding of splicing efficiency and mechanisms of alternative splicing regulation, which are also different concepts. The two PSI metrics need to be clearly disambiguated. 2. Exon “skipping” means exon “exclusion”, not “inclusion”. This would require correction of the newly written lines 25-28 of page 10, as they currently convey contradictory messages, and of whatever other sentence in the manuscript where “skipping” may be misused. However, even after such correction, those two sentences are misleading. The fact that some log-fold differences in PSI for the alternative allele are positive does not mean that the variants can promote more inclusion, as their PSIs are compared to a median reference genotype sample and not the reference genotype in the same individuals. In other words, those individuals with positive log-fold differences could have higher-than-average PSIs if they had a reference genotype and therefore still have a decrease from a much-higher-than-median to a slightly-higher-than-median PSI with the change in genotype. As the comparisons are not “paired” and therefore do not control for the trans-regulation of splicing, statements can only be made about the distributions as a whole (i.e. their average behaviours) and not about individual values. Optional/discretionary edits: 3. I disagree with the authors’ statements, not justified in the rebuttal, that violin plots have a “less intuitive” interpretation and boxplots are “simpler and easier to interpret”. It is hard to understand how a plot that conveys a smoothed histogram of the full distribution can be less intuitive, particularly when it can be plotted behind a boxplot to convey quartiles. The choice of less informative plots is even more intriguing in a Computational Biology journal. Having said this, it is an issue that does not affect this manuscript’s suitability for publication. My remark only intends to promote a good practice that would bring extra scrutinisability to the results of this work. 4. Plots involving p-values (i.e. significance) in Supplementary Figure S7 would be more easily interpretable by the readers if -log10 (instead of just log10) was used. Higher significance should show up higher in the plots. I apologize for having overlooked this in the first review. Reviewer #2: The authors have revised their manuscript, but the results lack novelty and the data do not compellingly support the conclusions drawn. Regarding novelty, Figure 1 of the manuscript is virtually identical in content to Figure 2 of a paper recently published by the authors (https://www.life-science-alliance.org/content/3/4/e202000663), showing essentially the same result in a slightly different system. So this result is purely confirmatory and should perhaps be a supplementary figure. Figure 2 shows an evolutionary result that supports that splicing of elncRNAs is under purifying selection, but provides no evidence of what the function of this splicing might be, so is novel but plays a minor role in advancing the authors' overall argument. Thues, the main direct evidence to support the authors' main conclusion that splicing of elncRNAs increases enhancer activity is still in Figure 3. And still the data are extremely weak. The analysis is still based on an n of 4, and the effects are extremely small. Alternate possibilities such as that the elncRNA expression rather than splicing modulates enhancer activity are still not excluded. Overall, I don't feel that the authors have improved the manuscript in a substantive way. The data supporting the main conclusions are still not compelling, and the novelty is compromised by their recent publication, cited above. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see 9 Jan 2021 Submitted filename: Jennifer_Tan_Response_To_Reviewers.docx Click here for additional data file. 27 Apr 2021 Dear Dr. Marques, Thank you very much for submitting your manuscript "The activity of human enhancers is modulated by the splicing of their associated lncRNAs." for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. In particular, we suggest that you consider the comment #1 of Reviewer #4, and test the hypothesis in an experimental setting. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Yi Xing Guest Editor PLOS Computational Biology William Noble Deputy Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I praise and thank the authors for their efforts in addressing my minor concerns. Unfortunately, they appear to keep struggling with the concepts and terminology associated with alternative splicing and its quantification. The Percentage-Spliced-In (PSI), as the name indicates, measures exon INCLUSION, and not “exclusion” (that would be 1-PSI or 100%-PSI) as stated in the paper. It is very important that the definitions are accurate and in line with the usual interpretation, so that the readers are not (unintentionally) misled. Similarly, the confusion between PSI and coSI appears to remain. In Figure 2B,C and Figure S3, the authors assign differences in PSI to S1 and S2, representing single exon-exon junctions. Based on the PSI definition, it is unclear which alternative exons S1 and S2 refer to or why distributions in actual PSI (instead of fold differences) are not compared. The meaning of ratios between exonic and intronic sequences (indeed putatively associated with splicing efficiency) appear to be confused with ratios between alternative exonic sequences (standard alternative splicing quantifications). Along the same lines, what “other alternative splicing events” (page 10, lines 20-21), other than exon skipping, can SS variants promote that a reader can see in Figures 2B,C and S3? What does an “overall decrease in splicing events” mean? Does it mean that that more exons are skipped or more introns are retained? I strongly recommend that the authors do not underestimate the importance of this issue and thoroughly revise these definitions and quantifications with help from an alternative splicing expert. Until this is done properly, the results from this work, even if sound, will remain hardly intelligible for most readers and, in my view, not fully ready for publication. Reviewer #4: I appreciate the authors' efforts to address the issues of novelty and evidence for their conclusions, as well as other points raised. This new version of the manuscript is significantly stronger. However, there are a few specific points that should be developed further before publication. 1. Direct evidence of causality. The main evidence supporting the authors’ conclusion that splicing of elncRNAs increases enhancer activity is still aggregated in a very small number of loci (n=4). I agree with the authors that this in silico experiment is “analogous” to experimentally disrupting splice sites of selected elncRNA candidates. However, I do not believe both approaches are equivalent. At this point, I suggest the authors actually test their hypothesis in an experimental setting. They could pick a few candidates to perturbate splice sites elncRNA with punctual mutations that create weaker or stronger splice sites and evaluate the consequences on enhancer’s activity and chromatin structure. 2. Link between splicing and increased transcription. The authors mentioned that they only considered variants associated with elncRNA splicing but not with its expression in Figs. 3 and 4, but do not show the data (sorry in case I missed it). I think that showing that the total elncRNA expression does not change is as crucial as showing that the splicing of the elncRNA does change. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #4: No ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #4: Yes Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols 4 Jun 2021 Submitted filename: Jennifer_Tan_Response_To_Reviewers.docx Click here for additional data file. 27 Oct 2021 Dear Dr. Marques, Thank you very much for submitting your manuscript "The activity of human enhancers is modulated by the splicing of their associated lncRNAs." for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. In particular, we would require you to follow the advice on toning down your conclusions as suggested by Reviewer 4 and treat this work as more hypothesis generating than conclusive. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Ilya Ioshikhes Deputy Editor PLOS Computational Biology William Noble Deputy Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I praise and thank the authors for carefully addressing my concerns. Reviewer #4: The authors have done a great job addressing my comment on the link between splicing and increased transcription. I appreciate the new Supplementary figure which is quite strong. In addition, I of course understand that disrupting splice sites of selected elncRNA candidates would be a hard experiment to set up and might not be possible in LCL cells. However, I still believe that the evidence in this manuscript to support the main conclusion of causality (which is the take home message and differentiates this paper from previous literature) is not conclusive. I believe that experimental evidence is necessary to support the conclusion that splicing of elncRNAs increases enhancer activity and several labs around the world would be happy to collaborate in such efforts. If the paper should be published without experimental data, I suggest to tone down the conclusions and acknowledge this issue. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #4: None ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #4: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols 22 Nov 2021 Submitted filename: Jennifer_Tan_Response_To_Reviewers.docx Click here for additional data file. 5 Dec 2021 Dear Dr. Marques, We are pleased to inform you that your manuscript 'The activity of human enhancers is modulated by the splicing of their associated lncRNAs.' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please correct the text as requested by reviewer on the same occasion. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Ilya Ioshikhes Deputy Editor PLOS Computational Biology William Noble Deputy Editor PLOS Computational Biology *********************************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #4: The authors have addressed my concerns and I support the publication of this manuscript. I would suggest the authors to test their hypothesis in an experimental setting when possible. Also, in page 3, I would suggest to use "evidence" instead of "strong evidence". ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #4: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #4: No 6 Jan 2022 PCOMPBIOL-D-20-01247R3 The activity of human enhancers is modulated by the splicing of their associated lncRNAs. Dear Dr Marques, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Katalin Szabo PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

71 in total

Review 1. How introns influence and enhance eukaryotic gene expression.

Authors: Hervé Le Hir; Ajit Nott; Melissa J Moore
Journal: Trends Biochem Sci Date: 2003-04 Impact factor: 13.807

2. Intergenic transcription and transinduction of the human beta-globin locus.

Authors: H L Ashe; J Monks; M Wijgerde; P Fraser; N J Proudfoot
Journal: Genes Dev Date: 1997-10-01 Impact factor: 11.361

3. eRNAs are required for p53-dependent enhancer activity and gene transcription.

Authors: Carlos A Melo; Jarno Drost; Patrick J Wijchers; Harmen van de Werken; Elzo de Wit; Joachim A F Oude Vrielink; Ran Elkon; Sónia A Melo; Nicolas Léveillé; Raghu Kalluri; Wouter de Laat; Reuven Agami
Journal: Mol Cell Date: 2012-12-27 Impact factor: 17.970

4. Local regulation of gene expression by lncRNA promoters, transcription and splicing.

Authors: Jesse M Engreitz; Jenna E Haines; Elizabeth M Perez; Glen Munson; Jenny Chen; Michael Kane; Patrick E McDonel; Mitchell Guttman; Eric S Lander
Journal: Nature Date: 2016-10-26 Impact factor: 49.962

5. FIMO: scanning for occurrences of a given motif.

Authors: Charles E Grant; Timothy L Bailey; William Stafford Noble
Journal: Bioinformatics Date: 2011-02-16 Impact factor: 6.937

6. An atlas of active enhancers across human cell types and tissues.

Authors: Robin Andersson; Claudia Gebhard; Michael Rehli; Albin Sandelin; Irene Miguel-Escalada; Ilka Hoof; Jette Bornholdt; Mette Boyd; Yun Chen; Xiaobei Zhao; Christian Schmidl; Takahiro Suzuki; Evgenia Ntini; Erik Arner; Eivind Valen; Kang Li; Lucia Schwarzfischer; Dagmar Glatz; Johanna Raithel; Berit Lilje; Nicolas Rapin; Frederik Otzen Bagger; Mette Jørgensen; Peter Refsing Andersen; Nicolas Bertin; Owen Rackham; A Maxwell Burroughs; J Kenneth Baillie; Yuri Ishizu; Yuri Shimizu; Erina Furuhata; Shiori Maeda; Yutaka Negishi; Christopher J Mungall; Terrence F Meehan; Timo Lassmann; Masayoshi Itoh; Hideya Kawaji; Naoto Kondo; Jun Kawai; Andreas Lennartsson; Carsten O Daub; Peter Heutink; David A Hume; Torben Heick Jensen; Harukazu Suzuki; Yoshihide Hayashizaki; Ferenc Müller; Alistair R R Forrest; Piero Carninci
Journal: Nature Date: 2014-03-27 Impact factor: 49.962