| Literature DB >> 29598823 |
Seyed Yahya Anvar1,2,3, Guy Allard4, Elizabeth Tseng5, Gloria M Sheynkman6,7, Eleonora de Klerk4,8, Martijn Vermaat4,9, Raymund H Yin10, Hans E Johansson10, Yavuz Ariyurek4,9, Johan T den Dunnen4,9, Stephen W Turner5, Peter A C 't Hoen4,11.
Abstract
BACKGROUND: The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29598823 PMCID: PMC5877393 DOI: 10.1186/s13059-018-1418-0
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Schematic overview of the approach to characterize the interdependencies between mRNA transcription initiation and processing events. a Identified full-length reads (reads with RNA inserts between 5′ and 3′ primers) are clustered into unique transcript structures using the ICE algorithm and further polished using the partial reads, where one of the primer sequences is missing. b Based on available transcripts per locus, available sequence (union of all exonic sequences that are observed at each locus) and unique set of features and splice sites are identified. Feature sets comprise unique transcriptional start sites (TSS), exons, and polyadenylation sites (PAS). The unique set of splice sites consists of unique donor and acceptor splice sites as well as all alternative TSSs and PASs. c The survey of coupling events is done by performing all possible pairwise tests between unique features in genes. The sum of the coverage of all transcripts that support the inclusion or exclusion of each pair is used in a contingency table to perform a Fisher’s exact test for statistical significance. The odds ratio (OR) is used to differentiate between mutually inclusive and exclusive coupling. d Set of interdependent coupling events were identified based on networks of coupling between features in each gene. Nodes represent features and links depict the mutual inclusivity (black edges) or mutual exclusivity (red edges) of each feature pair. Unique network components can thereby be filtered based on the type of interaction: mutual inclusive or mutual exclusive coupling events. e For all alternative exons that show significant coupling, a motif search is performed to assess the enrichment of specific RNA-binding protein motifs. For all alternative exons, 35-bp intronic sequences upstream of the acceptor site are defined as R1 domain (depicted in orange), 32-bp exonic sequences downstream of the acceptor site and upstream of the donor site are defined as R2 domain (depicted in dark gray), and 40-bp intronic sequences downstream of the donor site are defined as R3 domain (depicted in purple); 35-bp sequence upstream of each PAS (depicted in red) is searched for the presence of canonical and non-canonical poly(A) signals
Fig. 2Alternative transcription, splicing, and polyadenylation are highly interdependent. a Bar charts illustrate the number and proportion of genes that show significant coupling in MCF-7 cells. Genes with TSS- or PAS-coupled features are also presented. b Venn diagram shows the number of genes with various types of coupling representing interdependencies between different alternative processes. The total number of mutually inclusive and exclusive networks are also listed. c Histogram of the relative positions of TSSs with (blue) and without (gray) significant coupling to mRNA processing events. Relative positions are calculated based on the length of the total exonic sequence at each locus. Scatter plot shows the fraction of significantly coupled TSSs (blue) to alternative exons (black) and PASs (red), plotted at each relative position. d Histogram of the relative positions of alternative exons with (brown) and without (gray) significant coupling to other exons. Scatter plot shows the fraction of significantly coupled exons to other exons, plotted at each relative position. e Histogram of the relative positions of PASs with (red) and without (gray) significant coupling to alternative transcription and splicing events. Scatter plot shows the fraction of significantly coupled PASs (red) to alternative TSSs (blue) and exons (black), plotted at each relative position. For plots depicting the percentage of linked features per position, the bin size of 0.02 was used
Fig. 3Alternative TSSs and exons are significantly associated with known and novel poly(A) signals. a Bar charts show the number and relative proportion of PASs that are associated with canonical or non-canonical poly(A) signals for all PASs, PASs with significant coupling, and alternative exon- and/or TSS-linked PASs. b Bar charts represent the number and relative proportion of known and unknown poly(A) signals for TSS-linked, exon-linked, or TSS- and exon-linked PASs
Enrichment of MBNL binding site motifs in sequences upstream of alternative PAS with unknown poly(A) signal that are coupled with alternative TSS or alternative exons
| Motifs | Source | Total | Random set | Coupled PAS | Not coupled PAS | ||
|---|---|---|---|---|---|---|---|
| AKCCTGG | DREME | 1271 | 35 | 0 | 881 | 390 | 9.8E-44 |
| CTSCYB | Masuda, 2012 [ | 898 | 708 | 2.6E-07 | 442 | 456 | 9.7E-01 |
| YGCY | Purcell, 2012 [ | 2961 | 3139 | 1.0E-00 | 1578 | 1383 | 3.2E-02 |
| RSCWTGSK | Batra, 2014 [ | 145 | 93 | 4.1E-04 | 80 | 65 | 2.5E-01 |
| TGCYTSYY | Batra, 2014 [ | 95 | 55 | 6.5E-04 | 50 | 45 | 4.9E-01 |
| CWGCMWKS | Batra, 2014 [ | 1306 | 139 | 4.7E-262 | 870 | 436 | 1.5E-32 |
| Total PASs | 6979 | 6979 | – | 3614 | 3338 | – |
aThe enrichment of binding motifs in sequences upstream of PASs without a known poly(A) signal were calculated by Fisher’s exact test (one-sided). A randomly generated set was used as a background for enrichment analysis
bPASs without significant coupling were used as the background set to identify a binding site that is enriched in the coupled PASs without a known poly(A) signal
The RNA-binding protein motifs associated with alternative exons that are coupled to TSS, other alternative exons, or PAS
| R1 domain | ||||||
| Motif | Length | Coupled (28,716) | Not coupled (70,336) | E-value | Pfam ID | RBP |
| SVGV | 4 nt. | 12,121 | 23,872 | 6.0E-127 | PF00536 | SAMD4A |
| TGTCTGAA | 8 nt. | 108 | 70 | 1.2E-014 | PF00076 | RBM24; ENOX1 |
| R2 domain | ||||||
| Motif | Length | Coupled (53,490) | Not coupled (131,953) | E-value | Pfam ID | RBP |
| GSSB | 4 nt. | 29,261 | 65,661 | 2.4E-078 | PF00076; PF00098 | RBM4B |
| GGGAYTAC | 8 nt. | 223 | 164 | 2.8E-027 | PF00013 | NOVA2 |
| AGTMGCT | 7 nt. | 262 | 234 | 2.8E-024 | PF00076 | RBM28 |
| R3 domain | ||||||
| Motif | Length | Coupled (28,591) | Not coupled (70,138) | E-value | Pfam ID | RBP |
| SGTRAG | 6 nt. | 1043 | 1330 | 7.6E-051 | PF00076; PF00641 | FUS; SRSF2 |
| GAAGGTGA | 8 nt. | 98 | 49 | 1.5E-016 | PF00076; PF00641 | RBM5 |
Fig. 4Comprehensive map of protein peptides supports novel alternative splicing events in full-length MCF-7 transcriptome. a Histogram shows the distribution of peptide amino acid (aa) lengths that could be associated with either Gencode or PacBio transcript variants. b Scatter plot illustrates the number of unique peptide hits per gene based on PacBio (x-axis) or Gencode annotation (y-axis). Each dot represents a single gene locus based on matching of PacBio and Gencode genes. c Empirical cumulative distribution of relative peptide counts per gene for each peptide hit category. Genes with a single transcript annotation (single-transcript category) are shown in light blue. Multi-transcript genes with peptides matching to a subset of transcripts (sub-transcripts category) are shown in yellow. Multi-transcript genes with peptides matching to all annotated transcripts (all-transcripts category) are shown in brown. Multi-gene hits are shown in black. Dotted lines represent the cumulative distributions based on the Gencode annotation. d Bar charts illustrate the comparison of Gencode- or PacBio-based classification of Peptides. e Bar charts show the number of peptides derived from exon–exon junctions of transcripts. The number of peptides that match exon–exon junction of mutually inclusive (blue) or exclusive (yellow) exons. f Peptides with different classification matching to multiple transcripts of ITGB4. Black peptides are all-transcripts hits whereas, based on full-length MCF-7 transcriptome data, yellow peptides are only associated with a subset of transcripts. Exons are colored based on coupling networks, shown in red and blue