| Literature DB >> 28855263 |
Javier Tapial1,2, Kevin C H Ha3,4, Timothy Sterne-Weiler3, André Gohr1,2, Ulrich Braunschweig3, Antonio Hermoso-Pulido1,2, Mathieu Quesnel-Vallières3,4, Jon Permanyer1,2, Reza Sodaei1,2, Yamile Marquez1,2, Luca Cozzuto1,2, Xinchen Wang3, Melisa Gómez-Velázquez5, Teresa Rayon5, Miguel Manzanares5, Julia Ponomarenko1,2, Benjamin J Blencowe3, Manuel Irimia1,2.
Abstract
Alternative splicing (AS) generates remarkable regulatory and proteomic complexity in metazoans. However, the functions of most AS events are not known, and programs of regulated splicing remain to be identified. To address these challenges, we describe the Vertebrate Alternative Splicing and Transcription Database (VastDB), the largest resource of genome-wide, quantitative profiles of AS events assembled to date. VastDB provides readily accessible quantitative information on the inclusion levels and functional associations of AS events detected in RNA-seq data from diverse vertebrate cell and tissue types, as well as developmental stages. The VastDB profiles reveal extensive new intergenic and intragenic regulatory relationships among different classes of AS and previously unknown and conserved landscapes of tissue-regulated exons. Contrary to recent reports concluding that nearly all human genes express a single major isoform, VastDB provides evidence that at least 48% of multiexonic protein-coding genes express multiple splice variants that are highly regulated in a cell/tissue-specific manner, and that >18% of genes simultaneously express multiple major isoforms across diverse cell and tissue types. Isoforms encoded by the latter set of genes are generally coexpressed in the same cells and are often engaged by translating ribosomes. Moreover, they are encoded by genes that are significantly enriched in functions associated with transcriptional control, implying they may have an important and wide-ranging role in controlling cellular activities. VastDB thus provides an unprecedented resource for investigations of AS function and regulation.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28855263 PMCID: PMC5630039 DOI: 10.1101/gr.220962.117
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.VastDB: An atlas of alternative splicing profiles and functional associations in vertebrate cell and tissue types. (Left) vast-tools was used to profile 1478 independent RNA-seq data sets comprising 108 human, 139 mouse, and 61 chicken sets of tissues, cell types, and developmental stages, which produced thousands of AS events of all major types (center). Each AS event possesses a unique and constant event identifier (EventID) across vast-tools and VastDB, allowing profiling of new RNA-seq samples with vast-tools and direct contextualization of interesting events in VastDB. (Right) For each EventID, VastDB provides functional information, including a graphical display of splicing and gene expression levels across samples, sequence features, suggested primers for RT-PCR validation, genomic context through the UCSC Genome Browser, and multiple protein features (e.g., domains, disorder regions, 3D structure). Moreover, EventIDs have a homology annotation interconnecting the AS events across the different VastDB species.
Figure 2.A subset of exons in genes encoding DNA binding proteins is alternatively spliced across most cell and tissue types. (A) Distribution of PSIs across all samples with sufficient read coverage (violin plots, top) and total number of exons (histograms, bottom) for bins of alternative exons that are alternatively spliced (10 < PSI < 90) in an increasing fraction of samples. PanAS exons correspond to those exons that are alternatively spliced in >80% of the samples (i.e., red histograms). (Red dots) median PSI of the bin. (B) Enriched Gene Ontology categories using DAVID scores for genes harboring exons that are alternatively spliced in >80% of the samples (PanAS events; red, four last bins on A) or show switchlike inclusion patterns (SwitchAS; blue). (C) Percentage of human low-frequency AS (LFAS; yellow), SwitchAS (blue), or PanAS (red) AltEx events that have the same class of regulation in mouse orthologous exons. (D) Percentage of human low-frequency AS, SwitchAS, PanAS, and conserved PanAS exons that are predicted to generate alternative ORF-preserving isoforms (black), disrupt the ORF when included/excluded (light/dark gray), or overlap noncoding sequences (white). (E) For each AS group, PSI distributions obtained from ribosome-engaged RNA-seq data from multiple cell types. (F) For each AS group, PSI distributions in individual human eight-cell stage embryo blastomeres. The number of tested exons for each category is provided in parentheses. All P-values correspond to two-sided Fisher's exact tests; for E and F, the numbers of events with PSIs corresponding to alternative (10 < PSI < 90) versus nonalternative events is compared for each AS group.
Figure 3.Tissue regulation of AS is dominated by neural and muscle. (A) Heatmap and hierarchical clustering of highly regulated AS events (standard deviation of PSIs across samples higher than 20) in widely expressed human genes (events with sufficient read coverage in at least 40 samples). Samples with >80% missing values (i.e., events with read coverage below VLOW) were discarded. Immune-hem: immune-hematopoietic samples. (B) Bar plots showing the number of exons with increased PSI in a specific tissue compared to all other tissue types. (Early Dev) early embryonic stages in mouse (from oocyte to eight-cell stage) and chicken (from Stage X to HH6). (C) RT-PCR validations and corresponding VastDB RNA-seq PSI estimates for exons in Clasp2 (neural, increased PSI) and 1700106N22Rik (testis, decreased PSI).
Figure 4.Coregulated splicing network reveals two layers of tissue AS regulation. (A,B) Left: Graphical representation of the human splicing networks, highlighting the different exon communities for the first (A) and second (B) layer networks. Right: Plot of the mean absolute Z-score across tissues for each community of the first (A) and second (B) layer human networks. Dominant tissues are highlighted. Communities were named based on dominant tissues, when possible. For simplicity, only communities with 10 or more AS events are displayed. (C) Heatmap showing the percentage of node (exon) overlap between human (y-axes) and mouse (x-axes) communities for the first (left) and second (right) layer networks. P-values correspond to Bonferroni-corrected one-sided Fisher's exact tests. (D) Percentage of edge conservation for each main human community in the mouse network, for the first (left) and second (right) layer networks, and expected percentage of conservation based on randomized networks of the same size of the tested community. Error bars in controls represent first and third quartiles of the distribution. P-values correspond to permutation tests with 1000 random networks: (***) P-value ≤ 10−3; (**) 10−3 < P-value < 0.01; (*) 0.01 ≤ P-value < 0.05.