| Literature DB >> 30744673 |
Stephanie M Linker1,2, Lara Urban1,2, Stephen J Clark3, Mariya Chhatriwala4, Shradha Amatya4, Davis J McCarthy1,2, Ingo Ebersberger5,6, Ludovic Vallier4,7,8, Wolf Reik3,4,9, Oliver Stegle10,11,12, Marc Jan Bonder13,14.
Abstract
BACKGROUND: Alternative splicing is a key regulatory mechanism in eukaryotic cells and increases the effective number of functionally distinct gene products. Using bulk RNA sequencing, splicing variation has been studied across human tissues and in genetically diverse populations. This has identified disease-relevant splicing events, as well as associations between splicing and genomic features, including sequence composition and conservation. However, variability in splicing between single cells from the same tissue or cell type and its determinants remains poorly understood.Entities:
Keywords: Alternative splicing; Cell differentiation; DNA methylation; Multi-omics; Single-cell analysis; Splicing prediction
Mesh:
Year: 2019 PMID: 30744673 PMCID: PMC6371455 DOI: 10.1186/s13059-019-1644-0
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Single-cell splicing and considered features for modeling splicing rates. a Two canonical splicing models. The “cell model” assumes that splicing variation is due to the differential splicing between cells, with each cell expressing one of two splice isoforms. The “gene model” corresponds to the assumption that both splice isoforms can be expressed in the same cells. b Mean-variance relationships of splicing rates in iPS cells. Shown is the standard deviation of splicing rates across cells for the same cassette exon (standard deviation of PSI) as a function of the average inclusion rate of the cassette exons across cells, considering 84 iPS cells. Solid lines correspond to the expected relationship when either assuming a “cell model” (black line) or when assuming the “gene model” (red line). c Illustration of the considered features and genomics contexts for predicting splicing variation. “A” denotes the alternative exon; “I1” and “I2” correspond to the upstream and downstream flanking introns, respectively; and “C1” and “C2” to the upstream and downstream flanking exons, respectively. The 5′ and 3′ ends (300 bp) of the flanking introns are considered separately
Fig. 2Regression-based prediction of single-cell splicing variation. a Prediction accuracy of alternative regression models for predicting splicing rates in single cells. Shown are out of sample r2 (based on tenfold cross-validation) in iPS cells (left) and endoderm cells (right). The genomic model (genomic, dark blue) was trained using sequence k-mers, conservation scores and the length of local contexts (size of the cassette exon, length of flanking introns) as input features. Other models consider additional features that capture average methylation features aggregated across cells (genomic and mean methylation, blue) or cell-specific methylation features (genomic and cell methylation, light blue). Error bars denote ± 1 standard deviation across four repeat experiments. b Relevance of individual features for predicting splicing rates, quantified using correlation coefficients between individual features and splicing rates. Shown are the average feature importance scores across all cells with error bars denoting ± 1 standard deviation across cells. Features are ranked according to absolute correlation coefficient with methylation features shown in gray. c Principal component analysis on the feature relevance profiles as in b across all cells. d Weights of the ten most important features that underpin the first principal component in c (shown are the five features with the largest positive and negative weight respectively), which include k-mers with methylation information of the downstream intron I2. Methylation features are shown in gray
Fig. 3Classification of cassette exons based on single-cell splicing patterns in iPS cells. a Single-cell splicing rate (PSI) distributions of the 5 splicing categories (inspired by Song et al. [12]) in 84 iPS cells. Intermediate splicing categories that can only be defined based on single-cell information are framed by a gray box. b Variation of PSI (standard deviation) across cells as a function of the average inclusion rate of cassette exons across 84 iPS cells, colored according to their respective splicing category as defined in a. The solid black line denotes the LOESS fit across all cassette exons. c Performance of logistic regression models for predicting splicing categories based on genomic features. Shown is the receiver operating characteristics for each splicing category and the macro-average (area under the curve, AUC). d Prediction performance of alternative regression models for each splicing category, either considering a model trained using genomic features (“genomic,” left), genomic and all DNA methylation features (“genomic and methylation,” center) as well as only DNA methylation features (“methylation,” right). The genomic model includes k-mers, conservation scores, and region lengths (see Fig. 1c). The genomic and methylation model additionally includes DNA methylation features. The methylation model includes average DNA methylation features per sequence context. Splicing categories are coded in color as in a. Error bars denote ± 1 standard deviation across 4 repeat experiments. e Distribution of DNA methylation levels in the upstream exon (C1) per splicing category. Methylation is decreased in underdispersed exons
Fig. 4Comparison of splicing category distributions between iPS and endoderm cells. a Pie chart showing the number of category switches between iPS and endoderm cells (left panel). The zoom-in (right panel) shows details of different category switches. The outer pie chart shows the splicing category of each cassette exon at the iPS state and the internal pie chart shows the respective category at endoderm state. Non-annotated slices in the pie chart reflect ~ 1% of the data. b DNA methylation changes associated with the observed category switches. The top panel shows the iPS and endoderm splicing categories colored according to a. The bottom panel shows DNA methylation levels within the seven sequence contexts of a cassette exon as compared to the DNA methylation levels of the cassette exons that do not switch in their splicing category. Significant changes (Q < 0.05) are marked with a star. DNA methylation of the alternative exon and its vicinity is increased in cassette exons that switch from the underdispersed category. Cassette exons that switch from either included or excluded to any other splicing category show increased DNA methylation of the upstream exon (C1). c Performance of logistic ridge regression models that predict the absence/presence of switching splicing categories between iPS and endoderm states. DNA methylation information improves the prediction of the under- and overdispersed cassette exons. The categories are colored according to a. Error bars denote ± 1 standard deviation across four repeat experiments