| Literature DB >> 20150413 |
Hugues Richard1, Marcel H Schulz, Marc Sultan, Asja Nürnberger, Sabine Schrinner, Daniela Balzereit, Emilie Dagand, Axel Rasche, Hans Lehrach, Martin Vingron, Stefan A Haas, Marie-Laure Yaspo.
Abstract
Alternative splicing, polyadenylation of pre-messenger RNA molecules and differential promoter usage can produce a variety of transcript isoforms whose respective expression levels are regulated in time and space, thus contributing specific biological functions. However, the repertoire of mammalian alternative transcripts and their regulation are still poorly understood. Second-generation sequencing is now opening unprecedented routes to address the analysis of entire transcriptomes. Here, we developed methods that allow the prediction and quantification of alternative isoforms derived solely from exon expression levels in RNA-Seq data. These are based on an explicit statistical model and enable the prediction of alternative isoforms within or between conditions using any known gene annotation, as well as the relative quantification of known transcript structures. Applying these methods to a human RNA-Seq dataset, we validated a significant fraction of the predictions by RT-PCR. Data further showed that these predictions correlated well with information originating from junction reads. A direct comparison with exon arrays indicated improved performances of RNA-Seq over microarrays in the prediction of skipped exons. Altogether, the set of methods presented here comprehensively addresses multiple aspects of alternative isoform analysis. The software is available as an open-source R-package called Solas at http://cmb.molgen.mpg.de/2ndGenerationSequencing/Solas/.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20150413 PMCID: PMC2879520 DOI: 10.1093/nar/gkq041
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 4.(A) Plot showing the 90% quantile of the average error for proportion estimation by POEM based on simulations for one gene with one exon-skipping event. The average error (y-axis) is calculated according to the number of total reads in the gene (x-axis) and for various skipped exon lengths: 120 bp (light grey), 240 bp (grey) or 360 bp (black). The average error is shown for a proportion of 20% (dashed lines) and 80% (plain line). (B) This plot shows the 90% quantile of the mean (circles) and maximum (squares) error (y-axis) for POEM on all annotated ENSEMBL (v.46) transcripts. (C) Scatter plot showing the correlation (PCC = 0.65) of inclusion rates (constitutive forms) on 123 AEEs derived from exon–exon junction counts (x-axis) and POEM estimations (y-axis). Cross marks denote AEEs in genes with a quality score ≤ −14. Dashed lines represent the 20% error margin in (C) and (D). (D) Scatter plot showing the correlation (PCC = 0.81) of the inclusion rates on 47 AEEs measured by qPCR (x-axis) and estimated by POEM for a single exon-skipping event (y-axis). Plus marks denote unannotated AEEs in ENSEMBL v.46. (E) POEM estimation for annotated transcripts of MPI in HEK cells. Numbers reported on light blue arrows represent the expected counts on exon–exon junctions according to the estimated proportions with POEM for the three annotated isoforms (ENST000000379693, ENST000000352410 and ENST000000323744). The proportion estimate for each isoform is shown to the right (in percent). qPCR primers were designed to estimate the inclusion rate of exon 2 (‘Materials and Methods’ section). The skipping event of exon 3 was not annotated in ENSEMBL v.46, but was supported by an observed junction read. (F) The bar chart shows the inclusion rate of exon 2 computed by POEM (grey) and measured by qPCR (black) for HEK and B cells.
Figure 1.AS analysis workflow. (A) RNA-Seq reads are mapped to the reference genome and intersected with exon positions, AEEs are predicted within a condition (CASI) or two conditions (DASI). POEM estimates splice form proportions within a condition using known transcript structures. (B) Details of the analysis steps for DASI, CASI and POEM, performed on RNA-Seq data for HEK and B cells. The number of tested genes, transcripts or exons is reported for each method.
Figure 2.(A) Sensitivity and specificity (y-axis) for CASI AEE prediction for different minor isoform proportions (x-axis) based on simulation by introducing 20% noise (‘Materials and Methods’ section). (B) Robustness estimation for predictions on HEK data. The change in predicted number of AEEs is shown relative to the total number of predictions for the whole dataset (y-axis) for 500 bootstrap samples using a CASI of −2. The x-axis shows the reduction in length that was introduced to an exon at random (p = 0.25). (C) RT-PCR validation of a predicted AEE of NONO in HEK cells (CASI); it shows the observed exon–exon junction (blue arrows) and the corresponding number of reads (above the arrows) for all exons of the three annotated isoforms (ENSEMBL v.46). S1 and S2 primers are placed on the splice junctions of the constitutive and the skipped forms, respectively (red dashed line) to uniquely amplify two different splice variants of NONO. R1 and S3 primers were designed inside surrounding exons. Exons not considered in CASI analysis are marked by an asterisk. (D) Agarose gels (1.5%) showing the RT-PCR amplification results of S1-R1, S2-R1 and S3-R1 fragments. The observed sizes of the bands correspond to the expected sizes.
Figure 3.Distribution of the number of AEEs predicted by CASI and DASI. Bars show the number of 5′- (black), internal (grey) and 3′-exons (white) predicted as AEE within cell lines (CASI ≤ −4) and between cell lines (|DASI| ≥ 2). The whiskers for CASI are obtained by shortening the length of the 5′- and 3′-exons artificially by 20% in order to estimate the error due to the annotation in the 5′- and 3′-end of a gene.
Figure 5.qPCR validation of a predicted AEE in MKI67 between HEK (blue) and B cell (red) (DASI). (A) Screenshot of the MKI67 gene. The primers were designed to compare the inclusion rate of exon 7 between HEK cells and B cells. (B) RT-PCR results validate the presence of the constitutive and the skipped form in both cell lines. For both S1-R1 (constitutive) and S2-R1 (skipped), a PCR product of length 163 bp is expected if the form is expressed, otherwise no band should be visible. (C) Bar charts representing the normalized expression values for the constitutive form (black) and the skipped form (grey) obtained by qPCR. The results show that the skipped form is more abundant in B cells relative to the constitutive form, as predicted by the DASI method (DASI = 5.2).