| Literature DB >> 35741347 |
Nicolò Gualandi1, Cristian Iperi1, Mauro Esposito1, Federico Ansaloni1,2, Stefano Gustincich1,2, Remo Sanges1,2.
Abstract
Transposable elements (TEs), also known as "jumping genes", are repetitive sequences with the capability of changing their location within the genome. They are key players in many different biological processes in health and disease. Therefore, a reliable quantification of their expression as transcriptional units is crucial to distinguish between their independent expression and the transcription of their sequences as part of canonical transcripts. TEs quantification faces difficulties of different types, the most important one being low reads mappability due to their repetitive nature preventing an unambiguous mapping of reads originating from their sequences. A large fraction of TEs fragments localizes within introns, which led to the hypothesis that intron retention (IR) can be an additional source of bias, potentially affecting accurate TEs quantification. IR occurs when introns, normally removed from the mature transcript by the splicing machinery, are maintained in mature transcripts. IR is a widespread mechanism affecting many different genes with cell type-specific patterns. We hypothesized that, in an RNA-seq experiment, reads derived from retained introns can introduce a bias in the detection of overlapping, independent TEs RNA expression. In this study we performed meta-analysis using public RNA-seq data from lymphoblastoid cell lines and show that IR can impact TEs quantification using established tools with default parameters. Reads mapped on intronic TEs were indeed associated to the expression of TEs and influence their correct quantification as independent transcriptional units. We confirmed these results using additional independent datasets, demonstrating that this bias does not appear in samples where IR is not present and that differential TEs expression does not impact on IR quantification. We concluded that IR causes the over-quantification of intronic TEs and differential IR might be confused with differential TEs expression. Our results should be taken into account for a correct quantification of TEs expression from RNA-seq data, especially in samples in which IR is abundant.Entities:
Keywords: RNA-seq; bioinformatics; intron retention; technical bias; transcriptomic; transposable elements expression quantification
Year: 2022 PMID: 35741347 PMCID: PMC9220773 DOI: 10.3390/biology11060826
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Figure 1Exploration of IR and TEs in RNA-seq data from Geuvadis. (A) Distribution of the number of retained introns per samples in the Geuvadis dataset. (B) Genomic distribution of transcribed TEs (light grey) compared to the mean of 1000 randomization (dark grey). Z-scores are reported as numbers on the top of the plot. Error bars represent the standard deviation. (C) Integrative Genomics Viewer (IGV) screenshot of an intron characterized by increase retention in three samples colored in dark grey with respect to the light grey samples. This intron contains two annotated repeated elements, which could result in being differentially expressed if analyzed without checking the retention of their hosting intron.
Figure 2Intron retention might increase intronic TE measurements. (A) Distribution of the mean expression of TEs located in retained introns (light grey) compared to the mean expression of TEs located in randomly chosen introns (dark grey). p-value is reported on the top of the plot. (B) Pie chart reporting the percentage of positive, negative, and non-significant correlations between TE expression and the hosting intron’s IRratio. (C) Distribution of significant (adjusted p-value < 0.05) coefficients of correlation between TE expression and hosting intron’s IRratio. (D) Comparison between the number of commonly retained introns containing at least one transcribed TE (red dashed line) in the Geuvadis dataset with the mean of 1000 randomizations of randomly chosen introns (light grey distribution). Z-score is reported on the top of the plot. E Comparison between the number of commonly retained introns containing at least one TE fragment (red dashed line) in the Geuvadis dataset compared to the mean of 1000 randomizations of randomly chosen introns (light grey distribution). Z-score is reported on the top of the plot.
Figure 3Increased intron retention correlates with an increased number of expressed intronic TEs. (A) Correlation between normalized retained introns counts and the number of transcribed intronic TEs per sample. On the top of the plot the coefficients of correlation (R) and the p-value are reported. (B) Correlation between normalized number of reads in common between TEs in RI and intergenic TEs and the number of transcribed intergenic TEs per samples are reported. On the top of the plot the coefficients of correlation R and the p-value are shown. (C) Distribution of the number of common multi-mapping reads among TEs located in retained introns and intergenic TEs per sample. The number is reported as a fraction with respect to the total number of reads mapped in TEs located in retained introns.
Figure 4Differential intron retention between groups biases a correct TEs differential expression analysis. (A) Volcano plot reporting the number of deregulated introns in IR-High vs. IR-Low samples from Geuvadis dataset. Significant results are reported in dark grey (B) Volcano plot reporting the number of deregulated TEs in IR-High vs. IR-Low samples from Geuvadis dataset. Significant results are reported in dark grey. (C) Genomic distribution of upregulated TEs (light grey) in IR-High with respect to IR-Low samples compared to the mean of 1000 randomizations (dark grey). Z-score are reported as numbers on the top of the plot. Error bars represent the standard deviation. (D) Volcano plot reporting the number of deregulated introns in DNMT-/- samples vs. WT control. Significant results are reported in dark grey. (E) Volcano plot reporting the number of deregulated TEs in DNMT-/- samples vs. WT control. Significant results are reported in dark grey. (F) Genomic distribution of upregulated TEs (light grey) in DNMT-/- samples, with respect to WT controls, compared to the mean of 1000 randomizations (dark grey). Z-scores are reported as numbers on the top of the plot. Error bars represent the standard deviation. (G) Volcano plot reporting the number of deregulated introns in SDE2 knock-down cells with respect to control cells. Significant results are reported in dark grey. (H) Volcano plot reporting the number of deregulated TEs in SDE2 knock-down cells with respect to controls samples. Significant results are reported in dark grey. (I) Genomic distribution of upregulated TEs (light grey) in SDE2 knock-down cells compared to the mean of 1000 randomizations (dark grey). Z-scores are reported as numbers on the top of the plot. Error bars represent the standard deviation.