| Literature DB >> 30982421 |
Feng Jiang1,2, Jie Zhang1, Qing Liu3, Xiang Liu2, Huimin Wang1, Jing He2, Le Kang1,2,4.
Abstract
The large genome of the migratory locust (Locusta migratoria) genome accumulates massive amount of accumulated transposable elements (TEs), which show intrinsic transcriptional activities. Hampering the ability to precisely determine full-length RNA transcript sequences are exonized TEs, which produce numerous highly similar fragments that are difficult to resolve using short-read sequencing technology. Here, we applied a 5'-Cap capturing method using Nanopore long-read direct RNA sequencing to characterize full-length transcripts in their native RNA form and to analyze the TE exonization pattern in the locust transcriptome. Our results revealed the widespread establishment of TE exonization and a substantial contribution of TEs to RNA splicing in the locust transcriptome. The results of the transcriptomic spectrum influenced by Piwi expression indicated that TE-derived sequences were the main targets of Piwi-mediated repression. Furthermore, our study showed that Piwi expression regulates the length of RNA transcripts containing TE-derived sequences, creating an alternative UTR usage. Overall, our results reveal the transcriptomic characteristics of TE exonization in the species characterized by large and repetitive genomes.Entities:
Keywords: Nanopore; direct RNA sequencing; insects; piwi; transposable element
Mesh:
Substances:
Year: 2019 PMID: 30982421 PMCID: PMC6546357 DOI: 10.1080/15476286.2019.1602437
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652
Figure 1.Enrichment of full-length RNA transcripts by 5ʹ-Cap capturing. (a) Flowcharts illustrating the protocols to enrich full-length RNA transcripts by 5ʹ-Cap capturing. (b) Enrichment of the long-adaptor sequences in the 5ʹend of RNA transcripts. The tool uShuffle shuffles RNA transcript sequences while preserving the k-let counts. (c) Distribution of the adaptor sequence identity and percentage of adaptors in the adaptor-containing transcripts. The adaptor sequences were identified by the GLSEARCH program using a global-local alignment algorithm. L30, the truncated adaptor sequences ranging from 1 to 30 in the 5ʹ end of the long adaptor; R40, the truncated adaptor sequences ranging from −40 to −1 in the 3ʹ end of the long adaptor. Using R20 results in a maximum number of adaptor-containing transcripts and thus are considered 100%. (d) Gene coverage evaluation by aligning RNA transcripts to the protein-coding genes. The relative coverage was summed and plotted along each 1/100 portion for each protein-coding transcript containing the long-adaptor sequences. (E) Transcript integrity assessment of individual RNA transcripts. For each protein-coding gene, the transcript coverage was calculated as the average coverage of each individual RNA transcript against its corresponding CAP transcript.
Figure 2.TE occurrence in the locust transcriptome. (a) Pie charts summarizing the annotation of the TE family in the locust transcriptome. (b) Co-occurrence frequency of TE families in the locust transcriptome. The length in the outer circle indicates the occurrence frequency for each TE family. The line width between two TE families is proportional to the co-occurrence frequency of two TE families. (c) TE coverage in the UTR and CDS regions. The longest transcript in each transcription unit is selected for coverage calculation. (d) Distribution of the percentage of TE-derived sequences in ncRNA transcripts. (e) Number of exons of ncRNA transcripts and protein-coding genes. (f) TE family enrichment ratio in TE-derived splicing sites. The TE family enrichment ratio is calculated as log2-scale (PS + 1/PT + 1). PS = Percentage of TE family in the TE-derived splicing site, and PT = Coverage percentage of TE family in the locust transcriptome. (g) Examples of enriched motifs in the TE-derived splicing acceptor sites in different TE families. (h) Summary of TE-derived splicing sites in the first/last exons and in internal exons.
Figure 3.Burst of TE expression activity upon RNAi silencing of Piwi. (a) Principal component analysis plot of individual libraries of dsPiwi and dsGFP samples. (b) Scatterplot comparing TE expression (log2-scale TPM + 1) between dsPiwi and dsGFP samples. RNA transcripts with more than 80% TE-derived sequences are shown. The dashed lines in red indicate a 1.5-fold change in TE expression. To improve visualization clarity, the RNA transcripts with TPM > 20 are shown. (c) Coverage percentage of TE-derived sequences in the four categories, including ncRNA, 5ʹUTR, CDS and 3ʹUTR. The coverage percentage was calculated as the ratio of the total length of TE-derived sequences in each category and total length of its corresponding category. (d) Random validation of alternative splicing events using PCR reactions and Sanger sequencing of PCR products purified from agarose gels. (e) Summary of alternative splicing events. Skipped exon, SE; Mutually exclusive exons, MX; Alternative 5′ or 3′ splice site, A5 or A3; Retained intron, RI. (f) Summary of alternative splicing events that are overlapped with TE exonization. (g) Length distribution of the representative RNA transcripts in dsPiwi and dsGFP samples. The transcriptional units whose representative RNA transcripts could be detected in both the dsPiwi and dsGFP samples were used for comparison. N.S., not significant. (h) Comparison of the transcript length ratios between the dsPiwi and dsGFP samples. The portion in red represents the percentage of the genes whose transcript length in the dsPiwi samples is longer than that in the dsGFP samples. (i) Distribution of sequence variants along the protein-coding genes that are longer in the dsPiwi samples than in the dsGFP samples. For each representative RNA transcript, all the exons in the dsPiwi and dsGFP samples were merged into a concatenated set and were normalized into 100 bins. Red and blue indicate the bins are only covered by the dsPiwi samples and dsGFP samples, respectively. Light grey indicates the bins are covered by both the dsPiwi and the dsGFP samples. (j) An example diagram shows the length difference of protein-coding transcripts (Clathrin heavy chain gene, homologous to CG9012 in Drosophila melanogaster) in the dsPiwi and dsGFP samples.