Literature DB >> 35088819

Mosquito long non-coding RNAs are enriched with Transposable Elements.

Elverson Soares de Melo1, Gabriel Luz Wallau1.   

Abstract

Long non-coding RNAs (lncRNAs) lack coding capacity and mounting evidence suggests that they have a regulatory role in diverse organisms. Most knowledge about lncRNAs comes from studies on vertebrates, including a structural association between lncRNAs and transposable elements (TEs). TE sequences are genomic parasites found in all branches of life and are particularly active and abundant in insect genomes. Here we investigate the contribution of TEs to lncRNA biogenesis in Aedes albopictus and Culex quinquefasciatus. We found that a large fraction of lncRNA loci co-occurs with TE loci in both species. Around 40% of A. albopictus and 52% of C. quinquefasciatus lncRNAs show some association with TEs. Most of the lncRNA/TE associations are represented by TE-derived sequences that are expressed as one or all exons of lncRNAs, including five lncRNAs that seem to influence immune-related genes involved in antiviral response. The contribution of TEs to lncRNAs also varies among the different types of TEs. The Gypsi superfamily is particularly enriched in lncRNAs sequences. In sum, this study demonstrates that transposable elements substantially contribute to lncRNAs biogenesis in A. albopictus and C. quinquefasciatus and may have an impact on regulatory modulation in these species.

Entities:  

Year:  2022        PMID: 35088819      PMCID: PMC8796034          DOI: 10.1590/1678-4685-GMB-2021-0215

Source DB:  PubMed          Journal:  Genet Mol Biol        ISSN: 1415-4757            Impact factor:   1.771


Mosquitoes of the genera Aedes, Anopheles, and Culex are vectors of several human pathogens. Many studies sought to investigate a range of biological phenomena in these species, such as insecticide resistance and pathogen infection, resulting in thousands of RNA sequencing datasets available. Some recently published articles used dozens of these datasets to characterize long non-coding RNAs (lncRNA) in mosquitoes (Azlan et al., 2021a,b). Many lncRNAs are derived from genes that resemble protein-coding genes (mRNA producers). A mature lncRNA may be processed like a true protein-coding gene mRNA having a similar half-life, but normally they undergo different mRNA processing steps (Ulitsky and Bartel, 2013). However, unlike protein-coding gene mRNAs, lncRNAs are not capable or have a limited capacity of producing fully functional proteins (Mercer et al., 2009). lncRNAs may contain spurious (not translated) or short Open Reading Frames (ORFs), that is, ORFs that might potentially give rise to short polypeptides (<100 amino acids), but these are commonly absent from the species proteome, or if detected, no function can be inferred or characterized (Ulitsky and Bartel, 2013). These small ORFs may have no functional consequences, but in some cases, they may evolve to a more complex polypeptide giving origin to new genes (Ruiz-Orera et al., 2020). Therefore, lncRNAs are a group of RNA molecules larger than 200 nucleotides, expressed from intergenic, intronic or exonic loci, that lack true coding capacity to produce functional proteins (Ulitsky and Bartel, 2013; Cech and Steitz, 2014; Dhanoa et al., 2018). Different studies have shown that lncRNAs impact the specimen phenotype in several ways, mainly influencing gene expression, but also regulating chromosome architecture, recruitment of chromatin modifiers, and modulating chromosomal interactions (Bonasio and Shiekhattar, 2014; Yao et al., 2019). In addition, lncRNA knockdowns or knockout experiments result in morphological and behavioral changes in different organisms (Mattick, 2018). Although our knowledge about lncRNA has advanced, there are still several unanswered questions. One example is: from which genomic structures/loci does lncRNA emerge? Strikingly, results from vertebrates showed an important role of transposable elements (TEs) copies in the emergence of lncRNAs (Kelley and Rinn, 2012; Kapusta et al., 2013; Hezroni et al., 2015). More recently, studies on plants also show that some intronic lncRNAs overlap with TEs (Pedro et al., 2018). Although there are few studies on invertebrates, the TE contribution to lncRNA biogenesis appears to vary widely among insects, from almost no TE sequences overlapping with lncRNA in hymenopterans, to considerable overlap in Drosophila melanogaster (Lopez-Ezquerra et al., 2017). A recent publication of our group characterized in detail the contribution of TEs to the genome of mosquitoes (Melo and Wallau, 2020). Here, we leverage such a rich TE dataset to evaluate the TE contribution to lncRNA loci in Aedes albopictus and Culex quinquefasciatus. We detected a broad contribution of some specific TE superfamilies in these two species' lncRNA profiles. In order to investigate the role of TEs in lncRNA biogenesis in A. albopictus and C. quinquefasciatus, we first updated a previous TE characterization conducted by Melo and Wallau (2020) using the most up-to-date genome assembly of both species (GCF_006496715.1 and GCF_015732765.1, respectively). Briefly, we used TEdenovo to generate putative TE consensus families; next filters were applied to remove ill assembled consensus (Figure S1 shows the entire analysis workflow). TE annotation files were generated using the TEannot pipeline (Flutre et al., 2011) with custom parameters. We only kept annotated TE copies larger than 200 nucleotides, as this is the minimum size of an lncRNA. In addition, decreasing the TE copy size increases the chance of spurious matches or superfamily misassignments. We then obtained the available annotation from A. albopictus RefSeq assembly GCF_006496715.1 and combined it with the lncRNA dataset annotated by Azlan et al. (2021b) to generate lncRNA coordinates of this species. The C. quinquefasciatus annotation dataset from RefSeq assembly VPISU_Cqui (GCF_015732765.1) was used as the lncRNA source, we also took the lncRNA dataset annotated by Azlan et al. (2021a) to compare between the current and previous assembly version (Cpip2). Lastly, we used GffCompare (Pertea and Pertea, 2020) to find the overlap degree between TEs and lncRNA coordinates in both species. We found large variability in lncRNA and TE co-occurrence: I - one or multiple TEs can be inserted in introns or exons of a single lncRNA gene; II - TE sequences can overlap intron/exons junctions of a lncRNA gene, or III - a single or multiexon lncRNA can be entirely encompassed by TE coordinates. We found that 37.88% of all A. albopictus lncRNAs genes have some degree of overlap with transposable elements (Figure 1A). While this proportion varies in both versions of C. quinquefasciatus assembly, from 32% (Figure S2) in CpipJ2 to 52.24% in VPISU_Cqui (Figure 1B). As the latter was the most recent and showed better contiguity, we followed the analysis focusing on this version. Moreover, it is interesting to note that 10.61% and 17.98% of lncRNAs loci are expressed in the TE neighborhood (maximum distance of 2 kb between a lncRNA locus and a TE copy) in A. albopictus and C. quinquefasciatus respectively. Although not overlapped with a TE region, the expression of these lncRNAs may be modulated by the surrounding TEs transcriptional activity (Lee et al., 2019).
Figure 1 -

Co-occurrence of lncRNAs and transposable elements in Aedes albopictus (A) and Culex quinquefasciatus (B). The overlaps were separated into five types: lncRNA overlap with TEs but in the opposite strand; lncRNA contains one or more TEs inserted in their coordinates; lncRNA presents some partial overlap with TEs, in this case, the intersection occurs in 5’ or 3’ of lncRNA; lncRNAs genes are fully derived from a TE copy insertion, where some represent the total length of the TE and other are formed by a specific region of a TE. The proportion of TE-derived lncRNA exons are shown by TE orders in C and D for A. albopictus and C. quinquefasciatus, respectively. This proportion indicates the prevalence of Gypsy TE in A. albopictus (E) and C. quinquefasciatus (F).

LncRNAs occurring in introns of coding genes (intronic) and between genes (intergenic) were assessed separately and the difference in TE participation in the biogenesis of these two types of lncRNA was very small. In A. albopictus, 39.34% of intronic lncRNAs and 37.05% of intergenic lncRNAs overlaps with TEs. This variation is smaller in C. quinquefasciatus where 52.86% and 52.36% of intronic and intergenic lncRNAs co-occurs with TEs, respectively. The similarity in the TE/lncRNAs co-occurrence between intronic and intergenic regions differs substantially from vertebrate lncRNAs, where lncRNAs from intergenic and gene-poor regions are richer in TEs than lncRNAs derived from the intronic region and gene-rich part of the genome (Kapusta et al., 2013). We also evaluate if the intersection between lncRNA genes and TEs occurs more frequently in introns or exons of lncRNAs. We observed that 77.56% and 85.17% of the lncRNA genes that show some degree of overlap with TEs have at least one exon involved with a TE insertion in A. albopictus and C. quinquefasciatus respectively. This represents 29.37% and 44.49% of all lncRNA expressed in these species. The high fraction of TEs found in lncRNA exons contrasts with some other insects (Lopez-Ezquerra et al., 2017). Thus, the depletion of TEs in lncRNAs exons, as suggested by previous authors, can not be extrapolated for insects in general. At least, in insect species with a large genome and high density of TEs, the contribution of TEs to the evolution and properties of lncRNA can be as important as for vertebrates. In lncRNA exons, TEs can be directly responsible for the lncRNA activity domains (Johnson and Guigó, 2014). Besides, they can also control the expression of lncRNAs via the PIWI-piRNA pathway since the TE regions within lncRNAs may be a target of piRNAs derived from other TE copies dispersed in the genome (Wang and Lin, 2021). Interestingly, a significant fraction of TE copies that overlaps with lncRNA genes is derived from LTR retrotransposons. They represent 82,82% and 31,89% of lncRNA exons in A. albopictus and C. quinquefasciatus species (Figure 1C, D). Among these retrotransposons, the Gypsy superfamily stands out. In A. albopictus, Gypsy superfamily alone is responsible for more than half of all TE-derived exons of lncRNA (Figure 1E, Table S1), and almost one-quarter in C. quinquefasciatus (Figure 1F, Table S1). Strikingly, the proportion of TE superfamily in lncRNA exons does not represent the proportion that TE superfamilies copies found in the mobilome (Chi-squared test, P-value < 2.2e-16) (Figure S3). Performing an over-representation analysis using the clusterProfiler package (Yu et al., 2012), version 3.10.1, we observed that some TE superfamilies are enriched in lncRNA exons of both species (Figure 2B, D). The elements from the Gypsy superfamily represent again one of the most interesting cases. This superfamily shows the highest geneRatio and lower p-values in over-representation analysis among all TE superfamilies of both species (MITE-like elements are not a superfamily, instead they are heterogeneous elements usually derived from DNA transposons that lost their coding region and are flanked by terminal inverted repeats - TIRs). Gypsy elements represent 53.95% of lncRNA exons that overlap with TEs, but only 20.37% of all TE copies found on A. albopictus genome (Figure 2A). This discrepancy increases substantially in C. quinquefasciatus where Gypsy represents 22.38% of lncRNA exons, but only 1.69% of TE copies on the genome, a 13-fold difference (Figure 2C). The over-representation of some LTR retrotransposons has been observed in human and mouse, but not in insects, where studies on some species showed that TEs on lncRNA reflect genomic TE landscapes (Lopez-Ezquerra et al., 2017). The predominance of a specific TE type in lncRNAs also varies between species, LTR is the main TE order in lncRNA of A. albopictus, as in humans, mice, and insects such as D. melanogaster (Kapusta et al., 2013; Lopez-Ezquerra, ). On the other hand, DNA transposons are prevalent in lncRNA of C. quinquefasciatus, as in other insects as Tribolium castaneum and Sogatella furcifera (Lopez-Ezquerra ; Chang et al., 2020). This suggests that the predominant TE type in lncRNAs is species or genus-specific and cannot be extrapolated to family or higher taxa levels.
Figure 2 -

Over-representation analysis of TE superfamilies in lncRNA exons. Distribution of superfamilies that are enriched in A. albopictus (A) and C. quinquefasciatus (C), a full list of superfamilies is shown in Figure S3. The results of the over-representation analysis show that Gypsy and BEL (LTR retroelements) are the most enriched superfamilies in both A. albopictus (B) and C. quinquefasciatus (D). MITE-like elements are not a TE superfamily as they are a heterogeneous group of defective elements that have TIRs.

Azlan et al. (2021b) identified 13 lncRNAs that regulate the expression of immune-related genes involved in arbovirus antiviral defense, here we found that five of these lncRNA genes are composed of TEs sequences. Two lncRNAs have one or more TEs inside their introns (XR_003898565.1, XR_003896231.1). The remaining three are fully comprised into TEs copies boundaries, two of them in the same strand (XR_003894180.1, lncRNA_5135.1) and the other on the opposite strand (XR_002127134.2). These three lncRNAs are derived from LTR retroelements (Gypsy, Bel-Pao, and Copia). Moreover, one out of 13 also presents an LTR in its flanking region. LTR retroelements are known to possess long terminal repeats (LTRs) at the boundaries of their coding region. In addition to complete and fragmented LTR retroelements copies, many solo-LTRs are also found dispersed into the genome. LTRs regions contain RNA polymerase II promoters as well as other regulatory sequence motifs, and several studies have been showing that LTRs are more likely to keep these sites than other TE types, increasing the likelihood of modulating flanking genes expression (Chuong et al., 2017). Furthermore, selective pressure on lncRNAs may facilitate the domestication of LTRs as lncRNA promoters (Thompson et al., 2016). In order to evaluate if there are any distinct TE age patterns of TE copies co-occurring or not with lncRNA exons, the divergence of each TE copy that overlaps with lncRNA and that do not overlap with lncRNAs was estimated against the TE consensus family using blastn (Figure S1). The divergence of each TE copy to the respective TE family consensus sequence represents the relative age of each copy since its split from the ancestral element-the TE family consensus sequence. That is, younger TE copies had less time to accumulate mutations since its split from the ancestral TE and show a lower divergence to the consensus while older TE copies had more time to accumulate mutations and are characterized by a higher divergence to the consensus. A Wilcoxon Rank Sum test (wilcox.test on R) was used to compare the copy divergence between the two datasets. On average, TE copies on lncRNA exons are most similar to TE consensus sequences (younger copies) than TEs copies that are not co-occurring with lncRNAs (Figure 3) for both species. Around 69% and 60% of TE copies involved with lncRNA exons are younger than TE copies dispersed in the genome of the same family, in A. albopictus (Figure S4) and C. quinquefasciatus (Figure S5) respectively. Such divergence patterns rise two non-mutually exclusive hypotheses: I - TE copies co-occurring with lncRNA are indeed younger insertions; II - such low divergence is a result of conserved old copies that are being maintained under purifying selection. Ulitsky and Bartel (2013) came to the conclusion that lncRNA exons are generally more conserved than intergenic regions (neutrally evolving), in vertebrates (Ulitsky and Bartel, 2013) giving support for the latter hypothesis. However, at the moment, our results are insufficient to differentiate and test these hypotheses due to the lack of orthologous lncRNA loci between Ae. albopictus and C. quinquefasciatus.
Figure 3 -

Relative age for TE copies inside (blues) and outside (yellow) lncRNA exons of Aedes albopictus (A) and Culex quinquefasciatus (B). Relative age was calculated using the blastn divergence of the copies for each TE family against the TE family consensus. ** means a p-value < 0.01, **** means a p-value < 0.0001.

In summary, our study demonstrates that a significant fraction (~37-52%) of long non-coding RNAs from A. albopictus and C. quinquefasciatus mosquitoes is involved with transposable elements copies. A portion of these TEs may actively contribute to the regulation of the mosquito’s defense against viral pathogens, and probably to other functions in mosquitoes. From this point of view, a fraction of TE copies that gave rise to functional lncRNAs would no longer be considered genomic parasites or selfish elements, but as a host gene that was co-opted for regulatory functions, as shown in previous studies (Chuong et al., 2017). Our study, also shows that there is an over-representation of LTR retrotransposons co-occurring with lncRNAs for both studied species, which differ from previous observations in Anopheles gambiae. Although several TE superfamilies are distributed across species (Petersen et al., 2019), co-option for regulatory functions as lncRNAs exons appears to be species or genus-specific, which is supported by the fact that the predominant types of TEs in lncRNAs are different among different insects’ species. Finally, the distribution of TEs in different types of lncRNA, as well as their depletion or enrichment in exons of these structures, does not seem to be characteristic of a group of organisms. Following this reasoning, the profile of TE superfamilies present in lncRNAs is substantially different between species and does not seem to depend on the proportion of superfamilies in the mobilome. However, definitive responses to the influence of TE on lncRNAs for the entire Culicidae family of insects will require additional RNA-Seq experiments from various species belonging to different tribes that compose this taxon.
  25 in total

Review 1.  The noncoding RNA revolution-trashing old rules to forge new ones.

Authors:  Thomas R Cech; Joan A Steitz
Journal:  Cell       Date:  2014-03-27       Impact factor: 41.582

Review 2.  Long Noncoding RNAs and Repetitive Elements: Junk or Intimate Evolutionary Partners?

Authors:  Hyunmin Lee; Zhaolei Zhang; Henry M Krause
Journal:  Trends Genet       Date:  2019-10-29       Impact factor: 11.639

3.  Identification and characterization of long noncoding RNAs and their association with acquisition of blood meal in Culex quinquefasciatus.

Authors:  Azali Azlan; Mardani Abdul Halim; Faisal Mohamad; Ghows Azzam
Journal:  Insect Sci       Date:  2020-07-22       Impact factor: 3.262

Review 4.  Long Terminal Repeats: From Parasitic Elements to Building Blocks of the Transcriptional Regulatory Repertoire.

Authors:  Peter J Thompson; Todd S Macfarlan; Matthew C Lorincz
Journal:  Mol Cell       Date:  2016-06-02       Impact factor: 17.970

Review 5.  Cellular functions of long noncoding RNAs.

Authors:  Run-Wen Yao; Yang Wang; Ling-Ling Chen
Journal:  Nat Cell Biol       Date:  2019-05-02       Impact factor: 28.824

6.  Considering transposable element diversification in de novo annotation approaches.

Authors:  Timothée Flutre; Elodie Duprat; Catherine Feuillet; Hadi Quesneville
Journal:  PLoS One       Date:  2011-01-31       Impact factor: 3.240

7.  Mosquito genomes are frequently invaded by transposable elements through horizontal transfer.

Authors:  Elverson Soares de Melo; Gabriel Luz Wallau
Journal:  PLoS Genet       Date:  2020-11-30       Impact factor: 5.917

8.  Genome-wide identification of Aedes albopictus long noncoding RNAs and their association with dengue and Zika virus infection.

Authors:  Azali Azlan; Sattam M Obeidat; Kumitaa Theva Das; Muhammad Amir Yunus; Ghows Azzam
Journal:  PLoS Negl Trop Dis       Date:  2021-01-22

9.  Transposable elements reveal a stem cell-specific class of long noncoding RNAs.

Authors:  David Kelley; John Rinn
Journal:  Genome Biol       Date:  2012-11-26       Impact factor: 13.583

10.  GFF Utilities: GffRead and GffCompare.

Authors:  Geo Pertea; Mihaela Pertea
Journal:  F1000Res       Date:  2020-04-28
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.