Literature DB >> 27098918

LINE-1-like retrotransposons contribute to RNA-based gene duplication in dicots.

Zhenglin Zhu1, Shengjun Tan2, Yaqiong Zhang2, Yong E Zhang2,3.   

Abstract

RNA-based duplicated genes or functional retrocopies (retrogenes) are known to drive phenotypic evolution. Retrogenes emerge via retroposition, which is mainly mediated by long interspersed nuclear element 1 (LINE-1 or L1) retrotransposons in mammals. By contrast, long terminal repeat (LTR) retrotransposons appear to be the major player in plants, although an L1-like mechanism has also been hypothesized to be involved in retroposition. We tested this hypothesis by searching for young retrocopies, as these still retain the sequence features associated with the underlying retroposition mechanism. Specifically, we identified polymorphic retrocopies (retroCNVs) by analyzing public Arabidopsis (Arabidopsis thaliana) resequencing data. Furthermore, we searched for recently originated retrocopies encoded by the reference genome of Arabidopsis and Manihot esculenta. Across these two datasets, we found cases with L1-like hallmarks, namely, the expected target site sequence, a polyA tail and target site duplications. Such data suggest that an L1-like mechanism could operate in plants, especially dicots.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27098918      PMCID: PMC4838847          DOI: 10.1038/srep24755

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Rapidly accumulating evidence demonstrates that new genes play diverse functional roles and serve as a major driver of phenotypic evolution1. One important mechanism to create these lineage- or species-specific genes is RNA-based duplication or retroposition1, in which an mRNA template is reverse transcribed by retrotransposons and subsequently reinserted into the genome as a functional retrocopy or retrogene2. The identification of retrogenes is straightforward given the hallmark of intron loss relative to the parental copies. Moreover, because of the loss of most of the preexisting regulatory sequences, retrogenes are predicted to be subject to neofunctionalization, i.e., to play a different function compared to their parental genes3. Thus, retrogenes have been an attractive research target for decades. For example, one of the first reported new genes, jingwei in Drosophila, is a retrogene4. In plants, genome-wide surveys performed by others and ourselves have identified numerous retrogenes in Arabidopsis (Arabidopsis thaliana), rice, and so on567. Although the majority of these retrogenes are functionally uncharacterized, Sun is known to underlie morphological variation of the tomato fruit8, while CYP98A8 and CYP98A9 are involved in pollen development in Arabidopsis9. Retrotransposons provide the enzymatic machinery for retroposition and can be divided into various orders, including long interspersed nuclear elements (LINEs) and long terminal repeat (LTR) retrotransposons, among others10. LINEs are more abundant in animals, whereas LTR retrotransposons are dominant in plants10. LINE-1 or L1 appears to be the exclusive driver of retroposition in mammals211. A typical L1-mediated insertion is accompanied with sequence features such as a target site (TTAAAA), target site duplication (TSD) and a 3′ polyA tail21213. Notably, a nick is created between “TT” and “AAAA” during the retroposition, and the later was further used to prime the reverse transcription of mRNA. So, “AAAA” is also duplicated in this aspect. As a conventional terminology, “AAAA” is generally treated as a part of the polyA tail rather than TSD2. However, these three features are not always all present. For example, L1-mediated tailless retrocopies are found in therian genomes, where the polyA tail is absent but the target site and TSD are still present14. In contrast, in plants, retrogenes are mostly flanked by LTR retrotransposons, including the aforementioned Sun locus in tomato8, Bs1 in maize1516 and a dozen retrogenes in rice5, suggesting that retroposition in these species is mainly mediated by LTR retrotransposons. Interestingly, although not as abundant as in mammals, plant genomes are known to encode L1-clade LINEs17, and thus, L1 retrotransposons are hypothesized to drive retroposition in plants due to the relaxed recognition of template RNAs18. In the present study, we provide evidence that L1-like retrotransposons mediate the creation of retrocopies in plants, especially dicots.

Results

To investigate the mechanism of retroposition in plants, we focused on the model species, Arabidopsis, due to the following two reasons: 1) the reference genome encodes up to 251 retrogenes7; 2) although Arabidopsis consists of more LTR retrotransposons than LINEs19, no retrogenes were reported to be flanked by LTR retrotransposons20, suggesting that LINE-mediated retroposition could occur. Because the sequence features associated with retroposition (e.g., polyA tail or flanking LTRs) rapidly degenerate due to the accumulation of secondary mutations2, we focused on young retrocopies. Specifically, we took advantage of the next-generation sequencing data for 18 Arabidopsis accessions21 and searched for polymorphic retrocopies or retro-copy-number-variants (retroCNVs), followed by targeted local de novo assembly (see the Materials and Methods section). We were able to assemble the full-length sequences and corresponding flanking regions of four retroCNVs (Table 1), whose population frequencies ranged from 1/18 to 14/18 (Supplementary Tables S1 and S2 and Fig. S1A–D). On the basis of the assembled sequences, we validated all four retroCNVs and flanking sequences with PCR (Supplementary Fig. S1E,F). These retroCNVs are located on different chromosomes from their parental genes (Table 1), which is consistent with the between-chromosome duplication preference of retroposition22 but stands in contrast to the within-chromosome bias of DNA-level duplication23.
Table 1

RetroCNVs in Arabidopsis.

PGASE-EE-II-EISNFlankPolyATSDTTAAAAMechanism
AT3G06040.1Can-071714Chr2(+): 7.9M3 NNYUncertain
AT3G08580.2No-0252225Chr2(−): 11.7M14LTR/LTRNNNLTR
AT5G58720.1Oy-01087372Chr4(+): 7.6M3 YYYL1-like
AT5G51410.2No-0261826Chr1(+): 12.3M1 NNNUncertain

“PG” means the parental gene from which the retroCNV is derived. Since parental gene could encode multiple isoforms, only one transcript model (“0.1” or “0.2”) with the highest sequence similarity to the retroCNV is listed. “AS” denotes the accession in which the retroCNV is assembled, while “N” denotes the number of accessions in which the retroCNV is present. In the columns entitled “E-E” (reads mapped to exon-exon junctions), “E-I” (reads mapped to exon-intron junctions) or “I-E” (reads mapped to intron-exon junctions), the numbers refer to the total count of reads mapped to the corresponding junctions. “IS” is the coordinate of the insertion site with “+/−” showing the orientation of the retroCNV relative to the sense strand of the inserted chromosome (Chr). “Flank” shows the retrotransposons in the 5′/3′ flanking region of the retroCNV. Only the retroCNV derived from AT3G08580.2 is flanked by LTR retrotransposon at both sides whereas no recognizable retrotransposon is associated with the other three cases. The next three columns “PolyA”, “TSD” and “TTAAAA” list whether there is polyA, TSD or TTAAAA-like sequences in the flanking region, respectively. Given these sequence features, the mechanism is inferred in the last column.

We then examined the sequence features associated with the four retroCNVs to identify the underlying retroposition mechanism. We found one retroCNV (RC_AT3G08580.2, retroCNV derived from AT3G08580.2) flanked by two ATRAN-type LTR retrotransposon segments, suggesting an LTR retrotransposon-mediated mechanism (Supplementary Fig. S2). Interestingly, this retroCNV encodes one intron inherited from its parental gene (Supplementary Fig. S2C), which is consistent with pervasive intron retention events in plants due to alternative splicing2425. In contrast, no LTR retrotransposons were associated with the other three retroCNVs. Specifically, RC_AT5G58720.1 has features similar to those of retrogenes created by L1s (Fig. 1A and Supplementary Fig. S3): a candidate 10-bp polyA tail, a potential 17-bp TSD (identity of 61%, BLAST E = 0.45) and a possible target site “AAAAAA”, which is similar to the canonical “TTAAAA” motif. These data are consistent with the swift degeneration of flanking sequences after retroposition226. Since its putative TSD is too variable, we inferred that this site is possibly not a real TSD but rather encoded by the reference genome or accessions not encoding the retroCNV. In order to test this possibility, we downloaded all the assembled sequences of the 18 accessions, and investigated the insertion site in the accessions with RC_AT5G58720.1 genotyped as absent. We only found a single TSD (3′ TSD) rather than 5′ TSD and polyA at the insert site across the reference genome and 15 accessions with this retroCNV genotyped as absent (Supplementary Fig. S4A). Such a result suggests that RC_AT5G58720.1 is mediated by the L1-like mechanism.
Figure 1

Schematic representation of three retrocopies.

Thick and thin boxes stand for the coding region and the untranslated region, respectively. The symbol like “H” refers to intron. The exon size is roughly drawn to scale. The arrow means the transcription direction. The dashed line shows the correspondance of sequences between the parental gene and the retroCNV. The retroposed segment is marked in purple with the other in light blue. For the retroCNV, the candidate target site, target site duplication and polyA tail are marked in orange, green, and red, respectively. Panel (A) shows the retroCNV derived from the parental gene AT5G58720.1 in Arabidopsis where a partial sequence derived from the last three exons of AT5G58720.1 was retroposed and inserted into Chromosome 4 (Chr 4). Panel (B) shows a retrocopy encoded by the Arabidopsis reference genome, and Panel (C) shows a retrocopy encoded by the M. esculenta reference genome. Interestingly, in Panel (B), an insertion of LTR retrotransposon (Copia) occurrs in the middle of the retrocopy, which is marked in dark yellow.

RC_AT3G06040.1 has analogous features as RC_AT5G58720.1, but the polyA tail is short (4 bp) with an A-to-T substitution and a 33-bp TSD shows only 46% identity (E = 0.15, Supplementary Fig. S5). More than that, different from RC_AT5G58720.1, both the TSDs and polyA tail can be found across the reference genome and all accessions with this retroCNV genotyped as absent (Supplementary Fig. S4B). Thus, the inference of underlying mechanism is impossible. Interestingly, in all the accessions without RC_AT3G06040.1, there is an insertion around 6 kb between the two “TSDs”, which is replaced by RC_AT3G06040.1 in Can-0. Since the orthologous region in the closely related species, Arabidopsis lyrata, does not encode either the insertion or the retroCNV (Supplementary Fig. S6), whether the insertion predates the retroCNV or vice versa is unknown. Finally, the fourth retroCNV, RC_AT5G51410.2, is peculiar in that the retroposed sequence is inserted into the reference genome without any L1-like or LTR features (Supplementary Fig. S7). In other words, no polyA tail, TSD, target motif or LTR retrotransposon remnant exists, suggesting either an additional, previously unknown mechanism or the rapid degeneration of the aforementioned hallmark sequences. If Arabidopsis retroCNVs show hallmarks typical of L1-mediated retrogenes, a second type of young retrocopies, i.e., recently evolved retrocopies encoded by the reference genome, may also possibly harbor these features. By improving on the previous work6720, we identified 10 retrocopies with an overall identity higher than 95% relative to the parental loci (Materials and Methods) including six entries covered in6720 and four novel ones (Supplementary Table S3). Because all of these retrocopies are shared by at least 17 of the 18 Arabidopsis populations (Supplementary Table S4), they are very likely older than the retroCNVs, and thus the hallmarks associated with L1s may have already disappeared. Of the 10 retrocopies, we were able to identify the retroposition mechanism for four of them: one is associated with LTR retrotransposons, and the other three exhibit L1-like hallmarks (Supplementary Table S3). Specifically, R_AT4G31900.1 harbors a 22-bp polyA with only two A-to-G substitutions and a 15-bp TSD with high identity (93%, E = 3 × 10−5, only one mismatch). Interestingly, a 1.8-kb insertion (LTR element Copia-82_ALY-I) is situated in the middle (Fig. 1B and Supplementary Fig. S8), suggesting a secondary mutation after the L1-mediated retroposition. Similarly, R_AT1G05890.1 and R_AT4G21660.1 encode a 6-bp and 15-bp polyA tail, respectively, although no TSD was observed (Supplementary Figs S9 and S10). The remaining six retrocopies are like the aforementioned RC_AT5G51410.2 where the absence of sequence hallmarks precludes the inference of the underlying mutational mechanism. Since the L1 family is widely shared across different plants17, we followed the same strategy used in Arabidopsis to identify recently derived retrocopies in another dicotyledonous plant, the cassava, Manihot esculenta (M. esculenta). We chose cassava because it represents another major branch of dicots that diverged from Arabidopsis more than 100 million years ago27. We identified 13 young retrocopies and were able to infer the retroposition mechanism for seven of these on the basis of sequence features: three retrocopies were created by LTR retrotransposons and four were associated with L1-like hallmarks (Supplementary Table S5). For example, R_cassava4.1_019865m harbors a 15-bp polyA tract with an A-to-C substitution, a 13-bp TSD with only 1 mismatch (E = 8 × 10−5) and an exact copy of the hexanucleotide target sequence “TTAAAA” (Fig. 1C and Supplementary Fig. S11).

Discussion

By analyzing polymorphic retrocopies and recently evolved retrocopies encoded by reference genomes, we show that in addition to LTR retrotransposons, L1-like machinery also contributes to the formation of retrocopies in dicots. Because L1 retrotransposons are widely shared across dicots and monocots1718, it would not be surprising for future work to reveal polymorphic or recently originated retrocopies in monocots with sequence features of L1-like retroposition. The reason that we prefer the term “L1-like” rather than L1 is due to the complexity of the sequence features associated with retroCNVs and retrocopies. Specifically, only R_cassava4.1_019865m (Fig. 1C) fits perfectly with the standard L1-mediated retroposition model in mammals in terms of the polyA tail, TSD and “TTAAAA” target site2. In contrast, the other cases show deviations from this model, including highly diverged TSDs, which may be explained by spurious alignments, as suggested by the non-significant BLAST E-value and/or non-standard target site found for RC_AT5G58720.1 (Fig. 1A). Certainly, in these cases, L1-like mechanisms may still work, especially considering that these deviations can be explained by mutations subsequent to the retroposition. However, at least for RC_AT5G51410.2 (Supplementary Fig. S7), the L1 or L1-like model hardly works. Even in the relaxed tailless model of L1-mediated retroposition, the TSD and “TTAAAA” remain14, whereas the polyA tail, TSD and “TTAAAA” are totally absent for RC_AT5G51410.2. Because this retroCNV is present in only one out of 18 accessions and is thus likely very young (Supplementary Table S1), the absence of all three features is less likely to be accounted for by secondary mutations. It is plausible to think that the retroposition of RC_AT5G51410.2 was mediated by an as yet unknown mechanism. Considering all of these complexities, in-depth experimental work is called for to formally test the functional link between LINE elements and retroposition in plants. In addition, young retrocopies compiled in this study not only shed light on the underlying retroposition mechanism, but also contribute to future studies on genetic basis of accessions or species-specific phenotypic evolution. For 10 retrocopies encoded by the Arabidopsis reference genome, three are under functional constraint since the ratio between non-synonymous substitution rate and synonymous substitution rate (Ka/Ks) relative to parental genes is significantly smaller than 0.528 (Supplementary Table S3). These cases warrant further functional studies.

Methods

Full-length descriptions are provided in the Supplementary Information.

Identification and Assembly of RetroCNVs

By improving a previous retroCNV identification strategy2930, we identified retroCNVs in Arabidopsis resequencing data21 by aligning reads against exon-exon junction sequences and inferring the signal of intron loss (Fig. 2). We then extracted reads that mapped to retroCNVs and performed targeted local de novo assembly to obtain information on flanking regions.
Figure 2

The pipeline for the identification and assembly of retroCNVs.

RetroCNV Genotyping in Arabidopsis Accessions

For each of the four retroCNVs, we mapped the assembled longest contig back to the reference genome (TAIR10)3132 and determined the insertion site (Supplementary Fig. S12). We then searched for reads with higher alignment quality to retroCNVs than to parental genes and for reads that spanned the insertion breakpoints. If both types of reads were found, we conservatively classified the retroCNV as present in the corresponding accession. Otherwise, the retroCNV was classified as absent.

LTR/LINE Retrotransposon Inference

To infer the presence of LTR or LINE retrotransposons in the flanking regions of retrocopies, we applied RepeatMasker (http://www.repeatmasker.org) against a customized library that included annotated plant retrotransposons from Repbase3334 and TIGR35, as well as retrotransposons predicted de novo via MGESCan-LTR36 and MGEScan-nonLTR37.

Identification of Newly Evolved Retrocopies in Dicot Reference Genomes

We implemented BLAT38 and aligned the mRNAs derived from genes with at least one intron against the reference genomes of Arabidopsis and cassava (version 4.1)39. We then processed the alignment information and inferred the candidate retrocopy by retaining consecutive hits (BLAT identity higher than 95%), suggesting an intron loss event.

Additional Information

How to cite this article: Zhu, Z. et al. LINE-1-like retrotransposons contribute to RNA-based gene duplication in dicots. Sci. Rep. 6, 24755; doi: 10.1038/srep24755 (2016).
  38 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  Repbase update: a database and an electronic journal of repetitive elements.

Authors:  J Jurka
Journal:  Trends Genet       Date:  2000-09       Impact factor: 11.639

3.  TimeTree: a public knowledge-base of divergence times among organisms.

Authors:  S Blair Hedges; Joel Dudley; Sudhir Kumar
Journal:  Bioinformatics       Date:  2006-10-04       Impact factor: 6.937

4.  Acquisition of an Archaea-like ribonuclease H domain by plant L1 retrotransposons supports modular evolution.

Authors:  Georgy Smyshlyaev; Franka Voigt; Alexander Blinov; Orsolya Barabas; Olga Novikova
Journal:  Proc Natl Acad Sci U S A       Date:  2013-11-25       Impact factor: 11.205

5.  Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila.

Authors:  M Long; C H Langley
Journal:  Science       Date:  1993-04-02       Impact factor: 47.728

Review 6.  L1 retrotransposons and somatic mosaicism in the brain.

Authors:  Sandra R Richardson; Santiago Morell; Geoffrey J Faulkner
Journal:  Annu Rev Genet       Date:  2014-07-14       Impact factor: 16.830

7.  Human LINE retrotransposons generate processed pseudogenes.

Authors:  C Esnault; J Maestre; T Heidmann
Journal:  Nat Genet       Date:  2000-04       Impact factor: 38.330

8.  High rate of chimeric gene origination by retroposition in plant genomes.

Authors:  Wen Wang; Hongkun Zheng; Chuanzhu Fan; Jun Li; Junjie Shi; Zhengqiu Cai; Guojie Zhang; Dongyuan Liu; Jianguo Zhang; Søren Vang; Zhike Lu; Gane Ka-Shu Wong; Manyuan Long; Jun Wang
Journal:  Plant Cell       Date:  2006-07-07       Impact factor: 11.277

Review 9.  RNA-based gene duplication: mechanistic and evolutionary insights.

Authors:  Henrik Kaessmann; Nicolas Vinckenbosch; Manyuan Long
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

10.  Ancient traces of tailless retropseudogenes in therian genomes.

Authors:  Angela Noll; Carsten A Raabe; Gennady Churakov; Jürgen Brosius; Jürgen Schmitz
Journal:  Genome Biol Evol       Date:  2015-02-26       Impact factor: 3.416

View more
  7 in total

1.  LTR-mediated retroposition as a mechanism of RNA-based duplication in metazoans.

Authors:  Shengjun Tan; Margarida Cardoso-Moreira; Wenwen Shi; Dan Zhang; Jiawei Huang; Yanan Mao; Hangxing Jia; Yaqiong Zhang; Chunyan Chen; Yi Shao; Liang Leng; Zhonghua Liu; Xun Huang; Manyuan Long; Yong E Zhang
Journal:  Genome Res       Date:  2016-10-20       Impact factor: 9.043

2.  The Genomic Impact of Gene Retrocopies: What Have We Learned from Comparative Genomics, Population Genomics, and Transcriptomic Analyses?

Authors:  Claudio Casola; Esther Betrán
Journal:  Genome Biol Evol       Date:  2017-06-01       Impact factor: 3.416

Review 3.  Protein-Coding Genes' Retrocopies and Their Functions.

Authors:  Magdalena Regina Kubiak; Izabela Makałowska
Journal:  Viruses       Date:  2017-04-13       Impact factor: 5.048

4.  New reference genome sequences of hot pepper reveal the massive evolution of plant disease-resistance genes by retroduplication.

Authors:  Seungill Kim; Jieun Park; Seon-In Yeom; Yong-Min Kim; Eunyoung Seo; Ki-Tae Kim; Myung-Shin Kim; Je Min Lee; Kyeongchae Cheong; Ho-Sub Shin; Saet-Byul Kim; Koeun Han; Jundae Lee; Minkyu Park; Hyun-Ah Lee; Hye-Young Lee; Youngsill Lee; Soohyun Oh; Joo Hyun Lee; Eunhye Choi; Eunbi Choi; So Eui Lee; Jongbum Jeon; Hyunbin Kim; Gobong Choi; Hyeunjeong Song; JunKi Lee; Sang-Choon Lee; Jin-Kyung Kwon; Hea-Young Lee; Namjin Koo; Yunji Hong; Ryan W Kim; Won-Hee Kang; Jin Hoe Huh; Byoung-Cheorl Kang; Tae-Jin Yang; Yong-Hwan Lee; Jeffrey L Bennetzen; Doil Choi
Journal:  Genome Biol       Date:  2017-11-01       Impact factor: 13.583

5.  The m6A pathway protects the transcriptome integrity by restricting RNA chimera formation in plants.

Authors:  Dominique Pontier; Claire Picart; Moaine El Baidouri; François Roudier; Tao Xu; Sylvie Lahmy; Christel Llauro; Jacinthe Azevedo; Michèle Laudié; Aurore Attina; Christophe Hirtz; Marie-Christine Carpentier; Lisha Shen; Thierry Lagrange
Journal:  Life Sci Alliance       Date:  2019-05-29

6.  Jasmonates and Histone deacetylase 6 activate Arabidopsis genome-wide histone acetylation and methylation during the early acute stress response.

Authors:  Stacey A Vincent; Jong-Myong Kim; Imma Pérez-Salamó; Taiko Kim To; Chieko Torii; Junko Ishida; Maho Tanaka; Takaho A Endo; Prajwal Bhat; Paul F Devlin; Motoaki Seki; Alessandra Devoto
Journal:  BMC Biol       Date:  2022-04-11       Impact factor: 7.431

7.  Rapid Genome Evolution and Adaptation of Thlaspi arvense Mediated by Recurrent RNA-Based and Tandem Gene Duplications.

Authors:  Yanting Hu; Xiaopei Wu; Guihua Jin; Junchu Peng; Rong Leng; Ling Li; Daping Gui; Chuanzhu Fan; Chengjun Zhang
Journal:  Front Plant Sci       Date:  2022-01-04       Impact factor: 5.753

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.