Literature DB >> 29850801

Cross-Kingdom Commonality of a Novel Insertion Signature of RTE-Related Short Retroposons.

Eri Nishiyama1, Kazuhiko Ohshima1.   

Abstract

In multicellular organisms, such as vertebrates and flowering plants, horizontal transfer (HT) of genetic information is thought to be a rare event. However, recent findings unveiled unexpectedly frequent HT of RTE-clade LINEs. To elucidate the molecular footprints of the genomic integration machinery of RTE-related retroposons, the sequence patterns surrounding the insertion sites of plant Au-like SINE families were analyzed in the genomes of a wide variety of flowering plants. A novel and remarkable finding regarding target site duplications (TSDs) for SINEs was they start with thymine approximately one helical pitch (ten nucleotides) downstream of a thymine stretch. This TSD pattern was found in RTE-clade LINEs, which share the 3'-end sequence of these SINEs, in the genome of leguminous plants. These results demonstrably show that Au-like SINEs were mobilized by the enzymatic machinery of RTE-clade LINEs. Further, we discovered the same TSD pattern in animal SINEs from lizard and mammals, in which the RTE-clade LINEs sharing the 3'-end sequence with these animal SINEs showed a distinct TSD pattern. Moreover, a significant correlation was observed between the first nucleotide of TSDs and microsatellite-like sequences found at the 3'-ends of SINEs and LINEs. We propose that RTE-encoded protein could preferentially bind to a DNA region that contains a thymine stretch to cleave a phosphodiester bond downstream of the stretch. Further, determination of cleavage sites and/or efficiency of primer sites for reverse transcription may depend on microsatellite-like repeats in the RNA template. Such a unique mechanism may have enabled retroposons to successfully expand in frontier genomes after HT.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29850801      PMCID: PMC6007223          DOI: 10.1093/gbe/evy098

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Eukaryotic genomes contain an extraordinary number of retroposons such as long terminal repeat (LTR) retrotransposons, long interspersed repetitive elements (LINEs) or non-LTR retrotransposons, and short interspersed repetitive elements (SINEs) (Weiner et al. 1986; Brosius 1991; Kazazian 2004; Jurka et al. 2005; Bennetzen and Wang 2014). Because of the insertion mechanism of LINEs: target DNA-primed reverse transcription (TPRT) (Luan et al. 1993; Cost et al. 2002; Eickbush and Eickbush 2015), DNA cleavage specificity of endonuclease (EN) domain primarily determines the site of LINE insertion (Luan et al. 1993; Feng et al. 1996; Maita et al. 2007). Apurinic/apyrimidinic EN (APE)-like ENs are encoded by over 20 clades of LINEs that insert at many different loci within their host genome, some of which have shown weak target site preferences (Szak et al. 2002; Zingler et al. 2005; Bringaud et al. 2006); although only two clades, Tx1 and R1, contain site-specific LINEs (Fujiwara 2015; Nichuguti et al. 2016). Integration at a specific site also depends on other factors, such as the structural parameters of the target DNA and interactions between the mRNA and the target DNA (Cost and Boeke 1998; Repanas et al. 2007; Monot et al. 2013; Fujiwara 2015). Human L1 preferentially inserts at 5′-TT|AAAA-3′, where “|” indicates the site of insertion (Szak et al. 2002; Morrish et al. 2002, 2007), and its EN cleaves the TpA bond in 5′-TTTTAA-3′ on the complementary strand (Feng et al. 1996; Cost and Boeke 1998). TPRT usually results in the duplication of a short stretch of nucleotides (mostly no >20 bp) resulting from integration at staggered chromosomal breaks. Thus, each newly inserted element is typically flanked by short direct repeats, which are also known as a target site duplication (TSD) (Beck et al. 2011). To date, the analysis of TSDs from LINEs is largely confined to mammalian L1s. Using target analysis of nested transposons for genomic copies, Ichiyanagi and Okada (2008) studied TSDs for a variety of vertebrate LINEs, including those of the L1, L2, CR1, and RTE clades in mammalian, chicken, and zebrafish genomes. SINEs are nonautonomous retroposons, the 5′-end sequences of which are derived from tRNA, 5S rRNA, or 7SL RNA with promoter activity for RNA polymerase III (Okada 1991; Batzer and Deininger 2002; Kapitonov and Jurka 2003; Ohshima 2013; Vassetzky and Kramerov 2013; Ahl et al. 2015). Mammalian L1s mobilize nonautonomous sequences such as SINE RNA and cytosolic mRNA by recognizing the 3′-poly(A) tail of the template RNA (Doucet et al. 2015), resulting in enormous SINE amplification and processed pseudogene formation. The 3′-end sequences of various SINEs originated from corresponding LINEs other than L1 (Ohshima et al. 1996), however, and to date, ∼60 of these SINE/LINE pairs have been identified (Ohshima 2012; Vassetzky and Kramerov 2013). As the 3′-UTRs of several LINEs have been shown to be essential for retroposition, these LINEs presumably require stringent recognition of the 3′-end sequence of the RNA template (Okada et al. 1997; Kajikawa and Okada 2002; Eickbush and Eickbush 2012; Hayashi et al. 2014). The analyses of TSDs from SINEs have provided valuable clues to the enzymatic source for SINE retroposition (Jurka 1997; Lenoir et al. 2001; Wenke et al. 2011; Noll et al. 2015; Schwichtenberg et al. 2016). AfroSINEs (Nikaido et al. 2003) are a SINE family in the genomes of afrotherians, which are African endemic mammals, proposed to be derived from and have been mobilized by RTE-clade LINE (Bov-B) because these two elements share a highly similar sequence (Gogolevsky et al. 2008). Because AfroSINEs and known elephant RTE-clade LINE are not terminated by the same tandem repeat motifs, Gilbert et al. (2008) proposed that these differences reflect constraints imposed by base pairing interactions between the mRNA 3′ terminal tandem repeats and the target DNA at the initiation of TPRT. Plant genomes harbor a wide variety of SINE families (Mochizuki et al. 1992; Yoshioka et al. 1993; Deragon et al. 1994; Yasui et al. 2001; Xu et al. 2005; Deragon and Zhang 2006; Cognat et al. 2008; Tsuchimoto et al. 2008; Baucom et al. 2009; Gadzalski and Sakowicz 2011; Wenke et al. 2011; Schwichtenberg et al. 2016). Only three SINE/LINE pairs have been discovered: namely, maize ZmSINE2 and ZmSINE3 (LINE1-1_ZM: Baucom et al. 2009) and tobacco TS SINE (SolRTE-I_Nt: Wenke et al. 2011; RTE-1_STu: Ohshima 2012). High similarity of the Au SINE family between distantly related plant species has been reported (Fawcett et al. 2006). Although their phylogenetic distribution was patchy, Fawcett and Innan (2016) identified several copies present in the orthologous regions of various species, including species that diverged 90 Ma, thereby confirming the presence of Au SINE at multiple evolutionary time points. Therefore, the Au SINE appears to have been present in the common ancestor of all angiosperms being retained in some lineages while lost from others. In multicellular organisms, such as vertebrates and flowering plants, horizontal transfer (HT) of genetic information is thought to be a rare event (Kidwell 1993). However, the number of well-supported cases of transfer from eukaryotes is now expanding rapidly (Bock 2010; Schaack et al. 2010; Wallau et al. 2012; Ivancevic et al. 2013; Fuentes et al. 2014; Peccoud et al. 2017). Recently, unexpectedly frequent HT of RTE-clade LINEs was reported. Walsh et al. (2013) showed that HT of Bov-B LINEs (Kordiš and Gubenšek 1998; Malik and Eickbush 1998; Župunski et al. 2001) was significantly more widespread than believed, and they demonstrated the existence of two plausible arthropod vectors, specifically reptile ticks. Their analysis indicated that at least nine HT events are required to explain the observed topology. Suh et al. (2016) showed that the genomes of nematodes and seven tropical bird lineages exclusively share a novel LINE, AviRTE, which resulted from HT. The HTs between bird and nematode genomes were estimated to have taken place 25–22 and 20–17 Ma. In the present study, to elucidate the molecular footprints of the genomic integration machinery of RTE-related retroposons, the sequence patterns surrounding insertion sites of plant Au-like SINE families were analyzed in the genomes of a wide variety of flowering plants. There was a remarkable tendency of TSDs in SINEs, and moreover, the same TSD pattern was also found in plant RTE-clade LINEs and even in animal SINEs. Based on these observations, a model for the initial process of genomic integration of these retroposons is proposed, and the relationship between rampant HTs of RTE-clade LINEs and the mechanism is discussed.

Materials and Methods

Genomic Sequences

Plant genome sequences were obtained from Ensembl Plants (Bolser et al. 2017) and the Genome Database for Rosaceae (Jung et al. 2014). Animal genome sequences were obtained from Ensembl (Aken et al. 2017). supplementary table S1, Supplementary Material online shows a list.

Construction of Consensus Sequences

The consensus sequences (CONS) for 1) the RTE from common wheat (Triticum aestivum; TAe) and SINEs from 2) barrel clover (Medicago truncatula; MT), 3) purple false brome (Brachypodium distachyon; BDi), and 4) sorghum (Sorghum bicolor; SBi) were constructed from BLAST searches (Altschul et al. 1990) using an E-value of 5E-10. 1) BLAST against the common wheat genome using RTE-1_TD from durum wheat (Triticum durum) as the query resulted in ca. 6,000 hits, of which 30 randomly chosen sequences over 3,000 bases in length were used to construct the CONS (supplementary fig. S9, Supplementary Material online). 2) BLAST against the barrel clover genome using SINE2-1_TAe from common wheat as the query resulted in six hits, and the CONS from these sequences detected 374 sequences. Thirty randomly chosen sequences and the initial six sequences were used to derive the final CONS (supplementary fig. S10, Supplementary Material online). 3) BLAST against the purple false brome genome using Au SINE from Aegilops umbellulata as the query resulted in 24 hits. CONS from these sequences detected 43 sequences from which the final CONS was generated (supplementary fig. S11, Supplementary Material online). 4) BLAST against sorghum genome using SINE2-1_ZM from maize as the query resulted in 25 hits. CONS from 16 sequences with high scores detected 26 higher-quality sequences that were used in the final CONS (supplementary fig. S12, Supplementary Material online). Regarding the soybean Au-like SINE, the sequence reported by Shu et al. (2011) (GmAu1) was used as the consensus sequence. The sequence of the Sauria SINE of green anole (clone ACA-1-15; GenBank: FJ158974) was obtained from Piskurek et al. (2009). The sequence of an Oryzias RTE of medaka fish (clone OlRTE-a03; GenBank: AB021490) was obtained from Župunski et al. (2001), and the sequence of a lizard RTE of green anole (clone AcRTE-a01; GenBank: AAWZ01014759) was obtained from Tay et al. (2010). All remaining sequences were obtained from Repbase (Jurka et al. 2005; Bao et al. 2015).

Search for TSDs

Using the CONS as queries, a series of BLAST searches were performed against the respective genomes with an E-value of 5E-10 used in all cases. Detected sequences plus 200 bases of their 5′ and 3′ flanking sequence were extracted from genomic sequences. Within these sequences, we searched for TSDs with a Python script using the following criteria: 1) TSD length is between 10 and 49 bases inclusive, 2) the 5′ and 3′ TSD sequences are perfectly matched, and 3) the 5′ and 3′ TSD sequences are separated by at least 100 bases. The copy numbers of LINEs and SINEs, and the number of TSDs detected are shown in table 1 for the respective species. It is possible that they are subsets of the copies (young family members) since we used a stringent parameter for BLAST search (for potato Au-like SINEs, see Wenke et al. 2011 and Seibt et al. 2016).
Table 1

Copy Numbers of LINEs and SINEs and the Number of Analyzed TSDs

RTE-Clade LINEs
RTE-Related SINEs
Family# of CopiesTSDFamily# of CopiesTSD
Glycine maxRTE-1_GM1,120813GmAu11,4511,044
Medicago truncatulaRTE1_MT667305MT_AUlikeSINE_cons374224
Malus domesticaRTE-1_Mad856 (21,691)a423SINE-5_Mad147 (2,025)a97
RTE-1B_Mad714 (9,890)a304
Solanum tuberosumRTE-1_STu743315SINE2-2_STu6224
RTE-2_STu7024
Brachypodium distachyonRTE-1_BDi6023BDi_consensus_244327
Triticum aestivumTAe_RTE_cons6,2222,486SINE2-1_TAe2,3081,062
Sorghum bicolorRTE-1_SBi9530SBi_AU_cons2612
Zea maysRTE1_ZM996518RST_ZmSINE1268180
RTE2_ZM596416RST_AU166
    SINE2-1_ZM20085
Equus caballusRTE-1_EC606340SINE2-1_EC4,7121,613
Bos taurusBov-B359,044218,458BOVTA362,502201,054
Loxodonta africanaRTE1_LA193,947124,680AFROSINE-1_LA6,877 (9,862)b2,075
AFROSINE-2_LA10,3152,983
AFROSINE135,16854,407
AFROSINE1B14,9216,166
AFROSINE26,353 (34,868)c2,042
AFROSINE319,6865,185
Procavia capensisRTE1_Pca297 (1160)a188PSINE116466
SINE2-1_Pca14126
Echinops telfairiRTE1_ET280 (950)a187
Ornithorhynchus anatinusPlat_RTE136978
Anolis carolinensisRTE_BOV_B_AC_115,6257,122Sauria SINE78,44233,597
RTE-1_AC_110,4505,671
AcRTE-a012611
Oryzias latipesRTE-1_OL3,6501,229
RTE-2_OL2,839811
RTE-3_OL449187
OlRTE-a032,753974
Takifugu rubripesExpander34593
EXPANDER220963
Caenorhabditis elegansRTE-15330   

The number of copies analyzed with the total number of copies shown in parentheses.

The number of copies following exclusion of those with hits to AFROSINE-1_LA and AFROSINE-2_LA. The total number of hits is shown in parentheses.

The number of copies following exclusion of those with hits to AFROSINE2 and AFROSINE or AFROSINE1B. The total number of hits is shown in parentheses.

Copy Numbers of LINEs and SINEs and the Number of Analyzed TSDs The number of copies analyzed with the total number of copies shown in parentheses. The number of copies following exclusion of those with hits to AFROSINE-1_LA and AFROSINE-2_LA. The total number of hits is shown in parentheses. The number of copies following exclusion of those with hits to AFROSINE2 and AFROSINE or AFROSINE1B. The total number of hits is shown in parentheses.

Analysis of Nucleotide Compositions and Motif Discovery

The 5′ TSD sequences with their flanking sequences from respective copies of SINE and LINE families were extracted from the genomic sequences of the corresponding species. The nucleotide composition of each family was plotted on a chart for every nucleotide position. To test whether there was a biased composition between two consecutive nucleotides, the χ2 test was performed according to Jurka (1997) (supplementary fig. S1, Supplementary Material online; 15 degrees of freedom, significant level of 0.005). The nucleotide composition was also represented graphically by WebLogo (Crooks et al. 2004) (supplementary fig. S2, Supplementary Material online). The MEME motif discovery algorism (Bailey and Elkan 1994) was applied to the TSD data sets. The MEME suite 4.11.2 (Bailey et al. 2015) was used with the following parameters by ‘Terminal client’: minimum motif width, 15; maximum motif width, 30; minimum sites per motif, N (number of analyzed TSDs) × 0.25; maximum sites per motif, N. The most statistically significant (low E-value) motifs were used for further analyses (supplementary table S2, Supplementary Material online).

Estimating the Occurrences of a Specific Trinucleotide near the 3′-Ends of Each Copy

To estimate the association of each copy with microsatellite-like sequence at the 3′-ends, the occurrences of a specific trinucleotide near the 3′-ends of each copy were examined. Ten bases of 3′-ends of BLAST-detected sequences plus ten bases of their 3′ flanking sequences were extracted from genomic sequences. Within these sequences, a specific trinucleotide was searched for with a Python script. The results are summarized in supplementary table S3, Supplementary Material online.

3D Model of RTE EN

The 3D structure of the EN domain from the LINEs with indiscriminate integration sites was previously determined for only human L1. Using human L1-EN (Protein Data Bank ID: 1vyb) as a template, 3D models of soybean RTE-EN were constructed with MODELLER (Fiser and Šali 2003) in Chimera (Pettersen et al. 2004). Of the five models generated, the model with the highest scores (GA341 = 1.00, zDOPE = −0.28) was selected for further analyses.

Results

Plant Au-like SINEs and RTE-Clade LINEs Share 3′-Terminal Sequences

We analyzed the characteristics of Au-like SINE sequences from various angiosperms identified based on sequence similarity to known Au SINEs. Figure 1 shows sequence comparisons of the full-length Au-like SINEs and the 3′-terminal sequence of a potato RTE (RTE-1_STu). Nucleotide sequences of the 3′-terminal region of the RTE (positions 3991–4069; supplementary figs. S6–S8, Supplementary Material online) and Au-like SINEs (positions 69–144) were very similar (pairwise distances: 0.135–0.362), a finding which suggests this region is essential for retroposition. Nucleotide positions 127–144 of the SINEs and the corresponding region of the RTE-clade LINEs were predicted to form a hairpin-like RNA secondary structure, which was conserved with several compensatory mutations (fig. 2). Since the RNA secondary structures of the 3′-terminal region from several LINEs are essential to initiate reverse transcription, it is highly plausible that Au-like SINEs have retrotransposed with the RTE-clade LINE machinery.
. 1.

—Sequence comparisons of Au-like SINEs and the 3′-terminal sequence of an RTE. The entire sequence of Au-like SINEs and the 3′-terminal sequence (∼160 nucleotides) of a potato RTE-clade LINE (RTE-1_STu) (light blue) are aligned. Dots and hyphens represent identical nucleotides to the consensus sequence (shown at top) and gaps, respectively. Nucleotide positions of the SINEs and the LINE are shown on the top and bottom, respectively. The two internal promoters for RNA polymerase III (box A: positions 13–24; box B: 57–67) are shown in open boxes with the consensus sequences. Nucleotide positions (127–144) predicted to form a hairpin-like RNA secondary structure are shown in the grey box.

. 2.

—Secondary structure models for the 3′-terminal sequences of Au-like SINEs and RTE-clade LINEs. Transcripts from this region may form putative hairpin structures. Compensatory mutations, (A: T) ↔ (G: C) or (C: G) ↔ (A: T), are shown by pink and blue rectangles, respectively.

—Sequence comparisons of Au-like SINEs and the 3′-terminal sequence of an RTE. The entire sequence of Au-like SINEs and the 3′-terminal sequence (∼160 nucleotides) of a potato RTE-clade LINE (RTE-1_STu) (light blue) are aligned. Dots and hyphens represent identical nucleotides to the consensus sequence (shown at top) and gaps, respectively. Nucleotide positions of the SINEs and the LINE are shown on the top and bottom, respectively. The two internal promoters for RNA polymerase III (box A: positions 13–24; box B: 57–67) are shown in open boxes with the consensus sequences. Nucleotide positions (127–144) predicted to form a hairpin-like RNA secondary structure are shown in the grey box. —Secondary structure models for the 3′-terminal sequences of Au-like SINEs and RTE-clade LINEs. Transcripts from this region may form putative hairpin structures. Compensatory mutations, (A: T) ↔ (G: C) or (C: G) ↔ (A: T), are shown by pink and blue rectangles, respectively.

A Novel Insertion Signature of Plant RTE-Related Retroposons

We conducted TSD analyses for Au-like SINEs and RTE-clade LINEs from different flowering plants and found a novel insertion signature that is specific to these retroposons. Figure 3 shows the nucleotide composition of the genomic sequences surrounding the first nucleotide (P1) of the 5′ TSD of Au-like SINEs (left) and RTE-clade LINEs (right) from soybean (upper) and Medicago (lower), respectively. The P1 was frequently thymine (T) for both Au-like SINEs and RTE-clade LINEs, and moreover, we observed a prominent excess of T, often a stretch of ∼5 Ts, near P−10 (refer to supplementary fig. S2, Supplementary Material online for sequence logos). Such a feature at a remote position has not been reported for L1-clade LINEs. Figure 3 shows the nucleotide motifs found by the MEME motif discovery algorism in the same soybean data sets. Consistently, remarkable motifs which consist of a stretch of T and single T were found in both data sets from Au-like SINE (upper) and RTE-clade LINE (lower) (for statistical information, see supplementary table S2, Supplementary Material online). The same profile was also found in Au-like SINEs from other flowering plants, such as wheat, corn, and apples (supplementary fig. S1, Supplementary Material online). These results indicate that Au-like SINEs were amplified via reverse transcription with a unique machinery of RTE-clade LINEs.
. 3.

—Nucleotide composition and motifs surrounding the first nucleotide of 5′ TSDs from plant retroposons. (A) Nucleotide composition. Thirty nucleotide positions are shown with the first nucleotide of the 5′ TSD at the center (position 1: P1). Nucleotide compositions at respective positions are represented graphically: T (red), A (blue), G (green), and C (purple). Au-like SINEs (left) and RTE-clade LINEs (right) are shown from soybean (upper: n = 1,044; 813, respectively) and Medicago (lower: n = 224; 305). Note that P1 is frequently T and a prominent excess of T is found at approximately P−10. The same profile is also found in other plants (supplementary fig. S1, Supplementary Material online). (B) Discovered motifs for soybean SINE and LINE. The MEME motif discovery algorism, which uses a finite mixture model, was applied to the same data set as (A) (supplementary table S2, Supplementary Material online). Au-like SINE (upper) and RTE-clade LINE (lower) from soybean are shown.

—Nucleotide composition and motifs surrounding the first nucleotide of 5′ TSDs from plant retroposons. (A) Nucleotide composition. Thirty nucleotide positions are shown with the first nucleotide of the 5′ TSD at the center (position 1: P1). Nucleotide compositions at respective positions are represented graphically: T (red), A (blue), G (green), and C (purple). Au-like SINEs (left) and RTE-clade LINEs (right) are shown from soybean (upper: n = 1,044; 813, respectively) and Medicago (lower: n = 224; 305). Note that P1 is frequently T and a prominent excess of T is found at approximately P−10. The same profile is also found in other plants (supplementary fig. S1, Supplementary Material online). (B) Discovered motifs for soybean SINE and LINE. The MEME motif discovery algorism, which uses a finite mixture model, was applied to the same data set as (A) (supplementary table S2, Supplementary Material online). Au-like SINE (upper) and RTE-clade LINE (lower) from soybean are shown.

Characteristics of the EN Domain of Plant RTE-Clade LINEs

To understand the molecular basis of the unique TSD pattern of plant RTE-clade LINEs, we investigated characteristics of the EN domain of plant RTE-clade LINEs. Figure 4 shows comparisons of essential amino acid residues for EN activity (Weichenrieder et al. 2004) between RTE-clade LINEs and other LINEs. These amino acid residues are highly conserved among plant RTE-clade LINEs and other LINEs. Interestingly, residue 229 of plant RTEs was substituted to glutamine, whereas the residue at this position is aspartic acid in every other LINE including animal RTEs (fig. 4). Since this amino acid residue does not participate in coordinating magnesium ions (Beernink et al. 2001; Weichenrieder et al. 2004), we posit that this D229Q substitution does not dramatically decrease endonucleolytic activity, although it is located adjacent to the active center of the EN. Figure 4 shows the amino acid sequences of the betaB6–betaB5 hairpin loop region of EN from animal and plant LINEs. Amino acid substitutions at positions shown in red either alters the cleavage pattern such as at R1Bm (Maita et al. 2007) or decreases nicking activity as demonstrated in TRAS1 (Maita et al. 2004) and L1 (Repanas et al. 2007). For the L1-EN, it is suggested that the conformational flexibility of the beta-hairpin loop probing the DNA minor groove may be much more important than its sequence (Repanas et al. 2007). The beta-hairpin loop of plant RTEs are two amino acids (residues 196–197) shorter than that of other LINEs (fig. 4). Figure 5 shows the predicted three-dimensional (3D) structure of EN from soybean RTE (RTE-1_GM). Consistently, the beta-hairpin loop of soybean RTE (fig. 5 right, shown in cyan) is smaller than that found in L1 (fig. 5 left, shown in light brown). This region is predicted to overhang the minor groove of the DNA when the EN is in contact. Therefore, it is plausible that a change in the length of the beta-hairpin loop in conjunction with the D229Q substitution could impact the specificity of plant RTEs to cleave DNA.
. 4.

—Comparisons of critical amino acids for the APE-like EN of LINEs. (A) Comparisons of essential amino acids for LINE EN activity. Essential amino acid residues for EN activity (Weichenrieder et al. 2004) are compared between RTE-clade LINEs and other LINEs. Among highly conserved residues, residue 229 (highlighted in black) is substituted only in plant RTEs. (B) Amino acid sequences of the EN beta hairpin loop, which probes the DNA minor groove. Amino acid substitutions proposed to either alter cleavage pattern (R1Bm) or decrease nicking activity (TRAS1 and L1) are shown in red. Plant RTEs are two amino acids shorter compared with other LINEs.

. 5.

—Comparison of the 3D structure of EN domains from soybean RTE and human L1. Space-filling representation of a 3D model of soybean RTE-EN constructed using human L1-EN as template. The beta-hairpin loop of soybean RTE (cyan; right) and L1 (light brown; left) is represented in purple. The catalytic core and D229Q substitution are denoted in red and yellow, respectively. The lower images show left side views of the upper images. For reference, the DNA cleavage strand would be positioned vertically with the 5′-end at the top and the 3′-end at the bottom. Ribbon representation is available in supplementary fig. S5, Supplementary Material online.

—Comparisons of critical amino acids for the APE-like EN of LINEs. (A) Comparisons of essential amino acids for LINE EN activity. Essential amino acid residues for EN activity (Weichenrieder et al. 2004) are compared between RTE-clade LINEs and other LINEs. Among highly conserved residues, residue 229 (highlighted in black) is substituted only in plant RTEs. (B) Amino acid sequences of the EN beta hairpin loop, which probes the DNA minor groove. Amino acid substitutions proposed to either alter cleavage pattern (R1Bm) or decrease nicking activity (TRAS1 and L1) are shown in red. Plant RTEs are two amino acids shorter compared with other LINEs. —Comparison of the 3D structure of EN domains from soybean RTE and human L1. Space-filling representation of a 3D model of soybean RTE-EN constructed using human L1-EN as template. The beta-hairpin loop of soybean RTE (cyan; right) and L1 (light brown; left) is represented in purple. The catalytic core and D229Q substitution are denoted in red and yellow, respectively. The lower images show left side views of the upper images. For reference, the DNA cleavage strand would be positioned vertically with the 5′-end at the top and the 3′-end at the bottom. Ribbon representation is available in supplementary fig. S5, Supplementary Material online.

Identical Insertion Signature from Plant Retroposons Found in Several Animal RTE-Related SINEs

Different kinds of SINE families share 3′-terminal sequences with various RTE-clade LINEs in the genome of vertebrates (supplementary fig. S3, Supplementary Material online). Our analyses of animal SINEs with RTE-related 3′-tails revealed that the identical TSD pattern found in plants, which starts with T approximately ten nucleotides downstream of a stretch of Ts, was also found in animal SINEs from lizard and mammals (fig. 6 and supplementary table S2, Supplementary Material online). Analysis of green anole and elephant demonstrably showed an excess of T at P1, with a stretch of ∼3 Ts at approximately P−10. Intriguingly, a horse SINE showed an excess of adenine (A) at P1 (T at P−1) with a stretch of ∼3 Ts at approximately P−10 (fig. 6). In contrast, RTE-clade LINEs sharing 3′-end sequences with animal SINEs start with A (P1) in many cases (fig. 6 and table 2). For example, an RTE-clade LINE of green anole had an excess of A at P1 with a slight excess of T at approximately P−10.
. 6.

—Nucleotide composition surrounding the first nucleotide of 5′ TSDs from animal retroposons and comparisons of the discovered SINE motifs between animals and plants. (A) Thirty nucleotide positions are shown with the first nucleotide of the 5′ TSD at the center (position 1: P1). Animal SINEs with an RTE-related 3′-tail (left) and RTE-clade LINEs sharing a 3′-end sequence with animal SINEs (right) from green anole (top: n = 33,597; 7,122, respectively), elephant (middle: n = 13,097; 124,680), and horse (bottom: n = 1,613; 340). The identical TSD pattern in plants, where P1 is frequently T and a prominent excess of Ts are located at approximately P−10, is also found in lizard and elephant SINEs. Note that RTE-clade LINEs start with adenine. Nucleotide compositions at the respective positions are graphically represented: T (red), A (blue), G (green), and C (purple). (B) Comparisons of the discovered SINE motifs between animals and plants. MEME was applied to the animal and plant data sets (supplementary table S2, Supplementary Material online). Plant Au-like SINEs (soybean and Medicago) and animal RTE-related SINEs (green anole and horse) are shown.

Table 2

Correlation of 3′-Microsatellite-Like Sequences and the First Nucleotide of TSDs

 NameSpecies3′ RepeatTSD

RTE
RTE-1_GMSoybean(GTT)nT
RTE1_MTMedicago(GTT)nT
RTE-1_MadApple(GTT)n(A)
RTE-1B_MadApple(GTT)n(A)
RTE-1_STuPotato(GTT)nT
TAe_RTE_consCommon wheat(GTT)n(T/G)
RTE-1_SBiSorghum(GTT)nT
RTE1_ZMMaize(GATGTT)n(G)
RTE2_ZMMaize(GTT)n(G)
RTE-1_ECHorse(CAA)nA
BovBCow(CTGAA)nA
RTE1_LAElephant(CAA)nA
RTE1_PcaHyrax(CAA)nA
Plat_RTE1Platypus(TA)nA
RTE_BOV_B_AC_1Green anole(CGA)nA
RTE-1_AC_1Green anole(GTAA)nA
RTE-1_OLMedaka(ATGG)n(G)

RTE-3_OL

Medaka

(TAG)n

(A/T)
SINEGmAu1SoybeanTTTTTT
MT_AUlikeSINE_consMedicagoTTTT
SINE-5_MadAppleTTTT
SINE2-2_STuPotatoTTTTTT
BDi_consensus_24Purple false bromeT-richT
SINE2-1_TAeCommon wheatTTTT
RST_ZmSINE1MaizeTTTT
SINE2-1_ZMMaizeTTTT
SINE2-1_ECHorse(CAA)nA
BOVTACow(CA)n(A)
AFROSINE-2_LAElephant(CAA)nA
AFROSINE2Elephant(CAA)n(T/A)
AFROSINEElephant(GGTTT)nT
AFROSINE3Elephant(GGTTTT)nT
AFROSINE-1_LAElephant(GGTTTT)n(T/A)
AFROSINE1BElephantT-richT
Sauria SINEGreen anole(ACCTTT)nT

Microsatellite-like sequence at 3′-ends of SINEs and LINEs consist of a stretch of T or A plus other nucleotides. The first nucleotide of TSDs and the repeated nucleotide within the microsatellite-like sequence are consistent in many cases. In the cases where the first nucleotide of TSDs is not obvious, the nucleotides are in parentheses.

Correlation of 3′-Microsatellite-Like Sequences and the First Nucleotide of TSDs Microsatellite-like sequence at 3′-ends of SINEs and LINEs consist of a stretch of T or A plus other nucleotides. The first nucleotide of TSDs and the repeated nucleotide within the microsatellite-like sequence are consistent in many cases. In the cases where the first nucleotide of TSDs is not obvious, the nucleotides are in parentheses. —Nucleotide composition surrounding the first nucleotide of 5′ TSDs from animal retroposons and comparisons of the discovered SINE motifs between animals and plants. (A) Thirty nucleotide positions are shown with the first nucleotide of the 5′ TSD at the center (position 1: P1). Animal SINEs with an RTE-related 3′-tail (left) and RTE-clade LINEs sharing a 3′-end sequence with animal SINEs (right) from green anole (top: n = 33,597; 7,122, respectively), elephant (middle: n = 13,097; 124,680), and horse (bottom: n = 1,613; 340). The identical TSD pattern in plants, where P1 is frequently T and a prominent excess of Ts are located at approximately P−10, is also found in lizard and elephant SINEs. Note that RTE-clade LINEs start with adenine. Nucleotide compositions at the respective positions are graphically represented: T (red), A (blue), G (green), and C (purple). (B) Comparisons of the discovered SINE motifs between animals and plants. MEME was applied to the animal and plant data sets (supplementary table S2, Supplementary Material online). Plant Au-like SINEs (soybean and Medicago) and animal RTE-related SINEs (green anole and horse) are shown. The TSD lengths of given LINEs fall within clade-specific ranges regardless of their hosts (Ichiyanagi and Okada 2008). The majority of the TSDs for mammals and zebrafish L1-clade LINEs were 7–18 bp in length with 13–15 bp being the most abundant, whereas the majority of RTE-clade LINEs were 7–15 bp with 10–12 bp being the most abundant (Ichiyanagi and Okada 2008). We discovered that the majority of the TSDs for animal retroposons analyzed in this study were not >13 bp in length for both RTEs and SINEs (supplementary fig. S4, Supplementary Material online), and this finding further supports the possibility that in combination with common 3′-end sequences (supplementary fig. S3, Supplementary Material online), these SINEs are dependent on the RTE-clade LINEs for their retroposition. The TSD pattern for animal retroposons (fig. 6) indicates that RTE-clade LINEs and the related SINEs show distinct TSD patterns in some cases.

Global Correlation of 3′-Microsatellite-like Sequences and TSD Profile in Plant and Animal Retroposons

The 3′-end sequences of LINEs and SINEs often terminate in microsatellite-like sequences, such as (GTT)n, (CAA)n, (AT)n, and (A)n. During the course of our TSD analysis, we observed an inconsistent tendency between plants and animals as well as RTEs and SINEs. Our analysis of the relationship between microsatellite-like sequences at the 3′-end and the first nucleotide of the TSD revealed several interesting correlations (table 2 and supplementary table S3, Supplementary Material online). Plant RTE-clade LINEs end in (GTT)n, and the first nucleotide of their TSD is often T. Au-like SINEs, which share a specific nucleotide sequence of the 3′-terminal region with plant RTE-clade LINEs, end in a stretch of Ts and the first nucleotide of the TSD is definitively T. Animal RTE-clade LINEs often end in a microsatellite-like sequence with a repeated A such as (CAA)n and the first nucleotide of their TSD is frequently A. Animal SINEs, which share a specific nucleotide sequence of the 3′-terminal region with animal RTE-clade LINEs, were two types: one that ends in (CAA)n and has A as the first nucleotide of its TSD, and the other that ends in T-rich repeats and has T as the first nucleotide of its TSD. Interestingly, these two types of SINEs coexist in the elephant genome (table 2 and supplementary table S3, Supplementary Material online; Gilbert et al. 2008; Bao et al. 2015). These results demonstrate that microsatellite-like terminal sequences were critically involved in determining the insertion sites of RTE-related retroposons (see Discussion).

Discussion

Genomic Integration Machinery of RTE-Related Retroposons

In this study, we found a remarkable consistency of the TSDs for plant Au-like SINEs to start with a T approximately ten nucleotides downstream of a stretch of Ts. The same TSD pattern was also found in RTE-clade LINEs, which share 3′-end sequences with Au-like SINEs, in the genome of leguminous plants. Further, animal SINEs from lizard and mammals with the RTE-related 3′-tail have the same TSD pattern, which was originally discovered in plants. Such a split signature for insertion has never been previously reported for L1-clade LINEs. Moreover, a significant correlation was observed between the first nucleotide of TSDs and the microsatellite-like sequence at the 3′-ends of SINEs and LINEs. To explain these results comprehensively, we propose the following model (fig. 7). At the beginning of reverse transcription, the RTE protein binds to the DNA region containing a stretch of Ts upstream of the cleavage site, and cuts a phosphodiester bond at the site approximately one helical pitch downstream of the stretch of Ts. Microsatellite-like sequences such as (GGUUUU)n in the 3′-end of the template RNA for reverse transcription may influence selection of the cleavage site of the RTE EN on the first DNA cleavage strand (e.g., A on the complementary strand of T). Regarding SINEs, for nonautonomous retroposons from animal genomes, green anole and elephant SINEs tend to be cleaved at T, whereas horse and some elephant SINEs tend to be cleaved at A (fig. 6table 2 and supplementary table S3, Supplementary Material online). The observation that these elephant SINEs are largely identical with the exception of microsatellite-like sequences like (GGTTTT)n or (CAA)n suggests that the RTE-clade LINE in the elephant genome generated distinct TSD patterns depending on the different microsatellite-like sequences (Gilbert et al. 2008). Microsatellite-like sequence at the 3′-ends of animal SINEs and LINEs consist of a stretch of Ts or As plus other nucleotides. The concordance of the first nucleotide of TSDs and the repeated nucleotide within the microsatellite-like sequence indicates that the repeated nucleotide at the 3′-ends of template RNA increases the opportunity of the RTE protein to cleave the DNA strand complementary to the repeated nucleotide (Zingler et al. 2005; Jinek et al. 2012). Alternatively, the microsatellite-like sequences could facilitate the initiation of reverse transcription through base-pairing. The 3′-terminal sequence of mammalian L1s (several bp in length) and that of the CR1, L2, and RTE clades of LINEs (one to several bp) overlaps with the 5′-end of the target sequence (Ostertag and Kazazian 2001; Ichiyanagi and Okada 2008). The overlaps between the LINE and target sequences at the 3′ junctions of retrotransposed copies are proposed to be generated by retrotransposition reactions in which the LINE RNA becomes base paired with the EN-cleaved strand of the target duplex DNA to facilitate the initiation of reverse transcription (Ostertag and Kazazian 2001; Ichiyanagi et al. 2007). Base pairing between the target DNA and the 3′-end of the mRNA may either be required for or at least facilitate the initiation of TPRT for I factor, R1Bm, and R2Ol (Chaboissier et al. 2000; Anzai et al. 2005; Fujiwara 2015). However, these interactions are not required for TPRT for some LINEs such as R2Bm (Luan and Eickbush 1995). Global correlation between the first nucleotide of TSDs and the microsatellite-like sequence at the 3′-ends of RTE-clade LINEs observed in this study is consistent with these previous observations, although animal 3′-microhomology was limited to one or two bases. Further, these two possible roles of microsatellite-like sequences may not be mutually exclusive.
. 7.

—Model of the genomic integration machinery of RTE-related retroposons. The RTE protein binds to a DNA region containing a stretch of Ts upstream of the cleavage site, and cuts a phosphodiester bond approximately one helical pitch downstream of the stretch of Ts. Microsatellite-like sequences in the 3′-end of the template RNA for reverse transcription influence cleavage site selection by the RTE EN and/or facilitate the initiation of reverse transcription through base-pairing.

—Model of the genomic integration machinery of RTE-related retroposons. The RTE protein binds to a DNA region containing a stretch of Ts upstream of the cleavage site, and cuts a phosphodiester bond approximately one helical pitch downstream of the stretch of Ts. Microsatellite-like sequences in the 3′-end of the template RNA for reverse transcription influence cleavage site selection by the RTE EN and/or facilitate the initiation of reverse transcription through base-pairing.

Molecular Adaptation after Horizontal Transfer

This study also provides the first evidence for cross-kingdom (i.e., plant-animal) commonality of a novel insertion signature of SINEs and LINEs. Since all LINE families are evolutionally long hitchhikers in the eukaryotic genome with ∼30 clades of LINEs divided in early eukaryotes (Malik et al. 1999), they may share the same machinery from the common ancestor of plants and animals. An alternative possibility is that our observed plant-animal commonality resulted from HT events of RTE-clade LINEs between ancient plants and animals through plant-animal interactions such as between flowering plants and pollinators (e.g., insects and birds). In support, a strong similarity of some fish LINEs to plant RTE-clade LINEs have been reported (Župunski et al. 2001; Tay et al. 2010). A recent study showed unexpectedly frequent HT of RTE-clade LINEs in which HT of the Bov-B LINE was significantly more widespread than believed, and at least nine HT events were required to explain the observed topology (Walsh et al. 2013). Similarly, the genomes of the nematodes and seven tropical bird lineages exclusively shared an AviRTE LINE resulting from HT (Suh et al. 2016). The cross-kingdom commonality of the novel insertion signature found in this study could be a footprint of such a complex trajectory of genetic materials between species. Among the various LINE clades, why the RTE-clade LINEs frequently undergo HT is not known. Our study revealed that animal RTE-clade LINEs may switch their integration site depending on their 3′ microsatellite-like sequences. Because the microsatellite contents of eukaryotic genomes are taxon-specific (Tay et al. 2010) such a simple and flexible integration mechanism of RTE-clade LINEs may have contributed to the successful expansion of RTEs and the associated SINEs in frontier genomes after HT. If RTE-clade LINEs could capture a novel microsatellite-like sequence in their 3′-end, the novel repeats may have extended the opportunity of RTEs to integrate their copies into frontier genomes, an integration that corresponds to the microsatellite environment in the genome. Further investigation is required for a better understanding of the detailed mechanism that underlies molecular adaptation after HT and the precise history of cross-kingdom HT.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.
  90 in total

1.  The age and evolution of non-LTR retrotransposable elements.

Authors:  H S Malik; W D Burke; T H Eickbush
Journal:  Mol Biol Evol       Date:  1999-06       Impact factor: 16.240

2.  Ancient SINEs from African endemic mammals.

Authors:  Masato Nikaido; Hidenori Nishihara; Yukio Hukumoto; Norihiro Okada
Journal:  Mol Biol Evol       Date:  2003-03-05       Impact factor: 16.240

Review 3.  Retroposons--seeds of evolution.

Authors:  J Brosius
Journal:  Science       Date:  1991-02-15       Impact factor: 47.728

4.  Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition.

Authors:  Q Feng; J V Moran; H H Kazazian; J D Boeke
Journal:  Cell       Date:  1996-11-29       Impact factor: 41.582

5.  Human L1 element target-primed reverse transcription in vitro.

Authors:  Gregory J Cost; Qinghua Feng; Alain Jacquier; Jef D Boeke
Journal:  EMBO J       Date:  2002-11-01       Impact factor: 11.598

6.  An analysis of retroposition in plants based on a family of SINEs from Brassica napus.

Authors:  J M Deragon; B S Landry; T Pélissier; S Tutois; S Tourmente; G Picard
Journal:  J Mol Evol       Date:  1994-10       Impact factor: 2.395

7.  Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

Authors:  Kathrin M Seibt; Torsten Wenke; Katja Muders; Bernd Truberg; Thomas Schmidt
Journal:  Plant J       Date:  2016-05       Impact factor: 6.417

8.  On the evolution and expression of Chlamydomonas reinhardtii nucleus-encoded transfer RNA genes.

Authors:  Valérie Cognat; Jean-Marc Deragon; Elizaveta Vinogradova; Thalia Salinas; Claire Remacle; Laurence Maréchal-Drouard
Journal:  Genetics       Date:  2008-05       Impact factor: 4.562

9.  Parallel relaxation of stringent RNA recognition in plant and mammalian L1 retrotransposons.

Authors:  Kazuhiko Ohshima
Journal:  Mol Biol Evol       Date:  2012-06-05       Impact factor: 16.240

10.  Ensembl 2017.

Authors:  Bronwen L Aken; Premanand Achuthan; Wasiu Akanni; M Ridwan Amode; Friederike Bernsdorff; Jyothish Bhai; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Thomas Juettemann; Stephen Keenan; Matthew R Laird; Ilias Lavidas; Thomas Maurel; William McLaren; Benjamin Moore; Daniel N Murphy; Rishi Nag; Victoria Newman; Michael Nuhn; Chuang Kee Ong; Anne Parker; Mateus Patricio; Harpreet Singh Riat; Daniel Sheppard; Helen Sparrow; Kieron Taylor; Anja Thormann; Alessandro Vullo; Brandon Walts; Steven P Wilder; Amonida Zadissa; Myrto Kostadima; Fergal J Martin; Matthieu Muffato; Emily Perry; Magali Ruffier; Daniel M Staines; Stephen J Trevanion; Fiona Cunningham; Andrew Yates; Daniel R Zerbino; Paul Flicek
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.