Literature DB >> 29946483

Multiple massive domestication and recent amplification of Kolobok superfamily transposons in the clawed frog Xenopus.

Akira Hikosaka1, Seigo Konishi1.   

Abstract

BACKGROUND: DNA transposons are generally destroyed by mutations and have short lifespans in hosts, as they are neutral or harmful to the host and therefore not conserved by natural selection. The clawed frog Xenopus harbors many DNA transposons and certain families, such as T2-MITE, have extremely long lives. These have ancient origins, but have shown recent transposition activity. In addition, certain transposase genes may have been "domesticated" by Xenopus and conserved over long time periods by natural selection. The aim of this study was to elucidate the evolutionary interactions between the host and the long-lived DNA transposon family it contains. Here, we investigated the molecular evolution of the Kolobok DNA transposon superfamily. Kolobok is thought to contribute to T2-MITE transposition.
RESULTS: In the diploid western clawed frog Xenopus tropicalis and the allotetraploid African clawed frog Xenopus laevis, we searched for transposase genes homologous to those in the Kolobok superfamily. To determine the amplification and domestication of these genes, we used molecular phylogenetics and analyses of copy numbers, conserved motifs, orthologous gene synteny, and coding sequence divergence between the orthologs of X. laevis and X. tropicalis, or between those of two distant X. tropicalis lineages. Among 38 X. tropicalis and 24 X. laevis prospective transposase genes, 10 or more in X. tropicalis and 14 or more in X. laevis were apparently domesticated. These genes may have undergone multiple independent domestications from before the divergence of X. laevis and X. tropicalis. In contrast, certain other transposases may have retained catalytic activity required for transposition and could therefore have been recently amplified.
CONCLUSION: Multiple domestication of certain transposases and prolonged conservation of the catalytic activity in others suggest that Kolobok superfamily transposons were involved in complex, mutually beneficial relationships with their Xenopus hosts. Some transposases may serve to activate long-lived T2-MITE subfamilies.

Entities:  

Keywords:  DNA transposon; Domestication; Kolobok; Molecular evolution; T2-MITE; Transposase; Xenopus laevis; Xenopus tropicalis

Year:  2018        PMID: 29946483      PMCID: PMC6004289          DOI: 10.1186/s40851-018-0100-4

Source DB:  PubMed          Journal:  Zoological Lett        ISSN: 2056-306X            Impact factor:   2.836


Background

Transposable elements (TEs) are endosymbiotic or parasitic genetic elements in the host genome. These elements usually replicate at their loci in the same manner as host genes. Nevertheless, they occasionally transpose to other loci and, in some cases, amplify themselves in the host genome. This non-Mendelian inheritance creates a conflict between TEs and their hosts. Transposable elements promote their amplification rates to increase their “fitness” in the host. In effect, their transposition and amplification are generally neutral, but occasionally detrimental, to the host as they increase mutations, such as gene disruption and ectopic recombination. In general, then, TE copies are not conserved. They are removed by purifying selection, and they lose their structure and function as they accumulate mutations. TEs are classified into two major classes according to their transposition mechanisms: class I (RNA transposons or retrotransposons) and class II (DNA transposons) [1, 2]. An autonomous DNA transposon encodes its own transposase to catalyze its transposition between terminal inverted repeats (TIRs) including the transposase recognition site. In contrast, a nonautonomous DNA transposon has lost its own transposase gene owing to mutation(s). Its transposition depends on transposases provided by autonomous transposons coexisting in the genome. Autonomous transposons statistically lose their activity over time without purifying selection and become nonautonomous or nontransposable. Over time, the host genome will accumulate many nonautonomous copies and nontransposable “fossils” and retain relatively few autonomous copies. The TE family must keep producing autonomous copies to persist in the host. However, the transposase protein catalyzes the transposition of many nonautonomous copies and few autonomous copies, without distinguishing between them. Therefore, the probability that autonomous copies are transposed and amplified gradually decreases as the ratio of nonautonomous copies increases in the genome. Most DNA transposon families eventually become extinct in a host genome. This process is known as “vertical inactivation” [3, 4]. An alternative strategy for TE survival in a host is “molecular domestication.” If a TE-derived gene, such as a transposase, has mutated and become beneficial to the host, it can be conserved by natural selection [5, 6]. Domesticated genes typically lose their original nature (that is, the ability to catalyze transposition/amplification) and behave like host genes. Over time, domesticated genes diverge from their copies produced before domestication. As a result, they become and remain nonrepetitive (single-copy) genes. Domesticating a parasitic element is an important evolutionary innovation for hosts. Many TE-derived proteins have been domesticated [7, 8]. In the clawed frog Xenopus, we found that two subfamilies of the TxpB family belonging to the piggyBac superfamily have employed different survival strategies. The transposase-derived gene of the Kobuta subfamily was domesticated before the separation of X. tropicalis and X. laevis. In contrast, the transposase of the Uribo2 subfamily retained its catalytic activity and can still excise transposons from DNA [9]. Nonautonomous TEs can also be domesticated. Miniature inverted-repeat transposable elements (MITEs) are a subclass of short nonautonomous DNA transposons. These are characterized by a TIR structure, high copy numbers, and highly similar sequences among copies [10]. MITEs generate functional transcriptional regulatory elements [11] and matrix attachment regions [12]. In Xenopus, MITEs also form simple sequence repeats (SSRs). Xmix, a predominant MITE in X. laevis and X. tropicalis, has an amplified internal segment representing a large SSR family, Xstir [13, 14]. Simple sequence repeats are essential for higher-order chromosomal structure [15]. Xmix is a member of the T2 MITE family (T2-MITE), which is characterized by a TTAA target site and a terminal AGGRR (R: A or G) motif in its TIR [16, 17]. These features are common to members of the DNA transposon superfamily Kolobok, which targets the TTAA site and has an RR terminal sequence [18]. Therefore, T2-MITE is presumably a nonautonomous member of the Kolobok DNA transposon superfamily. We classified 16 major T2-MITE subfamilies based on an in silico screening of the X. tropicalis genome sequence [19]. Subfamilies A1 (T2-A1, Xmix) and C (T2-C) were the most prevalent and were present in both X. tropicalis and X. laevis. They probably originated before these two lineages diverged ~ 48 Mya [20]. Despite their age, both subfamilies include “young” (highly homogeneous) members. Therefore, they probably underwent relatively recent amplification. We also found evidence for intraspecific T2-A1 and T2-C insertion polymorphisms [21]. These results suggest that these subfamilies have been actively transposing for more than 48 million years. Their extraordinary longevity suggests that their continued transpositional activity has been conserved by natural selection. The transposition and/or amplification of these subfamilies presumably have been advantageous to the host. We found that the sequences derived from T2-C in X. tropicalis were significantly over-represented in the 5′ upstream regions of genes. Therefore, they may regulate the expression of neighboring genes [22]. If the transpositional activities of T2-MITEs are conserved by natural selection, then transposases such as Kolobok, which are presumably responsible for their transposition, should be domesticated by the host. In the present study, we surveyed Kolobok transposase genes in the diploid X. tropicalis genome and the recently decoded allotetraploid X. laevis genome [20]. To elucidate their evolution within the hosts, we analyzed their molecular phylogenies, copy numbers, syntenies, and sequence conservation.

Methods

Search for prospective Kolobok transposase-coding sequences

X. tropicnalis (Nigerian 9.1) and X. laevis (J-Strain 9.2) genome assemblies were downloaded from the Xenbase FTP site [23, 24] and used in the analyses described below. An automated pipeline for these analyses was developed using the Ruby language. The search for prospective Kolobok transposase-coding sequences (CDSs) was carried out as follows. Query Kolobok superfamily transposase protein sequences were collected from vertebrate, zebrafish, and invertebrate data sets using Repbase Update [25] v. 22.09 [26] (Additional file 5: Table S1) and applied to a tblastn search (e-value <1e-5) [27]. The hit regions were extracted with flanking sequences (1800 bp each upstream and downstream). The longest open reading frames (ORFs) in each extracted sequence were treated as candidate Kolobok transposase CDSs. ORFs that were too short (< 1800 bp) were excluded from the following analyses. To confirm the homology of proteins encoded in the candidate CDSs to the Kolobok transposases, the CDSs were translated to amino acid sequences and used as queries in backward homology searches (blastp, e-value <1e-5) to the transposase sequences used as the queries in the forward tblastn search. Candidate CDSs with prospective amino acid sequences that were homologous to at least five Kolobok transposases were selected. The protein sequences predicted from the candidate CDSs were aligned by MEGA7 [28] using MUSCLE [29] as an alignment engine. The positions of their start methionines were verified. Excess 5′ regions were manually trimmed from the candidate CDSs to align the translation initiation site. Trimmed CDSs and protein sequences were used in the analyses described below.

Copy number analysis

Prospective Kolobok transposase CDSs from X. laevis and X. tropicalis were used as queries for blastn searches (e-value <1e-100) to the corresponding genome. Adjacent high-scoring segment pairs were considered single hit regions if the distance between them was less than the query length. Hit regions in each prospective CDS were counted as closely related truncated copies of the CDS if they did not overlap with any prospective CDSs.

Molecular phylogenetics and synteny analyses

Prospective Kolobok transposase CDSs from both species were translated to protein sequences and used in molecular phylogenetic analyses. The amino acid sequences were realigned with MEGA7 and the alignment engine MUSCLE and used to construct phylogenetic trees. The neighbor-joining method [30] was used to generate the phylogenetic trees with MEGA7. Positions with less than 60% site coverage were eliminated. The evolutionary distances were computed using the JTT matrix-based method [31]. Rate variations among sites were modeled with a gamma distribution (shape parameter = 1). Dot plot analyses were performed with the polydot program in EMBOSS [32]. The word sizes for the nucleotide and protein sequences were 12 and 5, respectively. Putative orthologous transposase sets were collected with reference to the phylogenetic tree. NCBI gene models (XT9_1_GCA.gff3 and XL9_2_GCA.gff3) were downloaded from the Xenbase FTP site [23, 24]. Gene models around the putative orthologous CDSs were compared. The CDS orthologs were defined as those located on the homologous X. tropicalis chromosome and X. laevis L/S chromosomes and flanked by multiple orthologous neighbor genes. The ratios of synonymous and nonsynonymous substitutions between two coding DNA sequences (dN/dS ratio) were calculated by the method of Yang and Nielsen [33] using the yn00 program of PAML v. 4.8 [34].

Search for full-length Kolobok transposons

The upstream and downstream sequences flanking the CDSs of each repetitive XKol subfamily were compared by dot plot analysis and multiple alignment to find the left and right TIRs, respectively. The left and right terminal sequences were compared to confirm their similarity. The sequences between similar left and right TIRs were considered as full-length TEs. Four-base pair sequences flanking the full-length TEs were examined to check the conservation of their duplicated target sites (TTAA).

Cloning of XKol-Tpases from two X. tropicalis lineages

X. tropicalis was provided by the National Bioresource Project, Japan [35]. Genomic DNAs were extracted from the Nigerian and Asashima lines using a previously described method [13]. Polymerase chain reaction (PCR) was performed using KOD-Plus DNA polymerase (Toyobo Co. Ltd., Osaka, Japan) and 100 ng genomic DNA. The default PCR conditions were as follows: initial denaturation (94 °C, 120 s), followed by 35 cycles of denaturation (98 °C, 10 s), annealing (60 °C, 30 s), and extension (68 °C, 90 s). The annealing temperature, extension time, and/or PCR cycles were changed as needed to optimize amplification. The primers used in the PCR are shown in (Additional file 6: Table S2). PCR products were inserted into the pCRBluntII-TOPO or pCR4Blunt-TOPO vector (Invitrogen, Carlsbad, CA, USA) and cloned into OneShotTOP-10 competent cells (Invitrogen). The cloned sequences were analyzed using a BigDye3.1 Terminator Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) and an ABI PRISM 310 or 3100 Genetic Analyzer (Applied Biosystems).

Results

Prospective Kolobok transposase CDSs and proteins

To identify candidates of active (able to catalyze TE transposition) or domesticated Kolobok transposase genes, we performed tblastn (protein query versus nucleotide database) homology searches of X. tropicalis and X. laevis genomes. We searched the hit loci and their surrounding regions for the longest ORFs that encoded proteins homologous to Kolobok transposases. Multiple alignments of putative proteins encoded in the longest ORFs revealed that the N-terminal sequences encoded by certain ORFs were longer than those of other ORFs. We trimmed the excess 5′ regions from these ORFs to align their start codons with those of the majority genes (Additional file 1: Figure S1). Figure 1 shows the N-terminal region and some conserved regions of the aligned proteins. It is unclear whether these trimmed ORFs were the actual CDSs of Kolobok transposase genes. Nevertheless, we expected that most of these ORFs coincided or overlapped with the CDSs. Therefore, we ascribed them to be CDSs of prospective Kolobok transposase genes. In X. tropicalis and X. laevis, 38 and 24 prospective CDSs were found, respectively. We referred to the prospective Xenopus Kolobok transposons as the XKol family and its transposase as XKol-Tpase. We designated the XKol-Tpase CDSs and proteins serially as Tr1–Tr38 for X. tropicalis and Lv1–Lv24 for X. laevis (Table 1). The five shortest XKol-Tpase proteins (Tr34–Tr38) lacked a certain number of conserved amino acids (Fig. 1 and Table 2). Therefore, they may not be functional. The transposases encoded in the Kolobok-1_XT and Kolobok-2_XT transposons previously reported in the RepBase [18] most resembled, but were not identical to, Tr8 (723/783 identical) and Tr7 (782/783 identical), respectively.
Fig. 1

Conserved regions of prospective XKol transposase CDSs. Conserved regions in a multiple alignment of prospective transposases predicted from the putative CDSs of XKol-Tpase genes with three outgroup transposases (Danio rerio Kolobok-1_DR, Capitella teleta Kolobok-1_CTe, and Branchiostoma floridae Kolobok-2_BF). Numbers above the alignment indicate the position of the amino acids in it (Additional file 1: Figure S1). Numbers in square brackets are the abbreviated amino acids. Consensus residues of conserved domains (DDE, THAP, and H2CH) are shown below the alignment. Nonconserved amino acid residues in conserved motifs are marked by gray shading

Table 1

Prospective Xenopus XKol transposase CDSs

NameLocusLongest ORFTrimmed CDSNo. of truncated copies
A. Prospective X. tropicalis XKol transposase CDSs
 Tr1Chr05:4592573..4595167(−)2595 bp2202 bp1
 Tr2Chr02:100830975..100833458(+)2484 bp2421 bp3
 Tr3Chr01:90056839..90059247(+)2409 bp2409 bp0
 Tr4Chr09:49289527..49291908(+)2382 bp2319 bp0
 Tr5scaffold_769:44126..46486(−)2361 bp2361 bp10
 Tr6scaffold_3164:2182..4536(−)2355 bp2355 bp5
 Tr7Chr02:77614330..77616681(−)2352 bp2352 bp2
 Tr8Chr08:54548260..54550611(+)2352 bp2352 bp9
 Tr9Chr02:55486066..55488417(+)2352 bp2352 bp2
 Tr10Chr08:50277255..50279597(+)2343 bp2292 bp0
 Tr11scaffold_609:18293..20635(−)2343 bp2343 bp9
 Tr12Chr06:112308032..112310368(−)2337 bp2250 bp9
 Tr13Chr02:162067553..162069883(−)2331 bp2331 bp0
 Tr14scaffold_51:231432..233759(−)2328 bp2328 bp5
 Tr15Chr01:83498100..83500424(+)2325 bp2325 bp5
 Tr16Chr04:108160710..108163031(−)2322 bp2322 bp0
 Tr17Chr09:29639969..29642278(−)2310 bp2310 bp0
 Tr18Chr02:66952233..66954539(−)2307 bp2307 bp0
 Tr19Chr04:41679747..41682050(+)2304 bp2229 bp0
 Tr20scaffold_16:1256554..1258851(+)2298 bp2298 bp4
 Tr21Chr04:107657217..107659511(+)2295 bp2295 bp9
 Tr22Chr01:168455606..168457900(−)2295 bp2295 bp9
 Tr23Chr03:106168762..106171056(−)2295 bp2295 bp9
 Tr24scaffold_561:30795..33086(−)2292 bp2256 bp0
 Tr25Chr01:107512548..107514821(+)2274 bp2274 bp4
 Tr26Chr09:13318232..13320439(−)2208 bp2208 bp0
 Tr27scaffold_4830:88..2292(−)2205 bp2205 bp5
 Tr28Chr04:128881775..128883943(−)2169 bp2169 bp0
 Tr29Chr04:128872195..128874363(−)2169 bp2169 bp0
 Tr30Chr09:10540280..10542436(+)2157 bp2157 bp0
 Tr31Chr09:10547870..10549993(+)2124 bp2091 bp0
 Tr32Chr03:80573408..80575504(+)2097 bp2097 bp0
 Tr33Chr09:10557320..10559413(+)2094 bp2094 bp1*
 Tr34scaffold_731:1892..3937(−)2046 bp2046 bp0
 Tr35Chr05:37113502..37115535(−)2034 bp2034 bp3
 Tr36scaffold_161:120530..122407(+)1878 bp1878 bp8
 Tr37Chr01:130854739..130856610(−)1872 bp1872 bp0
 Tr38Chr09:29654976..29656787(−)1812 bp1812 bp0
B. Prospective X. laevis XKol transposase CDSs
 Lv1chr9_10L:117582522..117585011(−)2490 bp2433 bp4
 Lv2chr6S:1512718..1515198(−)2481 bp2481 bp1*
 Lv3chr1L:4995471..4997924(−)2454 bp2454 bp4
 Lv4chr8L:50116481..50118844(+)2364 bp2292 bp0
 Lv5Scaffold261:26626..28977(+)2352 bp2352 bp0
 Lv6chr8L:99349197..99351548(−)2352 bp2352 bp2
 Lv7chr2L:167641114..167643453(−)2340 bp2340 bp0
 Lv8chr2S:147754724..147757057(−)2334 bp2334 bp0
 Lv9chr9_10L:77656066..77658378(+)2313 bp2313 bp0
 Lv10chr4S:91247377..91249686(−)2310 bp2310 bp1*
 Lv11chr4S:1463674..1465929(−)2256 bp2217 bp1*
 Lv12chr3L:58143401..58145647(+)2247 bp2211 bp0
 Lv13chr9_10L:110876140..110878383(−)2244 bp2244 bp0
 Lv14chr4L:133759875..133762109(+)2235 bp2187 bp0
 Lv15chr4S:20852192..20854426(−)2235 bp2235 bp0
 Lv16chr4L:26594086..26596320(+)2235 bp2235 bp1*
 Lv17chr4S:1470489..1472717(−)2229 bp2229 bp2*
 Lv18chr6L:145404209..145406428(−)2220 bp2220 bp0
 Lv19chr9_10L:110897783..110899978(−)2196 bp2196 bp0
 Lv20chr4S:112529046..112531238(−)2193 bp2193 bp0
 Lv21chr4S:112539344..112541533(−)2190 bp2190 bp0
 Lv22chr4L:133769461..133771647(+)2187 bp2187 bp0
 Lv23Scaffold22:1796503..1798647(−)2145 bp2145 bp3
 Lv24chr4L:26633802..26635940(+)2139 bp2103 bp0

Prospective XKol-Tpases found in (A) X. tropicalis and (B) X. laevis. Asterisks in the “No. of truncated copies” column indicate semi-nonrepetitive CDSs

Table 2

Amino acid residues in conserved motifs of Kolobok transposase

NameDDEH2CHC2CHPWFAVPTIFConservation
Tr1DDEHHCHCCCHPWFSIPTQF
Tr2DDEHHCHYCCHPWFAIPSVF
Tr3DGDHHCH--CPPLLALQ-LM
Tr4DNKHHCQCCCHPWFATPTLF
Tr5DDEHHCHCCCHPWFAVPTIF++
Tr6DDEHHCHCCCHPWFAVPTIF++
Tr7DDEHHCHCCCHPWFAVPTIF++
Tr8DDEHHCHCCCHPWFAVPTIF++
Tr9DDEHHCHCCCHPWFAVPTIF++
Tr10DDEHHCHCCCHPWFAVPTIF++
Tr11DDEHHCHCCCHPWFAVPTIF++
Tr12DDEHHCHCCCHPWFATPTSF
Tr13DDEHHCHCCCHPWFAIPTLF+
Tr14DDEHHCHCCCHPWFAIPTLF+
Tr15DDEHHCHCCCHPWFAIPTLF+
Tr16DDKHHCHCCCHPWFAIPTIF
Tr17DEDHHCHCCCHPWFAVPTQF
Tr18DDEHHCHCCCHPWFAVPTVF+
Tr19DGEHHCHCCCHPWFAVPTIF
Tr20DDEHHCHCCCHPWFAVPTIF++
Tr21DDEHHCHCCCHPWFAVPTLF+
Tr22DDEHHCHCCCHPWFAVPTLF+
Tr23DDEHHCHCCCHPWFAVPTLF+
Tr24DDEHHCHCCCHPWFAVPTIF++
Tr25DDEHHCH-VCHPWFAVPTIF
Tr26DDVHHCHCCCHPWFAVPTIF
Tr27DDEHHCHCCCHPWFAMPTQF
Tr28GDEHHCHCCCHPWFAVPTIF
Tr29GNEHHCHCCCHPWFAVPTIF
Tr30DDLHHCHCCCHPWFAVPTIF
Tr31DDLHHCHCCCHPWFAVPTIF
Tr32DDEHHCHCCCHPWFAVPTIF++
Tr33NDTHDCNCCCHPWFALPTIF
Tr34DEEHHCH--T-P--TEPVVV
Tr35DDEHHCHS-C-P--
Tr36DD-HHCHCCCHPWFAVPTIY
Tr37DDEYHCH-SC-S-F---T--
Tr38DEDHHCHY-CK--LSLSTLL
Lv1DDEHHCHCCCHPWFAVPSIF+
Lv2DEDHHCHCCCHPWFAVPTLF
Lv3DDEHHCHCCCHPWFAIPSIF+
Lv4DDEHHCHCCCHPWFAIPTIF+
Lv5DDKHHCHCCCHPWFAFPTLF
Lv6DDEHHCHCCCHPWFAVPTIF++
Lv7DDEHHCHCCCHPWFAVPTIF++
Lv8DDEHHCHCCCHPWFAVPTIF++
Lv9DNKRHCHCCCHPWFATPTLF
Lv10HDNHHCHCCCHPWFAIPTIF
Lv11DDQHHCHCCCHPWFAVPTIF
Lv12YNEHHCHCCCHPWFAVPTIF
Lv13DDKHNCHCCCHPWFALPSIF
Lv14GDEHHCHCCCHPWFAVPTIF
Lv15DEEHHCHCCCHPWFAIPTIF
Lv16DSEHHCHCCCHPWFAVPTIF
Lv17DDQHHCHCCCHPWFAVPTIF
Lv18DDEHHCHCCCHPWFAVPTVF+
Lv19DDLHHCHCCCHPWFAVPTIF
Lv20GDEHHCHCCCHPWFAVPTIF
Lv21GDEHHCHCCCHPWFAVPTIF
Lv22GDEHHCHCCCHPWFAIPTIF
Lv23DDPHHCHCCCHPWFAVPTLF
Lv24DDKHHCHCCCHPWFAVPTIF

The “++” in the “Conservation” column indicates that all motifs are conserved. The “+” in the same column indicates that all motifs are conserved except for substitutions of similar amino acids in the AVPTIF motif

Conserved regions of prospective XKol transposase CDSs. Conserved regions in a multiple alignment of prospective transposases predicted from the putative CDSs of XKol-Tpase genes with three outgroup transposases (Danio rerio Kolobok-1_DR, Capitella teleta Kolobok-1_CTe, and Branchiostoma floridae Kolobok-2_BF). Numbers above the alignment indicate the position of the amino acids in it (Additional file 1: Figure S1). Numbers in square brackets are the abbreviated amino acids. Consensus residues of conserved domains (DDE, THAP, and H2CH) are shown below the alignment. Nonconserved amino acid residues in conserved motifs are marked by gray shading Prospective Xenopus XKol transposase CDSs Prospective XKol-Tpases found in (A) X. tropicalis and (B) X. laevis. Asterisks in the “No. of truncated copies” column indicate semi-nonrepetitive CDSs Amino acid residues in conserved motifs of Kolobok transposase The “++” in the “Conservation” column indicates that all motifs are conserved. The “+” in the same column indicates that all motifs are conserved except for substitutions of similar amino acids in the AVPTIF motif Kolobok superfamily transposases usually possess three motifs: a catalytic “DDE” domain, a THAP DNA-binding domain, and an H2CH putative zinc-finger domain  [18, 36]. The THAP DNA-binding domain includes a C2CH consensus, three key residues (P, W, and F), and a C-terminal AVPTIF box [37]. These features were identified in the prospective XKol-Tpases (Fig. 1). The H2CH domain is highly conserved in the prospective XKol-Tpases. Thirty-five of the 38 XKol-Tpases in X. tropicalis and 22 of the 24 XKol-Tpases in X. laevis retained this motif (Table 2). Three THAP domain motifs were also conserved. The C2CH motif was conserved in 31 of 38 and 24 of 24 XKol-Tpases in X. tropicalis and X. laevis, respectively. The PWF motif was conserved in 33 of 38 and 24 of 24 XKol-Tpases in X. tropicalis and X. laevis, respectively. The AVPTIF motif was conserved only in 17 of 38 and 12 of 24 XKol-Tpases in X. tropicalis and X. laevis, respectively. However, most of the remaining XKol-Tpases had chemically similar residues at this position. When considering such residues as conserved, this motif was conserved in 27 of 38 and 23 of 24 XKol-Tpases in X. tropicalis and X. laevis, respectively. In contrast, the catalytic DDE motif was conserved in only 24 of 38 XKol-Tpases in X. tropicalis and seven of 24 XKol-Tpases in X. laevis. Some XKol-Tpases lost one or two of these motifs by deletion (Tr3, Tr25, Tr34, Tr35, Tr36, Tr37, and Tr38). These proteins may lack DNA-binding and/or catalytic activity. In summary, 17 X. tropicalis and seven X. laevis XKol-Tpases retained all the motifs characterizing Kolobok transposase.

Truncated copies of the XKol-Tpase genes

Transposase genes may be retained in a host either by constant amplification of autonomous elements or by domestication in the host. Amplification of autonomous elements results in the accumulation of multiple intact and/or truncated copies in the genome. Therefore, they would have repetitive (multicopy) transposase genes. In contrast, if a copy of a transposase gene was domesticated by its host, it would be conserved by purifying selection. Its undomesticated relatives would not have been conserved by purifying selection and would have accumulated mutations and diverged from the domesticated copy. Over time, then, there would be no apparent homology between a domesticated transposase gene and its relatives, and the domesticated transposase genes would become nonrepetitive (single copies). Conversely, the nonrepetitiveness of a transposase-related gene may indicate that its host domesticated it long ago. X. laevis is an allotetraploid species retaining two of its progenitor’s subgenomes as L and S chromosomes, respectively [20]. Therefore, this species could have conserved both homeologous transposases if the genes had been domesticated before the segregation of the two progenitor species. In contrast, if either of the domesticated L or S genes had degraded because of functional redundancy, the remaining prospective gene would be accompanied by a closely related truncated sequence on its homeologous chromosome. If one of the tandemly duplicated paralogous transposase genes was degraded, the remaining gene would be accompanied by a closely related truncated sequence on its neighboring locus. In the analyses discussed below, we describe such genes as “semi-nonrepetitive.” To find nonrepetitive or semi-nonrepetitive genes derived from transposases, we surveyed truncated (nonprospective) copies closely related to each XKol-Tpase CDS using a blastn (nucleotide query versus nucleotide database) search. If a query did not hit genomic sequences other than itself or other XKol-Tpase CDSs, it was considered nonrepetitive. The number of truncated copies and their loci are shown in Table 1 and (Additional file 7: Table S3), respectively. Eighteen of 38 X. tropicalis XKol-Tpase genes (47%) and 15 of 24 X. laevis XKol-Tpase genes (63%) were found to be nonrepetitive. Two X. laevis XKol-Tpase genes (Lv2 and Lv10) could be considered semi-nonrepetitive because their single truncated copies were located on the homeologous chromosome (Additional file 7: Table S3). One X. laevis XKol-Tpase gene (Lv16) and one X. tropicalis XKol-Tpase gene (Tr33) may also be semi-nonrepetitive, as their hit sequences were located in the neighborhood of the XKol-Tpase CDSs and may therefore have been amplified by tandem duplication rather than transposition. Lv11 and Lv17 had one and two truncated hits, respectively. However, one of their hit regions covered the entire length of a very short Scaffold105792 (216 bp). Therefore, this hit could be an artifact caused by an imperfect genome sequence assembly. Another hit sequence to Lv17 was located on the neighbor of its homeolog Lv16 and might be a truncated paralog of Lv16 (Additional file 7: Table S3 and Fig. 4). Lv11 and Lv17, then, may also be semi-nonrepetitive genes. It is possible that the host domesticated these nonrepetitive or semi-nonrepetitive genes long ago.
Fig. 4

Synteny of XKol-Tpase genes. Conserved synteny around putative domesticated orthologs of subfamily D1–D7. Xenopus laevis homeologous chromosomes L and S and Xenopus tropicalis homologous chromosome (T) are indicated by arrows. The orientation of each arrow indicates the 5′ to 3′ direction. XKol-Tpase CDSs are represented by white triangles on the chromosomes, and their names are shown below the chromosomes. Gray triangles indicate truncated XKol-Tpase sequences. Gene models around the CDSs are represented by black bars

Phylogenetic analyses of XKol-Tpases

If a transposase gene was domesticated before the divergence of X. laevis and X. tropicalis and is conserved in both species, then the X. laevis and X. tropicalis genes are orthologous and the two X. laevis genes are homeologous. They should be closely related in the molecular phylogenetic tree and located on homologous chromosome loci. The L and S genes derived from ancestral L and S species are located on the L and S chromosomes, respectively. Therefore, if both were conserved in X. laevis, then the branching pattern should be a triplet consisting of the X. tropicalis (semi-)nonrepetitive transposase and the two (semi-)nonrepetitive transposases on the X. laevis L and S chromosomes (triplet-branching). Alternatively, if either the L or S gene was lost or degraded in X. laevis, the branching pattern should consist of a pair of (semi-)nonrepetitive transposases from X. tropicalis and X. laevis (doublet-branching). Figure 2 shows a molecular phylogenetic tree for the XKol-Tpases. A typical triplet-branching pattern can be seen for nonrepetitive Tr13, Lv7, and Lv8 (grouped as subfamily “D1” in Figs. 2 and 3), whose genes are located on the homologous chromosomes X. tropicalis Chr02, X. laevis chr2L, and chr2S, respectively. X. tropicalis Tr13 branched first, followed by the branching of X. laevis Lv7 and Lv8. This pattern corresponded to the order of divergence in X. tropicalis and two ancestral species of X. laevis. A dot plot analysis showed that their genes were significantly similar in amino acid and nucleotide sequences (Fig. 3). This relationship could also be seen for the other three (semi-)nonrepetitive XKol-Tpase triplets: Tr19/Lv16/Lv11-Lv17 (D2), Tr28/Lv22/Lv20 (D3–1), and Tr29/Lv14/Lv21 (D3–2). Lv11 and Lv17 genes were tandemly located within ~ 4.6 kb of each other on chr4S. Lv16 was found to be a semi-nonrepetitive gene accompanied by a closely related truncated sequence located in its neighborhood (Fig. 4). Therefore, these genes were thought to be tandemly duplicated before the segregation of the two ancestor species of X. laevis. The last two triplets Tr28/Lv22/Lv20 and Tr29/Lv14/Lv21 were closely related (Fig. 2). The tandemly duplicated gene pairs Tr28-Tr29, Lv14-Lv22, and Lv2-Lv20 were located within ~ 8 kb of each other on the homologous chromosomes X. tropicalis Chr04, X. laevis chr4L, and chr4S, respectively (Fig. 4). These paired genes were thought to be paralogs tandemly duplicated before the divergence of X. laevis and X. tropicalis. Six doublet pairs of closely related X. tropicalis and X. laevis (semi-)nonrepetitive genes were found to be located on the homologous chromosomes Tr16/Lv10 (chr4, D4), Tr10/Lv4 (chr8, D5), Tr30/Lv19 (chr9, D6–1), Tr33/Lv13 (chr9, D6–2), Tr4/Lv9 (chr9, D7), and Tr32/Lv12 (chr3, D8). One doublet pair, Tr34/Lv15 (D9), was also closely related; however, the chromosomal location of Tr34 was uncertain. These pairs may also be orthologs domesticated by the common ancestor of X. laevis and X. tropicalis. One of the X. laevis homeologs may have been lost or broken.
Fig. 2

Molecular phylogeny of prospective XKol transposases. Molecular phylogenetic trees of XKol-Tpase amino acid sequences inferred using the neighbor-joining method. Each operational taxonomic unit is represented by the names of the CDS and a chromosome or scaffold on which the CDS is located. The number of blastn hits to truncated transposase sequences are shown as numerals following the vertical bar. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. The symbol “+” on the branches indicates domestication events. Asterisks indicate transpositional amplification events. Subfamilies of putative domesticated transposase orthologs are grouped by black bars and named D1–D9. CDS subfamilies that may have been duplicated by recent transposition and amplification are grouped by gray bars and designated A1–A6

Fig. 3

Dot plot analysis of XKol-Tpase gene CDSs. All-to-all comparisons of XKol-Tpase CDSs (upper right) and proteins (lower left) performed by dot plot analyses. D1–D9 and A1–A6 are grouped in the same way as in Fig. 2

Molecular phylogeny of prospective XKol transposases. Molecular phylogenetic trees of XKol-Tpase amino acid sequences inferred using the neighbor-joining method. Each operational taxonomic unit is represented by the names of the CDS and a chromosome or scaffold on which the CDS is located. The number of blastn hits to truncated transposase sequences are shown as numerals following the vertical bar. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. The symbol “+” on the branches indicates domestication events. Asterisks indicate transpositional amplification events. Subfamilies of putative domesticated transposase orthologs are grouped by black bars and named D1–D9. CDS subfamilies that may have been duplicated by recent transposition and amplification are grouped by gray bars and designated A1–A6 Dot plot analysis of XKol-Tpase gene CDSs. All-to-all comparisons of XKol-Tpase CDSs (upper right) and proteins (lower left) performed by dot plot analyses. D1–D9 and A1–A6 are grouped in the same way as in Fig. 2 Synteny of XKol-Tpase genes. Conserved synteny around putative domesticated orthologs of subfamily D1–D7. Xenopus laevis homeologous chromosomes L and S and Xenopus tropicalis homologous chromosome (T) are indicated by arrows. The orientation of each arrow indicates the 5′ to 3′ direction. XKol-Tpase CDSs are represented by white triangles on the chromosomes, and their names are shown below the chromosomes. Gray triangles indicate truncated XKol-Tpase sequences. Gene models around the CDSs are represented by black bars Some X. tropicalis XKol-Tpase genes encode highly similar proteins and are located on different chromosomes. These genes were probably recently amplified. For example, Tr21, Tr22, and Tr23 (grouped as subfamily “A1” in Figs. 2 and 3) were found to be located on Chr04, Chr01, and Chr03, respectively (Tables 1 and 3, Figs. 2 and 3). They also contained multiple truncated copies of XKol-Tpase genes on Chr01, Chr02, Chr04, Chr05, Chr06, and Chr09 (Additional file 7: Table S3). Another five X. tropicalis XKol-Tpase gene groups, i.e., Tr5-Tr8-Tr11-Tr36 (A2), Tr14-Tr15 (A3), Tr20-Tr25 (A4), Tr2-Tr35 (A5), and Tr7-Tr9 (A6), were also closely related to each other and had multiple truncated copies on different chromosomes (Fig. 2, Additional file 7: Table S3). Although X. tropicalis Tr6, Tr12, and Tr27 and X. laevis Lv1, Lv3, Lv6, and Lv23 did not have similar XKol-Tpase genes, they had truncated copies on various chromosomes (Additional file 7: Table S3), representing traces of their amplification and transposition. Similarities in nucleotide sequences between these XKol-Tpase genes and their truncated hits are shown by dot plots (Additional file 2: Figure S2).
Table 3

Relationship of transposases between X. tropicalis and X. laevis

NameClosely related CDSSubfamilySynteny
A. Relationship between X. tropicalis and X. laevis transposases
Tr1
Tr2Lv3(L)A5
Tr3
Tr4Lv9(L)D7+
Tr5A2
Tr6
Tr7A6
Tr8A2
Tr9A6
Tr10Lv4(L)D5+
Tr11A2
Tr12
Tr13Lv7(L), Lv8(S)D1+
Tr14A3
Tr15A3
Tr16Lv10(S)D4+
Tr17Lv2(S)
Tr18
Tr19Lv16(L), Lv11(S), Lv17(S)D2+
Tr20A4
Tr21Lv23A1
Tr22Lv23A1
Tr23Lv23A1
Tr24
Tr25A4
Tr26
Tr27
Tr28Lv22(L), Lv20(S)D3–1+
Tr29Lv14(L), Lv21(S)D3–2+
Tr30Lv19(L)D6–1+
Tr31D6–1+
Tr32Lv12(L)D8
Tr33Lv13(L)D6–2+
Tr34Lv15(L)D9
Tr35Lv3(L)A5
Tr36A2
Tr37
Tr38Lv2(S)
B. Relationship between X. laevis and X. tropicalis transposases
Lv1
Lv2Tr17*, Tr38
Lv3Tr2*, Tr35
Lv4Tr10D5+
Lv5
Lv6
Lv7Tr13D1+
Lv8Tr13D1+
Lv9Tr4D7+
Lv10Tr16D4+
Lv11Tr19D2+
Lv12Tr32D8
Lv13Tr33D6–2+
Lv14Tr29D3–2+
Lv15Tr34D9
Lv16Tr19D2+
Lv17Tr19D2+
Lv18
Lv19Tr30D6–1+
Lv20Tr28D3–1+
Lv21Tr29D3–2+
Lv22Tr28D3–1+
Lv23Tr22*, Tr21, Tr23
Lv24

(A) Relationship between X. tropicalis and X. laevis transposases and (B) relationship between X. laevis and X. tropicalis transposases. Asterisks in the “closely related CDS” column indicate CDSs used in the analysis of differences in nucleotides and amino acids between X. laevis and X. tropicalis (Table 4B)

Relationship of transposases between X. tropicalis and X. laevis (A) Relationship between X. tropicalis and X. laevis transposases and (B) relationship between X. laevis and X. tropicalis transposases. Asterisks in the “closely related CDS” column indicate CDSs used in the analysis of differences in nucleotides and amino acids between X. laevis and X. tropicalis (Table 4B)
Table 4

Differences in transposase CDSs and amino acids, and presumptive status

NameNucleotide substitutionNucleotide gapAmino acid substitutiondN/dSPresumptive status
A. Differences between Nigerian and Asashima lines, and presumptive status
Tr1206100.396active?
Tr21124frame-shiftactive?
Tr35031.006domestication?
Tr4111270.501ancient domestication
Tr5currently active
Tr6101dS = 0currently active
Tr79070.696currently active
Tr8260211.057currently active
Tr9currently active
Tr105010.084ancient domestication
Tr11currently active
Tr12205991frame-shiftactive?
Tr1326080.173ancient domestication
Tr14currently active
Tr15030dS = 0currently active
Tr163020.527ancient domestication
Tr17240130.434domestication?
Tr1815090.504domestication?
Tr194304dS = 0ancient domestication
Tr20530290.403currently active
Tr21currently active
Tr22000dS = 0currently active
Tr23currently active
Tr24576230.28domestication?
Tr25active?
Tr2615030.058domestication?
Tr27550190.118active?
Tr289060.629ancient domestication
Tr29270120.312ancient domestication
Tr305020.328ancient domestication
Tr31213110.359ancient domestication
Tr327061.444domestication?
Tr3330080.14ancient domestication
Tr3423090.192domestication?
Tr35active?
Tr36active?
Tr37domestication?
Tr3812050.196domestication?
B. Differences between X. laevis and X. tropicalis, and presumptive status
Lv1currently active
Lv24662372020.285domestication?
Lv3472632100.194currently active
Lv426361100.257ancient domestication
Lv5domestication?
Lv6currently active
Lv7289511400.391ancient domestication
Lv8293571410.301ancient domestication
Lv9303421540.341ancient domestication
Lv10302361610.463ancient domestication
Lv11335421720.358ancient domestication
Lv125462522130.100domestication?
Lv133591981640.365ancient domestication
Lv14305361550.413ancient domestication
Lv154113931570.195domestication?
Lv16305361620.349ancient domestication
Lv17331541620.336ancient domestication
Lv18domestication?
Lv19335691560.328ancient domestication
Lv20323661650.497ancient domestication
Lv21312331660.472ancient domestication
Lv22356721930.521ancient domestication
Lv235912942400.153active?
Lv24domestication?

(A) Differences between Nigerian and Asashima lines and (B) differences between X. laevis and X. tropicalis. X. tropicalis CDSs used in the analysis of differences with those of X. laevis are indicated by asterisks in Table 3B. The last column indicates the presumptive status of each XKol transposase

Evidence of XKol-Tpase gene domestication

The triplet or doublet XKol-Tpase genes described above were thought to have been domesticated by the common ancestor of Xenopus. We analyzed the neighbors of these putative domesticated XKol-Tpases to assess whether the genes in each triplet or doublet were located on homologous loci. All four triplets (D1, D2, D3–1, and D3–2) and five doublets (D4, D5, D6–1, D6–2, and D7) in the X. tropicalis and X. laevis genes conserved synteny with the neighbors (Fig. 4). In contrast, we did not find synteny for the doublet pairs D8 and D9 (data not shown). We calculated the dN/dS ratio between putative orthologous or similar X. tropicalis and X. laevis XKol-Tpase genes to determine whether they were affected by natural selection (Table 4B). All dN/dS ratios were less than 1.0; therefore, these genes were probably conserved by purifying selection. Differences in transposase CDSs and amino acids, and presumptive status (A) Differences between Nigerian and Asashima lines and (B) differences between X. laevis and X. tropicalis. X. tropicalis CDSs used in the analysis of differences with those of X. laevis are indicated by asterisks in Table 3B. The last column indicates the presumptive status of each XKol transposase We investigated the evolutionary conservation of the CDSs compared to their flanking 2000 bp sequences for subfamilies D1–D9 (Additional file 3: Figure S3). The dot plots showed that the CDS regions were more conserved than their flanking regions. This result also supported the hypothesis that the XKol-Tpase proteins have a function conserved by natural selection. We amplified and cloned the CDSs of the putative domesticated XKol-Tpase genes from the distinct Nigerian and Asashima lineages of X. tropicalis. There were nucleotide substitutions and indels between the clones of the two lines. Nevertheless, the ORFs of the CDSs were not destroyed by nonsense or frameshift mutations (Table 4A and Additional file 4: Figure S4). For the X. tropicalis triplet and doublet genes, the dN/dS ratios between the Nigerian and Asashima lines were less than 1.0 in all cases except for Tr19, wherein all four substitutions were nonsynonymous. We also cloned and compared CDSs from nine other nonrepetitive genes (Tr3, Tr17, Tr18, Tr24, Tr26, Tr31, Tr32, Tr34, and Tr38) in the Nigerian and Asashima lines (Table 4A and Additional file 4: Figure S4). Once again, neither substitutions nor indel mutations destroyed the ORFs. The dN/dS ratios were < 1.0 for seven genes, approximately 1.0 for Tr3, and > 1.0 for Tr32. These results indicated that most nonrepetitive or semi-nonrepetitive XKol-Tpase genes were domesticated by the host and conserved by purifying selection.

Conservation of repetitive and recently active XKol-Tpase genes

In general, active transposons were not conserved by purifying selection because they were untamed and potentially harmful to the host. X. tropicalis Tr21-Tr22-Tr23 (A1), Tr5-Tr8-Tr11-Tr36 (A2), Tr14-Tr15 (A3), Tr20-Tr25 (A4), Tr2-Tr35 (A5), Tr7-Tr9 (A6), Tr6, Tr12, and Tr27 had multiple intact and/or truncated copies. Dot plot analyses revealed that all intact genes and some truncated copies were highly similar (Additional file 2: Figure S2), suggesting that they were recently amplified. Therefore, they were either currently active or were active until recently. We cloned and compared the CDSs of the repetitive XKol-Tpases in the Nigerian and Asashima lines (Table 4A). The ORFs of Tr22, Tr8, Tr15, Tr20, Tr6, and Tr27 were conserved and not destroyed by nonsense or frameshift mutations. The clones of Tr22 and Tr15 from the Asashima line were identical to those from the Nigerian line. For Tr2 and Tr12, frameshift mutations destroyed the ORFs of the clones from the Asashima line. Tr12 from the Asashima line was degraded by many mutations. We searched the upstream and downstream flanking sequences of the CDSs of the repetitive XKol-Tpases for left and right TIRs and found both left and right ones for ten CDSs (Table 5). All of them had AG terminal sequences, and all except for the right TIR of Tr11 were flanked by an intact TTAA target sequence. These are probably full-length autonomous XKol copies that still have transposition activity.
Table 5

Full-length X. tropicalis XKol sequences

SubfamilyCDSCDS locusFull-length Xkol sequenceLengthTIR leftTIR rightTSD leftTSD right
A1Tr21Chr04:107657217..107659511(+)Chr04:107650836..107663091(+)12,216AGCGATTCTGACATGGAGTGATACTGACAGTATTAATTAA
A1Tr22Chr01:168455606..168457900(−)Chr01:168451382..168460584(−)9163AGCGATTCTGACATGGAGTGATACTGACAGTATTAATTAA
A1Tr23Chr03:106168762..106171056(−)Chr03:106165634..106173458(−)7785AGCGATTCTGACATGGAGTGATACTGACAGTATTAATTAA
A2Tr8Chr08:54548260..54550611(+)Chr08:54546557..54553359(+)6763AGGAGAAGGAAAGGCTAGGAAATGGCAAGCCATTAATTAA
A2Tr11scaffold_609:18293..20635(−)scaffold_609..15284:22344(−)7021AGGAGAAGGAAAGGCTAGGAAATGGCAAGCCATTAAATAA
A3Tr14scaffold_51:231432..233759(−)scaffold_51..227047:240993(−)13,907AGGACATGTCAACCCCAGGACGTGTCAACCCTTTAATTAA
A3Tr15Chr01:83498100..83500424(+)Chr01:83495893..83504753(+)8821AGGACATGTCAACCCCAGGACGTGTCAACCATTTAATTAA
A4Tr25Chr01:107512548..107514821(+)Chr01:107510216..107521330(+)11,075AGGACAAGGAAAGCTTAGGAAAATGAAAGTCATTAATTAA
A5Tr2Chr02:100830975..100833458(+)Chr02:100828018..100838339(+)10,282AGGGGAACTATCATGAAGGGGATCTATCATGATTAATTAA
A6Tr9Chr02:55486066..55488417(+)Chr02:55483392..55492644(+)9213AGAGCAAGTAAAGTCGAGAGCAAGGCAAGCTTTTAATTAA

The position of full-length transposases, their left and right terminal sequences, and conservation of target sequences duplicated by the insertion of a transposon

Full-length X. tropicalis XKol sequences The position of full-length transposases, their left and right terminal sequences, and conservation of target sequences duplicated by the insertion of a transposon

Discussion

Multiple massive domestication of Kolobok transposases

In the present study, we surveyed the X. tropicalis and X. laevis genomes and found 38 and 24 prospective Kolobok transposase genes, respectively, whose CDSs encoded proteins of more than 600 amino acids in length. Subfamily A2 (Tr8/Tr11) and A6 (Tr7/Tr9) resembled previously reported X. tropicalis transposases of Kolobok-1_XT and Kolobok-2_XT, respectively [18], but others were novel genes. It is possible that some of these were not functional, because they were encoded by shorter CDSs that lacked certain lengths of conserved regions. For example, Tr36 lacked the C-terminal region, and Tr34, Tr35, Tr37, and Tr38 lacked the N-terminal regions conserved in the other XKol-Tpases. In contrast, most XKol-Tpases retained the DNA-binding motifs of Kolobok transposases, such as the C2CH domain [36] and the THAP domain [37]. Therefore, at the very least, they partially retained their molecular functions. Among the 62 XKol-Tpases, 24 (39%) were grouped into seven triplet or doublet clusters of X. tropicalis and X. laevis XKol-Tpases on the molecular phylogenetic tree (subfamilies D1–D7). Their genes were located on homologous chromosome loci. All the dN/dS ratios between the X. tropicalis and X. laevis CDSs within each subfamily were less than 1.0. These results indicated that they were derived from an ancestral transposase gene domesticated before the divergence of the Xenopus genus. Among them, only five (21%) of the XKol-Tpases (Tr10, Tr13, Lv4, Lv7, and Lv8 belonging to D1 or D5) retained all conserved Kolobok transposase motifs. The catalytic DDE motif was conserved only in these five proteins, whereas the other 19 lost this motif. In contrast, the C2CH and PWP DNA-binding motifs were conserved in all 24 proteins. The H2CH and AVPTIF domains, the latter of which was defined as conserved if each of its amino acids was substituted by similar ones, were highly conserved. Only four proteins in subfamily D6–2 or D7 (Tr4, Tr33, Lv9, and Lv13) lost the former domain, and two proteins in subfamily D7 (Tr4 and Lv9) lost the latter domain. These results suggest that most of the domesticated XKol-Tpases retained their DNA-binding activity, but lost the catalytic activity necessary for transposition. These “transposases” may suppress transposons by competitively binding the target sequences of active transposases. Fifteen other nonrepetitive XKol-Tpase genes were also found. Among them, only five proteins conserved the DDE motifs, whereas the H2CH-, C2CH-, PWF-, and AVPTIF motifs were conserved in 14, 11, 12, and 11 of the 15 proteins, respectively. Therefore, most of these XKol-Tpases may also have retained their DNA-binding activity but lost the ability to catalyze transposition. They may have been domesticated after the divergence of the two Xenopus species. Alternatively, they may constitute the remainder of ancestral domesticated transposase genes, and their orthologs may have been lost in other species. According to this study, 39 prospective XKol-Tpase genes (63%) were apparently domesticated. The best tblastn hits to XKol-Tpase in the GenBank database (excluding hits to the Xenopus sequence) were invertebrate sequences (data not shown). These included the cnidarian Acropora digitifera uncharacterized LOC107332277 mRNA (XM_015897006) and the Exaiptasia pallida uncharacterized LOC110237754 mRNA (XM_021043370). The lack of domesticated genes encoding Kolobok transposases orthologous to XKol-Tpases in other vertebrates suggests that domestication occurred after Xenopus diverged from other model animals, including mammals and birds. Molecular phylogeny suggested that independent domestication events occurred several times. If the XKol-Tpases located on nonhomologous loci had been domesticated independently, domestication may have occurred at least 16 times in Xenopus. This situation was the opposite of that reported for the Xenopus piggyBac superfamily TxpB [9]. The TxpB family includes only one domesticated subfamily, Kobuta, and it was domesticated only once in the Xenopus ancestor.

Currently active Kolobok transposases and the T2-MITE family

Among the 62 XKol-Tpase genes, 23 (37%) were repetitive, having interspersed multiple intact and/or truncated copies in the genome. Of these, 15 (65%) retained all conserved motifs of the Kolobok superfamily transposase. We regard them as currently active transposases. The eight other repetitive XKol-Tpases had lost at least one conserved motif. We could not determine whether they were currently active. Some “currently active” X. tropicalis XKol-Tpases were conserved between the Nigerian and Asashima lines. The CDSs of Tr6, Tr15, and Tr22 cloned from the Asashima line were nearly or exactly identical to those from the Nigerian line. Considering the distance between these two lines [35], this similarity may be explained by recent introgression or horizontal transfer of these genes from the original population of the Asashima line to that of the Nigerian line or vice versa. The CDSs of Tr7, Tr8, and Tr20 presented with 9, 26, and 53 nucleotide substitutions between the two lineages, respectively. However, their ORFs were conserved despite the large number of mutations. The “currently active” X. laevis XKol-Tpase Lv3 and the closely related X. tropicalis Tr2 had 472 nucleotide substitutions and 63 nucleotide gaps, but their ORFs were also conserved (Table 4). This phenomenon may be explained by purifying selection because the dN/dS ratio between them was 0.194. These results suggested that some XKol family transposons were currently active in the host and had also been domesticated by it. Domesticated XKol-Tpases belonging to the D1 and D5 subfamilies also conserved all Kolobok transposase motifs, including the catalytic DDE domain. Therefore, domestication and catalytic transposition activity may not be mutually exclusive in the XKol family. This apparent contradiction may be resolved by considering that the transposition catalysis of XKol-Tpases has been beneficial to the host. Kolobok transposases probably transpose the T2-MITE family [38], which is predominant in Xenopus [13, 14, 16]. Certain subfamilies (T2-A1 and T2-C) may be “long-lived”; that is, they have retained transposition activity in the X. tropicalis lineage since before the divergence of X. laevis and X. tropicalis [19, 21]. Our recent analysis of X. laevis revealed that some T2-MITE subfamilies have multiple identical copies, suggesting that they have recently been active (unpublished data). The long conservation activity of T2-MITE subfamilies in both species suggested that they may have contributed to host fitness through transposition. Recently, we reported that the T2-C subfamily tended to be located near the upstream regions of genes in X. tropicalis. The expression patterns of genes with upstream insertions were strongly correlated [22]. This distribution may indicate its function to be recruited by the host. For example, T2-C may include a cis-regulatory element. The insertion of a certain element into an upstream region may have been beneficial to the host; hence, it was conserved by natural selection. Recent studies have revealed that TE-derived sequences were recruited by hosts as cis-regulatory elements. TE transposition and amplification are important drivers of the evolution of the gene regulatory network [39]. The conservation of the Kolobok superfamily may be partially explained by pressure from the host to continue to activate the transposition of some T2-MITE subfamilies and to rewire their gene regulatory network.

Conclusions

Our results indicated that multiple massive XKol-Tpase domestication events occurred during Xenopus evolution and that XKol family transposons have complex, mutually beneficial relationships with their hosts. Long conservation of transposition activity and/or conserved catalytic and DNA-binding domains in certain XKol-Tpases suggest that they may benefit the host by catalyzing the transposition of long-lived T2-MITE subfamilies. Figure S1. Full-length multiple alignment of transposases predicted from prospective CDSs of XKol-Tpase genes with three outgroup transposases. Certain proteins predicted from the longest ORFs had excess N-terminal amino acids compared with other proteins. We trimmed the corresponding 5′ regions from these prospective CDSs to align their start methionine codons with those in the others. This alignment was used in molecular phylogeny (Fig. 1). (PDF 50 kb) Figure S2. Repetitive XKol-Tpase CDSs (highlighted in red) were compared with their truncated copies by dot plot analyses (word size = 12). Flanking upstream and downstream 1000 bp CDSs and truncated copy sequences are also included for comparison. (PDF 717 kb) Figure S3. XKol-Tpase CDSs and flanking upstream and downstream 2000 bp sequences of subfamily D1–D9 were compared by dot plot analyses (word size = 10). (PDF 319 kb) Figure S4. Multiple alignment of prospective transposase CDSs from the X. tropicalis and X. laevis genome databases and from CDSs cloned from X. tropicalis Nigerian and Asashima lineages. (PDF 167 kb) Table S1. Name for the Kolobok superfamily transposase query used in the tblastn search. Name for the source transposon in Repbase and protein length are shown. (XLSX 31 kb) Table S2. Primer sequences used in PCR to clone XKol-Tpase CDSs from the X. tropicalis Nigerian and Asashima lineages. (XLSX 33 kb) Table S3. Results of blastn searches to identify truncated copies of XKol-Tpase genes. Prospective XKol-Tpase CDS used as a query, locus of the CDS, hit locus of the blastn search (e-value <1e-100), and its length (bp) are shown. (XLSX 30 kb)
  34 in total

Review 1.  A unified classification system for eukaryotic transposable elements.

Authors:  Thomas Wicker; François Sabot; Aurélie Hua-Van; Jeffrey L Bennetzen; Pierre Capy; Boulos Chalhoub; Andrew Flavell; Philippe Leroy; Michele Morgante; Olivier Panaud; Etienne Paux; Phillip SanMiguel; Alan H Schulman
Journal:  Nat Rev Genet       Date:  2007-12       Impact factor: 53.242

Review 2.  Molecular domestication of transposable elements: from detrimental parasites to useful host genes.

Authors:  L Sinzelle; Z Izsvák; Z Ivics
Journal:  Cell Mol Life Sci       Date:  2009-03       Impact factor: 9.261

Review 3.  Eukaryotic transposable elements and genome evolution.

Authors:  D J Finnegan
Journal:  Trends Genet       Date:  1989-04       Impact factor: 11.639

Review 4.  Transposable elements domesticated and neofunctionalized by eukaryotic genomes.

Authors:  Ahmed M Alzohairy; Gábor Gyulai; Robert K Jansen; Ahmed Bahieldin
Journal:  Plasmid       Date:  2012-08-30       Impact factor: 3.466

Review 5.  Transposable Element Domestication As an Adaptation to Evolutionary Conflicts.

Authors:  Diwash Jangam; Cédric Feschotte; Esther Betrán
Journal:  Trends Genet       Date:  2017-08-24       Impact factor: 11.639

6.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

7.  Extensive amplification and transposition of a novel repetitive element, xstir, together with its terminal inverted repeat in the evolution of Xenopus.

Authors:  A Hikosaka; E Yokouchi; A Kawahara
Journal:  J Mol Evol       Date:  2000-12       Impact factor: 2.395

Review 8.  Regulatory activities of transposable elements: from conflicts to benefits.

Authors:  Edward B Chuong; Nels C Elde; Cédric Feschotte
Journal:  Nat Rev Genet       Date:  2016-11-21       Impact factor: 53.242

9.  A novel group of families of short interspersed repetitive elements (SINEs) in Xenopus: evidence of a specific target site for DNA-mediated transposition of inverted-repeat SINEs.

Authors:  K Unsal; G T Morgan
Journal:  J Mol Biol       Date:  1995-05-12       Impact factor: 5.469

10.  Evolution of the Xenopus piggyBac transposon family TxpB: domesticated and untamed strategies of transposon subfamilies.

Authors:  Akira Hikosaka; Toshihiro Kobayashi; Yumiko Saito; Akira Kawahara
Journal:  Mol Biol Evol       Date:  2007-10-13       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.