Literature DB >> 33871607

Unearthing LTR Retrotransposon gag Genes Co-opted in the Deep Evolution of Eukaryotes.

Jianhua Wang1, Guan-Zhu Han1.   

Abstract

LTR retrotransposons comprise a major component of the genomes of eukaryotes. On occasion, retrotransposon genes can be recruited by their hosts for diverse functions, a process formally referred to as co-option. However, a comprehensive picture of LTR retrotransposon gag gene co-option in eukaryotes is still lacking, with several documented cases exclusively involving Ty3/Gypsy retrotransposons in animals. Here, we use a phylogenomic approach to systemically unearth co-option of retrotransposon gag genes above the family level of taxonomy in 2,011 eukaryotes, namely co-option occurring during the deep evolution of eukaryotes. We identify a total of 14 independent gag gene co-option events across more than 740 eukaryote families, eight of which have not been reported previously. Among these retrotransposon gag gene co-option events, nine, four, and one involve gag genes of Ty3/Gypsy, Ty1/Copia, and Bel-Pao retrotransposons, respectively. Seven, four, and three co-option events occurred in animals, plants, and fungi, respectively. Interestingly, two co-option events took place in the early evolution of angiosperms. Both selective pressure and gene expression analyses further support that these co-opted gag genes might perform diverse cellular functions in their hosts, and several co-opted gag genes might be subject to positive selection. Taken together, our results provide a comprehensive picture of LTR retrotransposon gag gene co-option events that occurred during the deep evolution of eukaryotes and suggest paucity of LTR retrotransposon gag gene co-option during the deep evolution of eukaryotes.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  LTR retrotransposon; co-option; phylogenetics

Year:  2021        PMID: 33871607      PMCID: PMC8321522          DOI: 10.1093/molbev/msab101

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Transposable elements (TEs), generally thought to be genomic parasites, are major components of the eukaryote genomes (Brandt, et al. 2005; Wicker, et al. 2007; Etchegaray, et al. 2021); for instance, ∼45% of the human genome and ∼85% of the maize genome are comprised of various TEs (Schnable et al. 2009; Lander et al. 2012). Based on their transposition mechanisms, TEs are typically classified into two major classes, class I (retrotransposons) and class II (DNA transposons) (Wicker et al. 2007). Among retrotransposons, long terminal repeat (LTR) retrotransposons characterized by the presence of LTRs at 5′- and 3′-termini encode two common genes, gag and pol, required for retrotransposition (Llorens et al. 2009; Naville et al. 2016; Sanchez et al. 2017). LTR retrotransposons can be further divided into several superfamilies, including Ty3/Gypsy, Ty1/Copia, and Bel-Pao retrotransposons as well as retroviruses/endogenous retroviruses (ERVs) (Wicker et al. 2007). Most of LTR retrotransposons have been thought to be neutral or deleterious, and are removed by recombination between LTRs or inactivated and degraded by accumulating disruptive mutations (Stoye 2012; Jangam et al. 2017; Johnson 2019; Etchegaray et al. 2021). On occasion, coding or regulatory regions of LTR retrotransposons can be repurposed for diverse cellular functions in hosts, a process formally termed as co-option, domestication, or exaptation (Feschotte 2008; Kokošar and Kordiš 2013; Hoen and Bureau 2015; Chuong et al. 2016; Naville et al. 2016; Chuong et al. 2017; Jangam et al. 2017; Wang et al. 2019; Wang and Han 2020; Etchegaray et al. 2021). To date, more than 100 independent retroviral gag gene co-option events have been documented in literature (Campillos et al. 2006; Pastuzyn et al. 2018; Skirmuntt and Katzourakis 2019; Wang et al. 2019; Wang and Han 2020), as exemplified by Fv1 gene that serves as a restriction factor to inhibit the replication of diverse retroviruses (Best et al. 1996; Yap et al. 2014; Boso et al. 2018). In contrast, only six cases of co-opted LTR retrotransposon gag gene occurred above the taxonomic family level have been identified (Campillos et al. 2006; Pastuzyn et al. 2018), all of which involve Ty3/Gypsy retrotransposons. 1) Activity-regulated cytoskeleton-associated proteins (Arc) in tetrapods and 2) dArc proteins in schizophoran flies were derived from gag genes of distinct Ty3/Gypsy retrotransposon lineages, and mediate intercellular RNA transfer in the nervous system (Ashley et al. 2018; Pastuzyn et al. 2018). 3) Sushi-ichi retrotransposon homolog (SIRH/RTL) family arose from gag gene of a sushi LTR retrotransposon, and two members of this family, PEG10/RTL2 and PEG11/RTL1, are essential to the placenta development (Ono et al. 2006; Sekita et al. 2008). 4) Paraneoplastic Ma antigen (PNMA) gene family originated from Gypsy12_DR gag gene and might perform diverse functions; for example, PNMA5 and PNMA10 are associated with brain development (Takaji et al. 2009; Cho et al. 2011; Pang et al. 2018). 5) SCAN (SRE-ZBP, CTfin-51, AW-1, Number 18 cDNA) gene family was derived from C-terminal of Grm1-like LTR retrotransposon capsid (CA). Members of SCAN domains are usually associated with (C2H2)-type zinc fingers and Kruppel-associated box domains to form transcription factors, and function in many aspects of cell differentiation and development, for instance, regulating the transcription of growth factors (Collins et al. 2001; Sander et al. 2003; Edelstein and Collins 2005; Emerson and Thomas 2011). There were also sporadic documented LTR retrotransposon gag gene co-option events that occurred within the taxonomic family level; for example, Gagr is a Ty3/Gypsy retrotransposon gag gene co-opted within Drosophila (Nefedova et al. 2014; Makhnovskii et al. 2020). Besides Ty3/Gypsy retrotransposon gag genes, various coding and regulatory regions of TEs can be repurposed for cellular functions (Chuong et al. 2017; Jangam et al. 2017; Etchegaray et al. 2021). RAG1 and RAG2 proteins, which are essential for the rearrangement of antigen receptors in vertebrates, originated through co-opting a DNA transposon known as ProtoRAG (Huang et al. 2016; Morales Poole et al. 2017). In angiosperms, several cases of co-option of class II TE transposase have been documented: FAR1-related sequence (FRS) and MUSTANG (MUG) gene families were derived from transposases of Mutator-like elements (MULEs), and SLEEPER gene family arose from hAT transposase (Oliver et al. 2013; Hoen and Bureau 2015; Joly-Lopez et al. 2016). FRS genes (e.g., FHY3 and FAR1) perform diverse functions in plants, including acting as a light signal transducer, regulating the flowering time, and being involved in the division of chloroplasts (Lin et al. 2007; Wang and Wang 2015; Ma and Li 2018). MUG gene family plays crucial roles in plant growth, flowering time, and floral organ development (Joly-Lopez et al. 2012). SLEEPER genes regulate global gene expression and are crucial for the growth of plants (Bundock and Hooykaas 2005; Knip et al. 2012). In fission yeast, CENP-Bs (Cbh1, Cbh2, and Abp1) that were derived from transposases of pogo DNA transposons contribute to the silence of Tf retrotransposons (Cam et al. 2008). Moreover, cis-regulatory sequences of TEs can also be co-opted, shaping the evolution of host gene regulatory networks (Chuong et al. 2017). To date, a comprehensive picture of LTR retrotransposon gag gene co-option in eukaryotes is still lacking. Little is known about the extent and diversity of LTR retrotransposon gag gene co-option in eukaryotes. In this study, we performed a comprehensive phylogenomic analysis to unearth LTR retrotransposon gag gene co-option events (RtGCEs) above the taxonomic family level across eukaryotes. We identified a total of 14 RtGCEs, seven, four, and three of which occurred in animals, plants, and fungi, respectively. We also analyzed the evolutionary history, expression pattern, and selective pressure for each co-opted LTR retrotransposon gag (Crtg) gene. Our study provides a snapshot of LTR retrotransposon gag gene co-option that occurred during the deep evolutionary history of eukaryotes.

Results

Mining Deep LTR Retrotransposon gag Gene Co-option in Eukaryotes

We used a similarity search and phylogenetic analysis combined approach to systematically identify Crtg genes above the taxonomic family level in eukaryotes (see Methods and Materials for the details; Wang and Han 2020). Our analyses included a total of 2,011 annotated proteomes of eukaryotes, which cover at least 743 families in eukaryotes (supplementary table S1, Supplementary Material online). The Crtg genes identified in this study fulfill two criteria: 1) The Crtg genes share similar synteny among the genomes of species across at least two families, and/or the Crtg phylogeny largely agree with the host phylogeny; and 2) the Crtg genes are subject to certain level of natural selection, implying potential cellular functionality (Graur et al. 2013; Jangam et al. 2017; Wang and Han 2020). Following these two criteria, we identified a total of 14 Crtg genes, referred to as Crtg1 to Crtg14 (figs. 1–5), seven (Crtg1 to Crtg7), four (Crtg8 to Crtg11), and three (Crtg12 to Crtg14) of which were identified in the genomes of animals, plants, and fungi, respectively. Although six of these Crtg genes have been documented in animals, namely Arc (Crtg2), dArc (Crtg3), RTL (Crtg4 and Crtg5), PNMA (Crtg6), and SCAN (Crtg7), the remaining eight Crtg genes were first reported in this study. It should be noted that the co-option events identified here represent the ones that occurred above the family level of taxonomy and, in general, during the deep evolution of eukaryotes.
Fig. 1.

The phylogenetic relationship between LTR retrotransposons that are closely related to Crtg proteins and representative LTR retrotransposons. This phylogenetic tree was reconstructed based on RT protein sequences. The LTR retrotransposons that are closely related to Crtg proteins in animals, plants and fungi were highlighted in purple, green, and blue, respectively.

The phylogenetic relationship between LTR retrotransposons that are closely related to Crtg proteins and representative LTR retrotransposons. This phylogenetic tree was reconstructed based on RT protein sequences. The LTR retrotransposons that are closely related to Crtg proteins in animals, plants and fungi were highlighted in purple, green, and blue, respectively. The evolutionary history of Crtg genes in animals. The host phylogenetic relationship is based on TimeTree and literature (Wiegmann et al. 2011; Kumar et al. 2017; Zhang et al. 2018; Martin et al. 2019), and gene syntenies flanking each Crtg gene are shown near the corresponding species. The evolutionary history of Crtg genes in plants. The host phylogenetic relationship is based on TimeTree and literature (Kumar et al. 2017; Janssens et al. 2020), and gene syntenies flanking each Crtg gene are shown near the corresponding species. The evolutionary history of Crtg genes in fungi. The host phylogenetic relationship is based on TimeTree and literature (Hirayama et al. 2010; Raja et al. 2011; Li et al. preprint; Tian et al. 2015), and gene syntenies flanking each Crtg gene are shown near the corresponding species. The distribution of Crtg genes across eukaryotes. Phylogenetic tree of eukaryotes is based on literatures (Eichinger et al. 2005; Steenkamp et al. 2006; Burki et al. 2020). To decipher the source of Crtg genes, we identified the LTR retrotransposons that were closely related to these Crtg genes. Phylogenetic analyses of reverse transcriptase (RT) suggest that nine (Crtg1 to Crtg7 in animals, Crtg8 and Crtg11 in plants), four (Crtg9 and Crtg10 in plants, Crtg12 and Crtg13 in fungi), and one (Crtg14 in fungi) Crtg-related retrotransposons belong to Ty3/Gypsy, Ty1/Copia, and Bel-Pao retrotransposons, respectively (fig. 1). Whereas all the documented cases of retrotransposon gag gene co-option were derived from Ty3/Gypsy retrotransposons, our findings indicate that gag genes of all the three major LTR retrotransposon superfamilies (Ty3/Gypsy, Ty1/Copia, and Bel-Pao) could be co-opted during the evolutionary course of eukaryotes.

LTR Retrotransposon gag Gene Co-option in Animals

In animals, we identified a total of seven Crtg genes, including six previously known cases, namely Arc (Crtg2), dArc (Crtg3), RTL (Crtg4 and Crtg5), PNMA (Crtg6), and SCAN (Crtg7). We further investigated or revisited the evolutionary history of these Crtg genes. We identified a novel Crtg gene, namely Crtg1, in invertebrates (fig. 2). Synteny analysis suggests that Crtg1 arose before the last common ancestor of Scarabaeidae and Lampyridae within Coleoptera (∼286 Ma) (fig. 2). Phylogenetic and similarity analyses of both RT and Gag proteins show that Crtg1 gene arose through repurposing the gag gene of a Mdg3-like retrotransposon within Ty3/Gypsy retrotransposons (fig. 1 and supplementary fig. S1, Supplementary Material online). Consistent with previous studies, Arc and dArc originated independently: the co-option of Arc and dArc occurred before the last common ancestor of tetrapods (~352 Ma) and before the last common ancestor of Tephritidae and Drosophilidae within Muscomorpha (~126 Ma), respectively (Pastuzyn et al. 2018) (figs. 1and 2, and supplementary fig. S1, S1C, Supplementary Material online). The known RTL genes appear to originate from sushi-like retrotransposons through two independent co-option events, with one occurring in the last common ancestor of mammals (∼159 Ma) and the other occurring in the last common ancestor of placental mammals (∼105 Ma) (figs. 1and 2, and supplementary fig. S1, Supplementary Material online). PNMA genes arose through co-opting a Ty3/Gypsy retrotransposon gag gene before the last common ancestor of mammals (∼159 Ma) (figs. 1and 2, and supplementary fig. S1, Supplementary Material online). SCAN genes were derived from the gag gene of a Gmr1-like retrotransposon (Ty3/Gypsy) before the last common ancestor of tetrapods (∼352 Ma) (Goodwin and Poulter 2002; Emerson and Thomas 2011) (figs. 1and 2, and supplementary fig. S2, Supplementary Material online). For Ctrg1, Arc (Crtg2), and RTL-1 (Crtg5), most of them remain single copy in a species. Interestingly, dArc (Crtg3), RTL (Crtg4), PNMA (Crtg6), and SCAN (Crtg7) underwent extensive and complex gene duplication; for example, although SCAN gene remains single-copied in birds, SCAN genes underwent extensive duplication in reptiles and mammals independently (supplementary fig. S2, Supplementary Material online).
Fig. 2.

The evolutionary history of Crtg genes in animals. The host phylogenetic relationship is based on TimeTree and literature (Wiegmann et al. 2011; Kumar et al. 2017; Zhang et al. 2018; Martin et al. 2019), and gene syntenies flanking each Crtg gene are shown near the corresponding species.

LTR Retrotransposon gag Gene Co-option in Plants

No retrotransposon gag gene co-option has been documented in plants before. In this study, we identified four novel retrotransposon gag gene co-option events in plants, generating Crtg8 to Crtg11 genes. Although Crtg8 and Crtg11 genes were derived from Ty3/Gypsy retrotransposon gag genes, the origin of Crtg9 and Crtg10 genes involved Ty1/Copia retrotransposon gag genes. Interestingly, two of them, Crtg8 and Crtg9, originated during the early evolution of angiosperms. Crtg8 genes are closely related to Athila-like retrotransposon gag genes, and the co-option occurred after the divergence of Amborella trichopoda from angiosperm (∼175 Ma) (figs. 1and 3, and supplementary fig. S3, Supplementary Material online). Crtg9 genes appear to arise through co-opting a Tork-like retrotransposon gag gene, which occurred after the divergence of Nymphaea thermarum from angiosperms (∼160 Ma) (figs. 1and 3, and supplementary fig. S3, Supplementary Material online). The remaining co-option events took place during the evolutionary course of eudicots. Crtg10 genes arose through co-opting a Tork-like retrotransposon gag gene, which occurred after the divergence of Nelumbo nucifera from eudicots (∼117 Ma) (figs. 1and 3, and supplementary fig. S3, Supplementary Material online). Crtg11 genes are closely related to Del-like retrotransposon gag genes, and the co-option occurred before the common ancestor of Cannabaceae and Moraceae within eudicots (∼86 Ma) (figs. 1and 3, and supplementary fig. S3, Supplementary Material online). These Crtg genes underwent sporadic gene duplication, and notably Crtg9 genes were tandemly duplicated in many species (fig. 3).
Fig. 3.

The evolutionary history of Crtg genes in plants. The host phylogenetic relationship is based on TimeTree and literature (Kumar et al. 2017; Janssens et al. 2020), and gene syntenies flanking each Crtg gene are shown near the corresponding species.

LTR Retrotransposon gag Gene Co-option in Fungi

No retrotransposon gag gene co-option has been documented in fungi before. In this study, we identified three retrotransposon gag gene co-option events in fungi, generating Crtg12 to Crtg14. Although Crtg12 and Crtg13 appear to be derived from Ty1/Copia retrotransposon gag genes, Crtg14 originated through repurposing a Bel-Pao retrotransposon gag gene (supplementary fig. S4, Supplementary Material online). The co-option of Crtg12, Crtg13, and Crtg14 genes occurred before the last common ancestor of Lentitheciaceae and Lindgomycetaceae, before the last common ancestor of Pleosporales and Mytilinidiales (∼242 Ma), and before the last common ancestor of Tremellales and Trichosporonales, respectively (fig. 4). All the Crtg genes in fungi are single copy (fig. 4).
Fig. 4.

The evolutionary history of Crtg genes in fungi. The host phylogenetic relationship is based on TimeTree and literature (Hirayama et al. 2010; Raja et al. 2011; Li et al. preprint; Tian et al. 2015), and gene syntenies flanking each Crtg gene are shown near the corresponding species.

Expression Pattern and Gene Structure of Crtg Genes

We used transcriptome sequencing (RNA-seq) raw data to explore whether the eight Crtg genes first identified in this study (Crtg1, Crtg8 to Crtg14) are expressed. Strong evidence for expression was found for almost all these eight Crtg genes (the largest transcript per million [TPM] value for each gene ranges from 1.61 to 72.23) (supplementary table S7, Supplementary Material online), except Crtg12 with a TPM value of 0.41, which may be due to the poor quality of RNA-seq data (∼60% reads are too short to map). Crtg1 gene was found to be expressed during the development from larva to adult stages in Agrilus planipennis (supplementary table S7, Supplementary Material online). In plants, Crtg8, Crtg9, Crtg10, and Crtg11 were found to be expressed in a wide range of tissues, including leaf, root, flower, and seed (supplementary table S7, Supplementary Material online). Because RNA-seq data are only available for a limited number of tissues and gene expression is of temporal and spatial specificity, our results do not necessarily indicate that the co-opted genes are not expressed in other tissues. Interestingly, we found two Crtg genes, Crtg10 and Crtg14, were fused with host genes (supplementary table S3, Supplementary Material online). Crtg10 was fused with KELP gene, and the product of KELP gene is a transcriptional co-activator and binds movement protein (MP) of tomato mosaic virus to inhibit its cell-to-cell movement (Sasaki et al. 2009). Crtg14 was fused with PEX14 gene, which produces a peroxisomal membrane protein, Pex14p, involved in peroxisomal targeting signal-dependent protein import pathway (Albertini et al. 1997). Moreover, we found a putative intron with typical splicing sites GT-AG and the branch point (5′-YURAY-3′) (Thanaraj and Clark 2001) in the gag-derived region of Crtg14 genes. Taken together, all the results of gene expression and gene structure analyses further support that the eight Crtg genes are co-opted gag genes.

Natural Selection Acting on Crtg Genes

Like host cellular genes, the co-opted retrotransposon genes should be subject to certain level of natural selection, implying potential cellular functionality (Graur et al. 2013; Jangam et al. 2017; Wang and Han 2020). To explore whether Crtg genes are subject to natural selection, we first calculated the dN/dS ratio for the eight Crtg genes (Crtg1, Crtg8 to Crtg14), where dN represents the number of nonsynonymous substitutions per nonsynonymous site and dS represents the number of synonymous substitutions per synonymous site. The dN/dS ratio has often been used to detect signal of natural selection acting on genes (Daugherty and Malik 2012; Duggal and Emerman 2012; Sironi et al. 2015; Han 2019; Wang and Han 2020). We found that all the dN/dS ratios are less than one for all the eight genes (all but Crtg12 are <0.44), indicating that they evolved under certain functional constraints (table 1).
Table 1.

Selection Analyses of Crtg Genes.

GeneNo. of Sequences(used in selectionanalyses/total)No. of SitesdN/dSaM1a Versus M2a
M8a Versus M8
BUSTED
FUBAR
2Δlnlb P-valuec P 0 d P 1 d P 2 d 2Δlnlb P-valuecCodons withdN/dS > 1e P-valueNo. of Sitesunder PositiveSelectionNo. of Sitesunder NegativeSelection
Crtg1 7/75960.039010.6990.30101.0140.314 P-value = 0.005 ≤ .050354
Crtg8 82/1081900.209010.8080.19200.8450.358 P-value = 0.000 ≤ .050156
Crtg9 27/472580.343010.6060.39400.9440.331 P-value = 0.000 ≤ .05192
Crtg10 120/1522850.35269.1129.990 × 10-160.5730.3860.04128.113027Q*; 233F*;235S**;236A**;237N**; 241T* P-value = ≤.051161
Crtg11 4/42590.44010.640.3600.8610.353 P-value = 0.423 ≥ .05113
Crtg12 2/22170.9960.0010.9990.0050.99500.0010.972
Crtg13 62/632930.115010.8440.156001 P-value = 0.000 ≤ .050285
Crtg14 14/222360.113010.710.2900.0320.859 P-value = 0.137 ≥ .050145

The dN/dS values of the Ctrg genes were estimated using the one-ratio model (M0) in PAML.

2Δl represents twice of the difference in the natural logs of the likelihoods of the pairs of models (M1a vs. M2a and M8a vs. M8) being compared.

The P-value indicates the confidence with which the neutral models (M1a and M8a) can be rejected in favor of the positive selection models (M2a and M8), respectively.

Proportion of sites with omega < 1 (P0), omega = 1 (P1), and omega > 1 (P2).

Codons under positive selection with a posterior probability > 95% and 99% by Bayes empirical Bayes (BEB) analysis are labeled with one and two asterisks, respectively.

Selection Analyses of Crtg Genes. The dN/dS values of the Ctrg genes were estimated using the one-ratio model (M0) in PAML. 2Δl represents twice of the difference in the natural logs of the likelihoods of the pairs of models (M1a vs. M2a and M8a vs. M8) being compared. The P-value indicates the confidence with which the neutral models (M1a and M8a) can be rejected in favor of the positive selection models (M2a and M8), respectively. Proportion of sites with omega < 1 (P0), omega = 1 (P1), and omega > 1 (P2). Codons under positive selection with a posterior probability > 95% and 99% by Bayes empirical Bayes (BEB) analysis are labeled with one and two asterisks, respectively. Due to the conservative nature of dN/dS in detecting natural selection (Yang and Bielawski 2000), we further used two pairs of site models (M1a vs. M2a and M8a vs. M8) to detect sites subject to natural selection in Crtg genes. The results of M1a versus M2a show that all these eight Crtg genes possess a proportion of sites under purifying selection (table 1). Interestingly, evidence for positive selection was found in Crtg10 gene (table 1). We also used the branch-site unrestricted statistical test for episodic diversification (BUSTED) method, which detects gene-wide diversifying selection (Murrell et al. 2015), and the fast, unconstrained Bayesian approximation for inferring selection (FUBAR) method, which identifies sites under purifying or positive selection (Murrell et al. 2013) to analyze natural selection in seven Crtg genes (Crtg12 was excluded due to its limited number of sequences). The BUSTED method inferred at least one site or branch under positive selection in five Crtg genes (except Crtg11 and Crtg14) (table 1). The FUBAR method identified many sites subject to purifying selection in all the seven genes and detected several positively selected sites in three plant Crtg genes, namely Crtg9, Crtg10, and Crtg11 genes (table 1). Moreover, we divided each Crtg gene into several subsets at the taxonomic family level and tested the signals of natural selection for each subset. We detected several sites under positive selection in the Cryptococcaceae subset of Crtg14 gene and found that the dN/dS ratio of the Fagaceae subset of Crtg10 gene is greater than one (supplementary table S5, Supplementary Material online). Nevertheless, for each Crtg genes, the results of the subset analyses are similar to these of the whole data set analysis (supplementary table S5, Supplementary Material online). Overall, these results suggest that all the eight Crtg genes evolve under certain level of natural selection, indicating that they are likely to perform diverse cellular functions.

Discussion

Both coding and regulatory regions of TEs can be repurposed for diverse cellular functions (Feschotte 2008; Kokošar and Kordiš 2013; Hoen and Bureau 2015; Chuong et al. 2016; Naville et al. 2016; Chuong et al. 2017; Jangam et al. 2017; Wang et al. 2019; Wang and Han 2020; Etchegaray et al. 2021). LTR retrotransposons are highly abundant in the genomes of eukaryotes. However, a comprehensive picture of LTR retrotransposon gag gene co-option is still lacking, with only six cases documented in animals (Campillos et al. 2006; Pastuzyn et al. 2018). In this study, we used a phylogenomic approach to systematically mine co-opted LTR retrotransposon gag genes at the taxonomic family level across eukaryotes. We unearthed a total of 14 Ctrg genes, seven, four, and three of which were identified in animals, plants, and fungi, respectively. The LTR retrotransposon gag gene co-option events represent the ones occurred during the deep evolution of eukaryotes, because we only identified the co-option events occurring before the last common ancestor of at least two eukaryote families usually at the timescale of tens of millions of years. Among these cases of LTR retrotransposon gag gene co-option, eight have not been reported previously. Across more than 740 eukaryote families, we only identified 14 LTR retrotransposon gag gene co-option events occurred above the taxonomic family level (fig. 5), indicating paucity of LTR retrotransposon gag gene co-option during the deep evolution of eukaryotes. Two scenarios could explain this pattern: 1) Co-option of LTR retrotransposon genes does occur at an extremely low frequency; and 2) co-option of LTR retrotransposon genes occurs frequently, but co-opted genes are frequently lost. In our previous study of retrovirus gene co-option, we also observed a similar pattern: retrovirus gene co-option is relatively rare in the deep branches of vertebrates, which is likely due to frequent co-option and frequent loss (Wang and Han 2020). We think the paucity of LTR retrotransposon gag gene co-option during the deep evolution of eukaryotes could be explained by frequent co-option and frequent loss, and mining co-option events within the level of taxonomic family would help confirm this hypothesis. Our analyses came with several caveats: 1) We only mined annotated proteomes of eukaryotes, and many retrotransposon-related genes might not be well annotated. Thus, the number of co-opted LTR retrotransposon gag genes are underestimated in this study. 2) We only sampled ∼740 eukaryote families. It is possible that many co-option events occurred recently, and these relatively recent co-option events might not be unearthed in this study. 3) Our data set is biased to animals, fungi, and plants, as most genome sequencing has been performed in these groups, which might result in underestimation of LTR co-option in protists. However, our study well covers the deep diversity of eukaryotes, and covers the major diversity of animals, plants, and fungi. If a co-option event occurred in deep past and the co-opted gene pass on to its descendants, and if the co-opted gene has been annotated in some of the descendants, our analysis could capture this event (Wang and Han 2020). It follows that we might not miss many co-option events occurred in deep past (especially within animals, plants, and fungi), such as the emergence of tetrapods or the emergence of angiosperms. Together, our results reveal paucity of LTR retrotransposon gag gene co-option during the deep evolution of eukaryotes and suggest that co-opted LTR retrotransposon gag genes might have not been maintained for extremely long periods of time.
Fig. 5.

The distribution of Crtg genes across eukaryotes. Phylogenetic tree of eukaryotes is based on literatures (Eichinger et al. 2005; Steenkamp et al. 2006; Burki et al. 2020).

Co-opted LTR retrotransposon gag genes and co-opted retrovirus genes have been known to perform diverse cellular functions in animals, ranging from mediating intercellular RNA transfer in the nervous system, to regulating developmental processes, to inhibiting viral infections (Ashley et al. 2018; Pastuzyn et al. 2018; Wang and Han 2020). In general, genes that regulate crucial physiological processes might be mainly subject to purifying selection, whereas genes that are involved in certain genetic conflicts mainly evolve under strong positive selection. For eight Crtg genes first identified in this study, we found these genes appear to evolve mainly under purifying selection. These genes are expressed in a wide range of tissues or developmental stages. Evidence of positive selection was detected for some Ctrg genes, especially Ctrg10. Crtg10 gene was found to be fused with KELP gene, and KELP gene functions in the inhibition of tomato mosaic virus infection (table 1 and supplementary table S3, Supplementary Material online). It is possible that the fusion between a co-opted gag gene and KELP might participate in the evolutionary arms race between hosts and viruses. Overall, our results indicate Crtg genes might perform diverse cellular functions. Further experiments are still needed to explore the function of these Crtg genes. Our study provides insights into the co-option of LTR retrotransposon gag genes. First, all the six co-opted LTR retrotransposon gag genes previously documented involve Ty3/Gypsy retrotransposons (Campillos et al. 2006; Pastuzyn et al. 2018). In this study, we identified four and one Ctrg genes which were derived from gag genes of Ty1/Copia and Bel-Pao retrotransposons, respectively. Our study indicates that gag genes from diverse LTR retrotransposons can be repurposed for cellular functions (fig. 1). Second, all the previously reported co-option of LTR retrotransposon gag genes occurred in animals. In this study, we identified four and three LTR retrotransposon gag gene co-option events occurred in plants and fungi, respectively (fig. 5). Therefore, our study has expanded the range of LTR retrotransposon gag gene co-option. It follows that LTR retrotransposon gag gene co-option occurred more widely than previously appreciated. Our study provides a snapshot of LTR retrotransposon gag gene co-option in eukaryotes.

Materials and Methods

Identification of Co-opted LTR Retrotransposon gag Genes

We employed a similarity search and phylogenomic analysis combined approach (Wang and Han 2020) to identify co-opted LTR retrotransposon gag genes above the taxonomic family level across 2,011 eukaryotes. In brief, we first mined the homologs of LTR retrotransposon Gag proteins in 2,011 eukaryote genomes using the hmmsearch program in HMMER 3.3.1 with seven family in GAG-polyprotein clan (CL0523), Arc_C family, PNMA family, and SCAN family as queries and an e cut-off value of 0.1 (Eddy 2011) (supplementary tables S1 and S2, Supplementary Material online). The DUF4219 family in GAG-polyprotein clan was excluded, because its seed alignment is too short. Next, phylogenetic analyses of significant hits and representative Gag proteins of retroviruses and retrotransposons were performed (Llorens et al. 2008). Gag protein hits whose phylogenetic relationship largely agrees with their host above taxonomic family level were retrieved as co-opted Gag protein candidates. Finally, we verified the domain configuration for each co-opted Gag candidate using SMART and Conserved Domain (CD) search with default parameters (Lu et al. 2020; Letunic et al. 2021), and the candidates that encode these query domains were retrieved for further analyses. Protein sequences were aligned using MAFFT 7 (Katoh et al. 2002). Phylogenetic trees were reconstructed using an approximate maximum likelihood method implemented in FastTree 2.1.11 (Price et al. 2010). We used two criteria to define Crtg genes: 1) The Crtg genes share similar synteny among the genomes of species across at least two families, and/or the Crtg phylogeny largely agree with the host phylogeny; and 2) the Crtg genes are subject to certain level of selection. The syntenies flanking Crtg genes were based on genome annotation and/or domain annotation by CD search. The divergence time of hosts provides a minimum time estimate for the occurrence of co-option events. Host divergence time was based on TimeTree (Kumar et al. 2017).

Analysis of the Evolutionary History of Co-opted gag Genes

To explore the evolutionary history for each Crtg gene, we used the TBlastN algorithm to search 2,011 eukaryote genomes with a representative protein sequence for each Crtg gene as the query and an e cut-off value of 10−5. Phylogenetic analysis of significant hits and representative Gag proteins was performed to confirm the distribution of Crtg genes, and identify LTR retrotransposons that are closely related to Crtg genes. The significant hits were bidirectionally extended to identify classic structure of LTR retrotransposons (supplementary table S4, Supplementary Material online). LTR_Finder and LTRharvest were used to identify LTRs, and CD search was used to annotate protein domains (Xu and Wang 2007; Ellinghaus et al. 2008; Lu et al. 2020). We failed to identify LTR retrotransposons that are closely related to Crtg13 genes. But phylogenetic analysis of Gag proteins shows that it is closely related to Ty1/Copia retrotransposons (supplementary fig. S4, Supplementary Material online).

Phylogenetic Analyses

For each Crtg gene, we performed phylogenetic analysis of Crtg proteins, Gag proteins of LTR retrotransposons closely related to Crtg, and representative LTR retrotransposons. We only used Crtg7 protein sequences with the length of >84 amino acids for phylogenetic analyses (Edelstein and Collins 2005). To explore the phylogenetic relationship between LTR retrotransposons closely related to Crtg proteins and representative LTR retrotransposons, we performed phylogenetic analysis of RT proteins of LTR retrotransposons closely related to Crtg proteins and representative LTR retrotransposons (supplementary tables S4 and S6, Supplementary Material online). Protein sequences were aligned using MAFFT 7 with the strategy of L-INS-I, and then manually refined (Katoh et al. 2002). Phylogenetic analyses were performed using a maximum likelihood method implemented in IQTREE 2.0 (Nguyen et al. 2015). ModelFinder in IQ-TREE 2.0 was used to select the best-fit models (Kalyaanamoorthy et al. 2017). The branch support values were assessed using the ultrafast bootstrap method with 1,000 replicates (Hoang et al. 2018).

Expression Pattern Analyses

The Illumina pair-end RNA-seq raw data for three fungi (Lindgomyces ingoldianus, Alternaria alternata SRC1lrK2f, and Kockovaella imperatae NRRL Y-17943), four developmental stages (larva, prepupae, pupae, and adult) of one animal (A. planipennis), and seven tissues (leaf, root, bud, ovule, flower, petiole, and seed) of four plants (N. thermarum, Arabidopsis thaliana, N. nucifera, and Morus notabilis) were retrieved from NCBI to analyze the expression pattern of Crtg12 to Crtg14, Crtg1, and Crtg8 to Crtg11, respectively (supplementary table S7, Supplementary Material online). First, we employed Trimmomatic v0.39 to trim and filter the RNA-seq raw data (Bolger et al. 2014). Next, we used STAR v2.5.4b to map reads to reference genomes (Dobin et al. 2013). To obtain the uniquely mapped reads for each gene, we used the --quantMode GeneCounts option. Read alignment files generated by STAR v2.5.4b were sorted using Samtools v1.11 (Li et al. 2009). Finally, StringTie v2.1.5 was used to assemble transcripts and estimate the gene abundance through calculating TPM values (Kovaka et al. 2019). To further confirm that gag-derived regions of Crtg10 and Crtg14 are expressed, TBlastN algorithm was employed to BLAST against the corresponding RNA-seq data using the gag-derived regions as queries with an e cut-off value of 10−5. 252 and 1712 raw reads mapped to the gag-derived regions of Crtg10 and Crtg14 with the identity of 100% and the query covery of 100%, respectively.

Selection Pressure Analyses

For each Crtg gene, we used sequences without any premature stop codon and frameshift mutation to perform selection analyses. The one ratio model (M0) in PAML 4.9j was used to estimate the overall dN/dS ratio (Yang 2007). Two pairs of site models, M1a versus M2a and M8a versus M8, in PAML 4.9j were used to detect sites under purifying selection and positive selection. For data sets with more than two nonidentical sequences, the BUSTED method and the FUBAR method implemented in HyPhy package were employed to identify gene-wide selection signal and sites under natural selection, respectively (Murrell et al. 2013, 2015). All the nucleotide sequences were aligned based on codons using MUSCLE, and the ambiguous regions were removed manually (Kumar et al. 2016). Phylogenetic trees used in selection analyses were reconstructed using IQ-TREE 2.0 (Nguyen et al. 2015).

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  85 in total

1.  Computational characterization of multiple Gag-like human proteins.

Authors:  Mónica Campillos; Tobias Doerks; Parantu K Shah; Peer Bork
Journal:  Trends Genet       Date:  2006-09-18       Impact factor: 11.639

2.  Gypsy and the birth of the SCAN domain.

Authors:  Ryan O Emerson; James H Thomas
Journal:  J Virol       Date:  2011-08-24       Impact factor: 5.103

3.  CDD/SPARCLE: the conserved domain database in 2020.

Authors:  Shennan Lu; Jiyao Wang; Farideh Chitsaz; Myra K Derbyshire; Renata C Geer; Noreen R Gonzales; Marc Gwadz; David I Hurwitz; Gabriele H Marchler; James S Song; Narmada Thanki; Roxanne A Yamashita; Mingzhang Yang; Dachuan Zhang; Chanjuan Zheng; Christopher J Lanczycki; Aron Marchler-Bauer
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

4.  Convergent Co-option of the Retroviral gag Gene during the Early Evolution of Mammals.

Authors:  Jianhua Wang; Zhen Gong; Guan-Zhu Han
Journal:  J Virol       Date:  2019-06-28       Impact factor: 5.103

5.  Network dynamics of eukaryotic LTR retroelements beyond phylogenetic trees.

Authors:  Carlos Llorens; Alfonso Muñoz-Pomer; Lucia Bernad; Hector Botella; Andrés Moya
Journal:  Biol Direct       Date:  2009-11-02       Impact factor: 4.540

6.  The genome of the social amoeba Dictyostelium discoideum.

Authors:  L Eichinger; J A Pachebat; G Glöckner; M-A Rajandream; R Sucgang; M Berriman; J Song; R Olsen; K Szafranski; Q Xu; B Tunggal; S Kummerfeld; M Madera; B A Konfortov; F Rivero; A T Bankier; R Lehmann; N Hamlin; R Davies; P Gaudet; P Fey; K Pilcher; G Chen; D Saunders; E Sodergren; P Davis; A Kerhornou; X Nie; N Hall; C Anjard; L Hemphill; N Bason; P Farbrother; B Desany; E Just; T Morio; R Rost; C Churcher; J Cooper; S Haydock; N van Driessche; A Cronin; I Goodhead; D Muzny; T Mourier; A Pain; M Lu; D Harper; R Lindsay; H Hauser; K James; M Quiles; M Madan Babu; T Saito; C Buchrieser; A Wardroper; M Felder; M Thangavelu; D Johnson; A Knights; H Loulseged; K Mungall; K Oliver; C Price; M A Quail; H Urushihara; J Hernandez; E Rabbinowitsch; D Steffen; M Sanders; J Ma; Y Kohara; S Sharp; M Simmonds; S Spiegler; A Tivey; S Sugano; B White; D Walker; J Woodward; T Winckler; Y Tanaka; G Shaulsky; M Schleicher; G Weinstock; A Rosenthal; E C Cox; R L Chisholm; R Gibbs; W F Loomis; M Platzer; R R Kay; J Williams; P H Dear; A A Noegel; B Barrell; A Kuspa
Journal:  Nature       Date:  2005-05-05       Impact factor: 49.962

7.  On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE.

Authors:  Dan Graur; Yichen Zheng; Nicholas Price; Ricardo B R Azevedo; Rebecca A Zufall; Eran Elhaik
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

8.  Statistical methods for detecting molecular adaptation.

Authors: 
Journal:  Trends Ecol Evol       Date:  2000-12-01       Impact factor: 17.712

9.  Genesis and regulatory wiring of retroelement-derived domesticated genes: a phylogenomic perspective.

Authors:  Janez Kokošar; Dušan Kordiš
Journal:  Mol Biol Evol       Date:  2013-01-24       Impact factor: 16.240

10.  SMART: recent updates, new developments and status in 2020.

Authors:  Ivica Letunic; Supriya Khedkar; Peer Bork
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

View more
  4 in total

1.  Segregating Complete Tf2 Elements Are Largely Neutral in Fission Yeast.

Authors:  Yan Wang; Qin Wang; Zhiwei Wu; Guan-Zhu Han
Journal:  Genome Biol Evol       Date:  2021-11-05       Impact factor: 3.416

Review 2.  The diversity and evolution of retroviruses: Perspectives from viral "fossils".

Authors:  Jialu Zheng; Yutong Wei; Guan-Zhu Han
Journal:  Virol Sin       Date:  2022-01-19       Impact factor: 4.327

3.  A Genome-Wide Search for Candidate Genes of Meat Production in Jalgin Merino Considering Known Productivity Genes.

Authors:  Alexander Krivoruchko; Alexander Surov; Antonina Skokova; Anastasiya Kanibolotskaya; Tatiana Saprikina; Maxim Kukharuk; Olesya Yatsyk
Journal:  Genes (Basel)       Date:  2022-07-26       Impact factor: 4.141

Review 4.  Taming, Domestication and Exaptation: Trajectories of Transposable Elements in Genomes.

Authors:  Pierre Capy
Journal:  Cells       Date:  2021-12-20       Impact factor: 6.600

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.