Transposable elements (TEs) are mobile DNA sequences that colonize genomes and threaten genome integrity. As a result, several mechanisms appear to have emerged during eukaryotic evolution to suppress TE activity. However, TEs are ubiquitous and account for a prominent fraction of most eukaryotic genomes. We argue that the evolutionary success of TEs cannot be explained solely by evasion from host control mechanisms. Rather, some TEs have evolved commensal and even mutualistic strategies that mitigate the cost of their propagation. These coevolutionary processes promote the emergence of complex cellular activities, which in turn pave the way for cooption of TE sequences for organismal function.
Transposable elements (TEs) are mobile DNA sequences that colonize genomes and threaten genome integrity. As a result, several mechanisms appear to have emerged during eukaryotic evolution to suppress TE activity. However, TEs are ubiquitous and account for a prominent fraction of most eukaryotic genomes. We argue that the evolutionary success of TEs cannot be explained solely by evasion from host control mechanisms. Rather, some TEs have evolved commensal and even mutualistic strategies that mitigate the cost of their propagation. These coevolutionary processes promote the emergence of complex cellular activities, which in turn pave the way for cooption of TE sequences for organismal function.
Transposable elements (TEs) are mobile repetitive DNA sequences that comprise a substantial fraction of eukaryotic genomes (Bourque et al. 2018). For instance, TEs and their remnants account for more than half the nuclear DNA content of maize, zebrafish, and humans and approximately a third of the nuclear DNA content of Drosophila melanogaster and Caenorhabditis elegans (The ; Schnable et al. 2009; de Koning et al. 2011; Howe et al. 2013; Hoskins et al. 2015). The evolutionary success of TEs lies in their ability to replicate independently of and faster than the host genome (Doolittle and Sapienza 1980; Orgel and Crick 1980). TEs are classified into two broad groups based on their molecular transposition intermediate: Class I elements (retrotransposons) mobilize via an RNA intermediate, while class II elements (DNA transposons) mobilize via a DNA intermediate. Retrotransposons include endogenous retroviruses (ERVs) and related long terminal repeat (LTR) retrotransposons as well as non-LTR retrotransposons (Wicker et al. 2007; Bourque et al. 2018). DNA transposons include “cut and paste” DNA transposons as well as non-“cut and paste” elements (Wicker et al. 2007; Bourque et al. 2018). If transposition occurs in cells that contribute to the next generation (i.e., the germline), the newly integrated TE copy becomes inheritable and may spread vertically within the population. TEs also spread horizontally between species at an appreciable frequency, which is crucial to their evolutionary persistence by allowing the colonization of new genomes (Gilbert and Feschotte 2018).As for any heritable mutational event, natural selection and genetic drift dictate the fate of a new TE insertion in the population. Because of their sheer length (∼100–10,000 bp) and propensity to carry regulatory elements (i.e., promoters, splice sites, and poly-A signals), TE insertions are particularly prone to disrupt gene function. For instance, ∼10% and ∼50% of spontaneous mutant phenotypes isolated in laboratory strains of mice and flies, respectively, are caused by de novo TE insertions within the coding or noncoding portion of genes (Eickbush and Furano 2002; Gagnier et al. 2019). An equally sizeable fraction of maize mutants arose from transposition events (Neuffer et al. 1997). In humans, at least 120 Mendelian diseases have been attributed to de novo TE insertions (Hancks and Kazazian 2016). These observations, together with more direct measurement of fitness effects in Drosophila caged populations (e.g., Pasyukova 2004), indicate that TE mobilization is highly mutagenic and represents a significant source of genome instability in a wide range of organisms.Even TEs that are no longer mobile still pose a threat to organismal fitness. The repetitive nature of TE families provides a substrate for ectopic recombination events that can lead to chromosomal rearrangements, often with deleterious consequences (Montgomery et al. 1991; Ade et al. 2013; Bennetzen and Wang 2014; Hancks and Kazazian 2016). TE-encoded products (RNA, cDNA, and proteins), even with compromised functionalities, can be toxic, and their accumulation to aberrant amounts are associated with and increasingly recognized as directly contributing to various disease states, including cancer, senescence, and chronic inflammation (Lee et al. 2012a; Tubio et al. 2014; Burns 2017; Tang et al. 2017; Bourque et al. 2018; Dubnau 2018; Schauer et al. 2018; De Cecco et al. 2019).The manifold capacity of TEs to compromise genome integrity and interfere with normal cellular function suggests that uncontrolled TE amplification can have catastrophic consequences for both organisms and TEs, as their fitness is inextricably linked. Thus, like any parasitic system, natural selection will eliminate TEs with excessive activity and overly deleterious effects on their host. It will also select for the emergence of host-encoded mechanisms that suppress or dampen TE activity. If a defense mechanism completely blocks a given TE family, it will in turn place selective pressure on these elements to evolve a counterdefense or escape mechanism to avoid extinction, pressuring the host to evolve further mechanisms to compensate. Such genetic conflict is often framed in terms of an arms race (Hurst and Werren 2001; Burt and Trivers 2006; Werren 2011; McLaughlin and Malik 2017; Ozata et al. 2019), which builds on the Red Queen model for host–pathogen interactions (Van Valen 1973). Alternatively, the arms race could be avoided if TEs mitigate the conflict with their host either through self-control or by providing a benefit offsetting their cost. Such strategies would reduce pressure on the host to evolve systems to counteract TEs, promoting equilibrium rather than precipitating an arms race.In the first part of this review, we examine the evidence for and against the arms race model of host–TE interactions. We posit that the extent to which TEs outcompete their host is constrained by their dependence on organismal fitness, which favors TEs that evolve strategies that circumvent or attenuate, rather than block, host defenses. We then explore alternative models, including self-regulatory mechanisms and mutualistic interactions, that reduce the cost of TE activity. We speculate that cooperative strategies may be more widespread than currently appreciated and pave the way for cooption of TE sequences for host function.
Arms races
Consistent with a need to prevent the deleterious effects of rampant transposition, a variety of host-encoded mechanisms are known to repress eukaryotic TEs at both the transcriptional (chromatin modification and DNA methylation) and posttranscriptional (modifying and degrading TE transcripts) level (Yoder et al. 1997; Borges and Martienssen 2015; Goodier 2016; Czech et al. 2018; Ozata et al. 2019). This leads to suppression of TE expression, mobility, and/or ability to promote recombination. However, all of these mechanisms do not act exclusively on TE sequences but also regulate host gene expression or protect against exogenous pathogens (Klose and Bird 2006; Fedoroff 2012; Klemm et al. 2019). In fact, it has been proposed that epigenetic control systems, which suppress recombination, might facilitate the accumulation of TEs (Fedoroff 2012). These observations beg the following question: Are TEs the raison d’être and primary target for these mechanisms or are they caught in the cross fire? One way to address this question is to examine whether some of these mechanisms have evolved the ability to distinguish TEs from host sequences. Another predicted hallmark of a TE defense system is that it should be able to adapt to control new or variant TEs that evade repression, triggering an arms race between TEs and components of the pathway. Below, we discuss two pathways that appear to possess these attributes: piwi-interacting small RNAs (piRNAs) and Kruppel-associated box-containing zinc finger proteins (KRAB-ZFPs).
piRNA-mediated TE silencing
piRNAs are small RNAs found in most metazoans that are typically produced from long precursor transcripts derived from specialized loci called piRNA clusters (for review, see Ernst et al. 2017; Ozata et al. 2019). Once expressed, primarily in the gonads but also in the soma of some organisms, piRNA precursors are processed into mature piRNAs, which then complex with PIWI clade Argonaute proteins. These ribonucleoprotein (RNP) complexes recognize complementary target mRNAs in the cytoplasm or nascent RNAs in the nucleus, triggering a cascade of biochemical processes that ultimately reduces target RNA expression via posttranscriptional or transcriptional mechanisms, respectively (Fig. 1A; Ozata et al. 2019). piRNAs produced from TE loci act as potent trans-repressors of related TEs located throughout the genome (Ozata et al. 2019). In some organisms, piRNA-mediated TE repression appears necessary to preserve the integrity of the germline. This is best documented in D. melanogaster, where the loss of piRNAs normally repressing certain TEs (P element and I element) leads to rampant transposition and hybrid dysgenesis, which is characterized by extensive DNA damage, gonadal atrophy, and sterility (Kidwell et al. 1977; Bucheton et al. 1984; Brennecke et al. 2008; Wang et al. 2018). Increased piRNA production against P elements in aging flies rescues fertility, further underlining the importance of the piRNA pathway in maintaining germline integrity (Khurana et al. 2011). In mice, elimination of PIWI proteins also results in massive accumulation of retrotransposon transcripts in sperm and oocytes (Carmell et al. 2007; Kuramochi-Miyagawa et al. 2008; De Fazio et al. 2011; Kabayama et al. 2017). This activation precedes—and may cause—defects in spermatogenesis due to meiotic arrest and apoptosis (Deng and Lin 2002; Kuramochi-Miyagawa et al. 2004; Carmell et al. 2007; Frost et al. 2010), although whether defects are due to transposition remains unclear (Newkirk et al. 2017). Similar phenotypes are also seen in C. elegans upon loss of PIWI protein PRG-1, which leads to transposon activation and impaired fertility (Batista et al. 2008; Wang and Reinke 2008; Bagijn et al. 2012; Lee et al. 2012b). Thus, it is indisputable that one function of the piRNA pathway is to recognize and silence TEs.
Figure 1.
piRNA and KRAB-ZFPs: two host systems that recognize and silence TEs. (A) The piRNA pathway in D. melanogaster. Mature piRNAs re-enter the nucleus to perform transcriptional gene silencing (TGS) or participate in the ping-pong cycle to perform posttranscriptional gene silencing (PTGS) of TEs. (Rhi) Rhino; (Del) Deadlock; (Aub) Aubergine; (Ago3) Argonaute 3. (B) The KRAB-ZFP pathway in tetrapods (see the text). (NURD) Nucleosome remodeling deacetylase; (HDAC) histone deacetylase; (DNMTs) DNA methyltransferases.
piRNA and KRAB-ZFPs: two host systems that recognize and silence TEs. (A) The piRNA pathway in D. melanogaster. Mature piRNAs re-enter the nucleus to perform transcriptional gene silencing (TGS) or participate in the ping-pong cycle to perform posttranscriptional gene silencing (PTGS) of TEs. (Rhi) Rhino; (Del) Deadlock; (Aub) Aubergine; (Ago3) Argonaute 3. (B) The KRAB-ZFP pathway in tetrapods (see the text). (NURD) Nucleosome remodeling deacetylase; (HDAC) histone deacetylase; (DNMTs) DNA methyltransferases.
The piRNA pathway evolves rapidly, but why?
Is the piRNA pathway engaged in an arms race with TEs? A common hallmark of an arms race between host and pathogen is the rapid diversification of host and pathogen proteins that are engaged in conflicting interactions. This may be a direct physical interaction (e.g., host protein blocks pathogen protein), but indirect or secondary interactions may also drive rapid evolution (McLaughlin and Malik 2017; Parhad and Theurkauf 2018). Diversification and adaptation of protein sequences often manifest as so-called positive selection at the gene level, which is characterized by an excess of nonsynonymous changes in codons relative to synonymous mutations over time. Signatures of positive selection are pervasive in invertebrate piRNA pathway proteins, operating at virtually every step of piRNA biogenesis both within and between species (Begun et al. 2007; Larracuente et al. 2008; Obbard et al. 2009; Mackay et al. 2012; Simkin et al. 2013; Blumenstiel et al. 2016). Signatures of positive selection are also apparent in the evolution of several piRNA genes across fish species, including Piwil1, but less evident in their mammalian homologs (Yi et al. 2014).Another prediction of a rapidly evolving host defense system is that it can lead to functional incompatibilities, introducing a breach in the defense of the progeny of individuals carrying divergent components. Support for this scenario came from an elegant study of the rapidly evolving piRNA pathway components Rhino and Deadlock in hybrids of D. melanogaster and Drosophila simulans (Parhad et al. 2017). In drosophilids, Rhino and Deadlock interact directly as part of a complex that binds to and promotes the transcription of piRNA clusters (Fig. 1A; Ozata et al. 2019). Genetic loss of Rhino (Klattenhoff et al. 2009) or depletion of Deadlock (Mohn et al. 2014) in D. melanogaster leads to loss of cluster-derived piRNAs and results in massive transcriptional activation of TEs in the germline. While introduction of a D. melanogaster rhino transgene rescues this mutant phenotype, the D. simulans rhino transgene does not (Fig 2A; Parhad et al. 2017). Domain-swapping experiments and coimmunoprecipitation and colocalization assays indicate that this is due to an inability of the D. simulans Rhino protein to interact with D. melanogaster Deadlock due to species-specific amino acid changes at the interaction interface of the two orthologous proteins (Parhad et al. 2017). However, D. simulans Rhino does interact with D. simulans Deadlock, indicating that coevolution has occurred between these proteins in the D. simulans lineage as well. The investigators speculate that the rapid coevolution of Rhino and Deadlock proteins was precipitated by a yet-unknown TE-encoded antisilencing factor present in one or both species’ lineages that may directly compete or interfere with their interaction (Parhad et al. 2017).
Figure 2.
Evidence of host–TE arms races. (A) D. melanogaster–D. simulans Rhino–Deadlock incompatibility. (Rhi) Rhino; (Del) Deadlock; (Pol II) RNA polymerase II. D. melanogaster–D. simulans Rhino/Deadlock proteins cannot complement to transcribe piRNA clusters (see the text). (B) ZNF93- and ZNF649-mediated silencing of primate L1 elements. Older L1 primate families are silenced by ZNF93 and ZNF649, but younger L1 families lack ZNF93- and ZNF649-binding sites and are not silenced.
Evidence of host–TE arms races. (A) D. melanogaster–D. simulans Rhino–Deadlock incompatibility. (Rhi) Rhino; (Del) Deadlock; (Pol II) RNA polymerase II. D. melanogaster–D. simulans Rhino/Deadlock proteins cannot complement to transcribe piRNA clusters (see the text). (B) ZNF93- and ZNF649-mediated silencing of primate L1 elements. Older L1 primate families are silenced by ZNF93 and ZNF649, but younger L1 families lack ZNF93- and ZNF649-binding sites and are not silenced.Another indicator of adaptive evolution of the piRNA pathway lies in the recurrent duplication and turnover of genes involved in the pathway. Again, this is most striking in invertebrates. For example, a phylogenomic survey of Piwi clade Argonaute genes across 84 dipteran species revealed that these genes have duplicated 27 times independently (Lewis et al. 2016). Interestingly, this study revealed a disproportionate number of duplication events in mosquitoes, which carry a large and diverse TE load (Arensburger et al. 2010; Matthews et al. 2018). More recently, a broader survey of arthropods identified an additional 17 gene duplication events of Piwi, 14 of which occurred in the lineage of the pea aphid alone (Lewis et al. 2018), perhaps reflecting their abundant and diverse TE content (∼38% of the genome) (The International Aphid Genomics Consortium 2010).The rapid diversification of the piRNA pathway machinery in insects via positive selection and recurrent gene duplication is consistent with the idea that the system is engaged in an arms race. What remains unclear, however, is whether adaptation in the pathway is driven primarily by a need to adjust to TE activity or evasion or by another conflict. If TE activity is the primary driver, one would predict that the timing and intensity of positive selection should track with TE load and diversity across species’ lineages. However, in Drosophila, where this idea has been modeled and examined, this appears to not be the case. TE abundance across the Drosophila genus was found to be correlated with the level of purifying selection (constraint) on piRNA pathway components but not with the rate at which these proteins have diversified (Castillo et al. 2011). Another prediction of an arms race model is that variation in TE activity would be accompanied by commensurate changes in expression of piRNA pathway components or changes in abundance or composition of the piRNA pool. The former possibility was examined across wild-type strains of D. simulans with variable TE content (Fablet et al. 2014). While piRNA pathway genes exhibited wide variation in transcript levels across strains, there was no positive correlation with TE copy number. In fact, RNA sequencing of 16 inbred lines from the Drosophila Genetic Reference Panel identified only minor variation in piRNA expression, and piRNA cluster expression did not correlate with the presence of strain-specific TE insertions (Song et al. 2014). Another study investigated the genomic factors that contribute to the piRNA pool by integrating genomic, mRNA, and small RNA data for two laboratory strains of D. melanogaster (Kelleher and Barbash 2013). While variation of piRNA abundance between strains appears to be positively correlated with total TE content and expression, the most recently active TEs did not produce the most abundant piRNAs in ovaries (Kelleher and Barbash 2013). Collectively, the evidence so far in Drosophila does not support the notion that TE activity is a primary driver of rapid evolution in the piRNA pathway. It should be noted, however, that correlation between TE content and piRNA pathway expression variants may not be detectable due to the low levels of linkage disequilibrium in Drosophila (Mackay et al. 2012). Further characterization of the relationship between piRNA pathway evolution and TE composition across species with more drastic variation in TE content might reveal a different picture.
piRNAs function beyond TE silencing and outside the germline
What other forces could drive adaptive evolution of the piRNA pathway? In addition to TEs, it is well established that piRNAs also target host genes and viruses (Ozata et al. 2019), and both targets could have considerable influence on piRNA pathway evolution. The fact that host genes are not immune to, but actually are frequent targets of piRNAs in a wide range of organisms implies a need to minimize off-target effects. Indeed, if piRNA targeting is not sufficiently specific to TEs, it could interfere with host gene expression. The autoimmunity hypothesis (Blumenstiel et al. 2016) posits that positive selection in piRNA genes reflects alternating periods of high and low TE activity, which impose opposite constraints on the piRNA response: High TE activity requires high piRNA specificity, while low activity calls for greater sensitivity. In this model, TE activity indirectly influences piRNA evolution. Measurable fitness defects caused by off-targeting effects of TE-derived small RNAs (Hollister and Gaut 2009; Lee 2015) bring empirical support to the model. Further evidence comes from a recent study that compared off-target effects of three piRNA pathway components—Aubergine (aub), Armitage (armi), and Spindle E (spnE)—in D. melanogaster mutant backgrounds trans-complemented with the D. simulans protein (Wang et al. 2019). When mutant flies were trans-complemented, more D. melanogaster protein-coding genes were repressed than when complemented with the D. melanogaster version, suggesting that the D. melanogaster proteins have adapted to avoid genic off-targeting in the D. melanogaster background, whereas the D. simulans proteins have not (Wang et al. 2019). These results support the idea that the avoidance of off-target effects on “self” sequences is a plausible driver of adaptation in the piRNA pathway.There is also mounting evidence that the piRNA pathway functions in antiviral defense, which may trigger another conflict driving rapid evolution. The strongest evidence for antiviral activity of piRNAs thus far comes from studies in mosquitoes. These insects have a dramatically expanded repertoire of Piwi genes (Lewis et al. 2016) and deploy somatic piRNAs to combat arbovirus infection (Morazzani et al. 2012; Vodovar et al. 2012; Léger et al. 2013; Schnettler et al. 2013; Miesen et al. 2016). Small RNA profiling also revealed abundant viral-derived piRNAs in Drosophila cell culture—thought to reflect naturally occurring infections (Wu et al. 2010)—but not in actual flies subjected to experimental viral infection (Petit et al. 2016). Chickens that harbor endogenized avian leukosis virus (ALV) sequences also produce copious amount of piRNAs from these loci in their testes (Sun et al. 2017). Since one of these ALV loci, ALVE6, has been historically associated with ALV resistance (Robinson et al. 1981), it is tempting to speculate that piRNAs derived from this locus offer antiviral protection. Likewise, koalas infected by the KoRV retrovirus produce abundant piRNAs that map to endogenized KoRV insertions, although it remains unclear whether these piRNAs protect against KoRV infection (Yu et al. 2019). A similar mechanism has been proposed to operate in primates and rodents, where endogenous bornavirus-like elements generate piRNAs in the testes (Parrish et al. 2015). Together, these data suggest that endogenous viral sequences are a common source of piRNAs in diverse animals. Thus, both gene targeting and viral targeting by piRNAs add a layer of complexity that must be considered as potential drivers of piRNA pathway rapid evolution.
KRAB-ZFPs as an adaptive TE silencing system
The expansion of KRAB-ZFPs in mammalian genomes is increasingly recognized as an adaptive response to TE invasion. KRAB-ZFPs minimally contain an N-terminal KRAB domain followed by a variable array of C2H2-type zinc fingers (Imbeault et al. 2017; Yang et al. 2017b). Most KRAB-ZFPs so far characterized act as transcriptional repressors (for review, see Ecco et al. 2016; Yang et al. 2017b). Typically, KRAB-ZFPs bind DNA through their zinc fingers and recruit the corepressor KAP1 (TRIM28) via their KRAB domain. In turn, KAP1 recruits a variety of epigenetic modifiers such as histone and DNA methyltransferases that nucleate the formation of repressive chromatin at the target locus (Fig. 1B). Several KRAB-ZFPs have been implicated in the silencing of particular TE families recognized through sequence-specific DNA-binding interactions (Najafabadi et al. 2015; Schmitges et al. 2016; Imbeault et al. 2017). Perhaps the best-characterized example is mouse Zfp809, which was initially identified as a transcriptional repressor of proviral insertions of murine leukemia virus (MLV) in embryonic stem cells (ESCs) (Wolf and Goff 2009) but is also able to establish stable repression of ERV-like VL30 retrotransposons in early mouse development (Wolf et al. 2015). Thus, there is growing evidence that KRAB-ZFPs frequently target and silence TEs.Several observations suggest that the rapid evolution of KRAB-ZFP genes is driven in part by an arms race with TEs. First, KRAB-ZFPs are massively expanded in all tetrapod lineages examined (with the notable exception of birds), ranging from 200 to 400 genes in most mammals, but very few are deeply conserved, implying pervasive evolutionary turnover (Imbeault et al. 2017). Second, the copy number of ZFPs in a given genome, including KRAB-ZFPs, correlates strikingly with the amount of LTR retroelements, and this correlation is observed across a wide range of vertebrate species and timescales (Thomas and Schneider 2011), which suggests a persistent coevolutionary relationship between TEs and ZFPs. Furthermore, the vast majority of mammalian KRAB-ZFPs profiled thus far bind only a single or a few TE families, and often the evolutionary emergence of a KRAB-ZFP closely follows the expansion of the TE family that it targets (Najafabadi et al. 2015; Schmitges et al. 2016; Imbeault et al. 2017). Finally, the DNA-contacting residues of many mammalian ZFPs exhibit telltale signatures of positive selection (Schmidt and Durrett 2004; Emerson and Thomas 2009; Nowick et al. 2010), suggesting that some KRAB-ZFPs adapt their targeting capacity in order to repress new DNA sequences, which may be introduced by newly expanded TE families.Studies of two KRAB-ZFPs—ZNF93 and ZNF649—offer a compelling example of an arms race with the LINE1 (L1) family of non-LTR retrotransposons in the primate lineage (Fig. 2B; Jacobs et al. 2014; Fernandes et al. 2018). ZNF93 is a primate-specific gene whose zinc fingers underwent a series of adaptive changes in the hominoid lineage that enabled its binding to the L1PA6 and L1PA5 subfamilies (L1PA6-PA5) shortly after these elements started to expand in the hominoid ancestor (Jacobs et al. 2014). ZNF93 binding to L1PA6-PA5 elements had seemingly moderate repressive effects on these subfamilies, since they continued to amplify, but ZNF93 gained increased binding affinity for their descendants (L1PA4-PA3), which must have led to tighter repression. Indeed, it took a derivative of L1PA3 (L1PA3-6030) acquiring a 129-bp deletion—removing the ZNF93-binding site within its 5′ untranslated region (UTR)—to evade repression and enable further waves of L1 amplification, yielding the most recent L1PA2/L1HS subfamilies (Fig. 2B). Accordingly, reinserting the 129-bp segment deleted during evolution back to its original position within the modern active L1HS element restores ZNF93 binding and transcriptional repression and significantly dampens transposition of L1HS in cell culture (Jacobs et al. 2014).A recent study revealed yet another layer of coevolution between primate L1 and KRAB-ZFPs, implicating an older gene, ZNF649, not directly related to ZNF93, in the silencing of L1PA6 elements (Fernandes et al. 2018). Like ZNF93, the zinc fingers of ZNF649 seem to have adapted to bind a motif within the 5′ UTR of the ancestral L1PA6 element but upstream of the 129-bp region bound by ZNF93. However, descendants of L1PA6 progressively evaded ZNF649 binding through a series of point mutations within their 5′ UTR, in parallel to their evasion from ZNF93 binding (Fig. 2B; Fernandes et al. 2018). This intricate game of cat and mouse between KRAB-ZFPs and L1 elements provides a vivid illustration of an arms race spanning ∼30 million years of primate evolution. Similar interactions may explain the extensive 5′ UTR diversification of L1 subfamilies throughout mammalian evolution (Khan et al. 2006). The wide diversity of TEs and KRAB-ZFPs across tetrapods implies that many other arms races must have been at play to fuel their coevolutionary relationship (Thomas and Schneider 2011).
KRAB-ZFPs also regulate host genes and viral activity
Although evidence for a host–TE arms race is currently stronger for KRAB-ZFPs than for piRNAs, it can only partially explain KRAB-ZFP evolution. It is well established that several KRAB-ZFPs bind non-TE sequences and play important roles in host physiology and development that now appear independent of TE repression (Imbeault et al. 2017; Yang et al. 2017a; for review, see Yang et al. 2017b). Additionally, KRAB-ZFPs can persist in the genome long after their identified TE targets have lost transposition activity, and new KRAB-ZFPs can evolve to target TEs that have long ceased to be active (Imbeault et al. 2017). These and other observations have led to the hypothesis that the recurrent interaction of KRAB-ZFPs with TEs is not a defensive strategy but rather a “massive and sophisticated enterprise of TE domestication for the evolutionary benefit of the host” (Friedli and Trono 2015). Under this model, KRAB-ZFPs are selected to exploit a vast reservoir of previously dispersed TE families and their various cis-regulatory activities (Chuong et al. 2017) to modulate gene expression networks in a species- and cell type-specific fashion (Trono 2015; Ecco et al. 2016; Yang et al. 2017b). While this scenario does not preclude an occasional arms race with TEs, it offers a host-centric alternative worth considering as an additional driver of KRAB-ZFP evolution.Some KRAB-ZFPs are also known to repress the activity of exogenous retroviruses. ZFP809, for example, protects mouse ESCs from MLV replication, and its expression in differentiated cell lines is sufficient to render cells resistant to MLV infection (Wolf and Goff 2009). ZFP809 restricts MLV by binding to the proline tRNA primer-binding site of proviral DNA, which represses its transcription. Interestingly, the same primer sequence is used by various retroviruses, which suggests that a single KRAB-ZFP could potentially restrict a wide range of retroviruses. Recently, a pair of KRAB-ZFPs, known as Suppressor of nonecotropic ERV1 (Snerv1) and Snerv2 were shown via genetic analysis to be required for silencing of nonecotropic ERV (NEERV) expression (Treger et al. 2019). In immunodeficient mice, NEERV loci can recombine to generate infectious retroviruses, and expression of NEERV glycoprotein gp70 contributes to lupus nephritis susceptibility (Ito et al. 2013; Ottina et al. 2018). Similar to ZFP809, SNERV1 recruits KAP1 to silence NEERV elements by binding sequences overlapping their LTR, including a glutamine tRNA primer-binding site (Treger et al. 2019). Together, these findings suggest that antiviral activity may be a recurrent theme promoting the selection of novel KRAB-ZFPs. Furthermore, the fact that multiple KRAB-ZFPs have repeatedly evolved the ability to target tRNA primer-binding sites, which are some of the most evolutionarily constrained sequences in retroviral genomes, attests to the ability of retroviruses to frequently evade KRAB-ZFP binding to other parts of their genome. These observations point to retroviruses as common targets and important drivers of KRAB-ZFP evolution.
Counterdefense mechanisms
Invoking an arms race between TEs and host control systems implies that TEs commonly evade silencing. However, there are very few explicit cases of TEs having evolved active escape mechanisms. To our knowledge, only three examples of TE-encoded antisilencing mechanisms have been reported so far—all from plants (Nosaka et al. 2012, 2013; Fu et al. 2013; McCue et al. 2013; Hosaka et al. 2017). In cultivated rice (Oryza sativa), a family of CACTA DNA transposons carries a microRNA (miRNA) gene, mir820, which down-regulates the expression of the de novo methyltransferase gene OsDRM2 (Fig. 3B; Nosaka et al. 2012). mir820 binding to the 3′ UTR of OsDRM2 mRNA modestly reduces OsDRM2 expression, and independent RNAi-mediated knockdown of OsDRM2 resulted in reduced DNA methylation of a variety of TEs and concomitantly elevated TE expression (Nosaka et al. 2012, 2013). These results indicate that inhibition of OsDRM2 by miR820 would enable several TEs, including CACTA elements, to evade silencing. Interestingly, compensatory mutations appear to have been selected during rice evolution to maintain interactions between the TE-encoded miRNA and its binding site within the OsDRM2 mRNA, which may be the signature of an ongoing arms race (Nosaka et al. 2012).
Figure 3.
Evidence of TE counterdefense (A) VANDAL21 elements in Arabidopsis thaliana encode VANC21, which inhibits host DNA methylation (gray circles) of VANDAL21 elements. (B) Some CACTA DNA transposons in O. sativa encode a miRNA, mir820, which basepairs with OsDRM2 mRNA and reduces translation of OsDRM2, a DNA methyltransferase.
Evidence of TE counterdefense (A) VANDAL21 elements in Arabidopsis thaliana encode VANC21, which inhibits host DNA methylation (gray circles) of VANDAL21 elements. (B) Some CACTA DNA transposons in O. sativa encode a miRNA, mir820, which basepairs with OsDRM2 mRNA and reduces translation of OsDRM2, a DNA methyltransferase.In Arabidopsis, some TEs produce siRNAs that can affect host gene expression in trans (trans-acting siRNAs [tasiRNAs]). One of these tasiRNAs, derived from Athila6 retrotransposons, was shown to target the 3′ UTR of the UPB1b mRNA, which encodes a host protein involved in global translational repression under stress conditions (McCue et al. 2013). tasiRNA-mediated repression of UPB1b results in elevated transcript and protein levels of Athila6 elements, supporting a countersilencing role of this tasiRNA. It is ironic that small RNA-based regulation, which is usually perceived as a prominent mechanism to silence TEs, would be deployed by a TE to promote its propagation.VANDAL DNA elements in Arabidopsis provide perhaps the most convincing case reported thus far of transposons encoding a suppressor of silencing. At least two distantly related families, VANDAL21 and VANDAL6, were shown to encode an accessory protein, VANC21 and VANC6, respectively, that, when transiently expressed from a plasmid, induces demethylation of cognate VANDAL family members without affecting methylation of any other TE families in the Arabidopsis genome, including each other or closely related VANDAL families (Fig 3A; Fu et al. 2013; Hosaka et al. 2017). Mechanistically, it remains unclear how VANC promotes hypomethylation of VANDAL elements, but the process is dependent on a short tandem sequence motif that is found in high copy number within VANDAL21/6 elements but at low copy number elsewhere in the genome (Hosaka et al. 2017). By achieving sequence-specific antisilencing, VANDAL elements have evolved a powerful selfish strategy that promotes their own mobility without affecting that of other transposons, thereby limiting the deleterious impact of their antisilencing system on host fitness.To our knowledge, no TE-encoded antisilencing factors have been described against either the piRNA or KRAB-ZFP pathways despite the proposed arms race long engaged between TEs and these control systems (Jacobs et al. 2014; Parhad et al. 2017; Wang et al. 2018). The dearth of antisilencing strategies described in eukaryotic TEs is all the more surprising given the plethora of strategies described for viruses and other pathogens to counteract host defense mechanisms. These include many examples of virus-encoded proteins that directly antagonize or degrade host defense systems, such as RNAi, CRISPR, and nucleic acid sensors to name just a few (for review, see Crow et al. 2016; Hynes et al. 2018; Landsberger et al. 2018).
Escape and self-control strategies
Why are there seemingly so few antisilencing mechanisms encoded by TEs? One fundamental difference between TEs and viruses is that TEs must replicate in the germline in order to propagate within a population, whereas viruses generally do not (Haig 2016). Thus, TE fitness is intimately intertwined with the reproductive fitness of their host organisms. This dependency places an important limitation on the ability of TEs to evolve broadly effective antisilencing mechanisms. For instance, a mechanism blocking the entire piRNA pathway would lead to massive mobilization of diverse TEs, simultaneously compromising host fertility and dooming TE propagation (Blumenstiel et al. 2016; Haig 2016), as documented in piRNA mutant backgrounds (e.g., see Wang et al. 2018). Consistent with this quandary, all TE-encoded antisilencing mechanisms described thus far have narrow effects or modes of action either resulting in modest decreases of host regulatory proteins (mir820 and Athila6 tasiRNA) or selectively targeting individual families (VANC). This is in stark contrast to viruses, which evolve mechanisms that achieve broad and/or highly effective blocks of the targeted pathways (Crow et al. 2016; Hynes et al. 2018; Landsberger et al. 2018).
Bypassing host surveillance
Alternatively, TEs may evade but in contexts in which their activity does not impact host fitness. For example, the I elements of D. melanogaster hijack ovarian nurse cells, which are permissive to their transcription but apparently refractory to transposition, as factories to assemble RNP complexes that serve as transposition intermediates (Van De Bor et al. 2005; Hamilton et al. 2009; Wang et al. 2018). These products are then trafficked via microtubules and delivered to the oocyte, where transposition takes place (Wang et al. 2018). I-element mRNA trafficking to the oocyte further requires an RNA stem–loop secondary structure present in its ORF2p sequence (Van De Bor et al. 2005). Although these retrotransposons may be susceptible to piRNA silencing upon entry into the oocyte, assembling the transposon products in permissive cells reduces the number of transposition steps exposed to host silencing. Given that other transposons, including LTR elements HMS-Beagle and 3S18, also preferentially localize to the oocyte (Wang et al. 2018) and that the RNA stem–loop structure required for I-element trafficking is present in G2 and jockey retrotransposons (Hamilton et al. 2009), it seems likely that such a transport mechanism may be a recurrent evasion strategy. Similarly, the virus-like particles produced by EVADÉ retrotransposons in Arabidopsis partially protect their mRNAs against small RNA-mediated degradation (Marí-Ordóñez et al. 2013). These bypass strategies suggest that, in eukaryotes, selection favors TEs that circumvent rather than antagonize or block host silencing pathways.
Targeting preferences
TEs have also repeatedly evolved mechanisms to direct their insertion to “safe havens” or regions of the genome where insertion will cause minimal harm. Studies that map de novo transposition events have shown that TEs, especially those colonizing compact genomes, have repeatedly evolved mechanisms to target benign or highly redundant regions of the genome. For example, Ty1 and Ty3 LTR retrotransposons in yeast, Skipper retroelements in Dictyostelium discoideum, and Dada DNA transposons in fish independently evolved preferences for integration in the immediate vicinity of tRNA genes, with apparently little to no impact on tRNA expression (for review, see Sultana et al. 2017; Cheung et al. 2018). Another safe harbor has been adopted by R1 and R2 non-LTR retroelements in arthropods (Pérez-González and Eickbush 2002) as well as Pokey DNA transposons in Daphnia (Penton and Crease 2004), which independently evolved targeting to ribosomal DNA arrays. Ty5 in yeast (Zou et al. 1996), Het-A/TAHRE/TART in Drosophila (Pardue and DeBaryshe 2011), and TRAS/SART (Fujiwara et al. 2005) in silkworms all independently evolved the ability to target telomeric regions. Several TE families also show preference for insertion upstream of protein-coding genes, including Tf retrotransposons in fission yeast (Levin and Boeke 1992), Drosophila P elements (Liao et al. 2000), maize Mutator elements (Dietrich et al. 2002), and rice mPing transposons (Naito et al. 2009). While insertion of these elements in this compartment must occasionally perturb host gene expression, it is still less likely to be detrimental than in coding regions. It also provides these elements with the added benefit of an “open” chromatin environment that will facilitate further mobilization and might shield them against host silencing. Remarkably, TEs can also evolve preference for insertion into other TEs, such as Tx1 non-LTR retrotransposons that target Tx1d DNA transposons in Xenopus laevis (Christensen et al. 2000) and Tourist elements that preferentially insert into other Tourist elements in rice and maize (Jiang and Wessler 2001), a strategy shared by various TE families in maize (Stitzer et al. 2019). Collectively, these data indicate that TEs have repeatedly adapted to occupy genomic niches that minimize the cost of transposition on host fitness.
Self-regulatory mechanisms
TEs can also minimize their deleterious impacts on their hosts by evolving self-regulatory mechanisms. A prominent theme is the spatiotemporal restriction of transposition activity. TEs are present in the genome of all cells and could, in principle, mobilize in both germline and somatic tissues. However, mobilization in somatic tissue has a greater potential of harming organismal fitness, whereas mobilization in the germline has lesser immediate effects on host function as long as it does not affect fertility (Charlesworth and Langley 1986; Haig 2016). Thus, suppression of TE activity in the soma is advantageous for both hosts and TEs and therefore predicts that TEs should evolve mechanisms to restrict expression to the germline. A classic example is the Drosophila P element, whose transposase open reading frame (ORF) is interrupted by an intron that is spliced only in the germline, preventing P-element mobility in the soma, where only prematurely truncated inhibitory transposase is produced (Laski et al. 1986). It appears that this regulatory switch was evolved through the gain of sequence elements within the transposon that recruit somatically expressed splicing inhibiting factors (for review, see Majumdar and Rio 2015). A recent study adds another layer of intricacy by implicating the piRNA pathway in repressing the splicing of this intron in the germ cells through piRNA-mediated chromatin changes within the P element (Teixeira et al. 2017). Thus, spatiotemporal regulation of the P element in flies involves an interplay of mechanisms evolved by the transposon and by the host.In mammals, several retroelements are known to have evolved exquisite stage-specific expression during early embryonic development (for review, see Rodriguez-Terrones and Torres-Padilla 2018). For instance, in mice, MaLR/MT elements are transcribed exclusively in oocytes (Peaston et al. 2004; Brind'Amour et al. 2018), while transcription of mouse ERV type L (MERVL) (Macfarlan et al. 2012) and young L1 subfamilies (Jachowicz et al. 2017; Percharde et al. 2018) peaks at the two-cell (2C) stage and coincides with zygotic genome activation. Human ERV type K (HERV-K) expression peaks at the eight-cell stage of human embryonic development (Grow et al. 2015), while HERVH/LTR7 are expressed in the pluripotent stem cells (PSCs) of the blastocyst (Wang et al. 2014; Göke et al. 2015). The mechanisms enabling such developmental precision of expression are becoming increasingly clear: Each TE family recruits a unique cocktail of host transcription factors that precisely specify these developmental stages. For example, HERVH/LTR7 recruits NANOG, OCT4, SOX2, and KLF4 (Kunarso et al. 2010; Ohnuki et al. 2014; Göke et al. 2015; Ito et al. 2017; Pontis et al. 2019), while MERVL recruits Dux (De Iaco et al. 2017; Hendrickson et al. 2017). Because these stages precede the differentiation of germ cells, they allow each of these different elements to generate inheritable insertions while occupying distinct expression niches, which might reduce their competition for cellular resources.Another mitigating strategy is the evolution of suboptimal transposition and self-restraining copy number control mechanisms. In order to persist, TEs must be active enough to generate new insertions but not so active as to impair host fitness. To this effect, some TEs have evolved mechanisms to reduce their own activity. This is most studied in Tc1/mariner transposons, which self-regulate their mobility in at least three ways: evolution of suboptimal transposases (supported by the isolation of hyperactive mutants) (Lampe et al. 1999; Mátés et al. 2009; Liu and Chalmers 2014), inhibition of transposase function by aggregation when transposase expression reaches a certain threshold (also known as overproduction inhibition) (Lohe and Hartl 1996), and selection for imperfect terminal inverted repeats (TIRs) that reduce transposase affinity and excision efficiency (Augé-Gouillou et al. 2001). These strategies are likely to be widespread among DNA transposons. Indeed, hyperactive transposases have been readily obtained by experimental mutagenesis for various elements (Yusa et al. 2011; Lazarow et al. 2012; Voigt et al. 2016), diverse transposases are known to form aggregates when overexpressed (Heinlem et al. 1994; Liu and Chalmers 2014; Woodard et al. 2017), and many consensus transposon sequences have imperfect TIRs. A subset of Ty1 elements in yeast also encodes a truncated Gag protein with a dominant-negative effect on Ty1 copy number (Saha et al. 2015). These examples illustrate that TEs commonly evolve self-control mechanisms that mitigate their deleterious impact.
Host–transposon mutualism
TE self-regulation and targeting may reduce the detrimental impact that TEs can have on host fitness but do not directly provide a selective advantage to the host. Is it conceivable that TEs and their hosts could achieve such a mutualistic relationship? Mutualism can be attained if maintenance of TE activity immediately benefits the host. This form of host–TE cooperation is commonplace in bacteria, where transposons and conjugative plasmids frequently shuttle antibiotic resistance genes and other factors allowing bacterial hosts to adapt to environmental challenges (Wintersdorff et al. 2016). Thus far, very few examples akin to host–TE mutualism have been documented in eukaryotes. Here we highlight three possible cases: telomeric retroelements in Drosophila, telomere-bearing element (TBE)-mediated genome rearrangement in the ciliate Oxytricha trifallax, and an emerging role for mammalian retrotransposons in early embryonic development.
Eukaryotes have adopted several mechanisms to ensure replication of the termini of linear chromosomes. In most, chromosomes are protected by telomeric repeats maintained by a specialized enzyme called telomerase (Fig 4A; Kordyukova et al. 2018). Drosophilid species, however, have lost the gene encoding telomerase, and, in all Drosophila species examined thus far (with one exception, which is discussed below), their telomeric repeats have been replaced by arrays of non-LTR retrotransposons, collectively referred to as telomeric retroelements (Fig 4A; Pardue and DeBaryshe 2008; Casacuberta 2017). In D. melanogaster, telomeric elements belong to three related families of jockey-like non-LTR retrotransposons (HeT-A, TAHRE, and TART) that are continuously inserted head to tail at chromosome ends via target-primed reverse transcription supported by proteins produced by autonomous TAHRE and TART copies (Fig. 4A). mRNA transcribed from all three retroelement families are imported into the nucleus by virtue of their association with their respective gag-like proteins, but targeting of TAHRE RNPs to telomeres requires association with the Het-A gag (Rashkova et al. 2002; Pardue and DeBaryshe 2008). Het-A in turn requires pol proteins from TAHRE/TART to be reverse-transcribed (Pardue and DeBaryshe 2008). Thus, telomeric retroelements cooperate with each other for their own amplification and for telomere maintenance (Capkova Frydrychova et al. 2008). As such, telomeric retroelements have long been regarded as the prototypical example of host–TE mutualism (Kidwell and Lisch 2001), although recent data have challenged this view (see below). Intriguingly, evidence is mounting that a distantly related family of non-LTR retrotransposons (G2/jockey3) contributes directly to the organization and function of centromeres of D. melanogaster and its sister species, D. simulans (Chang et al. 2019). Thus, several Drosophila retroelements appear directly involved in the faithful replication and segregation of chromosomes.
Figure 4.
Cooperation paves the way for cooption. (A) Drosophila telomeric transposons as a potential model for telomerase evolution. In D. melanogaster, three non-LTR transposons families maintain telomeres. Telomeric elements form a head-to-tail array in the telomere. Gag and reverse transcriptase (RT) proteins from HeT-A, TAHRE, and TART complex with their cognate mRNAs, forming a RNP complex capable of telomere elongation. In other organisms, telomerase reverse transcriptase (TERT) and telomerase RNA (TR) form the telomerase complex, which maintains telomeres. (B) Transposase proteins necessary for germline internal eliminated sequence (IES) elimination in Tetrahymena and Paramecium may have evolved from a mechanism analogous to TBE excision in Oxytricha (see the text). In the developing Oxytricha somatic macronucleus (MAC), TBE transposases excise IESs. Subsequent steps stitch together exons into single-gene chromosomes, which are amplified to thousands of copies in the mature MAC. In Paramecium and Tetrahymena, PiggyBac-derived proteins are domesticated for IES recognition and excision. (Pgm) PiggyMac and interactors; (TBP) Tetrahymena piggyBac-like. Scissors represent transposase proteins, and arrows represent genes.
Cooperation paves the way for cooption. (A) Drosophila telomeric transposons as a potential model for telomerase evolution. In D. melanogaster, three non-LTR transposons families maintain telomeres. Telomeric elements form a head-to-tail array in the telomere. Gag and reverse transcriptase (RT) proteins from HeT-A, TAHRE, and TART complex with their cognate mRNAs, forming a RNP complex capable of telomere elongation. In other organisms, telomerase reverse transcriptase (TERT) and telomerase RNA (TR) form the telomerase complex, which maintains telomeres. (B) Transposase proteins necessary for germline internal eliminated sequence (IES) elimination in Tetrahymena and Paramecium may have evolved from a mechanism analogous to TBE excision in Oxytricha (see the text). In the developing Oxytricha somatic macronucleus (MAC), TBE transposases excise IESs. Subsequent steps stitch together exons into single-gene chromosomes, which are amplified to thousands of copies in the mature MAC. In Paramecium and Tetrahymena, PiggyBac-derived proteins are domesticated for IES recognition and excision. (Pgm) PiggyMac and interactors; (TBP) Tetrahymena piggyBac-like. Scissors represent transposase proteins, and arrows represent genes.
Active transposons are required for Oxytricha development
The role of TBEs in O. trifallax genome rearrangement offers another tantalizing case of host–TE mutualism. Ciliates, such as O. trifallax, are single-celled eukaryotes that possess two types of nuclei: a transcriptionally silent diploid germline micronucleus (MIC) and a somatic macronucleus (MAC) that maintains gene expression in vegetative cells and is derived from the MIC through extensive genome rearrangements that occur shortly after sexual conjugation (Fig. 4B; Chen et al. 2014). Genes in the MIC genome are interrupted by noncoding sequences referred to as internal eliminated sequences (IESs) that must be excised for proper MAC function. Furthermore, in O. trifallax, exons are arranged out of order and often inverted in the MIC, so they must also be unscrambled to assemble a functional MAC genome (Chen et al. 2014). TEs comprise a large fraction of IESs and contribute substantially to the size of the MIC genome (Chen et al. 2014; Hamilton et al. 2016). The process of IES excision and unscrambling removes essentially all TEs from the MAC genome and reduces the 1-Gb MIC genome to an ∼50-Mb MAC genome consisting of thousands of single-gene chromosomes (∼2 kb long), each subsequently amplified to thousands of copies (Fig. 4B; Swart et al. 2013; Chen et al. 2014). In O. trifallax, this complex remodeling of the genome requires the cooperation of a family of DNA transposons, called TBEs, actively mobilized during meiosis (Nowacki et al. 2009). Experimental silencing of all three families of TBE transposases via RNAi impairs cell growth and causes cell death due to defects in germline elimination of both TBEs and IESs (Nowacki et al. 2009). However, silencing of a single TBE family is insufficient to cause this phenotype, indicating that all three TBE families cooperate to promote MAC development (Nowacki et al. 2009). Therefore, TBEs must remain active in the Oxytricha genome to provide an indispensable role for the development of its host organism, suggesting a mutualistic arrangement established during Oxytricha evolution (Vogt et al. 2013). Additional support for this model comes from evidence that TBE-encoded transposases have evolved under purifying selection, suggesting evolutionary constraint to serve both transposon and host functions (Chen and Landweber 2016).
Is mammalian embryogenesis addicted to TEs?
There is growing evidence that retrotransposons and ERVs are intimately intertwined with mammalian embryonic development. MERVL and HERVL appear to contribute to zygotic genome activation in mice and humans, respectively (Kigami et al. 2003; Svoboda et al. 2004; Macfarlan et al. 2012; De Iaco et al. 2017; Hendrickson et al. 2017; Whiddon et al. 2017). Furthermore, MERVL transcription in the 2C embryo triggers the formation of hundreds of chromatin loops that fold the genome in a 3D organization that is unique and possibly critical to totipotency (Kruse et al. 2019). In human ESCs, a distinct family, HERVH, is highly expressed and marks pluripotent cell populations (Santoni et al. 2012; Ohnuki et al. 2014; Wang et al. 2014; Göke et al. 2015). The high level of transcription of several HERVH loci also promotes the formation of chromatin loops and topological domains that are unique to ESCs (Zhang et al. 2019b) as well as the expression of chimeric long noncoding RNAs (lncRNAs) (Kelley and Rinn 2012; Kapusta et al. 2013; Lu et al. 2013, 2014; Wang et al. 2014). RNAi knockdown of some of these individual HERVH-derived lncRNAs in induced PSCs (iPSCs) results in loss of pluripotency (for review, see Izsvák et al. 2016), and CRISPR knockout of a single HERVH locus increases the capacity of iPSCs to differentiate into cardiomyocytes (Zhang et al. 2019b). A recent study also implicated HERVK and SVA elements as enhancers in naïve human ESCs, a model for preimplantation development (Pontis et al. 2019). The regulatory activities emanating from retrotransposons during mammalian development are intriguing, but it remains unclear whether they are merely a relic of selfish or cooperative strategies that these TEs deployed to occupy a niche, facilitating their transmission, or whether they have been coopted for lineage-specific developmental innovations (Haig 2016; Izsvák et al. 2016; Chuong et al. 2017).Recently, evidence has surfaced for another potential partnership between TEs and mammalian embryonic development, implicating the L1 retrotransposon. There is a striking nuclear accumulation of the RNA produced by the youngest transpositionally active L1 subfamilies in the mouse 2C embryo. Experimental depletion of endogenous L1 RNAs at that stage elicited a wide range of chromatin and regulatory defects that block embryonic development (Jachowicz et al. 2017; Percharde et al. 2018). Interestingly, a similar phenotype was observed when L1 transcription was experimentally prolonged beyond the 2C stage by targeted chromatin manipulation (TALE-VP64) but not when full-length L1 mRNA was injected, and the phenotype could not be rescued by inhibiting L1 reverse transcriptase (Jachowicz et al. 2017). These results suggest that the precise transcriptional activation of L1 in the nuclei of preimplantation embryos is required for the establishment of a chromatin state that promotes developmental progression. Mechanistically, nuclear L1 RNAs appear to exert these activities by forming a RNP complex with at least two host proteins, Nucleolin and KAP1, that activates ribosomal RNA transcription and represses the Dux locus, which in turn stimulates exit from the 2C stage (Percharde et al. 2018). It is unknown whether these findings extend beyond the mouse embryo, but it is worth noting that Nucleolin was reported previously to bind human L1HS RNA (Peddigari et al. 2013). Also, these results should be interpreted with caution because it is difficult to rule out nonspecific effects with the experimental approaches used to manipulate L1 expression. Undoubtedly, these provocative observations will stimulate further investigation.
Conflicts in disguise?
Although the examples described above suggest that mutualistic interactions between TEs and their hosts may be more widespread than currently appreciated, they remain open to alternative interpretations. Rather than true mutualisms, they may be viewed as commensalisms benefiting the TEs but of no real benefit to the host or even as “addictions,” whereby the TEs have supplanted ancestral functions essential to the host without providing an adaptive innovation but nevertheless creating a dependency on active transposition (Jangam et al. 2017). Such addiction may precipitate into a conflict if transposition occurs in excess or otherwise incurs a net cost on host fitness, which may set an arms race with the host to evolve mechanisms to control transposition (Fig. 4A). There is growing evidence that such instability may be at play at Drosophila telomeres. First, intraspecific and interspecific evolutionary analyses of 29 genes encoding proteins associated with telomere maintenance and function revealed that they have experienced repeated bouts of positive selection, indicating the existence of recurrent conflicts necessitating rapid adaptation of telomeric proteins (Lee et al. 2017). Furthermore, a recent study tracing the evolution of D. melanogaster telomeric retroelements in closely related species shows that this lineage of retrotransposons has experienced rapid turnover with drastic changes in their abundance across species (Saint-Leandre et al. 2019). These results uncover a paradoxical level of evolutionary instability for seemingly essential components of the genome. Surprisingly, it appears that one species, Drosophila biarmipes, has even lost altogether the telomeric retroelements, which are now replaced by a mixture of unrelated TEs at the tip of its chromosomes (Saint-Leandre et al. 2019). It remains to be seen how D. biarmipes copes with the loss of telomeric retroelements, as the TEs found at their telomeres appear to be no longer active (Saint-Leandre et al. 2019). These findings suggest that replacing vital host functions with TE activity may be an evolutionarily contentious and ultimately untenable situation.
En route to cooption
An advantage of adopting a cooperation-centric model of host–TE interaction is that cooperation provides a facile path for TE cooption. Cooption is the process by which natural selection taps into TE sequences to evolve new host function. Once considered rare, numerous examples of TE cooption have now been described, although the evolutionary forces and molecular path that lead to cooption remain poorly understood (Feschotte and Pritham 2007; Sinzelle et al. 2009; Chuong et al. 2017; Frank and Feschotte 2017; Jangam et al. 2017; Zhang et al. 2019a). We propose that the type of cooperative activities that TEs and host engage in directly influence the function that TEs get coopted for.The case of Drosophila telomeric retroelements provides a good example to illustrate this model. There are striking similarities between this system and telomerase, the mechanism used by most eukaryotes to maintain telomere length (Fig 4A; Pardue and DeBaryshe 2008; Kordyukova et al. 2018). Telomerase is a reverse transcriptase that appears most closely related to that currently encoded by Penelope-like retrotransposons (Gladyshev and Arkhipova 2007). Moreover, in organisms lacking telomerase (Drosophila) or with low telomerase expression (e.g., the silkworm Bombyx mori), telomeric retrotransposons can supplant telomerase function (Fujiwara 2015; Servant and Deininger 2016). These observations support a long-standing model that telomere maintenance via telomerase originated from an ancient retrotransposon-based mechanism (Eickbush 1997; Nakamura and Cech 1998; Pardue and DeBaryshe 2008). If so, then the cooperation between Drosophila telomeric retrotransposons and their hosts may be an evolutionary replay of how an ancient group of retroelements (perhaps Penelope-like) maintained their activity prior to being fully coopted for telomere maintenance in the last eukaryotic common ancestor (Fig. 4A).O. trifallax and its TBEs appear to be engaged in a mutualistic or perhaps addicted relationship. However, other ciliates, such as Paramecium and Tetrahymena, appear to have advanced that relationship to full domestication: These species use immobilized transposase proteins to mediate their MIC-to-MAC transition (Fig. 4B). In Paramecium, PiggyMac (Pgm) is a piggyBac-derived transposase required for DNA elimination that catalyzes and interacts with five additional related Pgm-like proteins to ensure complete IES targeting (Bétermier and Duharcourt 2014; Bischerour et al. 2018). Tetrahymena also uses a suite of related transposase-derived genes (TBP, TBP2, TBP6, and LIA5) to ensure proper genome rearrangement and IES deletion (Vogt and Mochizuki 2013; Cheng et al. 2016; Feng et al. 2017). Although these proteins are all clearly related to piggyBac transposases, they are extremely diverged from each other and share no close similarity to any extant TEs in these species, indicating that they have been fully and possibly independently coopted (Cheng et al. 2010; Vogt et al. 2013; Bétermier and Duharcourt 2014). Whether these transposases were once derived from elements with an activity similar to TBEs in Oxytricha is unknown, but there are clear sequence similarities between ciliate IESs and the TIRs of transposons (Fass et al. 2011; Cheng et al. 2016; Hamilton et al. 2016). Furthermore, the extent to which each domesticated piggyBac transposase is fully domesticated varies, with some (Pgm and TBP6) retaining features of piggyBac transposition, including precise excision and TIR dependence, while others (TBP2) are entirely dependent on host factors for IES targeting (Feng et al. 2017). These observations suggest that the Pgm and TBP transposases may have been coopted as a mean to resolve an “addiction” to their cognate elements similar to that seen in Oxytricha, which would link their cooption directly to their initial cooperation with their hosts (Fig 4B).
Outlook
In this review, we examined two host-encoded TE silencing systems, piRNAs and KRAB-ZFPs, that carry strong signatures of adaptive evolution in some lineages, suggesting that they are engaged in an arms race with TEs (McLaughlin and Malik 2017). However, there remains no evidence that TEs have evolved mechanisms directly neutralizing either of these two pathways, and a variety of other factors may explain their rapid evolution. Clearly, more work is needed to better understand the forces driving the diversification of these systems and gain a fuller picture of their interactions with TEs across a broad range of species.Adaptive evolution of piRNAs has been studied extensively in insects but not in mammals, and the few studies that have addressed this gap suggest that mammalian piRNA proteins have not diversified extensively (Yi et al. 2014). One explanation for this contrast could be that mammals have replaced the antiviral arm of the piRNA pathway with other systems. Indeed, mammalian cells possess a variety of antiviral responses, including nucleic acid sensors and interferon responses, which might have supplanted or relieved the piRNA pathway to carry out this immune function (for review, see tenOever 2016). Such a scenario would support the hypothesis that piRNA pathway evolution outside of mammals is driven primarily by its antiviral function.Detailed studies of KRAB-ZFPs so far have been carried out only in mice and humans despite dramatic expansion of these genes in diverse tetrapods (Imbeault et al. 2017). While evidence has built for intricate coevolution of KRAB-ZFP and L1 retrotransposons in primates (Fig. 2B; Jacobs et al. 2014; Fernandes et al. 2018), it is still unclear how common such a tug of war is. It may be illuminating to investigate these questions in tetrapods with more aggressive TE activity, such as frogs, axolotls, opossums, or bats (Mikkelsen et al. 2007; Pritham and Feschotte 2007; Ray et al. 2008; Hellsten et al. 2010; Sotero-Caio et al. 2017; Nowoshilow et al. 2018; Rogers et al. 2018).Another take-home message is that the arms race is only one, but perhaps not the most prevalent, form of host–TE interaction in eukaryotes. TEs have also evolved subtle evasive strategies as well as self-control and targeting mechanisms that attenuate the cost of transposition on host fitness. Some TEs even appear to have engaged in cooperative strategies with their host organism in a way that resembles a mutualistic or symbiotic relationship. While few cases of mutualism have surfaced thus far in eukaryotes, this strategy is commonplace in prokaryotes (Wintersdorff et al. 2016). It is possible that symbiotic interactions are widespread in eukaryotes but more difficult to capture because they are more challenging to identify and test experimentally, in part because of the large amounts of TEs that need to be manipulated simultaneously (e.g., retrotransposons in mammalian embryogenesis). The advent of genome editing and other large-scale perturbations offers new powerful tools to overcome these challenges (Bourque et al. 2018; Fuentes et al. 2018; Smith et al. 2019; Todd et al. 2019). It is also possible that many mutualistic interactions are evolutionarily unstable and volatile because they are prone to tilt back and forth between disproportionately benefiting the TE (e.g., Drosophila telomeres) or the host to turn into full cooption events (Fig. 5).
Figure 5.
Model for host–TE interactions. Conflict: TEs (purple) harm the host (orange), leading to host silencing of TEs. TEs occasionally evolve direct antisilencing mechanisms (dashed line). Most host–TE conflict leads to TE death. Cooperation and evasion: TEs evolve self-regulatory mechanisms to mitigate impacts on host fitness. Hosts and TEs can also evolve a mutualistic relationship. Cooperation can lead to both conflict and cooption. Cooption: Host repurposes all or part of a TE for novel host function at the expense of the TE.
Model for host–TE interactions. Conflict: TEs (purple) harm the host (orange), leading to host silencing of TEs. TEs occasionally evolve direct antisilencing mechanisms (dashed line). Most host–TE conflict leads to TE death. Cooperation and evasion: TEs evolve self-regulatory mechanisms to mitigate impacts on host fitness. Hosts and TEs can also evolve a mutualistic relationship. Cooperation can lead to both conflict and cooption. Cooption: Host repurposes all or part of a TE for novel host function at the expense of the TE.We therefore envision a model in which the host and TE cooperate for a period of time, which resolves in one of three ways: (1) The TE no longer cooperates, leading to reactivation and possible loss of the family in the population if too active (arms race); (2) the TE fades into obscurity due to relaxed selection pressure on its sequence; or (3) maintenance of TE features for cellular function rather than the TE family as a whole leads to eventual loss of the TE family (cooption) (Fig. 5). Validating the model will require the study of transitional systems such as those described in this review and others that are bound to surface.
Authors: Xiao Chen; John R Bracht; Aaron David Goldman; Egor Dolzhenko; Derek M Clay; Estienne C Swart; David H Perlman; Thomas G Doak; Andrew Stuart; Chris T Amemiya; Robert P Sebra; Laura F Landweber Journal: Cell Date: 2014-08-28 Impact factor: 41.582
Authors: Michelle A Carmell; Angélique Girard; Henk J G van de Kant; Deborah Bourc'his; Timothy H Bestor; Dirk G de Rooij; Gregory J Hannon Journal: Dev Cell Date: 2007-03-29 Impact factor: 12.270
Authors: Christian J H von Wintersdorff; John Penders; Julius M van Niekerk; Nathan D Mills; Snehali Majumder; Lieke B van Alphen; Paul H M Savelkoul; Petra F G Wolffs Journal: Front Microbiol Date: 2016-02-19 Impact factor: 5.640
Authors: Amir K Foroushani; Bryan Chim; Madeline Wong; Andre Rastegar; Patrick T Smith; Saifeng Wang; Kent Barbian; Craig Martens; Markus Hafner; Stefan A Muljo Journal: Proc Natl Acad Sci U S A Date: 2020-10-05 Impact factor: 11.205
Authors: Vera Gorbunova; Andrei Seluanov; Paolo Mita; Wilson McKerrow; David Fenyö; Jef D Boeke; Sara B Linker; Fred H Gage; Jill A Kreiling; Anna P Petrashen; Trenton A Woodham; Jackson R Taylor; Stephen L Helfand; John M Sedivy Journal: Nature Date: 2021-08-04 Impact factor: 49.962