Literature DB >> 35725016

Ongoing endeavors to detect mobilization of transposable elements.

Abstract

Transposable elements (TEs) are DNA sequences capable of mobilization from one location to another in the genome. Since the discovery of 'Dissociation (Dc) locus' by Barbara McClintock in maize (1), mounting evidence in the era of genomics indicates that a significant fraction of most eukaryotic genomes is composed of TE sequences, involving in various aspects of biological processes such as development, physiology, diseases and evolution. Although technical advances in genomics have discovered numerous functional impacts of TE across species, our understanding of TEs is still ongoing process due to challenges resulted from complexity and abundance of TEs in the genome. In this mini-review, we briefly summarize biology of TEs and their impacts on the host genome, emphasizing importance of understanding TE landscape in the genome. Then, we introduce recent endeavors especially in vivo retrotransposition assays and long read sequencing technology for identifying de novo insertions/TE polymorphism, which will broaden our knowledge of extraordinary relationship between genomic cohabitants and their host. [BMB Reports 2022; 55(7): 305-315].

Entities: Chemical

Mesh：

Substances：

Year: 2022 PMID： 35725016 PMCID： PMC9340088

Source DB: PubMed Journal: BMB Rep ISSN： 1976-6696 Impact factor: 5.041

SIMPLE TE CATEGORIES WITH COMPLEX CHARACTERISTICS

Broadly, TEs are grouped into class I (retrotransposons) or class II (DNA transposons), depending on a transposition mechanism with or without an RNA intermediate, respectively. Class I TEs further fall into two subclasses: long terminal repeats (LTR) or non-long terminal repeats (non-LTR) retrotransposons (2). In spite of structural difference between LTR and non-LTR retrotransposons, mobilization of class I TEs requires an RNA intermediate, followed by reverse transcription for the integration. Referred as “copy and paste”, the original sequence of class I retrotransposons remains intact after integration into the genome. In contrast, class II DNA transposons propagate themselves via “cut and paste” mechanism. Once transcribed and translated from source DNA sequence, transposase machinery from class II recognizes and excises terminal inverted repeats (TIRs) flanking self-DNA sequence from the original location, and finally re-integrates into a new genomic location. TEs from both classes over time accumulate genetic alterations in their internal sequence and often become incompetent for transposition by themselves. TEs with such genetic alterations, called as non-autonomous TEs, still can propagate in the genome by hijacking protein machineries from other TEs competent for autonomous transposition (3). Categorization of TEs by their mode of transposition looks rather simple. When it comes to details, nevertheless, biology of TEs is not only incredibly complicating but intriguingly puzzling. It is well known that a significant portion of human genome is occupied by repetitive sequences (4) and TEs are one type of such sequences that are interspersed throughout genome in different sizes and shapes. In the case of one heavily studied non-LTR retrotransposon, LINE1s (long interspersed nuclear elements, or L1s in short) take up approximately 17% of the human genome with half million copies (5). Although most of L1 copies exhibit genetic changes such as rearrangements, point mutations, and 5’-truncation unfavorable for mobilization (6), roughly 100 copies of L1s are estimated to be active in the genome. Polymorphism of the particular TEs also varies so that around 300 L1 insertion sites are known to differ between unrelated individuals (7), contributing to complexity and variations within the human population. TEs that are inactive in one species are not necessarily same cases in other species. Most of LTR type retrotransposons in human (endogenous retrovirus or HERV), if not all of them, are thought to be inactive (8), whereas some LTR-retrotransposons in koala (the koala retrovirus or KoRV), a vertically transmitting endogenous retrovirus, is actively invading the koala population (9). A class II transposon, called as P-elements, is active in wild population of Drosophila melanogaster whereas DNA transposons in human are unable to mobilize due to mutations accumulated in internal sequences (10). Interestingly, P-elements had successfully completed world-wide invasion in the genome of Drosophila melanogaster population within a century. Given that activation of Drosophila P-elements results in genomic instability and animal sterility (11), survival of wild Drosophila population from the invasion infers that there must be an arms race between host and the invading P-elements (12-14). As a matter of fact, invasion of P-elements in closely related species, Drosophila simulans, is still ongoing process (15, 16). Integration preference from individual TE family seems quite different as well. From a large collection of mutant fly lines with a single transposon insertion (the Drosophila Gene Disruption Project), three DNA transposons (Minos, P-element and PiggyBac) used in this project show different preference for their integration (17, 18). Analysis on a collection of fly lines with new insertion points out that preference for Minos elements comes close to a random distribution in fly genome, whereas P-elements have 200-400 “hotspot loci” accounting for 30-40% of new insertions and tend to be inserted near gene promoters. PiggyBac transposons also have hotspots, but are considerably different from those of P-elements. In other case, mobilization of TE takes place into a specific locus. R1/R2 non-LTR retrotransposons in Drosophila encode sequence-specific endonuclease responsible for mobilization exclusively into 28s rDNA locus, which has been a successful strategy for transposition in the genome at least for 500-800 millions of years (19). In summary, different features of TEs across the eukaryote phylogeny strongly argue that TEs are not simply kept as miscellaneous genetic sequences (20, 21). Instead, TE families from different species indeed deploy their unique strategy for propagation, which shapes a different landscape of host genome.

ROLES OF THE ‘CONTROLLING ELEMENTS’ IN GENOME: AN ALLY OR AN ENEMY?

Unprecedented technical advances in genomics since early 2000s, a vast amount of sequencing data sheds light on the composition of the genome from a variety of species. With a few exceptions such as apicomplexan parasites (22), a significant fraction of eukaryotic genome is transposable elements (TEs) that are interspersed within a genome (4). Being once considered as genetic fossils during evolution, however, growing evidence undoubtedly points out that TEs participate in many aspects of biology, which seems to be irrelevant to “selfish elements”. When TE mobilization takes place, an insertion can alter the local gene regulatory network or an inserted TE segment might introduce their own intrinsic transcriptional modules such as enhancers, insulators and repressors in new genomic loci. Industrial melanism (23), color/shape change in plant (24), and maize domestication (25) are the widely recognized such instances, illustrating extraordinary relationship between TEs and their hosts during evolution. Notably, abundance in genome and capability to interact with host transcription machinery offer changes in transcriptional regulation, which can interweave the expression of nearby host genes. Numerous studies have shown co-option or exaptation of TEs as a region for gene regulatory innovation (26, 27). For example, analysis from binding sites of orthologous TFs in human and mouse cell lines pro-vided evidence of species-specific TE-derived binding peaks (28). Other cases for enrichment of non-autonomous MER130 family as active enhancers adjacent to neocortical genes for neural development (29) and association of MER20 elements in rewiring the gene network for pregnancy in placental mammals (30) indicate that TEs extensively act as one of driving forces for novel regulatory network during evolution. In addition to modulating the gene expression, hosts take advantage of TE-derived sequences that are repurposed for a new function. One classic example is the recombination activating 1 (RAG1) and recombination activating 2 (RAG2) proteins, which catalyze the V(D)J rearrangement essential for the adaptive immune systems. It was proposed that RAG1 and RAG2 were domesticated from the ancient Transib element in that the terminal inverted repeats (TIRs) and its way of arrangement are similar (31). It was found that ProtoRAG, an active Transib element from the lancelet encodes RAG1-, RAG2-like genes and it resembles RAG1/2-mediated DNA rearrangement (32), which further solidified the origin of RAG1/2 system in jawed vertebrates. Telomeres and centromeres, the two vital features of eukaryotic chromosomes, appear to be occupied by transposon sequences in some organisms. The terminal regions of a chromosome to maintain genome stability and integrity, telomeres of Drosophila are preserved by three retrotransposons, Het-A, TART and TAHRE, or collectively termed as HTT (33-36). Although Het-A- and TART-derived sequences are also found in centromeric regions of Y chromosome (37), most copies of HTT exist in telomere regions, pointing out their strong preference for transposition. This observation also brings intriguing view of how to achieve the balance between the rate of retrotransposition in such restricted region of the genome and telomere length regulation. In addition, centromeres essential for proper segregation of sister chromatids during cell division are composed of highly repetitive DNA sequences. Interestingly, recent reports on Drosophila (38) and others such as some plants (39), amoeba (40) and kangaroo (41) reveal that centromeres contains TE sequences. Combinatorial sequencing efforts on centromeres from Drosophila melanogaster show that all centromeres are enriched by a specific retrotransposon, G2/Jockey-3, flanked by satellite repeats (38), opening new questions of functional relationship between particular TEs and centromere biology. Obviously, all transposition events are not necessarily advantageous to host in terms of physiology, development, and evolution. For examples, a large collection of mutant fly lines with a single transposon insertion (the Drosophila Gene Disruption Project) shows largely deleterious effects on nearby genes (17, 18), as well as transposition-induced embryonic lethality from mutagenesis studies using PiggyBack (PB)/Sleeping Beauty (SB) transposons in mouse (42, 43). Initially found from a patient with hemophilia, retrotransposition of LINE1s is known to cause other disorders as well, including many types of cancer (5, 44). Given occurrence of a new insertion by retrotransposons such as Alu elements (1 out of 20 live births), LINE1 (1 out of 20-200 births) and SVA (1 out of 900 births), it is not peculiar to see that 124 cases of disorders are associated with retrotransposition events (5) and that altered gene expression profile by retrotransposition could transform cells to be oncogenic (44-46). Besides of a role as an internal mutagen, sequence similarity and abundance of TEs in genome as substrates for non-allelic homologous recombination can give rise to intra- or inter-chro-mosomal crossing over. Although unequal crossing over may result in expansion of gene-rich segmental duplications found in human (47), recombination events mediated by integration sites of TEs such as Alu elements have been reported in various human disorders (48, 49). To prevent deleterious effects from mobilization of TEs, hosts employ evolutionarily conserved molecular strategies to silence TEs transcriptionally and post-transcriptionally (50-52). When host surveillance systems such as DNA methylation 5mC (50) or small noncoding RNAs such as piRNAs in germline (51) are experimentally incapacitated, genome-wide derepression of transposons unsurprisingly conveys genomic catastrophe. It has been shown that unleashed from germline specific piRNA pathway in Drosophila, massively derepressed TE mobilome from germ cells takes host machinery to selectively target developing oocytes, the sole channel to the next generation (53). Similarly, genome-wide activation of P-elements in Drosophila leads to numerous de novo insertions in developing germ cells, resulting in genomic instability and female sterility (12, 13). In other species such as human, mouse, and zebrafish, derepression of TEs without host defense systems results in animal sterility as a phenotypic outcome (54-57). Taken together, examples in this section emphasize on extraordinary relationship between TEs and their host. Genome-wide TE activation unquestionably compromises genome integrity, leading to detrimental consequences such as animal sterility. Meanwhile integration events by TEs can rewire transcriptional regulation of genes, which seems to be symbiotic from many cases. Given insertional impacts on host genome, therefore, precise monitoring on TE mobilization is essential for better understanding of the double-edged swords coevolving with host.

VISUALIZING MOBILIZATION OF TES WITH CELLULAR RESOLUTION

A life cycle of retrotransposon is a series of transcription, translation, followed by reverse transcription and finally integration into new locus with their own preference. Given complexity on sequence and abundance of TEs residing in the genome, cataloguing de novo insertions of TEs has been a quite challenging task. For last two decades, analysis of transcriptomic profile using RNA-seq is apparently one of choices to estimate the degree of derepression of TEs. In line with this, many studies using Drosophila as a model system heavily rely on the transcriptional level of TEs as a proxy for TE activation, which, in fact, is an indirect method to decipher transposon mobilization. For examples, studies using Drosophila female germline and a gut showed that actual mobilization events quantified at the genomic DNA level are less correlated to transcriptome or small RNAs (piRNAs) sequencing data, thus sug-gesting that RNA signatures provide insufficient information on the degree of transposition (53, 58). Even quantification of transposition events at the genomic DNA level lacks cellular resolution because majority of sequencing data represents TE insertions from a total sum of given cells or tissues. In this section, we introduce recent progress to visualize mobilization events spatiotemporally with cellular resolution in mammal and Drosophila model systems.

Endeavors to visualize mobilization of non-LTR TEs

Estimated about 17% of human genome is LINE1 retrotransposons (L1s), non–long terminal repeat (non-LTR) retroelements. Of these, estimated 100 copies actively transpose (known as hot LINE1s) (59, 60) and trans-mobilize other non-autonomous transposons such as Alu elements, SINE, etc (61, 62). As studies on impacts of L1s in physiology and human disease are still ongoing, an engineered reporter system and its derivatives emulating the non-LTR life cycle have been widely used to address several questions on L1 biology. To capture retrotransposition events of L1s, briefly, a retrotransposition cassette contains a reporter such as green fluorescent protein (GFP) in 3’ UTR region of LINE1 (Fig. 1A). This reporter, where orientation is in the opposite direction with respect to a normal L1 transcription, is engineered to disrupt its expression by an artificial intron in the middle of the reporter (IVS, intervening sequence in Fig. 1A). While the engineered L1 cassette is being transcribed, the artificial intron from the reporter is spliced out, followed by the reverse transcription of the entire cassette. After the cassette without the artificial intron successfully is retrotransposed elsewhere in genome, one can observe a reporter signal (GFP) that enables to identify cells with de novo insertion events. This conceptual design of retrotransposition assay has been modified, and applied to many studies (Table 1); a retrotransposition cassette with CMV/CAG promoter (63-72); with endogenous human/mouse L1 promoter (73, 74); a same concept applied to Drosophila I-element similar with non-LTR retrotransposons containing two ORFs such as mammalian L1s (53, 75). The engineered L1 retrotransposition cassette has served as an excellent tool to appreciate mobilization of LINE1s in human genomes, heterogeneity in mouse brain as well as a selective integration into developing oocytes in Drosophila melanogaster (53, 64, 72, 73, 75-77).

Fig. 1

Strategies for monitoring retrotransposition events with cellular resolution. (A) An overview of the LINE1 retrotransposition assay (adopted from (64). The L1-cassette contains reporter (shown is GFP) in the opposite direction to normal L1 transcription. As transcription occurs, the intron that disrupts the reporter is removed by a splicing event. SD, splice donor; SA, splice acceptor; IVS, intervening sequence. (B) A schematic of gypsy-TRAP reporter (adopted from (84). i) No integration: Expression of GAL80 under α-tubulin promoter with intact hot spots (ovo promoter) suppresses GAL4-mediated transcription, thus no GFP signal is detectable. ii) Gypsy integration: If integration of gypsy into the ovo binding site occurs, depicted by triangles, GAL4-driven GFP expression becomes detectable as GAL80 expression is ceased. UAS, Upstream Activating Sequence. (C) A gypsy-CLEVR reporter mimicking the replication cycle of retrovirus (adopted from (92). During replication of gypsy retrotransposon, 5’ end of the 3’-LTR region (U3’’ in black box) and 3’ end of the 5’ LTR region (U5 in white box) are respectively used for synthesis of 5’ end of 5’-LTR region and 3’ end of 3’-LTR region. Note that both 5X UAS in 5’-LTR region (U5 in white box) and a reporter (GFP-P2A-mCherry) in 3’-LTR region (U3’’ in black box) are in opposite direction to gypsy transcription. Retrotransposition of gypsy-CLEVR leads 5X UAS in vicinity to the reporter (GFP-P2A-mCherry), allowing expression of the reporter by GAL4 activator. PBS, primer binding site; LTR, Long Terminal Repeats.

Table 1

Retrotransposition assays to monitor transposition events

Transposition assay	Cell / Tissue	Reporter	Ref
LINE-1 retrotransposition assay	Cell line (Human)	EGFP	59, 65, 73, 77
		NEO^R	60-62, 64, 69, 71, 76, 77
		Luciferase	66
		TEM1	67
	Cell line (Mouse)	NEO^R	64
	Cell line (CHO)	NEO^R	60, 70
	Tissues (Mouse/Rat)	EGFP	72, 73, 74
	Tissues (Drosophila)	EGFP	53
	Tissues (Drosophila)	NEO^R	75
Gypsy-TRAP	Tissues (Drosophila)	EGFP	84, 86, 87, 89, 90
CLEVR	Cell line (Drosophila)	EGFP-mCherry	92, 93
CLEVR	Tissues (Drosophila)	EGFP-mCherry	92, 95

EGFP: enhanced green fluorescent protein; NEOR: Neomycin resistant gene; CHO: Chinese hamster ovarian cell; TEM1: beta-lactamase.

Endeavors to visualize mobilization of LTR TEs

The gypsy elements in Drosophila melanogaster are categorized as long terminal repeats (LTR) retrotransposon, similar to the Ty elements of Saccharomyces cerevisiae and vertebrate provirus of retrovirus (78-80). Interestingly, the gypsy elements tend to integrate into seven regions within a 200-bp of the promoter of the ovo gene, which is necessary for proper oogenesis in the female germline (81-83). Adopting such behavior of gypsy transposon generated a cassette to trace mobilization events, so-called a gypsy-TRAP line (Fig. 1B) (83, 84). Without gypsy mobilization, α-tubulin promoter in the cassette drives expression of the GAL80 proteins which suppress the activity of GAL4 transcription factors and subsequent downstream UAS-reporter (Fig. 1B, upper panel). When endogenous gypsy TEs retrotranspose into the known hotspots (the ovo promoter) in the gypsy-TRAP cassette, the expression of GAL80 become ceased. In turn, GAL4 transcription factors are then released from the suppression by GAL80, driving the expression of the reporter gene (Fig. 1B, lower panel) (85). This process permits to map cells in vivo with new integration events of gypsy transposons in a given tissue and developmental stage (Table 1). For examples, the gypsy-TRAP line provides evidence to support ideas that gypsy are activated in an aged fly brain (84); in aged fat body (equivalent to liver in mammal) (86, 87); in aged fly intestine (88); in fly model of tauopathy (89); in FTD–ALS causing CHMP2BIntron5 mutation (90); in the developing mesodermal tissue with histone 3 lysine 9 (H3K9) substituted by arginine (H3R9) (91). The gypsy-TRAP line, however, has some technical downsides. Firstly, the reporter signal might be a consequence of mobilization by other retrotransposons sharing similar hot spots (81). Secondly, the system needs three transgenes (a reporter under UAS regulatory elements, GAL4 transcription factor, and the engineered cassette with GAL80 suppressor) in one animal, rendering further applications of this system a little difficult. An interesting idea that takes advantages of gypsy transposon replication has been proposed to improve the gypsy-TRAP line. In the newer version, named as cellular labeling of endogenous retrovirus replication reporter or CLEVR in short (Fig. 1C and Table 1) (92), the conserved features of retrovirus replication is applied to the system. Briefly, 5’ end of the 3’-LTR (U3’’ region in Fig. 1C) in the gypsy RNA is used as a template for the synthesis of 5’ end of the 5’-LTR during replication. Similarly, 3’ end of the 5’-LTR (U5 region in Fig. 1C) is used for the synthesis of 3’ end of the 3’-LTR. This system additionally includes a GFP-P2A-mCherry reporter at the 5’ end of the 3’-LTR (U3’’ region in Fig. 1C) and 5X UAS regulatory elements at the 3’ end of the 5’-LTR (U5 region in Fig. 1C), where orientation of both the reporter and 5X UAS regulatory elements is opposite with respect to the gypsy transcription. Upon completion of de novo integration, it generates two hybrid LTRs with 5X UAS regulatory elements in the vicinity of the GFP-P2A-mCherry at both ends. When combined with a tissue-specific driver such as glia-specific GAL4 (repo-GAL4), fluorescent reporter signals in the fly brain were detectable in age-dependent manner (92). The signal was all disappeared when gypsy-RNAi was introduced to the same animal, verifying its specificity (92). This system was further used to show that 1) gypsy transposons like a retrovirus are capable of transmitting intercellularly between Drosophila cells grown in the same culture media (93), in consistent with finding that gypsy mobilization involves production of gypsy viral particles in the follicle cells to infect the oocyte in Drosophila (94) and that 2) fly driving glial-specific expression of hTDP43, found to be aggregated in amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD), experiences gypsy-ERV activation, DNA damage and apoptosis in both glia and nearby neurons, hypothesizing the endogenous retrovirus might contribute to TDP-43-mediated neurodegeneration in non-cell-autonomous manner (95). Since CLEVR strategy can be applicable to other LTR type retrotransposons, it will enable to improve our understanding of transposable mobilization further in various contexts of biology.

DETECTING MOBILIZATION WITH NUCLEOTIDE RESOLUTION

Detecting TE mobilization events in earlier days relied on traditional methods such as Southern blotting and in situ hybridization on polytene chromosomes (96-98), which provide rough information on integration sites in the genome. PCR-based strategies such as L1 display, ATLAS (amplification typing of L1 active subfamilies) and LIDSIP (L1 insertion dimorphisms identification by PCR) (99-101), and transposon display (TD) (102) have been used to locate insertion sites of TEs. In spite of efficiency and versatility, prior annotation for the TE of interest is a prerequisite for successful identification of new insertions. Recent technical advances in DNA sequencing together with computational tools have opened up an unprecedented level of TE landscape from given tissues or contexts with nucleotide resolution, rapidly becoming a standard tool. As numerous computational tools on the basis of mapping events via split-reads or discordant read pairs (or in combination of both) are developed to detect polymorphic TE insertions/deletions (103), tremendous endeavors have identified de novo and mosaic TE insertions; and such discoveries engage in dissection of functional impacts on a variety of organisms (13, 44, 53, 104). In terms of mappability, nonetheless, utilization of short reads to study TEs mobilization can be challenging due to repetitive nature of TEs. Mapping of short reads on repetitive regions in genome can create ambiguities and incomplete contiguity, which might mislead data interpretation. For examples, a measurement of de novo insertion events by short reads based on an Illumina platform pointed out high false positive rates despite high sensitivity of the mapping strategy (105), suggesting that somatic transposition events might be less prevalent than expected. This study argues a fundamental flaw owing probably to unavoidable chimeric artifacts during library preparation and detection algorithm (105). Another example is that exon trapping by a 2kb SVA transposon insertion in the intron of MFSD8 was initially missed by standard clinical sequencing, suggesting that accurate tracking of transposon is a critical point especially for genomic medicine (106).

DETECTING MOBILIZATION BY OXFORD NANOPORE TECHNOLOGIES (ONT)

So-called third generation sequencing (TGS) implemented with ultra-long reads have recently emerged (107), allowing to pro-duce sequencing reads typically more than several kilo-bases long. Since ultra-long sequencing reads can solve the mappability issue from short sequencing reads (108), many platforms have been developed, including Oxford Nanopore Technologies (ONT), Pacific Biosciences (PacBio), etc (109). Although each platform undoubtedly provides its unique potential to decipher the complexity of genome, we briefly review especially Oxford Nanopore Technologies (ONT) related in TE biology due to its feasibility for a small-sized laboratory. Since its finding on the bio-pore that enables to detect ionic current blockage as nucleic acid polymers pass through (110, 111), a pocket-sized device of MinION nanopore technology has been devel-oped to trace changes in voltage corresponding to DNA sequences. Released in 2014 to early-access users (112), it has undergone a series of improvement on signal-to-noise ratio (113, 114) as well as algorithms for long-read data analysis (115). Efforts on improvement of ONT sequencing technology are ongoing process to achieve higher accuracy (116-119), extend read length (120, 121) and increase throughput (122-125). Shown to produce a read size even up to a mega base scale (126), it is now being applied to study unanswered questions such as finishing gaps in reference genome of human, nematode, plant, zebrafish and fruit fly (127-135) or to build non-reference genome by de novo assembly (108). ONT sequencing has been in attention as it can complement some of drawbacks from short-read sequencing, including mappability on repetitive regions, technical biases during library preparation and so on. Although read accuracy (87-98%) of ONT sequencing needs to be improved when compared to short read sequencing (>99.9%), nanopore technique can detect new insertions and intact target site duplications of TEs from single read (136). ONT sequencing detected 46 new TE insertions/TE losses from Drosophila reference stock maintained in a laboratory for more than 350 generations, compared to the reference genome (128), and possibly becomes feasible to study TE dynamics in population scale as well (137). From different types of cancer samples, ONT method further discovered new insertions of TEs that had not been catalogued previously. For instances, ONT method in combination with LDI-PCR uncovered new insertional events of LINE1 in colorectal tumor that were undetectable with 40x sequencing (138). Analysis on liver cancers, which were previously sequenced for the International Cancer Genome Consortium (ICGC), identified germline and somatic structural variations (SVs) probably caused by non-allelic homolog recombination (NAHR) by SINE transposons, providing evidence that long reads can be a valuable platform for detection of structural variations than short reads based approach (139). Although ONT technology requires further improvement to accomplish better read accuracy than the current status, there has been efforts in conjunction with other genomic approaches to capture more precise landscape of transposons. In combination with Hi-C scaffolding, ONT sequencing from two wild type strains of Drosophila melanogaster identified hundreds of TE insertions missed from short read-based studies before (140). Another study focusing on clonal neoplasia in fly gut tissue combined short and long read sequencing to profile directly somatic TE insertions from genomic DNA samples instead of TE transcriptome (141). In addition, efforts on filling large gaps in a centromere utilized ONT sequencing method in conjunction with ChIP-seq for CENP-A (centromeric histone), and super-resolution chromatin fiber imaging, leading to discovery that centromeres of Drosophila melanogaster are occupied by non–long terminal repeats (non-LTR) retroelement of Jockey family, G2/Jockey-3 (38). Although a question of how such retroelement become a major source in all centromeres of Drosophila melanogaster remains to be elucidated, it is remarkable that long read sequencing advances our understanding and unveils new questions about centromere of Drosophila genus. It is well known that DNA modification such as 5-methylcytosine transcriptionally suppresses transposons in many organisms (50). 5-methylcytosine is technically detectable by pre-treated chemical reaction to differentiate from cytosine, also known as bisulfite sequencing (142). As ONT platform utilizes native nucleic acids as a substrate, it precludes the pre-chemical reaction during library preparation and is able to directly ask 5-methylcytosine status on repetitive regions of the genome. Studies showed that DNA methyl transferase I (DnmtI)-dependent DNA methylation is enriched at IAP retrotransposons in mouse embryonic stem cells, and that locus-specific methylation of human TEs from many types of tissues and liver tumor show DNA methylation dynamics and its effect on the repression of TEs (143, 144). In addition to reading DNA modification by sequencing native genomic DNA molecules, ONT system is also able to directly sequence long RNA molecules or complementary DNA (cDNA). Sequencing of the 5’-Cap-captured native full length RNAs from Locust genome identified widespread TE exonization, which was a computationally challenging task by short-read sequencing (145). Direct reading of long RNA sequences was able to catalogue previously unannotated retrotransposon-related transcripts at the early stage of triticale seed development (146), opening new questions on their role in phenotypic variations. ONT sequencing on cDNA generated from virus-like particles (VLPs) from ddm1 mutants identified active transposons in Arabidopsis thaliana and Zea mays without mapping to genomic DNA (147, 148). Despite the requirement of a large quantity of nucleic acids samples, direct sequencing of RNA molecules can expand understanding of TE transcriptome and bypass biases that might be integrated by PCR amplification or reverse transcription. Taken together, ONT platform, since its release, certainly has been a means to scrutinize TE biology with distinct perspective that short read sequencing was unable with and opens up promising opportunities to appreciate further impacts of TEs in physiology, development and evolution.

SUMMARY

Most of living organisms in their genome have experienced TE propagation that takes place regardless of host fitness during evolution. Beneficial de novo insertions have established mutualistic relationship between TEs and host that undergo positive selection over time, meanwhile mutagenic transposition cases negatively impact on host fitness. Studies of more than 70 years have been expanded to explore fine mapping of polymorphic TEs between individuals/species; functional impacts of TEs in diverse contexts; the molecular basis on TE silencing in a given tissue/species and so on. Despite exciting discoveries, challenges due to the complexity of TEs in the genome have still placed our understanding of TEs to be ongoing progress. In addition to technical approaches introduced in this article, there are other efforts with distinct perspectives to decipher complexity of TEs. For example, single cell genomics combined with long read sequencing catalogued comprehensive information of TE expression (149). Proteomic approach (‘proteomics informed by transcriptomics’) characterized active TEs in poorly annotated organisms such as Aedes aegypti (150). Together with rapidly developing techniques, multifaceted approaches such as in vivo retrotransposition reporter in conjunction with digital droplet PCR (76), PacBio platform (77), short reads (53) show synergetic effects to broaden our knowledge of TEs. In summary, continuous endeavors for technical breakthrough and combination of techniques with different angles will provide better spatiotemporal resolution on TE biology, which will help us to assess further relationship between transposable elements and host in diverse contexts.

148 in total

1. ATLAS: a system to selectively identify human-specific L1 insertions.

Authors: Richard M Badge; Reid S Alisch; John V Moran
Journal: Am J Hum Genet Date: 2003-03-11 Impact factor: 11.025

2. Determination of TE Insertion Positions Using Transposon Display.

Authors: Eun Yu Kim; Wenwen Fan; Jungnam Cho
Journal: Methods Mol Biol Date: 2021

3. Recent amplification of the kangaroo endogenous retrovirus, KERV, limited to the centromere.

Authors: Gianni C Ferreri; Judith D Brown; Craig Obergfell; Nathaniel Jue; Caitlin E Finn; Michael J O'Neill; Rachel J O'Neill
Journal: J Virol Date: 2011-03-09 Impact factor: 5.103

4. Plant centromeric retrotransposons: a structural and cytogenetic perspective.

Authors: Pavel Neumann; Alice Navrátilová; Andrea Koblížková; Eduard Kejnovský; Eva Hřibová; Roman Hobza; Alex Widmer; Jaroslav Doležel; Jiří Macas
Journal: Mob DNA Date: 2011-03-03

5. Chromatin-modifying genetic interventions suppress age-associated transposable element activation and extend life span in Drosophila.

Authors: Jason G Wood; Brian C Jones; Nan Jiang; Chengyi Chang; Suzanne Hosier; Priyan Wickremesinghe; Meyrolin Garcia; Davis A Hartnett; Lucas Burhenn; Nicola Neretti; Stephen L Helfand
Journal: Proc Natl Acad Sci U S A Date: 2016-09-12 Impact factor: 11.205

6. MIWI2 is essential for spermatogenesis and repression of transposons in the mouse male germline.

Authors: Michelle A Carmell; Angélique Girard; Henk J G van de Kant; Deborah Bourc'his; Timothy H Bestor; Dirk G de Rooij; Gregory J Hannon
Journal: Dev Cell Date: 2007-03-29 Impact factor: 12.270

7. RAG1 core and V(D)J recombination signal sequences were derived from Transib transposons.

Authors: Vladimir V Kapitonov; Jerzy Jurka
Journal: PLoS Biol Date: 2005-05-24 Impact factor: 8.029

8. A somatic piRNA pathway in the Drosophila fat body ensures metabolic homeostasis and normal lifespan.

Authors: Brian C Jones; Jason G Wood; Chengyi Chang; Austin D Tam; Michael J Franklin; Emily R Siegel; Stephen L Helfand
Journal: Nat Commun Date: 2016-12-21 Impact factor: 14.919

9. Nanopore sequencing and assembly of a human genome with ultra-long reads.

Authors: Miten Jain; Sergey Koren; Karen H Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R Tyson; Andrew D Beggs; Alexander T Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E Olsen; Brent S Pedersen; Arang Rhie; Hollian Richardson; Aaron R Quinlan; Terrance P Snutch; Louise Tee; Benedict Paten; Adam M Phillippy; Jared T Simpson; Nicholas J Loman; Matthew Loose
Journal: Nat Biotechnol Date: 2018-01-29 Impact factor: 54.908

10. Transposons Hidden in Arabidopsis thaliana Genome Assembly Gaps and Mobilization of Non-Autonomous LTR Retrotransposons Unravelled by Nanotei Pipeline.

Authors: Ilya Kirov; Pavel Merkulov; Maxim Dudnikov; Ekaterina Polkhovskaya; Roman A Komakhin; Zakhar Konstantinov; Sofya Gvaramiya; Aleksey Ermolaev; Natalya Kudryavtseva; Marina Gilyok; Mikhail G Divashuk; Gennady I Karlov; Alexander Soloviev
Journal: Plants (Basel) Date: 2021-12-06