Literature DB >> 22491057

Survey sequencing reveals elevated DNA transposon activity, novel elements, and variation in repetitive landscapes among vesper bats.

Heidi J T Pagán¹, Jiří Macas, Petr Novák, Eve S McCulloch, Richard D Stevens, David A Ray.

Abstract

The repetitive landscapes of mammalian genomes typically display high Class I (retrotransposon) transposable element (TE) content, which usually comprises around half of the genome. In contrast, the Class II (DNA transposon) contribution is typically small (<3% in model mammals). Most mammalian genomes exhibit a precipitous decline in Class II activity beginning roughly 40 Ma. The first signs of more recently active mammalian Class II TEs were obtained from the little brown bat, Myotis lucifugus, and are reflected by higher genome content (~5%). To aid in determining taxonomic limits and potential impacts of this elevated Class II activity, we performed 454 survey sequencing of a second Myotis species as well as four additional taxa within the family Vespertilionidae and an outgroup species from Phyllostomidae. Graph-based clustering methods were used to reconstruct the major repeat families present in each species and novel elements were identified in several taxa. Retrotransposons remained the dominant group with regard to overall genome mass. Elevated Class II TE composition (3-4%) was observed in all five vesper bats, while less than 0.5% of the phyllostomid reads were identified as Class II derived. Differences in satellite DNA and Class I TE content are also described among vespertilionid taxa. These analyses present the first cohesive description of TE evolution across closely related mammalian species, revealing genome-scale differences in TE content within a single family.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2012 PMID： 22491057 PMCID： PMC3342881 DOI： 10.1093/gbe/evs038

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

Eukaryotes typically display high proportions of genomic content derived from transposable elements (TEs). These repetitive sequences are capable of movement within the genome and are classified according to their mode of transposition. Most mammalian insertions can be attributed to Class I TEs, also known as retrotransposons. Their copy-and-paste method of mobilization can lead to substantial accumulations in a genome. For example, Class I TEs comprise at least 45% of the human genome (Lander et al. 2001) and some estimates place that number above 60% (de Koning et al. 2011). The cut-and-paste mobilization mechanism of Class II TEs (DNA transposons) has likely contributed to their low representation in the human genome, ∼3%. Similarly, low proportions were identified in other mammals: <2% of dog and opossum and <1% of mouse and rat genomes (Waterston et al. 2002; Gibbs et al. 2004; Lindblad-Toh et al. 2005; Mikkelsen et al. 2007). However, an apparent overall lack of Class II activity in mammals in the recent past is another factor limiting the contribution of DNA transposons to mammalian genomes. Observations of various mammalian models (human, mouse, rat, and dog) have suggested a generalized shutdown of Class II TEs during roughly the same time period, ∼40 Ma (Lander et al. 2001; Waterston et al. 2002; Gibbs et al. 2004; Lindblad-Toh et al. 2005). The first identified exception to this rule is the vespertilionid bats, in particular Myotis lucifugus. While Class I TEs still dominate the overall TE landscape in M. lucifugus, Class II TEs have played a larger role when compared with other mammals (Ray et al. 2008). For example, members of the Helitron family, with their unique rolling circle amplification mechanism, have made significant contributions to genome content (Pritham and Feschotte 2007; Thomas et al. 2011). Many Helitron insertions as well as insertions from at least eight other Class II TE families occurred much more recently than 50 Ma and some may still be mobilizing. Interestingly, these recent DNA transposon invasions coincide with rapid diversification of Myotis, a genus with nearly worldwide distribution and more than 100 species (Wilson and Reeder 2005; Stadelmann et al. 2007). TE presence and activity can generate a diverse array of effects on coding sequence and expression of host genes (Kidwell and Lisch 1997; Deininger et al. 2003). In context of recent publications highlighting the capacity of TEs to drive speciation (Oliver and Greene 2009; Zeh et al. 2009; Rebollo et al. 2010), these findings provide a potential mechanism for the adaptive radiation of Myotis. On a larger scale, Myotis is a member of Vespertilionidae, the most species rich of all chiropteran families and the second most species rich family of mammals (Simmons 2005). Investigating the degree to which the elevated Class II activity found in Myotis extends to other bats is essential to future examinations into the potential role TEs have played in the evolution of Chiroptera as a whole and Vespertilionidae in particular. Here, we present analyses of the TE landscapes for five additional vesper bats: Myotis austroriparius, Perimyotis subflavus, Nycticeius humeralis, Lasiurus borealis, and Corynorhinus rafinesquii (fig. 1). The genus Myotis diverged early from a monophyletic clade encompassing the other taxa, which represent a diverse sampling within Vespertilionidae and thus allow us to determine if elevated Class II levels evolved singularly within Myotis. To serve as an outgroup, we also analyzed the phyllostomid bat, Artibeus lituratus. In our analyses, we applied 454-based sequencing to survey TE content. We demonstrate the utility of this method to investigating TE dynamics in nonmodel taxa that are unlikely to be the target(s) of full genome sequencing efforts, which will aid in determining the impact of Class II TEs on mammalian genome evolution.

Most recent of several possible phylogenies for the surveyed taxa. Topology and vespertilionid divergence dates are taken from Lack and Van Den Bussche (2010). The date of the Artibeus lituratus/vespertilionid divergence is taken from Datzmann et al. (2010), and the M. lucifugus/M. austroriparius divergence is from Stadelmann et al. (2007).

Materials and Methods

454 Sequencing and Sequence Processing

DNA extractions were carried out on M. austroriparius, P. subflavus, N. humeralis, L. borealis, and C. rafinesquii using 5 PRIME ArchivePure DNA Tissue Kits. Sequencing was performed on genomic DNA at the Georgia Genomics Facility. Roche standard chemistry was used initially (L. borealis), but for the remaining vespertilionid samples, Titanium chemistry was utilized to accommodate lower DNA concentrations. Sample preparation and processing followed Roche protocols (October 2008). The A. lituratus data was acquired separately (McCulloch and Stevens 2011); phenol–chloroform DNA extraction was used, and 454 Titanium sequencing was performed at Duke University Genome Sequencing and Analysis Core Facility using standard protocols. Emulsion polymerase chain reaction (emPCR) drops containing only one unique template but multiple beads can produce sequencing artifacts consisting of duplicate sequences with nearly identical starting positions (Dong et al. 2011). Thus, all raw data were parsed locally using 454 Replicate Filter (http://microbiomes.msu.edu/replicates/) to remove these artifacts. Parameters were set at 0.95 sequence identity cutoff, 0 length difference requirement, and 3 beginning base pairs to check. Reads derived from mitochondrial sequences were identified using BlastN. In some cases, we were able to reconstruct nearly complete mitochondrial genome sequences, and these have been discussed in a separate manuscript (Meganathan et al. 2012).

Repeat Discovery

To identify repeat content for each genome, we modified the pipeline developed by Macas et al. (2007) and updated by Novak et al. (2010). The methods were developed for plant genomic data but are applicable to mammalian genomes with minor modifications as described below. Briefly, the analysis consists of all-to-all comparison of 454 reads using mgblast (Pertea et al. 2003) and representation of pair-wise sequence similarities exceeding the specified threshold (overlaps containing 55% or more of the longer read with 90% similarity) as edges in a virtual graph connecting the similar reads represented by graph nodes. The reads representing different families of repetitive elements can then be distinguished as clusters (communities) of frequently connected nodes within the graph. These clusters are separated and the reads are further investigated, including their assembly into contigs using cap3 with “–o 100 –p 85” settings. For each taxon in our analysis, a set of clusters consisting of contigs derived from overlapping reads was obtained. A cutoff was imposed to reduce the number of clusters analyzed to only include repeat families composing at least 0.01% of the respective genome of each species. Caution should be applied when extrapolating these data to the whole genome. Although our methods are apparently very good at identifying high copy number elements and moderate-to-low copy number families with high similarity, they will necessarily be inadequate for identifying very low copy number families and older highly diverged elements in a genome. In the former case, the reasoning is obviously due to the lack of whole genome coverage. In the latter case, the inadequacy is due to the combination of our assembly method and limiting ourselves to contigs with genome coverage of >0.01% of the genome. Highly divergent families or families with a large number of divergent subfamilies that each have low copy numbers would not assemble well in our analysis or produce multiple contigs that all fall below the 0.01% cutoff. Such scenarios would lead to underestimations in genome TE content. Clusters may be representative of a particular TE family and every contig a possible consensus for a TE subfamily (Macas et al. 2007; Novak et al. 2010). However, the initial assembly resulted in individual clusters with large numbers of distinct contigs within them. For instance, M. austroriparius Cluster 1 contained 748 reads, 677 of which were assembled into 20 distinct contigs. Visual examination suggested that the contigs in a majority of these clusters could be assembled further to reduce the final data set without losing information. We reassembled these primary contigs in SeqMan (match size = 12, minimum percentage = 70, minimum length = 100) (DNASTAR, Madison, WI). Reassembly yielded a single contig for M. austroriparius Cluster 1 that was identified as the LINE element L1MAB_ML in RepBase. Similar results were obtained for other complex clusters in all examined taxa. Consensus sequences from reassembled contigs were submitted to CENSOR to assist in classifying them into one of five categories: DNA, ERV/LTR, Non-LTR/LINE, Non-LTR/SINE, satellite, or unknown. In some cases, CENSOR returned hits to multiple TE families within a single contig. Such results could be caused by nested insertions or misassemblies and were addressed by splitting the contig into separate entries for the final library. Contigs were also queried with a custom library of bat-specific repeats derived from ongoing and previous analyses (Pritham and Feschotte 2007; Ray et al. 2007, 2008) using RepeatMasker. The library is available upon request. Contigs from M. austroriparius were submitted to NCBI BlastN to query against the current whole genome shotgun (WGS) draft of M. lucifugus (AAPE00000000). Most contigs were found in their entirety multiple times, confirming their repetitive nature. For all taxa, contigs that could not be identified were queried against the NCBI nr database using BlastN and the protein database using BlastX. To identify potential satellite repeats and tandem arrays, all unidentified contigs were also submitted to a local installation of Tandem Repeats Finder (Benson 1999), using the following parameters: match = 2, mismatch = 3, indels = 5, PM = 0.75, PI = 0.20, minimum period = 30, maximum period = 500. Output was then submitted to TRAP (Sobreira et al. 2006). Dotter (Sonnhammer and Durbin 1995) allowed graphical confirmation of potential tandem repeats. The remaining unidentified contigs were submitted to TEclass (Abrusán et al. 2009), a tool that determines the likely mode of transposition and thus aids in identification of repeat type. Potentially novel elements (contigs not identified via CENSOR, RepeatMasker, or Tandem Repeats Finder) were queried against the appropriate taxon sequence data using BlastN, which allowed us to generate a more accurate full-length consensus. If possible, the M. lucifugus 2× WGS was used to infer consensus sequences for TEs with low coverage in the 454 data. The top 40 hits were extracted with 200-bp flanking sequence (if available) using process_hits.pl (Smith and Ray 2011), a computational tool for TE mining which, in this case, was configured to combine hits with 50-bp overlaps and align them using MUSCLE. If the boundaries of the repeat element were not recovered, as evidenced by dissimilar sequence data at the 3′ and 5′ ends, then the outermost 150 bp of the consensus was used to query the data again and extend the alignment until the full-length TE could be assembled. Large contigs (>1000 bp) were submitted to open reading frame (ORF) Finder (http://www.ncbi.nlm.nih.gov/projects/gorf/) to identify potential reverse transcriptase, endonuclease, or transposase ORFs. Element names end with a two-letter taxon identifier to indicate the source of the consensus (i.e., Mariner2_Ml was inferred from M. lucifugus).

Age Analysis

Novel TEs were further analyzed to determine their approximate period of activity as described in Pagan et al. (2010). Consensus sequences were repeatmasked against the respective taxon from which each was inferred, either the appropriate filtered 454 data set or a quarter of the M. lucifugus 2× WGS. To ensure full-length hits could be acquired, the query sequences were trimmed to 300 bp; if possible, the fragment was selected from coding regions in autonomous TEs. RepeatMasker .align output files were processed by a perl script designed to calculate the Kimura 2-Parameter distances while excluding hypermutable CpG sites (Pagan et al. 2010). Output was parsed to only include hits that spanned at least 90% of the query sequence. Ages were estimated from the distances using the mammalian neutral mutation rate, 2.2 × 10−9 (Kumar and Subramanian 2002). A complete library of the full consensus sequences was also used to query all six 454 data sets using BlastN to test for lineage specificity.

Genome Representation

TE contribution to genome content is often quantified by the number of bases they occupy in sequenced genomes. This value could be estimated using contig length and read depth, as per Macas et al. (2007) for the pea (Pisum sativum) genome. However, unlike Pisum, in which the dominant TE is an LTR element, the primary TE components of mammalian genomes are LINEs which are often 5′ truncated. This makes them difficult to reconstruct in their entirety from the limited coverage and short read lengths we obtained. This is also true for other large autonomous TEs, especially those with low copy numbers. SINE subfamilies are another major component of mammalian genomes. In these bats, the dominant SINE is Ves, with a consensus of just over 200 bp. Additionally, there are several short (<400 bp) nonautonomous DNA transposon families. Each of these observations suggests that using contig length and read depth might lead to inaccurate estimates of genome coverage. For example, our average read length was ∼300 bp, longer than a typical full-length Ves. Thus, the assembled contig lengths would be longer than the actual elements and artificially inflate genome coverage calculations. We therefore chose to focus on the proportion of total hits for each TE in the filtered data. We used a custom library of TE consensus sequences as identified above from each taxon to mask the respective filtered data set with RepeatMasker. Process_hits.pl was used to combine hits with 50 bp overlap, and then tally the number of and length of hits with a minimum length of 30 bp (the shortest 454 read lengths) in each taxon for each of five repeat categories (DNA, ERV/LTR, Non-LTR/LINE, Non-LTR/SINE, and satellite). Each read should represent random data from the genome. Thus, the proportion of the genome occupied by each TE category and/or family was then extrapolated from the data.

Results

454 Sequencing

Approximately 3.97 × 108 bp of data were obtained. Genome sizes for all taxa were obtained from www.genomesize.com. C-values for P. subflavus and M. lucifugus were not available, but rather estimated from congeners. Genome coverage was calculated from the number of sequenced base pairs divided by the estimated genome size. Genome coverage ranged from ∼0.76% for M. austroriparius to ∼4.75% for C. rafinesquii. Read lengths ranged from 29 to 755 bp and averaged ∼300 bp. The 454 replicate filter reduced the data by around 20%. For example, coverage was decreased to 0.59% for M. austroriparius. However, this level of coverage still allowed for identification of repeats present in >1,000 copies in the genome (Macas et al. 2007). For example, a 1,000 copy repeat in M. austroriparius will be found ∼5.9 times in the data set, calculated as follows for 1.94 × 107 bp filtered data and 3.26 × 109 bp genome size: 1,000/[1/(1.94 × 107/3.26 × 109)]. Information on the filtered and unfiltered reads is summarized in table 1. The raw data are available from the Dryad Repository: http://dx.doi.org/10.5061/dryad.83164r7v.

Table 1

454 Sequencing Summary

	Total Reads	Mean Read Length (bp)	Total Base Pairs	Estimated Genome Size	Percentage of Genome Coverage	After Sequencing Artifact Filter
	Total Reads	Mean Read Length (bp)	Total Base Pairs	Estimated Genome Size	Percentage of Genome Coverage	Unique Reads	Percentage of Replicates	Percentage of Unique Genome Coverage
Artibeus lituratus	295660	397	1.01 × 10⁸	2.70 × 10⁹	3.75	255065	13.73	3.23
Corynorhinus rafinesquii	403317	285	1.15 × 10⁸	2.42 × 10⁹	4.75	317269	21.34	3.74
Lasiurus borealis	233826	368	8.60 × 10⁷	2.56 × 10⁹	3.36	169361	27.57	2.43
Myotis austroriparius	86583	285	2.47 × 10⁷	3.26 × 10⁹	0.76	67924	21.55	0.59
Nycticeius humeralis	135978	280	3.81 × 10⁷	2.42 × 10⁹	1.57	108535	20.18	1.26
Perimyotis subflavus	122395	265	3.24 × 10⁷	2.26 × 10⁹	1.44	99801	18.46	1.17

NOTE.—Percentage of Genome Coverage was approximated using mean read length and estimated genome size. A sequencing artifact filter was applied to data (Percentage of Unique Genome Coverage) before graph-based repeat discovery and RepeatMasker analyses to determine genome representation.

454 Sequencing Summary NOTE.—Percentage of Genome Coverage was approximated using mean read length and estimated genome size. A sequencing artifact filter was applied to data (Percentage of Unique Genome Coverage) before graph-based repeat discovery and RepeatMasker analyses to determine genome representation.

Repeat Discovery and Distribution

Myotis lucifugus is the best characterized bat with regard to TE content (Pritham and Feschotte 2007; Ray et al. 2007, 2008). Although we were unable to obtain a M. lucifugus sample for this sequencing survey, the inclusion of the congener, M. austroriparius, allows validation of our methods. The estimated 9.9 Myr divergence time (Stadelmann et al. 2007) between the two species suggests we should find similar TE landscapes. Indeed, we identified all major M. lucifugus TE families in M. austroriparius. Most contigs that were not initially classified using Censor or RepeatMasker were identified as either tandem repeats or mitochondrial DNA. Less than 0.5% of the M. austroriparius repeat content was labeled “unknown.” Most contigs could be classified as satellite, DNA, LINE, SINE, or LTR elements and were found either to be previously identified or, if not already characterized, were shown to be repetitive in M. lucifugus. Indeed, our estimates of genome coverage for multiple element classes using the WGS of M. lucifugus and the collected 454 reads for M. austroriparius are a close match (table 2). The only appreciable deviation between the two is for the non-LTR/LINEs. Harismendy et al performed a comparison of next generation sequencing platforms and found overall that Roche 454 data had fairly even treatment of unique versus repetitive sequences, but did note a 1.25 overrepresentation of LINEs. It is possible that we are observing this bias here, but it would be expected to occur equally across all taxa, and no apparent bias is observed for the Class II families. In combination with similar analyses on pea (Macas et al. 2007) and snake genomes (Castoe et al. 2011), these data suggest that our approach is appropriate for estimating the TE landscape despite limited genome coverage.

Table 2

Comparison of RepeatMasker Output from Myotis austroriparius 454 Data and the WGS for M. lucifugus

Element Class/Family	M. austroriparius		M. lucifugus
Element Class/Family	Percentage of RM Hits	Percentage of 454 Sequence Data	Percentage of RM Hits	Percentage of WGS
DNA/hAT	10.75	2.07	12.95	2.29
DNA/Helitron	15.13	2.78	16.23	2.57
DNA/Mariner	3.19	0.68	3.32	0.67
DNA/piggyBac	1.14	0.27	0.65	0.16
DNA/TcMar-Tigger	0.09	0.02	0.27	0.05
ERV/LTR	10.49	2.35	9.17	2.22
Non-LTR/LINE	29.10	9.21	17.49	6.02
Non-LTR/SINE	30.02	5.31	39.86	6.27
Non-LTR/unknown	0.10	0.03	0.04	0.02

NOTE.—Percentage of RM hits = proportion of total RepeatMasker hits to any given TE type. Percentage of 454 sequence data indicates proportion of bases masked from M. austroriparius survey sequence data. Percentage of WGS indicates proportion of bases masked in the M. lucifugus WGS.

Comparison of RepeatMasker Output from Myotis austroriparius 454 Data and the WGS for M. lucifugus NOTE.—Percentage of RM hits = proportion of total RepeatMasker hits to any given TE type. Percentage of 454 sequence data indicates proportion of bases masked from M. austroriparius survey sequence data. Percentage of WGS indicates proportion of bases masked in the M. lucifugus WGS. As described in Novak et al. (2010), graph conformation of a specific cluster revealed features of the respective repeat family. Reads, presented as vertices, are connected by edges to other reads, which they overlap. A summary of the five largest clusters for each taxon can be found in table 3.

Table 3

Top Clusters for Each Taxon

	Cluster Number	Original Number of Reads	Number of Reads Used in Contigs	Number of Cluster-Based Contigs	Number of SeqMan Contigs	Number of RepeatMasker Reads	Element Name	Element Family
Corynorhinus rafinesquii	CL1	9595	8625	283	7	24730	L1MAB_ML	Non-LTR/LINE
						2347	ERV2X1A_I_ML	ERV/LTR
	CL2	3820	3526	61	1	4842	HAL1-1A_ML	Non-LTR/LINE
	CL3	2582	2538	3	1	2814		mtDNA
	CL4	2469	2249	37	1	3775	HAL1-1A_ML	Non-LTR/LINE
	CL5	1755	1601	77	4	3343	HAL1-1A_ML	Non-LTR/LINE
Lasiurus borealis	CL1	3324	2919	102	1	12262	L1MAB_ML	Non-LTR/LINE
	CL2	2174	1956	80	1	2973	HAL1-1A_ML	Non-LTR/LINE
	CL3	1076	847	92	4	4182	HAL1-1A_ML	Non-LTR/LINE
	CL4	625	531	44	3	1467	L1MAB_ML	Non-LTR/LINE
	CL5	510	380	15	1	1663	HAL1-1A_ML	Non-LTR/LINE
Myotis austroriparius	CL1	748	677	20	1	2197	L1MAB_ML	Non-LTR/LINE
	CL2	644	599	14	1	960	HAL1-1B_ML	Non-LTR/LINE
	CL3	563	330	16	3	2882	VES	Non-LTR/SINE
	CL4	303	226	6	1	423	Tandem Repeat	Satellite
	CL5	262	248	4	1	510	L1MAB2_ML	Non-LTR/LINE
Nycticeius humeralis	CL1	1818	1093	34	4	10436	VES	Non-LTR/SINE
	CL2	614	521	28	2	1397	HAL1-1A_ML	Non-LTR/LINE
	CL3	470	399	26	2	2101	L1MAB_ML	Non-LTR/LINE
						226	ERV2X1A_I_ML	ERV/LTR
	CL4	432	357	10	1	229	L1MAB_ML	Non-LTR/LINE
						512	ERV2X1A_I_ML	ERV/LTR
	CL5	345	260	37	2	3218	nHelitron1_Nh	DNA/Helitron
Perimyotis subflavus	CL1	2092	1634	65	3	2934	Tandem Repeat	Satellite
	CL2	1596	1430	54	1	5329	L1MAB_ML	Non-LTR/LINE
	CL3	1408	1151	88	6	4994	nHelitron1_Ps	DNA/Helitron
	CL4	1282	1157	37	1	2002	HAL1-1A_ML	Non-LTR/LINE
	CL5	830	790	7	1	926	Tandem Repeat	Satellite
Artibeus lituratus	CL1	5933	5225	154	4	24398	L1-4_PVa	Non-LTR/LINE
	CL2	5299	4563	169	5	11493	HAL1-3_ML	Non-LTR/LINE
	CL3	2688	2498	20	1	3131	Tandem Repeat	Satellite
	CL4	2454	2385	7	1	2609	Tandem Repeat	Satellite
	CL5	1482	1269	41	3	2321	Tandem Repeat	Satellite

NOTE.—Information regarding the content of the graph-based clusters is provided, including the original number of contigs, which were submitted to SeqMan. The SeqMan contigs were then submitted to CENSOR for identification and used to RepeatMask the respective taxonomic 454 data set to determine genome representation.

Top Clusters for Each Taxon NOTE.—Information regarding the content of the graph-based clusters is provided, including the original number of contigs, which were submitted to SeqMan. The SeqMan contigs were then submitted to CENSOR for identification and used to RepeatMask the respective taxonomic 454 data set to determine genome representation. We were able to confirm some previous PCR analyses that probed for Class II TEs in other vespertilionid taxa and identified several (piggyBac1, hAT2, hAT3) that initially appear to be limited to Myotis (Ray et al. 2008). However, two TE families previously thought to be confined to Myotis were identifiable in other taxa: hAT1_Ml was identified in N. humeralis and piggyBac2_Ml was observed in data from C. rafinesquii and L. borealis. This was likely due to mispriming from the internal primers of the earlier analysis and highlights the advantage of survey sequencing for a more accurate inspection of repetitive DNA. If we assume that the vespertilionid phylogeny described by Lack and Van Den Bussche (2010) (see fig. 1) is accurate, the presence of hAT1_Ml in N. humeralis but not in C. rafinesquii, L. borealis, or P. subflavus may result from two independent invasions of hAT1_Ml into the lineages leading to N. humeralis and Myotis spp. However, alternative phylogenetic hypotheses exist (Hoofer and Van Den Bussche 2003) and correct inference of independent invasions will depend on a reliable phylogeny of the group. Several novel elements were also identified and their key features are summarized in table 4. These novel elements have been submitted to RepBase.

Table 4

Characteristics and Ages of Novel TEs

Element	Length (bp)	TIR (bp)	ORF (aa)	Na	Average K2P	Standard Error	Average Age (Myr)b
Mariner2_Ml	803	28	235	349	0.0188	0.0005	8.5
nhAT1_Nh	192	16		404	0.0194	0.0006	8.8
Mariner1_Lb	2294	25	347	23	0.0197	0.0024	9.0
nhAT4_Nh	203	16		127	0.0223	0.0018	10.1
nhAT2_Nh	246	16		61	0.0228	0.0012	10.4
nMariner2_Lb	231	25		518	0.0268	0.0006	12.2
nHeliBat1_Ps	1207			33	0.0416	0.0041	18.9
nhAT3_Nh	213	16		47	0.0509	0.0066	23.2
nMariner1_Lb	184	29		54	0.0639	0.0032	29.1
nHeliBat1_Lb	993			209	0.0905	0.0019	41.1
nHeliBat1_Nh	1183			34	0.0916	0.0055	41.7
nHeliBat2_Ps	220			39	0.1119	0.0113	50.8
nHeliBat1_Cr	364			74	0.1208	0.0041	54.9
Mariner1_Ps	1293	32	345
nMariner1_Ps	279	67
Mariner1_Ml	1211	198	235
nhAT5_Nh	337	16

Note.—Elements shown in bold are lineage-specific. Names preceded by an “n” are nonautonomous. Age estimations are only shown if >20 hits of appropriate length were obtained for analysis. Final two letters denote data set from which consensus was inferred (e.g., Lb–L. borealis).

Number of RepeatMasker hits, which are at least 90% of the query length; see Materials and Methods.

Average mammalian neutral mutation rate (2.2 × 10−9).

Characteristics and Ages of Novel TEs Note.—Elements shown in bold are lineage-specific. Names preceded by an “n” are nonautonomous. Age estimations are only shown if >20 hits of appropriate length were obtained for analysis. Final two letters denote data set from which consensus was inferred (e.g., Lb–L. borealis). Number of RepeatMasker hits, which are at least 90% of the query length; see Materials and Methods. Average mammalian neutral mutation rate (2.2 × 10−9). Most Class II TEs were categorized according to terminal inverted repeat (TIR) length and target site duplications (TSDs) after extending and assembling the full repeat consensus (see Repeat Discovery in the Materials and Methods section). Blast hits to potential ORFs were also used for identification. Tc1/mariners have 25- to 29-bp TIRs and TA TSDs, while hATs typically have 16-bp TIRs and 8-bp TSDs with central TA dinucleotides. Helitrons are characterized by a 5′ TC, 3′ CTRR, an AT target site, and a 3′ 18-bp palindrome; elements are identified according to >80% similarity at 3′ (family) and 5′ (subfamily) 30 bp (Yang and Bennetzen 2009). All Helitrons identified in this study were from the HeliBat family (Pritham and Feschotte 2007), and several fell within a single unique subfamily according to the 5′ 30 bp (nHeliBat1_Lb/Nh/Ps/Cr). The observation that we could not identify the probe sequences used by Thomas et al. (2011) in the consensus sequences of these elements suggests that they fall within a separate but similar lineage. Also of note is Mariner1_Ml, which included the full Mariner2_Ml within TIRs of extended length. Although initially identified in P. subflavus, the consensus sequences were inferred from M. lucifugus to obtain adequate coverage. Both elements contained an ORF and a nonautonomous variant was also recovered with 67-bp TIRs from the P. subflavus data set (nMariner1_Ps). Unidentified clusters from most taxa were generally composed of low numbers of reads. The exception to this pattern was P. subflavus, in which 19 of 66 clusters (∼6% of the repetitive content) could not be identified by CENSOR or through BlastX and BlastN searches against NCBI databases. Dotter and TRF analyses did not identify the contigs as tandem repeats, and visual inspection showed no indication of sequencing artifacts. Many of the unrecognized contigs were >500 bp, and the ends were not recovered. Cluster 50, for instance, contained an 864-bp contig, and attempts to identify the ends using Blast were unsuccessful. With no similarity to known TEs and lack of 5′ and 3′ ends, which often contain the defining features of the various repeat families, we were unable to discern if these might be novel TEs. Identifying these contigs is the subject of ongoing investigations. A potentially confounding artifact in these types of analyses was also observed in P. subflavus. Many of the graph-based clusters contained only a few reads, yet RepeatMasker output indicated a large number of hits to the contig. Cluster 50 contained 18 reads of which only two were used to generate the cluster-based contig, yet RepeatMasker identified 794 hits. In this case, the RepeatMasker data was inflated with hits primarily to a 6-bp tandem repeat embedded within the contig. This suggests that future analyses will require passing even identifiable TEs through Tandem Repeat Finder prior to genome coverage analyses. Finally, repeat analysis of outgroup A. lituratus suggests that the elevated Class II TE content does not extend to Phyllostomidae. Less than 0.5% of the data set was identified as Class II and no novel or potentially recently active families were observed. Like most other mammals observed to date, Class I TEs comprise more than 25% of the genome. The major TE clades present in A. literatus were Ves SINEs and L1 (3% and 15% of the filtered data set, respectively).

Age Analysis of Selected Elements

The three newly described Mariner elements from L. borealis as well as the five novel nhATs from N. humeralis appear to be lineage specific and have been active in the relatively recent past (table 4). The age estimate (average 8.5 Myr) for Mariner2_Ml suggests it would be specific to Myotis, yet it was identifiable via BlastN analysis in all vesper 454 data sets. These contrasting results were further investigated by determining the activity periods of Mariner2_Ml in each taxon. Due to limited copy numbers in the 454 data, all RepeatMasker hits were used (instead of only hits within 90% length of the query, as for table 4). Average age estimates were as follows: M. austroriparius 11 Myr (N = 40), P. subflavus 16 Myr (N = 21), C. rafinesquii 17 Myr (N = 68), N. humeralis 19 Myr (N = 17), and L. borealis 23 Myr (N = 44). These estimates suggest activity of Mariner2_Ml in each taxon following the split from Myotis 32 Ma (fig. 1). Several Helitrons appear to predate the divergence of the five vespertilionid taxa, with the oldest having been active roughly 51 (nHeliBat2_Ps) and 55 (nHeliBat1_Cr) Ma. BlastN analysis supported the presence of similar fragments (E value ≤ −65) of both TEs in all but the outgroup, A. lituratus, suggesting further evidence that at least the Helitron phase of the Class II invasion began in the common ancestor of vesper bats (Thomas et al. 2011). As would be expected for mammals, Class I elements dominated the TE landscape for all six taxa (fig. 2, table 5). The highest LINE content (nearly 15% of the genome) was observed in the phyllostomid, A. lituratus. This was accompanied by the lowest SINE complement (3%). Nycticeius humeralis exhibited the reverse situation with decreased LINE content (7%) alongside elevated SINE levels (6%), revealing an inverse relationship between the full-length LINEs and the nonautonomous SINEs (r = −0.90809, all six taxa). The contribution of LTRs across all taxa was low, roughly 1.0% or lower. Finally as with M. lucifugus, elevated Class II levels were observed for the five vespertilionids (ranging from 3% in L. borealis to 5% in P. subflavus), but not for the phyllostomid bat (<1% in A. lituratus). A broader examination of genome-wide TE relationships is depicted between Class I and Class II elements in figure 3 (r = −0.84632, P = 0.03361).

Genome representation of the TE classes. The inclusion of outgroup Artibeus suggests elevated DNA transposon activity is limited to the vesper taxa, while other aspects of their repetitive landscapes differ within the family.

Table 5

Genome Representation Determined Using RepeatMasker and a Custom Repeat Library Compiled for Each Taxon

	Non-LTR/LINE (%)	Non-LTR/SINE (%)	ERV/LTR (%)	Total Class I (%)	Total Class II (%)
Artibeus lituratus	14.83	2.90	0.93	18.66	0.38
Lasiurus borealis	11.74	4.02	0.42	16.18	2.56
Corynorhinus rafinesquii	11.93	3.91	0.97	16.81	3.12
Nycticeius humeralis	7.16	6.04	1.02	14.22	3.11
Myotis austroriparius	8.46	4.48	0.53	13.48	3.52
Perimyotis subflavus	9.33	4.18	0.69	14.20	4.45

NOTE.—Primary Class I repeat types are shown, and final two columns depict Class I versus Class II content.

Correlation of Class I and Class II TE activity. Initial data suggest that TE activity may be inversely related between the two classes such that higher Class II genome representation is accompanied by a decrease in Class I content (r = −0.85, P < 0.05).

Genome Representation Determined Using RepeatMasker and a Custom Repeat Library Compiled for Each Taxon NOTE.—Primary Class I repeat types are shown, and final two columns depict Class I versus Class II content. Genome representation of the TE classes. The inclusion of outgroup Artibeus suggests elevated DNA transposon activity is limited to the vesper taxa, while other aspects of their repetitive landscapes differ within the family. Correlation of Class I and Class II TE activity. Initial data suggest that TE activity may be inversely related between the two classes such that higher Class II genome representation is accompanied by a decrease in Class I content (r = −0.85, P < 0.05).

Discussion

We have modified a methodology originally applied to plant genomes to identify distinct TE landscapes within five vespertilionids and a single phyllostomid bat. Comparison of a congener of the well-characterized M. lucifugus suggests that the method provides an accurate estimate of the TE landscape. Of course, this assumes that no major changes in TE dynamics have occurred in either lineage since their divergence ∼10 Ma (Stadelmann et al. 2007). NonLTR retrotransposons were the most abundant TEs in all species, as is typical of mammals. This is generally attributable to L1 elements. A large contribution of satellite DNA was noted in the P. subflavus genome (6%), as well as a considerable number of unidentifiable contigs across several repeat clusters. Unlike most mammals, Class II content was consistently elevated throughout Vespertilionidae, with ∼3% or greater contribution to genome content in all five taxa. Class II elevation was not observed in the phyllostomid outgroup taxon, providing additional support to the hypothesis that vesper bats are unique within Chiroptera in their ability to tolerate and/or host DNA transposons (Thomas et al. 2011). At the very least, these data provide evidence that the surge of DNA transposon activity observed in Vespertilionidae arose following the divergence of Vespertilionidae and Phyllostomidae ∼56 Ma (Datzmann et al. 2010). As noted by Pritham and Feschotte (2007) and Thomas et al. (2011), the Helitron superfamily is a prevalent component of the vesper bat TE landscape. Our data demonstrate that Helitrons were active during the early diversification of Vespertilionidae. Analyses suggest that two Helitrons had peak activity over 50 Ma, which would indicate activity in the common ancestor of Vespertilionidae and Phyllostomidae. However, these elements could not be recovered from the A. lituratus data. It should be noted that any elements with very low copy numbers (<1000) could be missed by our analyses. However, Thomas et al. (2011) failed to identify Helitron elements in Miniopteridae, suggesting that Helitron activity is indeed limited to Vespertilionidae. Assuming their hybridization and PCR-based results are accurate, this raises some issues regarding some of our activity period estimations. nHeliBat1_Cr and nHeliBat2_Ps were both estimated to have been active >50 Ma. Yet, the miniopterid divergence from Vespertilionidae is estimated to have occurred ∼43 Ma (49–38 Ma) (Miller-Butterworth et al. 2007), suggesting that these two families should be present in miniopterid genomes. The problem likely arises from attempting to apply an average mammalian mutation rate (2.2 × 10−9) to a wide range of taxa. Lack and Van Den Bussche (2010) noted that substitution rates in vesper bats are highly variable and that non-Myotis vespertilionids have consistently higher substitution rates. Thus, we might reasonably expect Perimyotis and Corynorhinus to exhibit inflated substitution rates. Calculating and applying lineage-specific rates to each taxon was beyond the scope of this study. However, future studies should incorporate such analyses. Future studies will also include samples from family Miniopteridae, which was recently elevated to the status of family and is more closely related to Vespertilionidae (Miller-Butterworth et al. 2007) and would therefore be appropriate for defining the limits of DNA transposon activity in these groups. While Helitrons were active during the early stages of vesper bat diversification, other DNA transposon families have since invaded and been active in these genomes. For example, multiple hAT, piggyBac, and Tc1/Mariner elements, many of them novel to this study, exhibit activity profiles ranging from ∼8 to 30 Ma (table 3) (Ray et al. 2008). One striking observation is from the Mariner family. Age analysis suggests that Mariner2_Ml has been active the most recently, within the past 10 Myr in M. lucifugus. However, BlastN analyses of the available data indicate that this element is present in all five vesper taxa, which might suggest instead that Mariner2_Ml was an older element with activity prior to the divergence ∼32 Ma. Class II TEs generally have a short period of activity in a genome before accumulating inactivating mutations (Brookfield 2005). Likewise, although Class I TEs persist over longer timespans, they accumulate mutations and diverge into different subfamilies (Cordaux et al. 2004). A possible explanation for Mariner2_Ml might be repeated reinvasion of vespertilionid genomes. However, at this time, we can only speculate. Thus far, no evidence for elevated or recent Class II TE activity in bats has been found outside of the vesper lineage (Ray et al. 2008; Thomas et al. 2011). RNAi has been shown to specifically target TIRs to prevent transposon integration (Sijen and Plasterk 2003), but these defenses can be evaded when distinct subfamilies are present in low copy numbers (Plasterk 2002). The Class II TE expansion in M. lucifugus has been diverse, from Helitron and Tc1/mariner superfamilies to various subfamilies of hATs and piggyBacs (Ray et al. 2008). Similar findings of TE diversity for the taxa described here suggest that vesper bats in general are predisposed to accommodate invasion by novel TEs. While the following suggestion is open to further study, the capacity of vespertilionid bats to harbor active DNA transposons may be linked to another feature of M. lucifugus. A BlastN query of the newly released 7× M. lucifugus WGS using multiple mammalian Piwi homologs (list available upon request) and a search of the Myotis Ensembl database suggests that only two Piwi homologs are present, PIWIL2 (ENSMLUG00000002115) and PIWIL4 (ENSMLUG00000002018). This lies in stark contrast to the presence of all four homologs in the WGS of the megabat, Pteropus vampyrus (ENSPVAG00000010030, ENSPVAG00000009878, ENSPVAG00000016875, and ENSPVAG00000007245). Mammalian genomes are protected from TE integration in the germline by piRNA mediated methylation (O'Donnell and Boeke 2007; Aravin et al. 2008; Obbard et al. 2009), and loss of a single Piwi homolog has been linked to upregulated transposition (Carmell et al. 2007). Additional work to determine if the PIWI homologs missing in M. lucifugus are also missing in other affected bats would be an avenue worth pursuing. Loss of Piwi RNA genes may provide hypotheses to explain how TEs have managed to thrive in vesper bats. However, it raises an interesting question. Are vesper bats more susceptible to invasion or are they exposed to more potential invaders? It may be that Vespertilionidae is particularly susceptible to invasion by DNA transposons via their role as a host for a diverse array of parasites (Marinkelle and Grose 1972; Calisher et al. 2006; Wibbelt et al. 2010). Further research to identify patterns among bats may help answer these questions. Several lineage-specific activity patterns were observed, suggesting differential activity in each lineage for particular transposon families and potential horizontal transfer events. As described above, at least two cases of potential horizontal transfer can be identified from this data. However, identifying horizontal transfer is dependent on overlaying the taxonomic distributions of TEs onto a well-established phylogeny. Vespertilionid phylogeny, unfortunately, has been rather intractable to both morphological and molecular data and is a well-known problem within the phylogenetic community (Stadelmann et al. 2007; Lack and Van Den Bussche 2010). Thus, while we suspect based on the most recent phylogenetic hypothesis presented by Lack and Van Den Bussche that both hAT1_Ml and piggyBac2_Ml have been transferred laterally into multiple vespertilionid genomes, we must be vigilant and work to generate a more robust phylogeny before making strong statements. That being said, both hAT elements in general and piggyBac2_ML in particular have been implicated in multiple horizontal transfers (Pace et al. 2008; Gilbert et al. 2010; Pagan et al. 2010). Our initial interest in the vesper lineage was spurred by the elevated Class II activity in genus Myotis. However, the methods we describe allow for characterization of all TEs with relatively high copy numbers in a genome. Therefore, we could also note differences in Class I content. For example, the A. lituratus genome exhibited the lowest SINE content (3%) and the highest LINE contribution (15%). A much larger SINE-to-LINE ratio was observed in N. humeralis, which may suggest an adaptation in recent Ves subfamilies to more efficiently utilize the LINE enzymatic machinery in this taxon. Such a scenario is the opposite of that seen in the recent analysis of the orangutan genome, in which the primate SINE, Alu, has apparently lost its ability to efficiently mobilize (Locke et al. 2011). The autonomous/nonautonomous relationship suggests a possible interaction between LINEs and SINEs as they compete with one another for use of needed enzymatic machinery (Brookfield 2005; Le Rouzic and Capy 2006). Our data indicate that the rise in Class II TE activity may have been accompanied by a decreased Class I TE genome contribution. Vesper bat genera Perimyotis and Myotis displayed the highest Class II content (5% and 4%) and the lowest Class I content (14% each). This trend is amplified when the phyllostomid bat is included (fig. 3, r = −0.85), in which Class II content is at the low end of the spectrum while Class I content is the highest of the six taxa. However, while these results are suggestive of a trend, they still represent only six data points and should be taken with caution. Our investigation is the first step in isolating any potential links between elevated Class II TE activity and the evolution of vesper bats. Variation in TE landscapes may be partially derived from population subdivision and genetic drift (Jurka et al. 2011). While the primate lineage has been examined extensively to elucidate the potential role of TEs in diversification, the focus was largely constrained to ancestrally derived Class I elements and remnants of extinct Class II TEs (Kim et al. 2004; Oliver and Greene 2009, 2011), although there are a few cases of recent Class II invasion (Gilbert et al. 2010). However, continued activity of both TE classes combined with horizontal transfer and novel TE invasions have furnished the vespertilionid family with a variety of elements with potential for facilitating species-specific adaptations. Finally, we note that the methods described here are conceptually similar to those described in a recent analysis of two snakes (Castoe et al. 2011) and multiple amphibian genomes (Sun et al. 2012). The major differences are with the precise computational methods used and not with the type of data analyzed. This suggests a strong interest in the evolutionary biology community in investigating the dynamics of TEs in large samples of relatively closely related organisms. Comparisons of mammalian TE landscapes have, until now, typically encompassed relatively diverse taxa. Inferences drawn from a limited sampling of genomes consisting mostly of model organisms are often broadly applied across taxa. This strategy is imposed primarily by the substantial costs of whole genome sequencing. However, the advent of next generation sequencing techniques has provided a leap forward in terms of gaining genome-level data (if not entire genome assemblies) for nonmodel organisms. Here, we have demonstrated the utility of survey sequencing for generating sufficient data for comparative analyses and descriptions of novel TEs and have gathered data suggesting an extensive history of Class II TE activity throughout a broader sample of Vespertilionidae.

52 in total

Review 1. Jumping genes and epigenetics: Towards new species.

Authors: Rita Rebollo; Béatrice Horard; Benjamin Hubert; Cristina Vieira
Journal: Gene Date: 2010-01-25 Impact factor: 3.688

2. Emerging diseases in Chiroptera: why bats?

Authors: Gudrun Wibbelt; Marianne S Moore; Tony Schountz; Christian C Voigt
Journal: Biol Lett Date: 2010-04-28 Impact factor: 3.703

3. The limited distribution of Helitrons to vesper bats supports horizontal transfer.

Authors: Jainy Thomas; Mehran Sorourian; David Ray; Robert J Baker; Ellen J Pritham
Journal: Gene Date: 2010-12-28 Impact factor: 3.688

4. Complete mitochondrial genome sequences of three bats species and whole genome mitochondrial analyses reveal patterns of codon bias and lend support to a basal split in Chiroptera.

Authors: P R Meganathan; Heidi J T Pagan; Eve S McCulloch; Richard D Stevens; David A Ray
Journal: Gene Date: 2011-10-24 Impact factor: 3.688

5. PiggyBac-ing on a primate genome: novel elements, recent activity and horizontal transfer.

Authors: Heidi J T Pagan; Jeremy D Smith; Robert M Hubley; David A Ray
Journal: Genome Biol Evol Date: 2010-07-12 Impact factor: 3.416

6. Genome sequence of the Brown Norway rat yields insights into mammalian evolution.

Authors: Richard A Gibbs; George M Weinstock; Michael L Metzker; Donna M Muzny; Erica J Sodergren; Steven Scherer; Graham Scott; David Steffen; Kim C Worley; Paula E Burch; Geoffrey Okwuonu; Sandra Hines; Lora Lewis; Christine DeRamo; Oliver Delgado; Shannon Dugan-Rocha; George Miner; Margaret Morgan; Alicia Hawes; Rachel Gill; Robert A Holt; Mark D Adams; Peter G Amanatides; Holly Baden-Tillson; Mary Barnstead; Soo Chin; Cheryl A Evans; Steve Ferriera; Carl Fosler; Anna Glodek; Zhiping Gu; Don Jennings; Cheryl L Kraft; Trixie Nguyen; Cynthia M Pfannkoch; Cynthia Sitter; Granger G Sutton; J Craig Venter; Trevor Woodage; Douglas Smith; Hong-Mei Lee; Erik Gustafson; Patrick Cahill; Arnold Kana; Lynn Doucette-Stamm; Keith Weinstock; Kim Fechtel; Robert B Weiss; Diane M Dunn; Eric D Green; Robert W Blakesley; Gerard G Bouffard; Pieter J De Jong; Kazutoyo Osoegawa; Baoli Zhu; Marco Marra; Jacqueline Schein; Ian Bosdet; Chris Fjell; Steven Jones; Martin Krzywinski; Carrie Mathewson; Asim Siddiqui; Natasja Wye; John McPherson; Shaying Zhao; Claire M Fraser; Jyoti Shetty; Sofiya Shatsman; Keita Geer; Yixin Chen; Sofyia Abramzon; William C Nierman; Paul H Havlak; Rui Chen; K James Durbin; Amy Egan; Yanru Ren; Xing-Zhi Song; Bingshan Li; Yue Liu; Xiang Qin; Simon Cawley; Kim C Worley; A J Cooney; Lisa M D'Souza; Kirt Martin; Jia Qian Wu; Manuel L Gonzalez-Garay; Andrew R Jackson; Kenneth J Kalafus; Michael P McLeod; Aleksandar Milosavljevic; Davinder Virk; Andrei Volkov; David A Wheeler; Zhengdong Zhang; Jeffrey A Bailey; Evan E Eichler; Eray Tuzun; Ewan Birney; Emmanuel Mongin; Abel Ureta-Vidal; Cara Woodwark; Evgeny Zdobnov; Peer Bork; Mikita Suyama; David Torrents; Marina Alexandersson; Barbara J Trask; Janet M Young; Hui Huang; Huajun Wang; Heming Xing; Sue Daniels; Darryl Gietzen; Jeanette Schmidt; Kristian Stevens; Ursula Vitt; Jim Wingrove; Francisco Camara; M Mar Albà; Josep F Abril; Roderic Guigo; Arian Smit; Inna Dubchak; Edward M Rubin; Olivier Couronne; Alexander Poliakov; Norbert Hübner; Detlev Ganten; Claudia Goesele; Oliver Hummel; Thomas Kreitler; Young-Ae Lee; Jan Monti; Herbert Schulz; Heike Zimdahl; Heinz Himmelbauer; Hans Lehrach; Howard J Jacob; Susan Bromberg; Jo Gullings-Handley; Michael I Jensen-Seaman; Anne E Kwitek; Jozef Lazar; Dean Pasko; Peter J Tonellato; Simon Twigger; Chris P Ponting; Jose M Duarte; Stephen Rice; Leo Goodstadt; Scott A Beatson; Richard D Emes; Eitan E Winter; Caleb Webber; Petra Brandt; Gerald Nyakatura; Margaret Adetobi; Francesca Chiaromonte; Laura Elnitski; Pallavi Eswara; Ross C Hardison; Minmei Hou; Diana Kolbe; Kateryna Makova; Webb Miller; Anton Nekrutenko; Cathy Riemer; Scott Schwartz; James Taylor; Shan Yang; Yi Zhang; Klaus Lindpaintner; T Dan Andrews; Mario Caccamo; Michele Clamp; Laura Clarke; Valerie Curwen; Richard Durbin; Eduardo Eyras; Stephen M Searle; Gregory M Cooper; Serafim Batzoglou; Michael Brudno; Arend Sidow; Eric A Stone; J Craig Venter; Bret A Payseur; Guillaume Bourque; Carlos López-Otín; Xose S Puente; Kushal Chakrabarti; Sourav Chatterji; Colin Dewey; Lior Pachter; Nicolas Bray; Von Bing Yap; Anat Caspi; Glenn Tesler; Pavel A Pevzner; David Haussler; Krishna M Roskin; Robert Baertsch; Hiram Clawson; Terrence S Furey; Angie S Hinrichs; Donna Karolchik; William J Kent; Kate R Rosenbloom; Heather Trumbower; Matt Weirauch; David N Cooper; Peter D Stenson; Bin Ma; Michael Brent; Manimozhiyan Arumugam; David Shteynberg; Richard R Copley; Martin S Taylor; Harold Riethman; Uma Mudunuri; Jane Peterson; Mark Guyer; Adam Felsenfeld; Susan Old; Stephen Mockrin; Francis Collins
Journal: Nature Date: 2004-04-01 Impact factor: 49.962

7. A piRNA pathway primed by individual transposons is linked to de novo DNA methylation in mice.

Authors: Alexei A Aravin; Ravi Sachidanandam; Deborah Bourc'his; Christopher Schaefer; Dubravka Pezic; Katalin Fejes Toth; Timothy Bestor; Gregory J Hannon
Journal: Mol Cell Date: 2008-09-26 Impact factor: 17.970

8. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences.

Authors: Tarjei S Mikkelsen; Matthew J Wakefield; Bronwen Aken; Chris T Amemiya; Jean L Chang; Shannon Duke; Manuel Garber; Andrew J Gentles; Leo Goodstadt; Andreas Heger; Jerzy Jurka; Michael Kamal; Evan Mauceli; Stephen M J Searle; Ted Sharpe; Michelle L Baker; Mark A Batzer; Panayiotis V Benos; Katherine Belov; Michele Clamp; April Cook; James Cuff; Radhika Das; Lance Davidow; Janine E Deakin; Melissa J Fazzari; Jacob L Glass; Manfred Grabherr; John M Greally; Wanjun Gu; Timothy A Hore; Gavin A Huttley; Michael Kleber; Randy L Jirtle; Edda Koina; Jeannie T Lee; Shaun Mahony; Marco A Marra; Robert D Miller; Robert D Nicholls; Mayumi Oda; Anthony T Papenfuss; Zuly E Parra; David D Pollock; David A Ray; Jacqueline E Schein; Terence P Speed; Katherine Thompson; John L VandeBerg; Claire M Wade; Jerilyn A Walker; Paul D Waters; Caleb Webber; Jennifer R Weidman; Xiaohui Xie; Michael C Zody; Jennifer A Marshall Graves; Chris P Ponting; Matthew Breen; Paul B Samollow; Eric S Lander; Kerstin Lindblad-Toh
Journal: Nature Date: 2007-05-10 Impact factor: 49.962

9. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula.

Authors: Jirí Macas; Pavel Neumann; Alice Navrátilová
Journal: BMC Genomics Date: 2007-11-21 Impact factor: 3.969

10. Evaluation of next generation sequencing platforms for population targeted sequencing studies.

Authors: Olivier Harismendy; Pauline C Ng; Robert L Strausberg; Xiaoyun Wang; Timothy B Stockwell; Karen Y Beeson; Nicholas J Schork; Sarah S Murray; Eric J Topol; Samuel Levy; Kelly A Frazer
Journal: Genome Biol Date: 2009-03-27 Impact factor: 13.583

25 in total

1. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads.

Authors: Petr Novák; Laura Ávila Robledillo; Andrea Koblížková; Iva Vrbová; Pavel Neumann; Jirí Macas
Journal: Nucleic Acids Res Date: 2017-07-07 Impact factor: 16.971

2. Multiple Invasions of Visitor, a DD41D Family of Tc1/mariner Transposons, throughout the Evolution of Vertebrates.

Authors: Dan Shen; Bo Gao; Csaba Miskey; Cai Chen; Yatong Sang; Wencheng Zong; Saisai Wang; Yali Wang; Xiaoyan Wang; Zoltán Ivics; Chengyi Song
Journal: Genome Biol Evol Date: 2020-07-01 Impact factor: 3.416

3. Dynamics of genome size evolution in birds and mammals.

Authors: Aurélie Kapusta; Alexander Suh; Cédric Feschotte
Journal: Proc Natl Acad Sci U S A Date: 2017-02-08 Impact factor: 11.205

4. Partial sequencing reveals the transposable element composition of Coffea genomes and provides evidence for distinct evolutionary stories.

Authors: Romain Guyot; Thibaud Darré; Mathilde Dupeyron; Alexandre de Kochko; Serge Hamon; Emmanuel Couturon; Dominique Crouzillat; Michel Rigoreau; Jean-Jacques Rakotomalala; Nathalie E Raharimalala; Sélastique Doffou Akaffou; Perla Hamon
Journal: Mol Genet Genomics Date: 2016-07-28 Impact factor: 3.291

5. How do mammalian transposons induce genetic variation? A conceptual framework: the age, structure, allele frequency, and genome context of transposable elements may define their wide-ranging biological impacts.

Authors: Keiko Akagi; Jingfeng Li; David E Symer
Journal: Bioessays Date: 2013-01-14 Impact factor: 4.345

6. Targeted Capture of Phylogenetically Informative Ves SINE Insertions in Genus Myotis.

Authors: Roy N Platt; Yuhua Zhang; David J Witherspoon; Jinchuan Xing; Alexander Suh; Megan S Keith; Lynn B Jorde; Richard D Stevens; David A Ray
Journal: Genome Biol Evol Date: 2015-05-25 Impact factor: 3.416

7. Differential SINE evolution in vesper and non-vesper bats.

Authors: David A Ray; Heidi Jt Pagan; Roy N Platt; Ashley R Kroll; Sarah Schaack; Richard D Stevens
Journal: Mob DNA Date: 2015-05-15

8. LTR-retrotransposons in R. exoculata and other crustaceans: the outstanding success of GalEa-like copia elements.

Authors: Mathieu Piednoël; Tifenn Donnart; Caroline Esnault; Paula Graça; Dominique Higuet; Eric Bonnivard
Journal: PLoS One Date: 2013-03-04 Impact factor: 3.240

9. Integration of molecular cytogenetics, dated molecular phylogeny, and model-based predictions to understand the extreme chromosome reorganization in the Neotropical genus Tonatia (Chiroptera: Phyllostomidae).

Authors: Cibele G Sotero-Caio; Marianne Volleth; Federico G Hoffmann; LuAnn Scott; Holly A Wichman; Fengtang Yang; Robert J Baker
Journal: BMC Evol Biol Date: 2015-10-06 Impact factor: 3.260

10. Transposable elements: powerful contributors to angiosperm evolution and diversity.

Authors: Keith R Oliver; Jen A McComb; Wayne K Greene
Journal: Genome Biol Evol Date: 2013 Impact factor: 3.416