Literature DB >> 30102348

Transposable Elements Contribute to the Adaptation of Arabidopsis thaliana.

Zi-Wen Li¹, Xing-Hui Hou^1,2, Jia-Fu Chen^1,2, Yong-Chao Xu^1,2, Qiong Wu¹, Josefa González³, Ya-Long Guo^1,2.

Abstract

Transposable elements (TEs) are mobile genetic elements with very high mutation rates that play important roles in shaping genome architecture and regulating phenotypic variation. However, the extent to which TEs influence the adaptation of organisms in their natural habitats is largely unknown. Here, we scanned 201 representative resequenced genomes from the model plant Arabidopsis thaliana and identified 2,311 polymorphic TEs from noncentromeric regions. We found expansion and contraction of different types of TEs in different A. thaliana populations. More importantly, we identified two TE insertions that are likely candidates to play a role in adaptive evolution. Our results highlight the importance of variations in TEs for the adaptation of plants in general in the context of rapid global climate change.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2018 PMID： 30102348 PMCID： PMC6117151 DOI： 10.1093/gbe/evy171

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

Transposable elements (TEs) represent an important source of genetic variation (McClintock 1984) and are highly dynamic in diverse species, such as Drosophila (Petrov et al. 2011; Kofler et al. 2015) and Arabidopsis thaliana-related species (Hu et al. 2011; Agren 2014; Quadrana et al. 2016; Stuart et al. 2016). TEs play crucial roles in shaping genomic architecture and phenotypic variation in diverse organisms (Finnegan 1989; Feschotte et al. 2002; Kazazian 2004; Lisch 2013). Besides their well-known effect on genome size, the presence or absence of TEs affects various biological processes (Chuong et al. 2017). For example, inserted TEs can contribute to the coding sequences of existing genes or even form new coding genes in the genome (Lin et al. 2007; Hoen and Bureau 2015; Joly-Lopez et al. 2016). In addition, inserted TEs can regulate the expression levels of genes through either cis- or trans-regulatory elements located within TE sequences or through epigenetic modifications induced by the insertion or deletion of TEs (Naito et al. 2009; Hollister et al. 2011; Lisch 2013; Seymour et al. 2014; Stuart et al. 2016; Wei and Cao 2016). In maize, a transposon insertion located between 58.7 and 69.5 kb upstream of the well-known domestication gene teosinte branched1 (tb1) acts as an enhancer of gene expression, which partially explains the increased apical dominance in maize compared with its progenitor (Studer et al. 2011). In melon, a transposon insertion located at the 3′ downstream of CmWIP1 induced epigenetic changes, thereby regulating sex determination (Martin et al. 2009). In oil palm, loss of methylation on the Karma transposon in the intron of DEFICIENS contributed to the origin of a mantled somaclonal variant (Ong-Abdullah et al. 2015). In peppered moth, the industrial melanism mutation was induced by a transposon insertion at the first intron of cortex, which increased its expression level and induced melanization (Van't Hof et al. 2016). In Drosophila melanogaster, the activation of TEs is a contributing factor to ageing (Wood et al. 2016). More importantly, TEs might function as agents of rapid adaptation, because they can rapidly create genetic diversity (Schrader et al. 2014; Stapley et al. 2015). Overall, TEs have emerged as important “functional elements” in the genomes of diverse organisms. Despite the importance of TEs in shaping genome architecture and phenotypic variation, the TEs that are under positive selection in natural plant and animal populations are largely unknown. Moreover, it is important to address which genes are regulated by TE insertions to facilitate the rapid adaptation of the organism to global climate change (Rey et al. 2016). The answers to this question is largely unknown, except in D. melanogaster; some TEs were found to be candidate adaptive TEs based on frequency variation among populations (González et al. 2008, 2010), and/or on functional validation (Daborn 2002; Aminetzach et al. 2005; Schmidt et al. 2010; Magwire et al. 2011; Guio et al. 2014; Mateo et al. 2014; Ullastres et al. 2015; Merenciano et al. 2016). In this study, to explore the effect of TEs on adaptation at the whole-genome level, we investigated natural populations of A. thaliana, as TEs in this model plant have been annotated in detail and many accessions have been sequenced, and this plant originated in Europe and northern Africa and adapted to new climates as it expanded eastward to Eastern Asia (Cao et al. 2011; Long et al. 2013; Schmitz et al. 2013; The 1001 Genomes Consortium 2016; Durvasula et al. 2017; Zou et al. 2017). To investigate the effect of TEs on adaptation in natural populations, we identified both reference and nonreference TE insertions using the read pair method. Note that nonreference TE insertions have been taken into account recently (Quadrana et al. 2016; Stuart et al. 2016). We identified 2,311 polymorphic TEs from 201 representative A. thaliana accessions collected worldwide, and found the differential expansion and contraction of diverse types of TEs in different populations. We identified two TE insertions that are likely candidates to play a role in adaptive evolution. Overall, this study highlights the potential effects of TEs on adaptive evolution of A. thaliana in nature.

Materials and Methods

Cluster Analysis of Accessions

Raw paired-end reads of 201 accessions were used in this study, including 118 representative accessions from Europe, Central Asia, North America, and Japan, which were extracted from the 1001 Genomes Project (http://1001genomes.org/) (Cao et al. 2011; Long et al. 2013; Schmitz et al. 2013; The 1001 Genomes Consortium 2016), as well as 83 accessions from our own resequencing project (Zou et al. 2017), including 24 accessions from Northwestern China, and 59 accessions from the Yangtze River basin (supplementary table S1, Supplementary Material online). The raw reads were processed and then mapped to the A. thaliana reference genome (TAIR10) using BWA (Li and Durbin 2009) with default parameters. Single nucleotide polymorphisms (SNPs) were called using the Genome Analysis Toolkit (GATK) flowchart (DePristo et al. 2011) with a quality value of 25 as the threshold. Only biallelic SNP sites with minor allele frequencies greater than 0.05 were used in the principal component analysis (PCA) with EIGENSOFT (version 4.2) (Price et al. 2006). Accessions located between major clusters were filtered out.

Identifying Polymorphic TE Sites in Populations

Genome resequencing data, and the A. thaliana reference genome (Col-0 accession, TAIR10) were used to identify polymorphic TE sites in the A. thaliana populations. The polymorphic TE sites are TE loci in which some accessions harbor TE insertions but others do not. A method based on paired-end reads was used to identify nonreference and reference TE insertions. The paired-end reads is often used to identify polymorphic TEs (Platzer et al. 2012; Kofler et al. 2015). To increase the accuracy of identification, mapping direction information was integrated into the identification process as previously described (Platzer et al. 2012): If the mapped read is reversely complemented, its direction is backward; if not, the direction is forward (fig. 1). Mapped reads at the left and right sides of each TE site should have the same orientation (a forward-reverse arrangement in mapping result) to ensure that the detected presence/absence of TEs did not result from a chromosome inversion event.

. 1.

—Identification of polymorphic reference and nonreference TE sites. (A) Diagram of polymorphic TE sites. (B) Diagram of the polymorphic TE identification method. (C) Flowchart of the polymorphic TE detection method. Three steps were used to detect nonreference TE insertions. First, two reads in a pair with a mapping distance >1 kb, including one read uniquely mapped to a non-TE region and the other mapped to the annotated TE sequence similar to the predicted TE insertion sequence in the reference genome, were extracted from the mapping results for each accession. Information about the families of annotated TEs was extracted from TAIR10 annotation, which was used to predict the family types of the inserted TEs. Second, abnormally mapped reads located far from each other but within a certain range (the sequencing length of one read plus twice of the designated insertion size between paired reads) were used to set the insertion range of a polymorphic TE candidate (supplementary fig. S1A, Supplementary Material online). The designated insertion size between paired reads was 300 bp for accessions from our own project and 50 bp for accessions from the 1001 Project. The sequencing length of one read in the 1001 Project and our own project was 100 bp. Third, candidates from accessions in a population were merged together when they overlapped (>1 bp). During this step, candidate polymorphic TEs from different accessions were integrated at the population level. During the merging process, the number of reads supporting a candidate polymorphic TE in the population represented the average number of reads supporting this insertion in all accessions of the population. Candidate polymorphic TEs supported by more than one read pair per accession were identified as the raw data of polymorphic TEs. Furthermore, polymorphic TEs supported by at least three reads per accession were used in the analysis. Similar to the detection of nonreference TE insertions, a read pair with the following features was considered to represent a candidate polymorphic reference TE: 1) read pair with a mapping distance greater than 1 kb and less than 10 kb (90% of annotated TEs in the reference genome are shorter than 10 kb); 2) both of the paired reads uniquely mapped to the reference genome in non-TE regions; 3) annotated TEs are present between the mapped positions of the two reads in the reference genome. These read pairs with distances below a certain length (the designated insertion size between paired reads) were merged together as candidate polymorphic TEs identified by reference TE insertion method (supplementary fig. S1A, Supplementary Material online). Finally, candidate polymorphic nonreference and reference TEs were integrated together into the polymorphic TE data set.

Evaluating the Sensitivity and False Discovery Rate of Our Method

To estimate the detection power of our method, the reference accession Col-0 was resequenced using Illumina HiSeq 2000 with 100 bp pair-end reads, and the reads were remapped to the modified reference genomes. In detail, to evaluate nonreference TE insertion method, annotated, nonoverlapping TE sequences (if overlapping, the longest one was used) in the A. thaliana reference genome were deleted from the original regions and moved to the 3′ end of the chromosome they were located (supplementary fig. S1B, Supplementary Material online). This modified TE-deleted genome sequence (Col-0-noTEs) was the same as Col-0, except for the locations of TEs. Given that the background of resequenced reads of Col-0 was from the real reference genome with the annotated TEs in their original positions, remapping these reads to the genomes of Col-0-noTEs could uncover “nonreference TE insertions” that were deleted from the real reference genome, thereby allowing the detection efficiency of nonreference TE insertions to be calculated. For reference TE insertions, the Col-0-moreTEs genome was created by randomly inserting all TEs annotated in TAIR10 into the Col-0 genome and subjecting it to the same remapping strategy. Centromeric regions of A. thaliana chromosomes were defined according to a previous study (Ziolkowski et al. 2009). The identified TE sites in the populations were validated using 20 randomly chosen TE sites (see supplementary table S2, Supplementary Material online for information about the 20 TEs and 12 accessions used for validation). Primers were designed to span the predicted TE sites (supplementary table S6, Supplementary Material online). In addition, for TE deletion sites, the PCR product of one randomly chosen accession per TE site was sequenced to confirm the TE deletion.

Detection of Adaptive TE Insertions

In iHS (integrated haplotype score) statistic analysis, biallelic homozygous SNP sites and polymorphic TE loci with minor allele frequencies greater than 0.05 were used to calculate iHS using the selscan program (Szpiech and Hernandez 2014). The absolute value of iHS (|iHS|) was used to detect selective sweeps. A significant high |iHS| value is an indicator of the location of a TE locus in a selective sweep region. To estimate the significance of observed iHS values, we computed iHS values in permutated data (100 permutations) for TEs with the top 5% highest |iHS| values. At each TE locus and in each permutation, a number of accessions equally to the original number of TE insertion accessions were randomly sampled without repetition from a population, which were considered as permutated accessions with TE insertion alleles; correspondingly, other alleles were considered as non-TE alleles. These permutations were performed 100 times for each TE locus in popE, popN, and popY, respectively. After permutations, TEs with observed |iHS| values higher than the permutated values were further analyzed. The fTE statistic was estimated according to the method reported in a previous study (González et al. 2008). Permutation analyses on fTE were also performed for each polymorphic TE locus in each population (100 permutations). A significant low fTE is an indicator of positive selection on TE insertion alleles. TEs with significantly low fTE values were considered as putatively candidates. Finally, iHS was also estimated for all SNPs in 20 kb regions flanking the candidate adaptive TEs. Extended haplotype homozygosity (EHH) statistic was performed by Rehh package based on polymorphic data set used in iHS calculation.

Statistical Analysis

All statistical analyses were performed using the R package v3.1.3 (R Core Team 2014). All P-values in multiple testing were adjusted using the “fdr” option in the “p.adjust” function in R (Benjamini and Hochberg 1995).

Results

Identification of Polymorphic TEs in Various Populations

We used paired-end reads to identify polymorphic TE sites, including both reference and nonreference polymorphic TEs, based on the inconsistency between the mapping distances and the insertion sizes of read pairs (fig. 1supplementary fig. S1A, Supplementary Material online). Nonreference TE insertions are TEs present in at least one accession that do not exist in the reference genome at the syntenic region. Reference TE insertions are TE insertions that exist in the reference genome but are absent in at least one other accession (fig. 1). The modified reference genome sequence (Col-0), with TE sequences removed from their original positions in the reference genome (supplementary fig. S1B, Supplementary Material online), was used to evaluate the detection power of our method. Apparently, the use of mapping direction information was an efficient way to reduce the false discovery rate (FDR; from 11.26% to 1.39%) while roughly maintaining the sensitivity of detection (from 90.59% to 89.05%; table 1). In addition, given that TEs are enriched in centromeric regions, TEs identified in these regions likely have a much higher FDR due to the difficulty in mapping short reads to such regions. Consistently, TEs detected across the whole genome, including centromeric regions, had a higher FDR (1.39% vs 1.04%) and a lower sensitivity (89.05% vs 90.45%) than the modified genome without centromeric regions (table 1). Furthermore, TEs located outside of the centromere that are supported by at least three reads (on average) across all accessions had an even lower FDR (0.93%; table 1). Therefore, in subsequent analyses, we focused on the polymorphic TEs outside of centromere (fig. 1).

Table 1

Sensitivity and FDR of Various Identification Methods

Identification Method	Number of Identified TEs	Sensitivity (%)	FDR (%)
Raw data	10,910	90.59	11.26
Filtering with direction information	9,608	89.05	1.39
Filtering with direction information in noncentromeric region	8,883	90.45	1.04
Filtering with direction information in noncentromeric region and at least three reads	8,853	90.32	0.93

Sensitivity and FDR of Various Identification Methods Given that the estimated sensitivity of our method is approximately 90%, 10% of the TEs were not detected. These “missing” TEs appear to share common characteristics: Most are 100 bp or even shorter, and the distances from nearby TEs are frequently less than 1 kb (supplementary fig. S2A and B, Supplementary Material online). Consistently, among different TE families, the proportions of the TEs identified ranged from 42% to 98% (supplementary fig. S2C, Supplementary Material online). For example, for TE families RathE1, RathE2, and RathE3, only half of the total TEs were detected, largely due to their shorter lengths or being located too close to nearby TEs (supplementary fig. S2D and E, Supplementary Material online). Therefore, TEs longer than 100 bp and more than 1 kb away from other TEs could be identified by our method.

Polymorphic TEs in Three A. thaliana Populations

We used resequenced genomes of 201 representative accessions, mainly from Eurasia (supplementary fig. S3 and table S1, Supplementary Material online). The resequencing depths of these accessions were all greater than 15×, and those of 178 accessions were greater than 20×. PCA based on SNPs revealed that accessions from the Yangtze River basin clustered into an independent group; in contrast, accessions from Northwestern China formed a cluster with several Central Asian accessions that joined with the Europe accessions cluster (fig. 2). Eleven accessions that were roughly isolated from the three main clusters were excluded from subsequent analysis (fig. 2, accession names shown in red). We ultimately selected 191 accessions, 59 from the Yangtze River basin, 24 from Northwestern China, and 108 (mainly) from Europe including Col-0 reference genome for polymorphic TEs analysis, which were considered to form three large populations (hereafter referred to as popY, popN, and popE, respectively).

. 2.

—PCA based on SNP sites in 201 accessions. Accessions filtered out of the three populations are marked with accession names. popE, popN, and popY represent accessions mainly from Europe, accessions from Northwestern China and accessions from the Yangtze River basin, respectively. Using the paired-end method and excluding TEs present in only one accession or supported by only one paired-end read per accession, we identified a total of 4,305 polymorphic TE loci in the 3 populations (table 2), including 3,856 in noncentromeric regions, 2,311 of which were supported by at least 3 reads pairs per accession. Permutation analyses revealed that the number of detected polymorphic TEs does not increase linearly: 95 randomly selected accessions contain nearly 90% of all the polymorphic TEs (supplementary fig. S4A, Supplementary Material online). Therefore, the number of TEs tends to reach saturation rather than grow continually when the number of accessions increases. Validation of 20 identified TE loci by PCR using 12 representative accessions suggested that the FDR of the predicted TE present/absent events for the 2,311 polymorphic TE loci was 5.42% (supplementary table S2, Supplementary Material online).

Table 2

The Numbers of TE Loci in the Three Populations

	popE	popN	popY	Three Populations
Raw TE loci	3,930 (2,046/1,884)	2,891 (1,258/1,633)	2,932 (1,324/1,608)	4,305 (2,421/1,884)
TE loci in noncentromeric region	3,504 (1,756/1,748)	2,556 (1,052/1,504)	2,556 (1,077/1,479)	3,856 (2,108/1,748)
TE loci in noncentromeric region with at least three reads per accession on average	2,064 (1,092/972)	1,434 (704/730)	1,403 (700/703)	2,311 (1,339/972)
Inserted TEs	101,085 (16,443/84,642)	19,684 (7,910/11,774)	47,869 (20,282/27,587)	168,638 (44,635/124,003)
Inserted TEs per accession (95% confidence intervals)	944.7±7.7 (153.7±6.9/791.0±12.4)	820.2±5.5 (329.6±5.0/490.6±4.9)	811.3±2.9 (343.8±2.3/467.6±3.1)	887.6±10.3 (234.9±13.8/652.6±23.5)
Population-specific TE loci	618 (450/168)	81 (81/0)	69 (69/0)	768 (600/168)

The first number in parentheses is the number of polymorphic TEs identified by nonreference TE insertion, and the second is the number of polymorphic TEs identified by reference TE insertion.

The Numbers of TE Loci in the Three Populations The first number in parentheses is the number of polymorphic TEs identified by nonreference TE insertion, and the second is the number of polymorphic TEs identified by reference TE insertion. One measure of the success of our identification method is that the allele frequency distribution was U-shaped (fig. 3). The allele frequency distribution of polymorphic TE sites at the genome level is usually U-shaped (Kofler et al. 2015), whereas the allele frequency distribution of polymorphic TEs taking into account only the nonreference TE insertions or only the reference TE insertions in popE obtained in the present study is L-shaped (supplementary fig. S5A and B, Supplementary Material online). The abnormal frequency distribution of TEs in popE may largely result from the different genetic distances between the studied accessions and the reference accession. This bias became stronger with increasing genetic distance between the studied accessions and the reference accession, especially for accessions within popE, which are closely related to the reference genome (Spearman’s rank correlation coefficient [τ] =0.66, P < 1.26 × 10−14 for nonreference TE insertions; supplementary fig. S6A, Supplementary Material online; τ =−0.43, P < 4.54 × 10−6 for reference TE insertions; supplementary fig. S6B, Supplementary Material online). However, after merging nonreference and reference polymorphic TEs, there was no significant correlation between genetic distances and the number of polymorphic TE insertions (τ =−0.11, P = 0.26) (supplementary fig. S6C, Supplementary Material online), and the frequency distribution of polymorphic TEs in popE was U-shaped (supplementary fig. S5C, Supplementary Material online). Overall, combining results from nonreference and reference TE insertions is a much more robust way to reveal the evolutionary pattern of TEs than polymorphic TEs only identified by nonreference or by reference TE insertions.

. 3.

—Frequency and distribution of polymorphic TEs in A. thaliana populations. (A) Frequency of polymorphic TEs in A. thaliana. (B) Distribution of polymorphic TEs in the genome. (C) Frequency of polymorphic TEs in different genomic regions. (D) Composition of polymorphic TEs in different populations. Among the 2,311 loci, 1,339 loci were nonreference TE insertions and the other 972 loci were reference TE insertions. For each population, we identified 2,064, 1,434, and 1,403 polymorphic TE loci in popE, popN, and popY, respectively. In addition, 1,047 TE loci were polymorphic in all 3 populations. In contrast, we identified 618, 81 and 69 population-specific TE loci in popE, popN, and popY, respectively. Apparently, popE had the largest number of polymorphic TEs and population-specific TEs. After the effect of sample size was ruled out via a permutation test, popE still had the largest number of population-specific TEs (supplementary fig. S4B, Supplementary Material online), and also the largest number of inserted TEs (table 2; supplementary fig. S4C and D, Supplementary Material online). Most TEs are distributed at intergenic regions at either the species level or the specific population level, but, still, nearly 20% of TEs exist at genic regions in either coding sequences or introns (fig. 3). Coding regions (CDS) of 242 genes contain 245 polymorphic TE insertions (supplementary table S3, Supplementary Material online). The functions of these genes are enriched in defense response (GO enrichment analysis, FDR =0.000075) and immune response (FDR =0.015). Polymorphic TEs inserted in CDS regions and untranslated regions (UTRs) were significantly biased toward low frequencies (frequency ≤0.1) compared with TEs in intergenic regions (fig. 3 Fisher’s exact test, multiple testing corrected P = 0.0013 for TEs in CDS regions and 0.0063 in UTR regions). These results indicate the spreading of TE insertions in the regions of CDS and UTR is constrained by purifying selection. Of the 2,311 polymorphic TEs, 1,445 could be classified into specific TEs types. The proportion of DNA-type TEs (35.8%) is significantly higher than that of Helitron-type TEs (26.0%; χ2 test, P=1.36 × 10−8) and LTR-type (long terminal repeat) TEs (30.7%; P = 0.0039). Furthermore, popE, popN, and popY are roughly consistent in the composition of polymorphic TE types, as well as for polymorphic TEs shared among the three populations (fig. 3supplementary table S4, Supplementary Material online). However, the compositions of population-specific TEs in each population were different. DNA-type and LTR-type TEs (37.9% and 39.4%) are enriched in popE-specific polymorphic TEs, whereas Helitron-type TEs are the major components in popN- and popY-specific polymorphic TEs (42.2% and 48.3%, respectively; fig. 3). Overall, these results suggest that when A. thaliana spread across the world, the expansion and contraction of different types of TEs differentiated among A. thaliana populations.

Detecting Adaptive TE Insertions

Given the functional importance of TEs, TE loci with a selective advantage in specific environments could spread in a population and speed up the adaptation of an organism. We aimed to identify TEs that might have contributed to the adaptation of A. thaliana as it expanded out of Europe (The 1001 Genomes Consortium 2016; Zou et al. 2017). We screened for adaptive TEs in each population (popE, popN, popY) in two steps. First, we identified adaptive TE candidates located in selective sweep regions using the integrated haplotype score (iHS statistic), and the nucleotide diversity of TE insertion alleles compared with that of the background genome (fTE statistic; Voight et al. 2006; González et al. 2008). Second, we estimated whether the adaptive TE candidates were the actual targets of positive selection by comparing iHS values of the adaptive TE candidates with its surrounding SNP sites in 20 kb regions (fig. 4). iHS is a commonly used method in detecting genetic loci under positive selection (Voight et al. 2006; Colonna et al. 2014; Nedelec et al. 2016), as well as in identifying adaptive TEs (González et al. 2008). Here, we performed iHS analysis in each population (popE, popN, and popY) on polymorphic TE sites and genome wide SNP sites with minor allele frequency (MAF) larger than 0.05. We identified 49, 46, and 38 TE sites having the top 5% highest |iHS| values among polymorphic TEs, in popE (|iHS| thresholds is 2.33), popN (2.47), and popY (1.94), respectively (fig. 4). To quantify the significance of the top 5% highest |iHS| values, permutations were performed 100 times at each of the above TE sites. Consequently, 49, 17, and 31 TE sites with observed |iHS| values larger than the permutated values remained significant in popE, popN, and popY, respectively (fig. 4).

. 4.

—Identification of adaptive TEs. (A) Flowchart of the adaptive TE detection method. (B) The absolute values of integrated haplotype scores (|iHS|) at polymorphic TE sites in popY, popN, and popE, respectively. Each point indicates a polymorphic TE site (MAF>0.05). The green line represents the threshold of top 5% highest |iHS| values in each population. Red points are polymorphic TE sites with significantly high |iHS| values in the permutation analysis, whereas polymorphic TE sites with nonsignificant |iHS| values are marked as black points. The arrows indicate the two adaptive TEs. (C, D) |iHS| values of TE_1 and TE_2 (the red points) and SNP sites (the blue points) in 20 kb flanking regions, respectively. Gene models and the TE locus are corresponding to genome positions used in |iHS| plot. (E, F) Extended haplotype homozygosity (EHH) in adaptive TEs and its flanking regions. Genome regions under positive selection usually have lower nucleotide polymorphisms than background genomes. To confirm the result of adaptive TE candidates identified by the iHS method, we computed the proportion of nucleotide polymorphisms of TE insertion accessions (πTE, 2 kb around the predicted insertion site, namely, 1 kb upstream and 1 kb downstream) to the total nucleotide diversity of all the accessions (πTE + πnon-TE, in the same 2 kb region) for each polymorphic TE locus in each population (fTE statistics). Permutation analyses on fTE values were performed according to the same method used for |iHS| permutation, at each TE site in each population (see Material and methods for details). TE sites with fTE values lower than their 100 times permutated values were considered significant, as they are more likely located in genome regions affected by positive selection. Among TE sites with significantly high |iHS| values, 33, 7, and 13 TE sites have significant low fTE values in popE, popN, and popY, respectively. Adaptive TE candidates with low fTE values have low nucleotide diversity in TE insertion alleles, which suggest the positive selection targets may be the TE insertion alleles. Thus, overall we identified 33, 7, and 13 adaptive TE candidates in the 3 populations, respectively (supplementary table S5, Supplementary Material online). To further confirm that TEs identified as adaptive candidates are the targets of positive selection, we screened |iHS| values of SNP sites in 20 kb regions surrounding each TEs (10 kb upstream and 10 kb in downstream of each TE). Finally, 2 adaptive TE candidates have the highest |iHS| values in their flanking 20 kb regions (fig. 4table 3). The two adaptive TEs show higher haplotype homozygosities in TE insertion alleles than alleles without the TEs, respectively (fig. 4). These two adaptive TE candidates are more likely to be the actual targets of positive selection in the tested populations.

Table 3

List of Two Adaptive TEs

Adaptive TE	The Predicted Central Position of TE	The Upstream Gene		The Related TE		The Downstream Gene		TE Insertion Frequency in popE/N/Y	\|iHS\|	f_TE
Adaptive TE	The Predicted Central Position of TE	Gene ID	Annotation	TE ID	Annotation	Gene ID	Annotation	TE Insertion Frequency in popE/N/Y	\|iHS\|	f_TE
TE_1	Chr5: 14270878	AT5G36220	CYP81D1	AT5TE51000	LINE element	AT5G36228	Nucleic acid-binding protein	76/1/5	3.29	0.036
TE_2	Chr5: 21851755	AT5G53800	Nucleic acid-binding protein	AT5TE78710 in the intron of AT5G53810	Copia element; AT5G53810 is an O-methyltransferase family protein.	AT5G53820	Late embryogenesis abundant protein (LEA) family protein	101/7/0	3.60	0

List of Two Adaptive TEs

Discussion

TEs are a major source of genomic mutations, which, like any environmental mutagen, occasionally lead to beneficial changes (Lynch 2007). After its discovery in maize (McClintock 1950), TEs have been investigated comprehensively, including their identification and classification (Lisch 2013), evolutionary dynamics (Petrov et al. 2011; González and Petrov 2012; Agren 2014; Barrón et al. 2014; Bousios et al. 2016; Pietzenuk et al. 2016), and structural and functional effects (Kazazian 2004; Rey et al. 2016; Wei and Cao 2016). From an evolutionary perspective, it is important to study two aspects of TEs, that is, their evolutionary dynamics in the genome and the contribution of TEs to adaptation in the context of global climate change. To address these questions, we need to identify TEs based on resequencing data sets from natural populations. Most previous studies of TEs in populations have only focused on polymorphic TEs absent from the reference genome and have ignored reference TE insertions, thereby failing to address a significant portion of TE polymorphisms. In this study, we merged data from reference and nonreference TE insertions, and demonstrated that combining nonreference and reference TE insertions is a more robust way to reveal the evolutionary pattern of TEs. As TEs represent an important source of genetic variation, they can contribute to the evolution of an organism in diverse ways, such as acquiring coding ability and altering the coding sequence (Cowan et al. 2005; Joly-Lopez et al. 2012; Sun et al. 2014), and the expression level of a gene (Kobayashi et al. 2004; González et al. 2009; Butelli et al. 2012). More importantly, TE mutations could affect adaptation to the environment (Casacuberta and González 2013; Quadrana et al. 2016; Van't Hof et al. 2016), and TE mutations with beneficial effects on adaptation in natural populations could become fixed. In this study, we found that at least two of 2,311 TE loci are likely to be targets of positive selection, and thus have contributed to the adaptation of A. thaliana. Overall, our findings highlight that TEs could play important roles in the adaptation of organisms to global climate change. Click here for additional data file.

64 in total

Review 1. The effect of transposable elements on phenotypic variation: insights from plants to humans.

Authors: Liya Wei; Xiaofeng Cao
Journal: Sci China Life Sci Date: 2016-01-11 Impact factor: 6.038

2. The significance of responses of the genome to challenge.

Authors: B McClintock
Journal: Science Date: 1984-11-16 Impact factor: 47.728

3. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

4. Identification of a functional transposon insertion in the maize domestication gene tb1.

Authors: Anthony Studer; Qiong Zhao; Jeffrey Ross-Ibarra; John Doebley
Journal: Nat Genet Date: 2011-09-25 Impact factor: 38.330

5. Evolution of DNA methylation patterns in the Brassicaceae is driven by differences in genome organization.

Authors: Danelle K Seymour; Daniel Koenig; Jörg Hagmann; Claude Becker; Detlef Weigel
Journal: PLoS Genet Date: 2014-11-13 Impact factor: 5.917

6. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation.

Authors: Tim Stuart; Steven R Eichten; Jonathan Cahn; Yuliya V Karpievitch; Justin O Borevitz; Ryan Lister
Journal: Elife Date: 2016-12-02 Impact factor: 8.140

7. Adaptation of Arabidopsis thaliana to the Yangtze River basin.

Authors: Yu-Pan Zou; Xing-Hui Hou; Qiong Wu; Jia-Fu Chen; Zi-Wen Li; Ting-Shen Han; Xiao-Min Niu; Li Yang; Yong-Chao Xu; Jie Zhang; Fu-Min Zhang; Dunyan Tan; Zhixi Tian; Hongya Gu; Ya-Long Guo
Journal: Genome Biol Date: 2017-12-28 Impact factor: 13.583

8. Genome sequence comparison of Col and Ler lines reveals the dynamic nature of Arabidopsis chromosomes.

Authors: Piotr A Ziolkowski; Grzegorz Koczyk; Lukasz Galganski; Jan Sadowski
Journal: Nucleic Acids Res Date: 2009-03-21 Impact factor: 16.971

9. High rate of recent transposable element-induced adaptation in Drosophila melanogaster.

Authors: Josefa González; Kapa Lenkov; Mikhail Lipatov; J Michael Macpherson; Dmitri A Petrov
Journal: PLoS Biol Date: 2008-10-21 Impact factor: 8.029

10. Mating system shifts and transposable element evolution in the plant genus Capsella.

Authors: J Ågren Agren; Wei Wang; Daniel Koenig; Barbara Neuffer; Detlef Weigel; Stephen I Wright
Journal: BMC Genomics Date: 2014-07-16 Impact factor: 3.969

19 in total

Review 1. Substitutions Are Boring: Some Arguments about Parallel Mutations and High Mutation Rates.

Authors: Maximilian Oliver Press; Ashley N Hall; Elizabeth A Morton; Christine Queitsch
Journal: Trends Genet Date: 2019-02-20 Impact factor: 11.639

Review 2. Genome-wide analysis of CCHC-type zinc finger (ZCCHC) proteins in yeast, Arabidopsis, and humans.

Authors: Uri Aceituno-Valenzuela; Rosa Micol-Ponce; María Rosa Ponce
Journal: Cell Mol Life Sci Date: 2020-04-18 Impact factor: 9.261

3. Transposable elements expression in Rhinella marina (cane toad) specimens submitted to immune and stress challenge.

Authors: Adriana Ludwig; Michelle Orane Schemberger; Camilla Borges Gazolla; Joana de Moura Gama; Iraine Duarte; Ana Luisa Kalb Lopes; Carolina Mathias; Desirrê Alexia Lourenço Petters-Vandresen; Michelle Louise Zattera; Daniel Pacheco Bruschi
Journal: Genetica Date: 2021-08-12 Impact factor: 1.082

4. Genomic-based epidemiology reveals independent origins and gene flow of glyphosate resistance in Bassia scoparia populations across North America.

Authors: Karl Ravet; Crystal D Sparks; Andrea L Dixon; Anita Küpper; Eric P Westra; Dean J Pettinga; Patrick J Tranel; Joel Felix; Don W Morishita; Prashant Jha; Andrew Kniss; Phillip W Stahlman; Paul Neve; Eric L Patterson; Philip Westra; Todd A Gaines
Journal: Mol Ecol Date: 2021-10-21 Impact factor: 6.622

5. The high-quality genome of diploid strawberry (Fragaria nilgerrensis) provides new insights into anthocyanin accumulation.

Authors: Junxiang Zhang; Yingying Lei; Baotian Wang; Song Li; Shuang Yu; Yan Wang; He Li; Yuexue Liu; Yue Ma; Hongyan Dai; Jiahong Wang; Zhihong Zhang
Journal: Plant Biotechnol J Date: 2020-02-15 Impact factor: 9.803

6. Genome Sequencing of Cladobotryum protrusum Provides Insights into the Evolution and Pathogenic Mechanisms of the Cobweb Disease Pathogen on Cultivated Mushroom.

Authors: Frederick Leo Sossah; Zhenghui Liu; Chentao Yang; Benjamin Azu Okorley; Lei Sun; Yongping Fu; Yu Li
Journal: Genes (Basel) Date: 2019-02-08 Impact factor: 4.096

10. Stowaway miniature inverted repeat transposable elements are important agents driving recent genomic diversity in wild and cultivated carrot.

Authors: Alicja Macko-Podgórni; Katarzyna Stelmach; Kornelia Kwolek; Dariusz Grzebelus
Journal: Mob DNA Date: 2019-11-27