Literature DB >> 30356268

Genome-wide analyses of miniature inverted-repeat transposable elements reveals new insights into the evolution of the Triticum-Aegilops group.

Danielle Keidar-Friedman1, Inbar Bariah1, Khalil Kashkush1.   

Abstract

The sequence drafts of wild emmer and bread wheat facilitated high resolution, genome-wide analysis of transposable elements (TEs), which account for up to 90% of the wheat genome. Despite extensive studies, the role of TEs in reshaping nascent polyploid genomes remains to be fully understood. In this study, we retrieved miniature inverted-repeat transposable elements (MITEs) from the recently published genome drafts of Triticum aestivum, Triticum turgidum ssp. dicoccoides, Aegilops tauschii and the available genome draft of Triticum urartu. Overall, 239,126 MITE insertions were retrieved, including 3,874 insertions of a newly identified, wheat-unique MITE family that we named "Inbar". The Stowaway superfamily accounts for ~80% of the retrieved MITE insertions, while Thalos is the most abundant family. MITE insertions are distributed in the seven homologous chromosomes of the wild emmer and bread wheat genomes. The remarkably high level of insertions in the B sub-genome (~59% of total retrieved MITE insertions in the wild emmer genome draft, and ~41% in the bread wheat genome draft), emphasize its highly repetitive nature. Nearly 52% of all MITE insertions were found within or close (less than 100bp) to coding genes, and ~400 MITE sequences were found in the bread wheat transcriptome, indicating that MITEs might have a strong impact on wheat genome expression. In addition, ~40% of MITE insertions were found within TE sequences, and remarkably, ~90% of Inbar insertions were located in retrotransposon sequences. Our data thus shed new light on the role of MITEs in the diversification of allopolyploid wheat species.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 30356268      PMCID: PMC6200218          DOI: 10.1371/journal.pone.0204972

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The origin of wheat (Triticum-Aegilops group) dates back some 4 million years ago, with the divergence of three ancestral species from a common progenitor, namely Triticum urartu (donor of the A genome), an unknown Aegilops species from the sitopsis section (donor of the B genome) and Aegilops tauschii (donor of the D genome) [1]. The first allopolyploidization event included hybridization of T. urartu with an Aegilops species, leading to the formation of the tetraploid Triticum turgidum (wild emmer, genome AB) around 500,000 years ago [2]. The second event included hybridization between T. turgidum and Ae. tauschii, resulting in the formation of the hexaploid T. aestivum (bread wheat, genome ABD) around 10,000 years ago [1, 3]. Newly formed allopolyploid species reveal massive genome reorganizations and epigenetic modifications that affect the regulation of gene expression, causing the activation of some transposon families and the silencing of others [2, 4, 5]. Transposable elements (TEs) are DNA fragments that can change their location and proliferate within the host genome and are found in all organisms investigated to date. Over 80% of the wheat genome comprises TEs [6-9]. Plant TEs can be divided into two main classes. Class I TEs, termed RNA elements or retrotransposons, move via a “copy and paste” mechanism involving an RNA intermediate. Class II TEs, termed DNA transposons, translocate via a “cut and paste” mechanism without the involvement of any intermediate molecules [4]. These TE classes can be sub-divided into superfamilies and families, with a given genome possibly consisting of hundreds or thousands of different families. Superfamilies within the same order share a replication strategy but differ in terms of their DNA sequence and target site duplication (TSD) size. Families, on the other hand, are defined by DNA sequence conservation [4, 10]. Finally, active TEs can affect genome structure and function [11-13]. Recent studies showed that the non-autonomous miniature elements termed miniature inverted-repeat transposable elements (MITEs) are one of the most active TEs in eukaryotes [14-17]. MITEs belong to the order TIR (Tandem Inverted Repeat) of the DNA transposons. MITEs are short DNA elements, comprising tens to several hundred base pairs, and are found only in eukaryotic genomes [4, 18]. MITEs are found in high copy numbers in some plant species, such as rice [19] and maize [18, 20], and were shown to be in strong association with genes [10, 21–24]. MITEs were also shown to be active in plants [14, 15, 19, 25], where they affect the expression of genes by insertion into introns, promotors or other gene regulatory sequences [26-29]. In a previous study, ~18,000 insertions of 18 Stowaway-like MITE families in wheat were analyzed [30] using the shotgun sequence draft of a 454-pyrosequence of T. aestivum [31]. The availability of genome drafts for four Triticum and Aegilops species, specifically the updated sequence draft of T. aestivum [7] and the recently published genome drafts of T. turgidum ssp. dicoccoides [8] and Aegilops tauschii [32], allowed for genome-wide high-resolution analysis of TEs. In this study, we performed genome-wide analysis of all known MITE families in wheat in four genome drafts, namely T. aestivum (genome ABD), T. turgidum dicoccoides (genome AB), Ae. tauschii (genome D) and T. urartu (genome A). The known MITE superfamilies in wheat genomes are Stowaway, characterized by very short length (70–350 bp) elements and a TSD corresponding to the TA dinucleotide [33], Tourist, corresponding to a size of 100–400 bp and a TSD of TWA (W = a/t), and Mutator, characterized by sequences sized 100–700 bp and a long and varying TSD (7–10 bp) [18]. We retrieved 235,252 known MITE insertions from these four wheat genome drafts, mostly belonging to the Stowaway superfamily. In addition, we discovered a new wheat-unique MITE family, termed Inbar, and retrieved 3,874 such insertions from the 4 genome drafts. The impact of MITEs on genome structure and function is discussed.

Materials and methods

Wheat genomic and transcriptomic sources

In this study, we used genome drafts of four Triticum and Aegilops species. T. urartu, the donor of A genome, was sequenced by Illumina using a whole-genome paired-end shotgun approach. The sequenced reads were assembled using SOAP denovo and the sequence depth of most (96%) reads was over 20x, with a peak at 85x (http://plants.ensembl.org/Triticum_urartu/Info/Index) [34]. Ae. tauschii ssp strangulata accession AL8/78, a close relative of the D genome donor, was sequenced using BAC and whole-genome shotgun approaches. These sequences were assembled to a create genome draft of 4.3 Gbp covering 95.2% of the genome sequence [32]. T. turgidum ssp dicoccoides is wild emmer wheat (WEWseq: http://wewseq.wix.com/consortium). The full genome draft of emmer wheat (Triticum turgidum ssp. dicoccoides) containing sorted chromosomes was sequenced using paired-end and mate-pair shotgun sequencing to a depth of 190x, and was recently published [8]. Sequenced reads were assembled using the DeNovoMAGIC tool (created by NRGene) to cover ~95% of the emmer wheat genome. The genome of T. aestivum, the hexaploid bread wheat, was sequenced in June, 2016 and can be found at EnsemblPlants [7] (pre.plants.ensembl.org/Triticum_aestivum/Info/Index). This updated T. aestivum assembly was generated by The Genome Analysis Center in Norwich (TGACv1). The 2016 update of T. aestivum assembly, TGACv1, covers 13.4 Gbp of the genome with an N50 of 88.8 kbp. Scaffolding was carried on using SOAPdenovo and CSS reads [9].

RNA-seq database

We used the updated publicly available RNA-seq database of T. aestivum found at Ensemblplants [7]. The library includes cDNA, CDS and ncRNA sequences (plants.ensembl.org/info/website/ftp/index.html). We also used the RNA-seq database of T. turgidum ssp dicoccoides [8] found at (https://wheat.pw.usda.gov/GG3/wildemmer, http://wewseq.wixsite.com/consortium).

Computer-assisted analysis

Retrieval of MITE insertions

The sequences of 35 previously characterized MITE families belonging to the Stowaway, Tourist, Mutator and unknown superfamilies were retrieved from the 4 genome drafts, using MITE analysis kit (MAK) software (http://labs.csb.utoronto.ca/yang/MAK/) [35, 36]. MAK is a homology-based software, meaning it uses a consensus MITE sequence as query and the BLASTN algorithm with global alignment. The publicly available consensus sequence of each MITE family (TREP database at http://wheat.pw.usda.gov/ggpages/Repeats/ and GIRI database at http://www.girinst.org/repbase/update/browse.php) was used as an input (query sequence) in the MAK software. We used an e-value of 1e-3 and an end mismatch tolerance of 20 nucleotides. In addition, flanking sequences (500 bp from each end) were retrieved, together with each MITE insertion, to molecularly characterize the insertion sites. A rice-specific MITE called mPing served as a negative control in the MAK analysis. No mPing-related sequences were retrieved from any of the 4 genome drafts. Redundant sequences were detected with NCBI BLAST+ software [37] standalone version (https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews#1). The BLASTN function for insertion duplicates (MAK errors) was used for comparing each family sequence and exclusion of the paired element from each couple of sequences that were found to share 100% identity. It is important to mention that in this analysis, we considered truncated elements (at one of the terminal sequences) as being nearly intact elements.

Statistical analysis, sequence conservation, and target site preference

The Galaxy online platform was employed for sequence characterization and statistical analyses [38, 39], using the “Fasta manipulation” and “Statistics” programs. Calculation of average TE lengths for each MITE family was done using the ‘Compute Sequence Length’ function that calculates nucleotide sequences length in a FASTA file, the ‘Count’ function that calculates the copy numbers of a TE family and the ‘Summary Statistics’ function that calculates the summation, mean and standard deviation of sequence lengths in Galaxy. Output files were edited with the “regular expression” function of Textpad 7.4 to separate an element sequence from its flanking sequences before further analysis. Levels of sequence conservation in each MITE family were analyzed using the MAFFT 7.245 package for multiple sequence alignment, using default parameters and output as fasta files [40], and DNAsp 5.10.1 software [41]. MAFFT creates a multiple sequence alignment for a transposable element family in a FASTA format file which was then analyzed with DNAsp to find sequence similarities and conserved regions, using the ‘haplotype diversity’ function and viewing the sequence alignments. Analysis of target site preference for each TE family was done using the publicly available online WebLogo 3.0 package [42]. Target site duplications were retrieved using MAK software during TE analysis, adjusted to fit the same length (by adding Ns) for each family and then analyzed at the WebLogo website (http://weblogo.threeplusone.com/create.cgi). The WebLogo 3.0 software generates graphical presentations for each transposable element family sequence and target site preferences. Each logo comprises a stack of letters (different nucleotides) for each nucleotide position in a sequence. The height of each letter in the stack represents its relative frequency at the specific position, while stack width represents the relative fraction of valid nucleotides at that position.

Annotation of MITE-flanking sequences

Annotation of MITE flanking sequences was performed using the complementary-DNA (cDNA), coding sequences (CDS) and non-coding RNA (ncRNA) databases of wheat species taken from EnsemblPlants (http://plants.ensembl.org/index.html) and the TE databases of plant transposable elements taken from TREP (botserv2.uzh.ch/kelldata/trep-db/index.html). Annotation was performed using BLAST+ standalone version 2.2.3 with an e-value of 1e-10. The merged 5’ and 3’ flanking sequences, as well as the transposable elements themselves, were used as query against the mentioned databases. The best annotation hit of each flanking sequence was chosen to determine the specific protein product, ncRNA or TE family. Furthermore, genes that contain TE sequences were analyzed for association of TEs with wheat genes (i.e., located in an intron, exon, up to 100 bp downstream or upstream to a given gene), using the EnsemblPlants database. Association between a MITE and gene was considered when the MITE was inserted into or adjacent to (up to 100 bp upstream or downstream) the gene.

Plant material

We used various Triticum and Aegilops species, and synthetic allohexaploids (S1 Table), namely the A genome (T. urartu, 2 accessions; T. monoccocum, 1 accession), the B genome (Ae. speltoides, Ae. searsii, Ae. sharonensis and Ae. longissima), the D genome (Ae. tauschii, 3 accessions), and the allopolyploid species T. turgidum (3 accessions of dicoccoides and 3 of durum) and T. aestivum (3 accessions of bread wheat, and 4 accessions of synthetic generations of ABD, S1-S4 generations) [43]. Accessions details are found in S1 Table.

DNA isolation and site-specific PCR analysis

DNA was extracted from young leaves (4 weeks post-germination) using a DNeasy plant kit (Qiagen). PCR validation of the existence of the newly discovered MITE family (Inbar) was performed using primers, designed using PRIMER3 version 4.0.0 (bioinfo.ut.ee/primer3/) from flanking sequences of the insertion. Each PCR reaction contained 12.5 μl ultrapure water (Biological Industries), 2 μl of 10xC Taq DNA polymerase buffer (EURX), 1.5 μl of 25 mM MgCl2 (EURX), 0.8 μl of 2.5 mM dNTPs, 0.2 μl Taq DNA polymerase (5 U μl-1, EURX), 1 μl of each site-specific primer (50 ng/μl) and 1 μl of template genomic DNA (approximately 50 ng/μl). The PCR conditions were 94°C for 3 min, 30 cycles of 94° C for 1 min, 60°C for 1 min and 72°C for 1 min, and 72°C for 3 min. For sequence validation, PCR products were extracted from agarose gels using a QIAquick PCR Purification Kit (Qiagen) and subjected to sequencing. Primer sequences can be found in S6 Table.

Results

Assessing MITE composition and chromosomal distribution in diploid and polyploid genomes

In addition to the genome drafts of two diploid genome donors (A and D genomes), the updated genome drafts of the polyploid wild emmer wheat and bread wheat facilitated detailed analyses of the content, chromosome location and distribution of MITE families and allowed comparative analysis of MITE composition among Triticum and Aegilops species from different ploidy levels in the present study. The consensus sequences of all characterized MITE families in Triticum and Aegilops species were used as queries in MAK software to retrieve MITE insertions, together with flanking sequences (500 bp from each side), from the genome drafts of T. urartu (donor of genome A), Ae. tauschii (donor of genome D), T. turgidum ssp. dicoccoides (genome AB, wild emmer, the “mother” of wheat) and T. aestivum (genome ABD). We included mPing, a rice-unique MITE family [14], in this analysis as a negative control; no mPing sequences were retrieved from any of the Triticum or Aegilops genome drafts. In addition, all of the retrieved sequences corresponded to nuclear DNA, with no sequences being retrieved from the mitochondrial and/or chloroplast genomes.

MITE composition in the T. urartu genome

We retrieved 15,513 insertions belonging to 35 MITE families (Table 1) from the T. urartu genome using MAK software, which account for ~2.62 Mbp (0.053%) of the total ~4,940 Mbp [34]. Most of the retrieved MITE families (19 of the 35 families) belong to the Stowaway superfamily (12,567 insertions or 81% of total MITE insertions; Fig 1). However, 5 families belong to the Tourist superfamily (1,075 insertions or 6.93% of total MITE insertions), 7 families belong to the Mutator superfamily (696 insertions or 4.5% of total MITE insertions) and 4 families belong to unknown superfamilies (1,175 insertions or 7.57% of total MITE insertions). The Thalos family (Stowaway superfamily) was found to present the highest copy numbers (5249 insertions), while the Polyphemus family (Stowaway superfamily) was found to present the lowest copy numbers (only 1 insertion). Other families presenting relatively high copy numbers were Athos (2,314, Stowaway superfamily), Pan (1,407, Stowaway superfamily), Belus (929, unknown superfamily), Icarus (694, Stowaway superfamily), Hades (643, Stowaway superfamily), and Minos (636, Stowaway superfamily).
Table 1

Characterization of MITE families in four wheat species: Consensus element size, target site preference (TSD), copy number and % of total MITE content.

Family nameSuperfamilyname1Consensus Element Size (bp)2Target Site Preference (TSD)3Copy number% of total MITEs
T. aestivumT. turgidumAe. tauschiiT. urartuT. aestivumT. turgidumAe. tauschiiT. urartu
ThalosStowaway158TA423212794612557524936.2639.1141.3834.04
AthosStowaway81TA19359106375297231419.4714.8917.4615.01
PanStowaway123TA14296111083838140711.6815.5412.659.13
IcarusStowaway108TA6795451216496945.746.315.434.50
HadesStowaway92TA365626358986434.543.692.964.17
EosStowaway344TC326421509744651.423.013.213.02
XadosStowaway112AG193714254453442.401.991.472.23
MinosStowaway236TA125310221646361.261.430.544.13
AisonStowaway215TA8526921151661.040.970.381.08
StolosStowaway255TA7205091972320.930.710.651.51
FortunaStowaway349TA509440351120.280.620.120.73
OleusStowaway146TA4072581301060.560.360.430.69
AntonioStowaway104TA32022684840.400.320.280.55
MinimusStowaway51TA32020686760.380.290.280.49
TantalosStowaway253CC1057331280.140.100.100.18
PhoebusStowaway319CG3426730.050.040.020.02
PolyphemusStowaway237TC2211810.020.020.030.01
JasonStowaway256TA1810740.020.010.020.03
OrpheusTourist272TAA191214742995462.362.060.993.54
KerberosTourist285TA15948086891001.041.132.270.65
CoeusTourist273TTA777612531630.930.860.171.06
XenonTourist305AG5564221222210.740.590.401.43
VictorTourist276GCA19410657450.200.150.190.29
GeraldMutator345AAAAATTAA11199251914011.141.290.632.60
RheaMutator561TACAAAAAA4733461241940.510.480.411.26
SpringMutator223GGGGAACC31927040170.390.380.130.11
ArgusMutator327TTTAATTAA3062806100.310.390.020.07
VacunaMutator464TTT19714129400.200.200.100.26
GabrielMutator407CCTC1614570.010.020.020.05
Belusunknown173CATG8522611420149294.568.566.646.03
Keresunknown71CGGTCCG640510931000.470.710.310.65
Gorgonunknown56GC21712364530.350.170.210.34
Inbarunknown58TA1911184624931.662.360.080.60
RemusMutator829CG2112137270.170.300.020.18
MariusStowaway691TA1511330.020.020.010.02
MurrayMutator937TACTGCTCC21000.000.000.000.00
Total115169781023034215513

1Based on: http://wheat.pw.usda.gov/ITMI/Repeats

2Based on the TREP database (http://wheat.pw.usda.gov/ggpages/Repeats/) and the GIRI database (http://www.girinst.org/repbase/update/browse.php)

3see S1 Text

Fig 1

Proportions of MITE insertions by superfamilies in Triticum and Aegilops genomes.

From outer to inner circles: T. aestivum (AABBDD), T. turdigum ssp dicoccoides (AABB), Ae. tauschii (DD) and T. urartu (AA). Percentages denote the fraction of each superfamily from the total number of MITE insertions.

Proportions of MITE insertions by superfamilies in Triticum and Aegilops genomes.

From outer to inner circles: T. aestivum (AABBDD), T. turdigum ssp dicoccoides (AABB), Ae. tauschii (DD) and T. urartu (AA). Percentages denote the fraction of each superfamily from the total number of MITE insertions. 1Based on: http://wheat.pw.usda.gov/ITMI/Repeats 2Based on the TREP database (http://wheat.pw.usda.gov/ggpages/Repeats/) and the GIRI database (http://www.girinst.org/repbase/update/browse.php) 3see S1 Text Stowaway and Tourist MITEs in plants are known to be short length elements (100–500 bp), while Mutator MITEs tend to be longer (100–700) [18]. Our analysis showed that Stowaway MITEs had an average length of 214 bp, with element length ranging from around 40 to 691 bp. Tourist MITEs lengths ranged around 269–296 bp, with an average length of 278 bp. Mutator MITEs lengths were found to be longer than those of Stowaway and Tourist MITEs, ranging from around 221 to 826 bp, with an average length of 446 bp. The 4 MITE families that belong to unknown superfamilies had very short lengths, ranging from around 51 to 171 bp. We also analyzed the relative fraction of each MITE family from the total number of insertions found in the genome (Table 1, Fig 2). The Thalos family (Stowaway superfamily) had the highest fraction, corresponding to 34% af all MITEs, followed by Athos (Stowaway superfamily; 15%), Pan (Stowaway superfamily; 9%), Belus (unknown superfamily; 6%), Icarus (Stowaway superfamily; 4.5%, Hades (Stowaway superfamily; 4.1%). The other families presented fractions ranging from 0.006% to 3%. This shows that most MITE insertions within the genome of T. urartu belong to 3 Stowaway families, Thalos, Athos and Pan, which account for approximately 58% of total MITE insertions.
Fig 2

Proportion of MITE insertions by families in Triticum and Aegilops genomes.

From outer to inner circles: T. aestivum (AABBDD), T. turdigum ssp dicoccoides (AABB), Ae. tauschii (DD) and T. urartu (AA). MITE families are indicated by different colours (left). Percentages denote the fraction of each family from the total number of MITE insertions. Note that only values for the 12 most abundant families (according to abundance in the polyploids) are shown due to space limitations.

Proportion of MITE insertions by families in Triticum and Aegilops genomes.

From outer to inner circles: T. aestivum (AABBDD), T. turdigum ssp dicoccoides (AABB), Ae. tauschii (DD) and T. urartu (AA). MITE families are indicated by different colours (left). Percentages denote the fraction of each family from the total number of MITE insertions. Note that only values for the 12 most abundant families (according to abundance in the polyploids) are shown due to space limitations.

MITE composition in the Ae. tauschii genome

Overall, 30,342 insertions belonging to 35 MITE families (Table 1) were retrieved from the Ae. tauschii genome draft, which account for ~4.57 Mbp (0.1%) of the total ~4,360 Mbp [44]. The retrieved MITE families included 19 families assigned to the Stowaway superfamily (12,012 insertions or 87.4% of total MITE insertions), 5 families of the Tourist superfamily (818 insertions or 4% of total MITE insertions), 7 families of the Mutator superfamily (394 insertions or 1.32% of total MITE insertions) and 4 families of unknown superfamilies (948 insertions or 7.23% of total MITE insertions) (Fig 1). Thalos presented the highest copy number (12,557 insertions), while Marius presented the lowest copy number (3 insertions only). Other families presenting high copy numbers were Athos (5,297, Stowaway superfamily), Pan (3,838, Stowaway superfamily), Belus (2,014, unknown superfamily), Icarus (1,649, Stowaway superfamily) and Hades (898, Stowaway superfamily). Analysis of the relative fraction of each family of insertions from the total number of MITE insertions revealed that the Thalos family (Stowaway superfamily) had the highest fraction (41.4% of all MITEs), followed by Athos (Stowaway superfamily; 17.5%), Pan (Stowaway superfamily; 12.6%), Belus (unknown superfamily; 6.6%), Icarus (Stowaway superfamily; 5.4%) and Hades (Stowaway superfamily; 3%). Fractions from other families ranged from 0.01% to 3.21% (Fig 2). This showed, similarly to what was found for the T. urartu genome, that the most abundant families are Thalos, Athos and Pan, which account for approximately 71.5% of total MITE insertions.

MITEs composition in the T. turgidum ssp. dicoccoides genome

Overall, 78,102 insertions belonging to 36 MITE families were retrieved from the tetraploid T. turgidum ssp. dicoccoides genome draft (Table 1), which account for ~ 12 Mbp (0.1%) of the total ~ 12,000 Mbp [8]. The retrieved MITE families included 19 families of the Stowaway superfamily (63,897 insertions or ~81.8% of total MITE insertions), 5 families of the Tourist superfamily (3,422 insertions or 4.38% of total MITE insertions), 9 families of the Mutator superfamily (2,190 insertions or 2.8% of total MITE insertions) and 4 families of unknown superfamilies with 8,593 insertions or 11% of total MITE insertions (Fig 1). Thalos had the highest copy number (27,946 insertions), while Murray only had a single insertion (Table 1). Other families with high copy numbers were Pan (11108, Stowaway superfamily), Athos (10637, Stowaway superfamily), Belus (6114, unknown superfamily) Icarus (4512, Stowaway superfamily), Hades (2635, Stowaway superfamily), Eos (2150, Stowaway superfamily), Xados (1425, Stowaway superfamily) and Orpheus (1474, Tourist superfamily). Analysis of the relative fraction of each family from the total number of MITEs showed that the Thalos family (Stowaway superfamily) had the highest fraction, with 35.8% of all MITE insertions, followed by Pan (Stowaway superfamily;14.2%), Athos (Stowaway superfamily; 13.6%, Belus (unknown superfamily; 7.8%), Icarus (Stowaway superfamily 5.8%), Hades (Stowaway superfamily; 3.4%) and the other families, with fractions of 0.001% to 3% (Table 1, Fig 2). The three Stowaway-MITE families Thalos, Athos and Pan are the most abundant in the wild emmer wheat genome and account for ~64% of the total MITE insertions.

Chromosomal distribution of MITE insertions in T. turgidum ssp. dicoccoides

The newly available genome draft of emmer wheat (T. turgidum ssp. dicoccoides) facilitated analysis of MITE distribution in the two sub-genomes and 7 homoeologous chromosomes (Fig 3A). We found that 59% of the insertions (46,074 of 78,102 insertions) were located within the B sub-genome, as compared to 38.2% (29,844) insertions located within the A sub-genome (S2 Table). Note that 2.8% (2,154) of the insertions could not be mapped to either sub-genomes, and were thus listed as “unknown” (Fig 3). These data could indicate different proliferation levels of MITEs in the T. dicoccoides A and B sub-genomes (See S1 Fig for distribution of each family). For example, 62.7% of Thalos insertions (17,522) were found in the B sub-genome, 34.67% (9,689) of the insertions were found in the A sub-genome and only 2.63% were found in unidentified genome regions (S1A Fig, S2 Table). However, some families showed a different trend in terms of copy numbers. For example, 80.92% of Minos insertions were found in the A sub-genome (827), while only 16.34% of such insertions were found in the B sub-genome (167); 2.74% (28) Minos insertions were found in unidentified genome regions (S1H Fig). At the chromosome level, the highest fraction of MITE insertions was 12,024 elements found within group-2 chromosomes. The highest fraction of MITE elements was found within chromosome 7B (7,459 insertions), accounting for 9.5% of total MITE insertions.
Fig 3

Distribution of MITE insertions within the seven homologous chromosomes.

Each chromosome is defined by its genome (AA, BB, and DD, indicated by different colours; top) and number (1–7). a. Distribution of MITE insertions within chromosomes of the tetraploid T. turgidum ssp. dicoccoides (genome AABB). b. Distribution of MITE insertions within chromosomes of the hexaploid T. aestivum (genome AABBDD).

Distribution of MITE insertions within the seven homologous chromosomes.

Each chromosome is defined by its genome (AA, BB, and DD, indicated by different colours; top) and number (1–7). a. Distribution of MITE insertions within chromosomes of the tetraploid T. turgidum ssp. dicoccoides (genome AABB). b. Distribution of MITE insertions within chromosomes of the hexaploid T. aestivum (genome AABBDD).

MITEs composition in the T. aestivum genome

Overall, 115,169 insertions belonging to 36 MITE families were retrieved from the hexaploid T. aestivum genome draft (Table 1), which account for ~17.6 Mbp (0.1%) of the total ~17,000 Mbp [7, 9]. The retrieved MITE families included 19 families belonging to the Stowaway superfamily (96,203 insertions or 83.53% of total MITE insertions), 5 families belonging to the Tourist superfamily (5,033 insertions or 4.37% of total MITE insertions), 9 families belonging to the Mutator superfamily (with 2,643 insertions or 2.29% of total MITE insertions) and 4 families belonging to unknown superfamilies (11,290 insertions or 9.8% of total MITE insertions) (Fig 1). Thalos showed the highest copy number (42,321 insertions), while only 2 Murray insertions were found. Other families with high copy numbers were Athos (19,359, Stowaway superfamily), Pan (14,296, Stowaway superfamily), Belus (8,522, unknown superfamily), Icarus (6,795, Stowaway superfamily), Hades (3,656, Stowaway superfamily), and Eos (3,264, Stowaway superfamily). As in the T. turgidum ssp. dicoccoides genome, copy numbers varied remarkably between the different MITE families, with the majority of high copy number families belonging to the Stowaway superfamily. Analysis of the relative fraction of each family from the total MITEs population showed that the Thalos family (Stowaway superfamily) had the highest fraction at ~37%, followed by Athos (Stowaway superfamily; ~17%), Pan (Stowaway superfamily; ~12%), Belus (unknown superfamily; 7.5%), Icarus (Stowaway superfamily; 6%), and other families, with fractions spanning 0.01% to ~3% (Table 1, Fig 2). As in the diploid and tetraploid species, the highest copy numbers of MITE families in the hexaploid genome were Thalos, Athos and Pan, which account for approximately 66% of total MITE insertions, indicating that these 3 families contained the most active MITEs throughout wheat evolution.

Chromosomal distribution of MITE insertions in T. aestivum

The updated genome draft of the hexaploid wheat genome (publicly available on EnsemblPlants since June, 2016) allowed us to analyze the distribution of MITE insertions in the three sub-genomes and 7 homologous chromosomes of wheat (Fig 3B). We found that 41% of MITE insertions were located within the B sub-genome (47,324 of 115,169 total MITE insertions), as compared to 29.3% (33,787) of the insertions in the D sub-genome, 27.3% (31,424) of the insertions in the A sub-genome and 2.28% insertions being unmapped (S2 Table). Most MITE families presented a sub-genome-specific proliferation profile (see S2 Fig for distribution of each family), meaning that they were not equally distributed across all three sub-genomes. For example, 41% (17,202) of Thalos insertions were found in the B sub-genome, as compared to 34.5% (14,600) of such insertions in the D sub-genome and 23% (9,699) in the A sub-genome (S2A Fig, S2 Table). Another such example was the Minos family, where ~70% (871) of the insertions were found in the A sub-genome, as compared to ~15% (194) in the D sub-genome) and only 12% (151) in the B sub-genome (S2H Fig). At the chromosome level, the highest fraction of MITE insertions were the 17,356 elements found within group-3 chromosomes, which account for 15% of all elements. At the combined chromosome and genome levels, the highest fraction of MITE elements was found within chromosome 3B (7,570 insertions, accounting for 6.5% of total MITE insertions), which is also the largest wheat chromosome (995 Mbp) [45] and was the first fully assembled chromosome of wheat [9].

Comparative analysis of MITE composition in Triticum and Aegilops genomes

All MITE families found in the hexaploid T. aestivum and the tetraploid T. turgidum ssp. dicoccoides were also found within the diploid Ae. tauschii and T. urartu genomes, except for the Murray family (Mutator) that presented only 1 and 2 copies in the hexaploid and the tetraploid genomes, respectively (Table 1). The Stowaway superfamily was the most abundant (~80% of insertions) MITE superfamily in the wheat genome. The relative fraction of Stowaway MITE insertions is 81.01% in T. urartu, 81.81% in T. turgidum ssp. dicoccoides, 87.42% in Ae. tauschii, and 83.53% in T. aestivum. The other analyzed families belong to Tourist (5 families), Mutator (8 families) or to unknown (4 families) superfamilies. The relative fraction of Tourist MITE insertions varied from 4.02% in Ae. tauschii, 4.37% in T. aestivum, and 4.38% in T. turgidum ssp. dicoccoides to 6.93% in T. urartu. Mutator insertion fractions varied from 1.32% in Ae. tauschii, 2.29% in T. aestivum, and 2.8% in T. turgidum ssp. dicoccoides to 4.49% in T. urartu. The unknown superfamilies insertion fractions varied from 7.23% in Ae. tauschii, 7.57% in T. urartu, and 9.8% in T. aestivum to 11% in T. turgidum ssp. dicoccoides (Fig 1). In all four species, the Thalos family presented the highest copy number of all MITE families examined. We observed variations in MITE insertion copy numbers between polyploid and diploid species (Table 1, S3 Fig). Almost all MITE families show patterns of variation that can be explained either by differences in genome size and composition or as the result of different activity levels [30]. Common insertion analysis (comparison of TE insertions with their flanking sequences) of four MITE families (Thalos, Athos, Pan and Belus) comparing hexaploid and tetraploid insertions showed that only ~30–47% (34–41%, 30–36%, 35–41% and 37–47% in each chromosome of a family, respectively) of each family of insertions were common to both polyploids, meaning insertions that were inherited from tetraploid to hexaploid wheat. This means that the other 53–70% of insertions are unique to either tetraploid or hexaploid wheat and might be the result of transpositions or rearrangements, such as recombination or deletion (e.g., deletion of a MITE-containing sequence in the hexaploid would result in a “unique” insertion found in the tetraploid). The Thalos family, for example, presented 42,321 insertions in the hexaploid (9,699 in the A sub-genome, 17,202 in the B sub-genome and 14,600 in the D sub-genome), 27,946 insertions in the tetraploid (9,689 in the A sub-genome, 17,522 in the B sub-genome) and 12,557 insertions in D genome donor (3:2:1 ratio; Fig 4A, S2 Table). This variation can be accounted as TE inheritance alone, althouh Thalos common insertion analysis showed ~59–66% of such insertions as being unique to either tetraploid or hexaploid wheat. In a previous study [46], we found that Class II transposons display different patterns of cytosine methylation in a synthetic allohexaploid, as opposed to a synthetic allotetraploid. The Thalos elements underwent massive hyper-methylation in the S1-S4 generations of the allohexaploid, while hypo-methylation was predominant in the S1-S5 generations of the allotetraploid. Hypo-methylation indicates a potential for rearrangement and possibly activation of Thalos elements following the allotetraploidization event. The relatively similar copy number in each sub-genome of the tetraploid and hexaploid suggests that a major part of Thalos activity in the A and B sub-genomes occurred following the tetraploidization event, as the copy number in the diploid A donor is much lower (5,249). Nevertheless, since MITEs are class II elements, transposition of elements might have occurred with no proliferation, as was shown for specific cases in a previous study from our lab [1, 47].
Fig 4

Copy number of four MITE families in Triticum and Aegilops genomes.

Copy numbers are indicated on top of each bar. a. Thalos (Stowaway), b. Athos (Stowaway), c. Pan (Stowaway), d. Oleus.

Copy number of four MITE families in Triticum and Aegilops genomes.

Copy numbers are indicated on top of each bar. a. Thalos (Stowaway), b. Athos (Stowaway), c. Pan (Stowaway), d. Oleus. A similar pattern of insertions was also seen with Athos insertions, where we found 19,359 insertions in T. aestivum, 10,637 insertions in T. turgidum ssp. dicoccoides, 5,297 in Ae. tauschii and 2,512 insertions in T. urartu (~8:4:2:1 ratio; Fig 4B). Another unique pattern was observed in terms of Pan insertions, where we found 14,296 insertions in T. aestivum, 11,108 insertions in T. turgidum ssp. dicoccoides, 3,838 in Ae. tauschii and 1230 insertions in T. urartu (~12:9:3:1 ratio; Fig 4C). Remus insertions presented a different pattern. We noted 213 insertions in T. turgidum ssp. dicoccoides, 211 insertions in T. aestivum, 7 insertions in Ae. tauschii and 4 insertions in T. urartu (~53:53:1:1 ratio; Fig 4D). In this case, there is a similar copy number in both polyploid genomes but only a very small amount in the diploid genomes. This finding, combined with the relatively low number of insertions, indicates that the reason for the observed Remus pattern is probably the different sizes of the genomes, as it seems that this family has not been too active. A high level of MITE sequences was found in the wheat genome together with huge copy number variation among MITE families and among the 4 Triticum and Aegilops genome drafts (Table 1). A total of 30,366 MITE insertions were found in the Ae. tauschii genome (D), 15,420 insertions were found in the T. urartu genome (A), 78,102 insertions were found in the T. turgidum ssp. dicoccoides genome (AB) and 115,169 insertions were found in the T. aestivum genome (ABD), (Fig 5).
Fig 5

Total copy number of MITE insertions in Triticum and Aegilops genomes.

Copy numbers are indicated on top of each bar.

Total copy number of MITE insertions in Triticum and Aegilops genomes.

Copy numbers are indicated on top of each bar.

MITEs associations with genes and TEs

Analysis of MITE sequences with transcribed regions revealed that nearly 52% of the insertions were found within or close to (100 bp range) annotated protein-coding genes (S3 Table), while ~40% of the insertions were within or in close proximity to TE sequences. Of these 40%, 33.54% are found as Class II DNA transposons, mostly in CACTA elements of the Jorge family (or 31.49% of all TEs), while 5.34% are found as Class I retrotransposons. Other MITE insertions (~4%) were located in or near non-coding RNA sequences. The remainder (5%) was located in unidentified regions, most probably corresponding to non-coding sequences. We noticed that many MITE insertions of the Belus (99.5%), Icarus (88.7%), Fortuna (67.27%), Pan (57.27%) and Thalos (42.45%) families were found within sequences of the Class II element called “Jorge”, in different regions of this element. Jorge is a large derivative of the CACTA superfamily found in wheat species (T. monococcum, Ae. tauschii, T. aestivum) and is considered to be non-autonomous and non-active, due to lack of transposases coding CACTA elements in the wheat genome [48]. In the case of the Belus family, a MITE family comprising ~173 bp-sized sequences assigned to an unknown superfamily, almost all insertions corresponded to Jorge elements, mostly at the same position (around positions 4426–4588 of the 15,800 bp sequence). This suggests that proliferation of Belus family was due to a past insertion into a Jorge element when the Jorge family was still active. In one case, Belus/Jorge became part of the coding sequence of a gene coding for a nitrate/chlorate transporter (mapped to exon 4 of an Ae. tauschii gene, acc. F775_11526, EnsemblPlants). Orthologous genes found on chromosomes 5A and 5B of the emmer genome (GrainGenes, acc. TRIDC1_5B|TRIDC5BG063580.2, Protein NRT1/ PTR FAMILY, acc. TRIDC1_5A|TRIDC5AG059350.2, Protein NRT1/ PTR FAMILY) lack the Belus/Jorge sequence, suggesting that domestication of the Belus/Jorge element led to this motif becoming a vital part of such genes. Since Jorge family is now, however, non-active, it is possible that the large number of MITE insertions found within Jorge elements are evidence of a high copy number of Jorge family members in the wheat genome. In previous work in rice, MITEs were found mostly in genic regions or within other MITE sequences [11, 49]. To validate whether MITE sequences appear in transcribed sequences in vivo, we retrieved MITE insertions from the T. aestivum RNA-seq database using MAK software. Overall, 484 MITE-containing transcripts belonging to 364 different genes were retrieved. The most abundant MITE families found in this transcriptome were Thalos, Athos and Pan. Detailed analysis showed that ~70% of the insertions were located within 3’UTRs, ~17% were in 5’ UTRs and ~13% were within the coding region (CDS). In a previous study, we reported on intron retention of Au SINE, a non-LTR retrotransposon family that, in a similar manner as MITEs, is highly associated with wheat genes [50]. Au SINE was found to cause allelic variation in wheat protein-coding genes, and in some cases, insertions of Au SINE in introns led to intron retention and generation of alternative splice variants generating proteins of shorter lengths. In silico examination of 100 MITE-containing transcripts (S4 Table) revealed 24 cases of genes presenting alternative splice variants in which some contain MITEs yet others do not. In all cases, the MITE-containing transcript of a certain gene was longer than the other transcripts, altough protein size was not necessarily larger. In some cases, the protein retained the same sequence and size as encoded by all splice variants, while in other instances, the MITE-containing variant led to the generation of a shorter or a longer protein. For example, gene accession TRIAE_CS42_1AL_TGACv1_002619_AA0043720 (uncharacterized protein, EnsemblPlants) presents two splice variants, with one transcript lacking the MITE insertion being 1184 bp-long and coding for a 253 residue-containing protein and the other transcript being 1470 bp-long, containing an Orpheus insertion in its 3’ UTR and coding for a 305 residue-containing protein. In addition to the bread wheat transcriptome, we analyzed the wild emmer wheat transcriptome and retrieved 164 MITE-containing transcripts derived from 72 genes, with 31 of these insertions being found, at least partially, in the CDS (~19%).

Inbar, a new and unique Stowaway-like MITE family in wheat

Computer-assisted analysis revealed an unfamiliar sequence in chromosome 5B, 68 bp in length, containing TIRs of 11 bp and creating “TA” target site duplication, possibly indicative of an unidentified Stowaway-like MITE (Figs 6 and 7). BLAST analysis revealed that this MITE sequence is unique to wheat, as we were not able to detect it in other plant genomes. To assess the composition of Inbar in Triticum and Aegilops species, we used its sequence as query in MAK software and retrieved similar copies from the four genome drafts. Overall, 93 insertions were found in the T. urartu genome, 24 in Ae. tauschii, 1,846 in T. turgidum ssp dicoccoides and 1,911 in T. aestivum (Fig 8). Distribution analysis of the polyploid genomes revealed that Inbar elements were predominant in the B sub-genome (81% of the insertions in T. turgidum ssp dicoccoides and 79% of the insertions in T. aestivum; Fig 9). The massive copy number variation between the diploid and polyploid species indicates that Inbar might have been transpositionally activated following the process of allotetraploidization.
Fig 6

Schematic representation of Inbar consensus sequence.

The sequences of the TIRs (blue) and the TSDs (red) are indicated.

Fig 7

Target site preference (TA) of Inbar.

Created by the WebLogo 3.0 package, based on MAK data output of target site duplications from all four Triticum and Aegilops genomes.

Fig 8

Copy number of the Inbar family in Triticum and Aegilops genomes.

Copy numbers are indicated on top of each bar.

Fig 9

Distribution of Inbar insertions within the seven homologous chromosomes.

Each chromosome is defined by its genome (AA, BB, and DD, indicated by different colors—top) and numbers (1–7). a. Distribution of MITE insertions within tetraploid T. turgidum ssp. dicoccoides (genome AABB) chromosomes. b. Distribution of MITE insertions within hexaploid T. aestivum (genome AABBDD) chromosomes.

Schematic representation of Inbar consensus sequence.

The sequences of the TIRs (blue) and the TSDs (red) are indicated.

Target site preference (TA) of Inbar.

Created by the WebLogo 3.0 package, based on MAK data output of target site duplications from all four Triticum and Aegilops genomes.

Copy number of the Inbar family in Triticum and Aegilops genomes.

Copy numbers are indicated on top of each bar.

Distribution of Inbar insertions within the seven homologous chromosomes.

Each chromosome is defined by its genome (AA, BB, and DD, indicated by different colors—top) and numbers (1–7). a. Distribution of MITE insertions within tetraploid T. turgidum ssp. dicoccoides (genome AABB) chromosomes. b. Distribution of MITE insertions within hexaploid T. aestivum (genome AABBDD) chromosomes. Annotation analysis of Inbar flanking sequences showed that ~90% of the insertions were in retrotransposon sequences, 7.42% were associated with protein-coding genes, 2.22% were inserted in Class II TEs, and the remaining were associated with non-coding RNA sequences or non-coding DNA (0.49% and 0.21%, respectively) (S5 Table). These data indicate that Inbar insertion occurred preferentially within retrotransposon sequences, explaining why it was not identified previously. Interestingly, 57.65% of Inbar insertions into retrotransposons were associated with the LTR-Copia superfamily and specifically with the sequences of the Inga and Eugene families, although such insertions were found at different locations within these elements. We validated our computer-assisted analysis by site-specific PCR analysis using primers designed against Inbar flanking sequences. For additional validation, the PCR products were also sequenced. We analyzed Inbar insertions associated with genes in three scenarios. In the first case, we considered Inbar insertion within an intron of a T. aestivum gene located on 2B chromosome (identified in silico as accession TRIAE_CS42_2BL_TGACv1_129533_A0387420, EnsemblPlants, S5 Table). This gene is also found in the genome of B diploids (Ae. searsii, Ae. speltoides, Ae. sharonesis and Ae. longisima) and polyploids (T. turgidum ssp durum, T. turgidum ssp. dicoccoides and T. aestivum), meaning that it is probably an insertion unique to the B genome, inherited by the polyploids (Fig 10A). In addition, we addressed an Inbar insertion found downstream to a T. aestivum gene of unknown function located on 6A chromosome (identified in silico as accession TRIAE_CS42_6AL_TGACv1_471379_A1507990, S5 Table). PCR analysis showed this Inbar insertion to be found in T. urartu and in the polyploid genomes (T. turgidum ssp durum, T. turgidum ssp. dicoccoides and T. aestivum; Fig 10B), indicating this insertion as being unique to the A genome. Finally, an Inbar insertion found in an intron of a gene coding for ribulose bisphosphate carboxylase small chain, identified in silico in chromosome 2A in T. urartu (acc. TRIUR3_10383, EnsemblPlants, S5 Table), T. turgidum ssp. dicoccoides (acc. TRIDC2BG008560.1, GrainGenes) and T. aestivum (TRIAE_CS42_2AS_TGACv1_113196_A0352860, EnsemblPlants), was considered. PCR analysis showed that this Inbar insertion is found in T. urartu and the polyploid genomes (T. turgidum ssp durum, T. turgidum ssp. dicoccoides and T. aestivum; Fig 10C), indicating it to be specific to genome A. PCR analysis suggests these Inbar insertions into genes are ancient, dating back to the divergence of the A, B and D diploid species, an event that transpired ~4 MYA.
Fig 10

Site-specific PCR analysis using primers raised against Inbar flanking sequences.

The arrows denote the expected PCR product, “M” denotes size markers, and “NC” denotes a negative control (when ddH2O served as PCR template). The PCR analysis was performed with DNA templates of the following accessions: BB1 = Ae. searsii, BB2 = Ae. speltoides, BB3 = Ae. sharonensis / Ae. longissima, AA = T. urartu / T. monoccocum, DD = Ae. tauschii, AABB1 = T. turgidum ssp dicoccoides, AABB2 = T. turgidum ssp durum, AABBDD1 = T. aestivum, AABBDD2 = synthetic generations of T. aestivum—S1, S2, S3, S4 (hybdriziation of Ae. tauschii and T. turgidum ssp durum). a. Inbar insertion in a gene located in the BB genome of Ae. searsii, Ae. speltoides, Ae. sharonensis, Ae. longissima, T. turgidum ssp dicoccoides, T. turgidum ssp durum and T. aestivum (gene acc. TRIAE_CS42_2BL_TGACv1_129533_AA0387420, ). PCR product size: 101 bp. b. Inbar insertion in a gene located in the AA genome of T. urartu, T. turgidum ssp dicoccoides, T. turgidum ssp durum and T. aestivum (gene acc. TRIAE_CS42_6AL_TGACv1_471379_AA1507990, ). PCR product size: 258 bp. c. Inbar MITE insertion in a gene located in the AA genome of T. urartu (gene acc. TRIUR3_10383, T. turgidum ssp dicoccoides, T. turgidum ssp durum and T. aestivum (gene acc. TRIAE_CS42_2AS_TGACv1_113196_AA0352860, ribulose bisphosphate carboxylase small chain ). PCR product size: 426 bp.

Site-specific PCR analysis using primers raised against Inbar flanking sequences.

The arrows denote the expected PCR product, “M” denotes size markers, and “NC” denotes a negative control (when ddH2O served as PCR template). The PCR analysis was performed with DNA templates of the following accessions: BB1 = Ae. searsii, BB2 = Ae. speltoides, BB3 = Ae. sharonensis / Ae. longissima, AA = T. urartu / T. monoccocum, DD = Ae. tauschii, AABB1 = T. turgidum ssp dicoccoides, AABB2 = T. turgidum ssp durum, AABBDD1 = T. aestivum, AABBDD2 = synthetic generations of T. aestivum—S1, S2, S3, S4 (hybdriziation of Ae. tauschii and T. turgidum ssp durum). a. Inbar insertion in a gene located in the BB genome of Ae. searsii, Ae. speltoides, Ae. sharonensis, Ae. longissima, T. turgidum ssp dicoccoides, T. turgidum ssp durum and T. aestivum (gene acc. TRIAE_CS42_2BL_TGACv1_129533_AA0387420, ). PCR product size: 101 bp. b. Inbar insertion in a gene located in the AA genome of T. urartu, T. turgidum ssp dicoccoides, T. turgidum ssp durum and T. aestivum (gene acc. TRIAE_CS42_6AL_TGACv1_471379_AA1507990, ). PCR product size: 258 bp. c. Inbar MITE insertion in a gene located in the AA genome of T. urartu (gene acc. TRIUR3_10383, T. turgidum ssp dicoccoides, T. turgidum ssp durum and T. aestivum (gene acc. TRIAE_CS42_2AS_TGACv1_113196_AA0352860, ribulose bisphosphate carboxylase small chain ). PCR product size: 426 bp.

Discussion

Allopolyploidization is considered stress on the plant genome, since this event is followed by massive genetic and epigenetic rearrangements, causing the new genome to act as a diploid, both cytologically, as demonstrated by pairing during meiosis, for example, and genetically, as reflected in gene expression orchestration. These rearrangements include the activation of some TEs and the deactivation of others [1, 2, 5, 51]. However, the underlying mechanism of genomic reorganization involving TEs remains poorly understood. Nonetheless, MITEs are considered as one of the most abundant and successful plant TE groups [18, 52–54]. In this study, we retrieved and analyzed 239,126 MITE insertions belonging to 36 different families from four Triticum and Aegilops species, including 3,874 members of a newly identified MITE family termed Inbar. Our efforts represent the most updated and detailed analysis of MITE composition in wheat genomes, including analysis of MITE distribution in the seven homologous chromosomes of the tetraploid and hexaploid wheat species, available to date. For comparison, in a previous work [30], we reported the analysis of ~18,000 Stowaway-like elements retrieved from the shotgun sequence draft of a 454-pyrosequence of T.aestivum [31]. In the present study, 15,420 insertions were detected in the T. urartu genome, 30,366 insertions were found in the Ae. tauschii genome, 78,102 insertions were noted in the T. turgidum ssp. dicoccoides genome, and 115,169 insertions were identified in the T. aestivum genome. For some MITE families, the hexaploid copy number is similar to the additive value of the parent copy numbers (T. turgidum ssp. dicoccoides + Ae. tauschii). However, as MITEs are class II elements, transposing using a “cut and paste” mechanism, transposition does not always result in increased copy number. Moreover, as we reported before [43, 47], MITEs activation following polyploidization is not necessarily followed by an increase in copy number. Analysis of common insertions between the A or B sub-genome of hexaploid and tetraploid wheats showed that around 30–47% of the insertions are common (meaning, they were inherited from the tetraploid to the hexaploid), while the rest are unique to either tetraploid or hexaploid wheat. These unique insertions might be the result of a species-specific activity in the tetraploid, transposition of MITEs following speciation of hexaploid wheat (hexaploidization) or due to different genomic rearrangements, such as deletion of MITE-containing sequences. Stowaway is the most abundant MITE superfamily in the wheat genome, representing ~80% of insertions, much as was previously reported for other plant species [18, 55]. The largest number of MITEs is found in T. aestivum (ABD) with 113,258 known MITE insertions and 1,911 Inbar insertions, yielding a total of 115,169 MITE elements. This value is more than the number of MITEs counting during sequencing of the T. aestivum genome (102,275 insertions [7]). All MITE families found in the polyploid genomes are also found in the diploid genomes, except for the Murray family that presents a single copy in wild emmer wheat and two copies in bread wheat, yet none in the diploid species (Table 1). The Thalos, Athos and Pan families (Stowaway superfamily) have the highest copy numbers in all four species, together representing around 60–70% of all MITEs in each genome. As such, it is likely that these were the t the most active MITEs during wheat evolution. Indeed, our findings confirm a previous report of Thalos being the most abundant MITE family in wheat [9]. MITE insertions are distributed across the seven homologous chromosomes of the tetraploid and hexaploid species, with the B sub-genome of wild emmer wheat containing 58.4% of MITE insertions and the B sub-genome of T. aestivum containing 40.4% of the MITE insertions (Fig 3), reflected the previously described highly repetitive nature of the B sub-genome of polyploid wheat [2]. While the B sub-genome, comprising 6,274 Mbp, is the largest sub-genome of T. aestivum and thus correlating with a larger number of MITEs, the 4,937 Mbp-long D sub-genome is the smallest but contains more MITE elements than does tje A sub-genome (5,727 Mbp [9]). This indicates that different proliferation levels of MITEs exist in the different sub-genomes. We noticed that Stowaway MITEs show a preference for insertion into dinucleotide TA target sites, while Tourist MITEs prefer non-specific target sites of 2–3 nucleotides and Mutator MITEs shows no preference in 9–10 nucleotide-long target sites where these are found (see S1 Text), in agreement with previous reports [10, 18, 55–57]. In addition, we showed that most MITE insertions preferably insert into genic or repetitive regions, with ~50% of insertions being found within or in close proximity to protein-coding genes, and 40% of insertions being found within both class I and II TEs. The strong association of MITEs with plant genes was reported in several studies [10, 21–24], although the presence of MITEs within other TEs was not reported previously. We found many insertions of different MITE families in class II elements, mostly in the currently non-autonomous Jorge family, a large derivative of the CACTA superfamily. In addition, we characterized a new MITE family named Inbar that is found, in most cases, within the retrotransposon sequences of members of the Inga and Eugene families (Copia superfamily). One possibility for the insertion of Inbar into retrotransposons is that during genomic rearrangements following allopolyploidization, transposable elements that are being transcribed become target sites for MITEs that usually insert into genes by an unknown mechanism. It is possible this is an alternative mechanism for MITE proliferation. Upon inserting into an active TE, the MITE element would copy itself, together with the active element (in the case of class I elements) or move with the active element (in the case of class II elements) to a different location in the genome. Alternatively, this could be a host defence mechanism whereby MITEs insert into the coding regions of active TEs, thus causing mutations and leading to de-activation of once-active TEs. To further examine the association of MITEs with genes, we retrieved MITE elements from the T. aestivum transcriptome and found ~480 MITE-containing transcripts. Most insertions were located to the 3’ UTR of genes and only a few were found in coding sequences. In addition, in almost all cases of a gene with alternative splice variants, the MITE-containing transcript was longer than the other transcripts, even though the protein was not necessarily longer. This shows that MITEs are found not only within introns but also in the non-coding regions of genes, thus possibly playing a role in their regulation, as was reported [26-29]. In one case, insertion of a Tourist MITE insertion into the 3’UTR of a bread wheat heat shock protein gene (TaHSP16.9-3A) led to increased levels of the gene transcript [58]. In another case, MITE insertion into a regulatory element of ZmRap2.7, a flowering repressor, was found to affect the expression of this gene by affecting the insertion methylation status [59]. MITE insertions can also lead to allelic variation of plant genes, as we recently showed in emmer wheat [60]. It was previously reported that more than ~3,500 MITEs are transcribed with rice genes [53]. If so, our numbers may be underestimated, especially given how there are possibly other, as yet unrecognized MITE families. In summary, our high-resolution analysis of MITEs in diploid and polyploid genome drafts sheds light on the proliferation of MITEs during genomic rearrangements, as well as insertion mechanisms, and on the role of TEs in shaping the wheat genome by creating allelic diversity. The high dynamics of TEs in polyploidy species might facilitate the rapid adaptation of newly emerged allopolyploid species.

Analysis of sequence conservation and target site preference.

(DOCX) Click here for additional data file.

Plant accessions used in site-specific PCR analyses.

(DOCX) Click here for additional data file.

Copy Number of MITEs insertions in Triticum and Aegilops species by sub-genome.

(DOCX) Click here for additional data file.

Characterization of MITE flanking sequences in wild emmer and bread wheats.

(XLSX) Click here for additional data file.

Characterization of MITE-containing transcripts in the T. aestivum transcriptome.

(XLSX) Click here for additional data file.

Characterization of Inbar flanking sequences.

(XLSX) Click here for additional data file.

Primer sequences for Inbar insertions in wheat genes.

(DOCX) Click here for additional data file.

Distribution of MITE families in the seven homologous chromosomes of T. turgidum ssp dicoccoides (AB).

(PDF) Click here for additional data file.

Distribution of MITE families in the seven homologous chromosomes of T. aestivum (ABD).

(PDF) Click here for additional data file.

Copy number of all MITE families in Triticum and Aegilops genomes.

(PDF) Click here for additional data file.
  56 in total

Review 1.  Using rice to understand the origin and amplification of miniature inverted repeat transposable elements (MITEs).

Authors:  Ning Jiang; Cédric Feschotte; Xiaoyu Zhang; Susan R Wessler
Journal:  Curr Opin Plant Biol       Date:  2004-04       Impact factor: 7.834

2.  Updating of transposable element annotations from large wheat genomic sequences reveals diverse activities and gene associations.

Authors:  François Sabot; Romain Guyot; Thomas Wicker; Nathalie Chantret; Bastien Laubin; Boulos Chalhoub; Philippe Leroy; Pierre Sourdille; Michel Bernard
Journal:  Mol Genet Genomics       Date:  2005-10-11       Impact factor: 3.291

Review 3.  A unified classification system for eukaryotic transposable elements.

Authors:  Thomas Wicker; François Sabot; Aurélie Hua-Van; Jeffrey L Bennetzen; Pierre Capy; Boulos Chalhoub; Andrew Flavell; Philippe Leroy; Michele Morgante; Olivier Panaud; Etienne Paux; Phillip SanMiguel; Alan H Schulman
Journal:  Nat Rev Genet       Date:  2007-12       Impact factor: 53.242

4.  Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces.

Authors:  Frédéric Choulet; Thomas Wicker; Camille Rustenholz; Etienne Paux; Jérome Salse; Philippe Leroy; Stéphane Schlub; Marie-Christine Le Paslier; Ghislaine Magdelenat; Catherine Gonthier; Arnaud Couloux; Hikmet Budak; James Breen; Michael Pumphrey; Sixin Liu; Xiuying Kong; Jizeng Jia; Marta Gut; Dominique Brunel; James A Anderson; Bikram S Gill; Rudi Appels; Beat Keller; Catherine Feuillet
Journal:  Plant Cell       Date:  2010-06-25       Impact factor: 11.277

5.  Draft genome of the wheat A-genome progenitor Triticum urartu.

Authors:  Hong-Qing Ling; Shancen Zhao; Dongcheng Liu; Junyi Wang; Hua Sun; Chi Zhang; Huajie Fan; Dong Li; Lingli Dong; Yong Tao; Chuan Gao; Huilan Wu; Yiwen Li; Yan Cui; Xiaosen Guo; Shusong Zheng; Biao Wang; Kang Yu; Qinsi Liang; Wenlong Yang; Xueyuan Lou; Jie Chen; Mingji Feng; Jianbo Jian; Xiaofei Zhang; Guangbin Luo; Ying Jiang; Junjie Liu; Zhaobao Wang; Yuhui Sha; Bairu Zhang; Huajun Wu; Dingzhong Tang; Qianhua Shen; Pengya Xue; Shenhao Zou; Xiujie Wang; Xin Liu; Famin Wang; Yanping Yang; Xueli An; Zhenying Dong; Kunpu Zhang; Xiangqi Zhang; Ming-Cheng Luo; Jan Dvorak; Yiping Tong; Jian Wang; Huanming Yang; Zhensheng Li; Daowen Wang; Aimin Zhang; Jun Wang
Journal:  Nature       Date:  2013-03-24       Impact factor: 49.962

6.  CACTA transposons in Triticeae. A diverse family of high-copy repetitive elements.

Authors:  Thomas Wicker; Romain Guyot; Nabila Yahiaoui; Beat Keller
Journal:  Plant Physiol       Date:  2003-05       Impact factor: 8.340

7.  Mobilization of a transposon in the rice genome.

Authors:  Tetsuya Nakazaki; Yutaka Okumoto; Akira Horibata; Satoshi Yamahira; Masayoshi Teraishi; Hidetaka Nishida; Hiromo Inoue; Takatoshi Tanisaka
Journal:  Nature       Date:  2003-01-09       Impact factor: 49.962

8.  The plant MITE mPing is mobilized in anther culture.

Authors:  Kazuhiro Kikuchi; Kazuki Terauchi; Masamitsu Wada; Hiro-Yuki Hirano
Journal:  Nature       Date:  2003-01-09       Impact factor: 49.962

9.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

10.  An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations.

Authors:  Bernardo J Clavijo; Luca Venturini; Christian Schudoma; Gonzalo Garcia Accinelli; Gemy Kaithakottil; Jonathan Wright; Philippa Borrill; George Kettleborough; Darren Heavens; Helen Chapman; James Lipscombe; Tom Barker; Fu-Hao Lu; Neil McKenzie; Dina Raats; Ricardo H Ramirez-Gonzalez; Aurore Coince; Ned Peel; Lawrence Percival-Alwyn; Owen Duncan; Josua Trösch; Guotai Yu; Dan M Bolser; Guy Namaati; Arnaud Kerhornou; Manuel Spannagl; Heidrun Gundlach; Georg Haberer; Robert P Davey; Christine Fosker; Federica Di Palma; Andrew L Phillips; A Harvey Millar; Paul J Kersey; Cristobal Uauy; Ksenia V Krasileva; David Swarbreck; Michael W Bevan; Matthew D Clark
Journal:  Genome Res       Date:  2017-05       Impact factor: 9.043

View more
  7 in total

Review 1.  Miniature inverted-repeat transposable elements (MITEs), derived insertional polymorphism as a tool of marker systems for molecular plant breeding.

Authors:  B Nandini
Journal:  Mol Biol Rep       Date:  2020-03-11       Impact factor: 2.316

2.  Copy Number Variation of Transposable Elements in Thinopyrum intermedium and Its Diploid Relative Species.

Authors:  Mikhail G Divashuk; Gennady I Karlov; Pavel Yu Kroupin
Journal:  Plants (Basel)       Date:  2019-12-21

3.  TaMOR is essential for root initiation and improvement of root system architecture in wheat.

Authors:  Chaonan Li; Jingyi Wang; Long Li; Jialu Li; Mengjia Zhuang; Bo Li; Qiaoru Li; Junfang Huang; Yan Du; Jinping Wang; Zipei Fan; Xinguo Mao; Ruilian Jing
Journal:  Plant Biotechnol J       Date:  2021-12-24       Impact factor: 13.263

4.  Evolution and origin of bread wheat.

Authors:  Avraham A Levy; Moshe Feldman
Journal:  Plant Cell       Date:  2022-07-04       Impact factor: 12.085

5.  Exploiting the miniature inverted-repeat transposable elements insertion polymorphisms as an efficient DNA marker system for genome analysis and evolutionary studies in wheat and related species.

Authors:  Benjamin Ewa Ubi; Yasir Serag Alnor Gorafi; Beery Yaakov; Yuki Monden; Khalil Kashkush; Hisashi Tsujimoto
Journal:  Front Plant Sci       Date:  2022-09-02       Impact factor: 6.627

6.  Stowaway miniature inverted repeat transposable elements are important agents driving recent genomic diversity in wild and cultivated carrot.

Authors:  Alicja Macko-Podgórni; Katarzyna Stelmach; Kornelia Kwolek; Dariusz Grzebelus
Journal:  Mob DNA       Date:  2019-11-27

7.  Differential expression in leaves of Saccharum genotypes contrasting in biomass production provides evidence of genes involved in carbon partitioning.

Authors:  Fernando Henrique Correr; Guilherme Kenichi Hosaka; Fernanda Zatti Barreto; Isabella Barros Valadão; Thiago Willian Almeida Balsalobre; Agnelo Furtado; Robert James Henry; Monalisa Sampaio Carneiro; Gabriel Rodrigues Alves Margarido
Journal:  BMC Genomics       Date:  2020-09-29       Impact factor: 3.969

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.