Literature DB >> 27233667

Long Terminal Repeat Retrotransposon Content in Eight Diploid Sunflower Species Inferred from Next-Generation Sequence Data.

Hannah M Tetreault1, Mark C Ungerer2.   

Abstract

The most abundant transposable elements (TEs) in plant genomes are Class I long terminal repeat (LTR) retrotransposons represented by superfamilies gypsy and copia Amplification of these superfamilies directly impacts genome structure and contributes to differential patterns of genome size evolution among plant lineages. Utilizing short-read Illumina data and sequence information from a panel of Helianthus annuus (sunflower) full-length gypsy and copia elements, we explore the contribution of these sequences to genome size variation among eight diploid Helianthus species and an outgroup taxon, Phoebanthus tenuifolius We also explore transcriptional dynamics of these elements in both leaf and bud tissue via RT-PCR. We demonstrate that most LTR retrotransposon sublineages (i.e., families) display patterns of similar genomic abundance across species. A small number of LTR retrotransposon sublineages exhibit lineage-specific amplification, particularly in the genomes of species with larger estimated nuclear DNA content. RT-PCR assays reveal that some LTR retrotransposon sublineages are transcriptionally active across all species and tissue types, whereas others display species-specific and tissue-specific expression. The species with the largest estimated genome size, H. agrestis, has experienced amplification of LTR retrotransposon sublineages, some of which have proliferated independently in other lineages in the Helianthus phylogeny.
Copyright © 2016 Tetreault and Ungerer.

Entities:  

Keywords:  Helianthus; Helianthus agrestis; LTR retrotransposons; genome size evolution; transposable elements

Mesh:

Substances:

Year:  2016        PMID: 27233667      PMCID: PMC4978885          DOI: 10.1534/g3.116.029082

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


Transposable elements (TEs) are DNA sequences capable of mobilizing within a host genome. Mobilization typically occurs either by physical excision-reinsertion events or through a process of replicative transposition whereby individual elements transcriptionally give rise to new copies that are reverse transcribed prior to insertion at new locations in the genome (Kumar and Bennetzen 1999; Feschotte ). TEs that mobilize via replicative transposition (Class I TEs) are a major genomic component of many plant species because their mobilization involves sequence propagation and large-scale copy number increases. Because these events occur independently and at different rates and scales in the genomes of different plant lineages, even closely related species may diverge rapidly in TE content and genome size (Hawkins ; Tenaillon ). The most abundant Class I TEs in plants are long terminal repeat (LTR) retrotransposons and are subdivided into superfamilies gypsy and copia (Kumar and Bennetzen 1999; Wicker ). Differential proliferation and abundance of families (or sublineages) within these superfamilies are known to directly impact genome size evolution (Vitte and Panaud 2005; Hawkins ; Piegu ; Charles ; Tenaillon ; Piednoel ; Kelly ). Characterizing particular sublineages within superfamilies that undergo proliferation and determining patterns of proliferation events among related species can be a difficult task given sequence variation among sublineages and the difficulty of accurately estimating copy number abundance of elements within sublineages across taxa. Advances in next-generation sequencing (NGS) approaches have greatly facilitated efforts to generate and characterize whole-genome-level sequence data (Lam ) for model and nonmodel organisms alike (Kelly and Leitch 2011). Major impediments of de novo genome assembly of NGS data exist, however, on account of short-read lengths generated by many NGS platforms and the difficulty of assembling reads derived from genomes with a high repetitive fraction (e.g., with a large TE component). NGS data nonetheless have proved extremely informative for characterization of the genomic TE content both within species and across related taxa (Tenaillon ; Bonchev and Parisod 2013), and several analysis methods have been developed for such characterizations, even under scenarios of low sequence coverage (Macas ; Kurtz ; Novak ). Wild sunflowers in the genus Helianthus provide an opportune system for studies of TE proliferation dynamics and associated genome evolution (Giordani ). Helianthus includes ∼49 species native to North America that are collectively widespread throughout the United States, southern Canada, and northern Mexico (Heiser ) and phylogenetic relationships are well resolved (Rieseberg 1991; Schilling 1997; Schilling ; Timme ; Stephens ). Genomic resources and tools are available for several Helianthus species (Kane ) and a genome sequencing effort is underway for the cultivated sunflower H. annuus (Kane ). Multiple ploidy levels are found within the genus (Kane ) with genome size varying considerably even among species of the same ploidy (Sims and Price 1985). Genome structure and organization have been best characterized for the diploid species H. annuus. The genome of this species is highly repetitive, with LTR retrotransposons and their derivatives comprising >70% of nuclear DNA (Staton ; Gill ). Recent insertional activity of these sequences has been documented in H. annuus (Buti ; Staton ) as have patterns of tissue-specific expression (Gill ). Recent and even larger-scale proliferation of LTR retrotransposons has been documented for three diploid annual Helianthus species derived via ancient hybridization events (Ungerer ; Staton ; Kawakami ), with sublineages that proliferated in these species remaining active transcriptionally and expressed at higher levels when compared to the parental species from which the hybrid taxa are derived (Kawakami ; Ungerer and Kawakami 2013). In the current study we explore the contribution of LTR retrotransposons to genome size variation among eight diploid Helianthus species representing all four taxonomic sections based on current classification (Schilling and Heiser 1981) and an outgroup species, Phoebanthus tenuifolius. These eight Helianthus species represent much of the existing variation in diploid genome size, ranging nearly fourfold in estimated nuclear DNA content (Sims and Price 1985). We combine short-read NGS data with sequence information from a panel of H. annuus (common sunflower) full-length LTR retrotransposons in a de novo graph-based clustering approach that enables meaningful comparisons of LTR retrotransposon sublineage identity and abundances across species. We demonstrate that nuclear genome size is significantly correlated with repetitive DNA content in these species and that the species under investigation generally exhibit similar abundances of different LTR retrotransposon sublineages, suggestive of shared ancestry. We also note signatures of amplification for a small number of LTR retrotransposon sublineages in species with the largest genomes, thus identifying a contributing mechanism of genome size expansion in these species. Lastly, we highlight how graph-based clustering approaches are preferable to read-mapping-based approaches in interspecific comparative analyses of TE abundance.

Materials and Methods

Plant materials and DNA sequencing

Seeds of species utilized in this study were obtained from the United States Department of Agriculture (USDA) National Plant Germplasm System (http://www.ars-grin.gov/npgs/) or collected from natural populations (Table 1). Seeds were germinated in the dark on moist filter paper in Petri dishes and 2- to 3-d-old seedlings transferred to 8-inch pots with a 2:1 mixture of Metro-mix 350: all-purpose sand. All plants were grown under a 16 hr:8 hr, light:dark cycle in the Kansas State University glasshouse facility. Watering was conducted daily or as needed and fertilization with a weak nutrient solution (N:P:K = 15:30:15) was applied weekly.
Table 1

Study species, genome size estimates, and associated genomic data

SpeciesAbbreviationLife CycleAccessionPaired-End Readsa2C (pg) (SE)Genome CoverageRepetitive Fraction (%) (SE)
H. praecoxPRAAnnualPI 43584710,314,1266.94 (0.10)0.5968.17 (0.18)
H. annuusANNAnnualPI 46860712,060,7437.36 (0.12)0.6768.97 (0.21)
H. cusickiiCUSPerennialPI 64995911,981,5779.32 (0.24)0.5174.58 (0.18)
H. divaricatusDIVPerennialPI 5032126,752,8409.41 (0.08)0.2969.55 (0.29)
H. anomalusANOAnnualPI 46864212,228,84911.82 (0.37)0.4175.26 (0.19)
H. heterophyllusHETPerennialPI 66473211,753,27811.82 (0.29)0.4071.42 (0.20)
H. angustifoliusANGPerennialANG-MCUb6,837,15112.91 (0.32)0.2173.38 (0.33)
H. agrestisAGRAnnualPI 46841616,909,58924.23 (0.84)0.2882.12 (0.15)
P. tenuifoliusPHOPerennialPHO-LAc10,971,46513.94 (0.71)0.3174.08 (0.16)

Post processing.

Collected in Anson County, NC (M. C. Ungerer).

Collected in Apalachicola National Forest (Loran Anderson).

Post processing. Collected in Anson County, NC (M. C. Ungerer). Collected in Apalachicola National Forest (Loran Anderson). Young, fully-expanded leaves from one individual per species were collected for DNA extraction and subsequent whole-genome shotgun (WGS) sequencing. All harvested tissue was flash-frozen in liquid nitrogen and stored at −80° until needed. Genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA). Three micrograms of total DNA per species were utilized for library preparation and WGS sequencing on an Illumina HiSeq2000 platform, generating 2 × 100 bp paired-end reads. Library preparation was performed following the Tru-Seq standard protocol (Illumina Inc., San Diego, CA) with a library insert size of 350 bp. Libraries were multiplex sequenced on a single lane. Library construction and sequencing were performed at the University of Missouri DNA Core Facility, Columbia, MO (http://biotech.missouri.edu/dnacore/). Sequence data were trimmed and filtered using Trimmomatic V0.30 (Bolger ) according to the following criteria: (1) adapters and barcodes removed, (2) reads <80 bases removed, (3) bases trimmed from read ends if quality <30, and (4) read ends trimmed while mean quality <25 in a 4 bp sliding window. Chloroplast reads were removed by mapping the filtered dataset to the H. annuus chloroplast genome (NC_007977.1) using BWA v0.7.6 (Li and Durbin 2009) with default parameters. Genomic coverage for each species was estimated using the equation Coverage = LN/G (Lander and Waterman 1988), where L is average read length, N is number of reads per species and G is genome length. Genome length for each species was calculated utilizing the haploid 1C value, derived from 2C data estimated by flow cytometry (Table 1), and the equation 1 pg = 978 Mb (Dolezel ).

Genome size determination

Nuclear DNA content (2C genome size) was estimated using a Guava PCA-96 microcapillary flow cytometry system (Guava Technologies, Hayward, CA). Five biological replicates were evaluated per species with a minimum of 5000 events per sample. Sample preparation for flow cytometry followed that of Kawakami . An external standard (H. petiolaris) was used along with the internal standard chicken erythrocyte nuclei (CEN; BioSure). Data were analyzed using CytoSoft V 2.5.4 (Guava Technologies, Haywood, CA).

Estimation of genomic repetitive fraction based on short-read sequence data

The genomic repetitive fraction of each species was determined separately using a graph-based clustering approach developed by Novak and implemented in RepeatExplorer (Novak ) on the Galaxy Server (http://www.repeatexplorer.org/). Briefly, ∼3 M single end (R1) 100 bp reads were randomly sampled from each species (Table 1, Supplemental Material, Table S1) and clustered based on an all-by-all comparison of sequence similarity (≥90%) and overlap (≥55%). Individual clusters were identified and counted toward the genomic repetitive fraction if they contained ≥0.01% of the starting number of sampled sequences (e.g., for 3 M sequences, minimum cluster size = 300 sequences). These parameter values represent default settings of RepeatExplorer. Datasets where fewer than 3M reads were sampled (Table S1) were automatically reduced by RepeatExplorer based on an initial analysis of a randomly sampled subset of reads and assessment of genome repeat structure as described in the RepeatExplorer manual. To assess potential variation in repetitive fraction estimates for a given dataset, five separate graph-based clustering analysis runs (each analysis run ≈3 M randomly sampled reads) were conducted, with means ± SE presented in the Results. To assess the strength of association between genome size and repetitive fraction, Pearson product–moment correlation coefficients and phylogenetically adjusted correlation coefficients were performed in Program R (v3.2.2, R Foundation for Statistical Computing, Vienna, Austria). The phylogenetically adjusted correlations were performed using phylogenetic independent contrasts with the ‘APE’ package in R (Paradis ), based on evolutionary relationships presented in Stephens . The phylogeny was truncated using the drop.tip function in APE to consist only of the species under investigation, with the exception of H. anomalus, which is of hybrid origin (Rieseberg 2006) and thus not included in the phylogenetically adjusted analysis.

Clustering with full-length LTR retrotransposons from the H. annuus genome

To aid interpretation of repetitive sequence cluster identity and size across species as they pertain to LTR retrotransposons, graph-based clustering analysis runs were performed with a diverse reference panel of full-length gypsy and copia LTR retrotransposons derived from the H. annuus genome (Buti ; Staton ) (File S1). Individual elements were extracted and characterized from published BAC sequences utilizing the LTRharvest (Ellinghaus ) LTRdigest (Steinbiss ) pipeline in genometools V 1.4.2. Of 110 full-length elements identified by these methods, 52 (40 gypsy + 12 copia) were identified as possessing an intact reverse transcriptase (RT) domain and thus retained for phylogenetic analysis based on their RT amino acid sequences (File S2). The majority of these full-length elements represent relatively ‘young’ copies, with insertion estimates within the last 2 million yr (Buti ; Staton ). Multiple sequence alignment was performed with ClustalW and phylogenetic analysis was conducted using neighbor-joining (NJ) and maximum parsimony (MP) methods in Molecular Evolutionary Genetics Analysis 4.0.2 (Tamura ). The reliability of tree topologies was estimated with bootstrap replication (1000 pseudoreplicates). Full-length elements subjected to phylogenetic analysis (n = 52) were subsequently converted to 100 bp kmers of sliding 85 bp overlap using a custom perl script to standardize sequence length with the Illumina-generated short-read dataset. By this method, 281 to 1073 kmers were generated per full-length element (35,488 kmers total). The ∼3 million Illumina reads per species were analyzed jointly with this collection of 100 bp kmers, which served as genomic ‘tracers’ enabling meaningful comparisons of the LTR retrotransposon content and abundance of different species’ genomes. The decision to use 85 bp overlap for adjacent 100 bp kmers for each full-length element was based on the fact that the resulting similarity (100% shared bases across overlap of 85%) exceeded considerably the RepeatExplorer parameters for sequence clustering (i.e., ≥90% shared bases across overlap of ≥55%).

RT-PCR assays

LTR retrotransposon transcriptional activity was evaluated via RT-PCR in both vegetative (leaf) and reproductive (bud) tissues from a single individual per species. For each sampled plant, leaf tissue representing the eight-leaf stage and the first bud were harvested and immediately flash-frozen in liquid nitrogen. Total RNA was extracted using TRIzol (Invitrogen, Carlsbad, CA) and purified with an RNeasy Mini Kit (Qiagen, Valencia, CA). RNA was treated with RNase-Free DNase (Qiagen, Valencia, CA) to eliminate DNA contamination. Two sublineages of gypsy and a single sublineage of copia were assayed for transcriptional activity in both tissue types for all species utilizing sublineage-specific primers targeting the Integrase and RNASEH domains of gypsy and copia elements, respectively (Kawakami ; Ungerer and Kawakami 2013). RT-PCR assays were conducted using the ImProm-II Reverse Transcriptase system (Promega, Madison, WI; Table S2). RT-PCR reactions of the actin gene were used as positive controls for all samples. Negative control reactions were performed by withholding the reverse transcriptase enzyme. RT-PCR amplifications were conducted with an initial denaturing step of 94° for 2 min, followed by 5 cycles of 94° for 15 sec, 55° (+1.0°/cycle) for 15 sec, and 72° for 15 sec, followed by 30 cycles of 94° for 15 sec, 60° for 15 sec, and 72° for 15 sec, with a final incubation step of 72° for 5 min. Amplification products were size-separated via electrophoresis in 2% agarose gels and stained with ethidium bromide for visualization.

Data availability

Raw sequence reads have been submitted to the NCBI Short Read Archive [SRP074507].

Results

Genome size and repetitive sequence content

Genome size estimates based on flow cytometry (Table 1) are largely consistent with earlier reports for overlapping Helianthus species (n = 7) obtained by Feulgen-staining (Sims and Price 1985), with the exception of H. divaricatus, which was estimated at 2C = 9.41 pg (±0.08) in the current study vs. 16.9 pg reported previously (Sims and Price 1985). Intraspecific ploidy variation in H. divaricatus may underlie this observation (E. Baack, personal communication), though it is generally thought to be rare in Helianthus (Kane ). Greater variability in 2C values exists among the sampled Helianthus annual species (range = 6.94–24.23 pg) vs. perennial species (range = 9.32–12.91 pg; Table 1). With the exception of H. agrestis, all Helianthus species display 2C values lower than observed for closely related P. tenuifolius (2C = 13.94pg ± 0.71), a diploid species and outgroup taxon for Helianthus (Schilling ; Timme ; Stephens ). The Illumina Hi-Seq platform generated ∼6.8–16.9 M paired-end reads (2 × 100 bp), post processing, for each of the eight Helianthus species and P. tenuifolius, yielding genome coverage estimates ranging from 0.21× to 0.67× (Table 1). Based on subsampling of ∼2.4–3 M single end reads per species, graph-based clustering yielded genomic repetitive fraction estimates between 68.17% and 82.12% (Table 1 and Figure 1) and these estimates are strongly correlated with estimates of genome size (phylogenetic independent contrast analysis: r = 0.9041, P = 0.0052; Figure 1).
Figure 1

Phylogenetic relationships (A) and correlation between genome size and genomic repetitive fraction (B) for species under investigation. Phylogenetic tree is based on relationships presented in Stephens and does not include H. anomalus, which is of hybrid origin (Rieseberg 2006). Genome size and genome repetitive fraction are significantly correlated: phylogenetic independent contrast analysis: r = 0.9041, P = 0.0052; unmodified analysis: r = 0.9121, P = 0.0006. Species abbreviations in (B) are as in Table 1. Red, annual; blue, perennial; teal, perennial outgroup. Values (± SE) are provided in Table 1.

Phylogenetic relationships (A) and correlation between genome size and genomic repetitive fraction (B) for species under investigation. Phylogenetic tree is based on relationships presented in Stephens and does not include H. anomalus, which is of hybrid origin (Rieseberg 2006). Genome size and genome repetitive fraction are significantly correlated: phylogenetic independent contrast analysis: r = 0.9041, P = 0.0052; unmodified analysis: r = 0.9121, P = 0.0006. Species abbreviations in (B) are as in Table 1. Red, annual; blue, perennial; teal, perennial outgroup. Values (± SE) are provided in Table 1.

LTR retrotransposon contribution to genomic repetitive fraction

To evaluate the contribution of LTR retrotransposons to the repetitive fraction of these genomes, the short-read sequence data for each species were analyzed jointly with a library of 100 bp overlapping kmers derived from 40 full-length gypsy and 12 full-length copia elements identified previously from the H. annuus genome (see Materials and Methods). Phylogenetic analyses based on the reverse transcriptase (RT) domains of these 40 + 12 full-length elements indicate multiple well-supported gypsy and copia sublineages (Figure 2, A and B, respectively). Comparisons of these full-length element derived RT amino acid sequences across sublineages for both superfamilies revealed high sequence variability, with average genetic distances ranging from 0.108 to 0.667, and from 0.318 to 0.644 in pairwise comparisons of amino acid sequences from different sublineages within gypsy and copia, respectively (Table S3). These elements are highly diverse, and represent a majority of the gypsy and copia diversity reported previously in sunflower based on sequence survey approaches (Ungerer ; Kawakami ) and analyses of multiple sequenced H. annuus BACs (Buti ; Staton ). Nomenclature for sublineage designations follows that reported in Ungerer and Kawakami . Identified sublineages based on phylogenetic analyses presented herein also are largely congruent with family classification described in Staton (Figure 2).
Figure 2

Neighbor-joining trees depicting sublineages of gypsy (A) and copia (B) elements based on 129 and 239 amino acid residues of the reverse transcriptase (RT) domain, respectively. Numbers along branches indicate bootstrap support for Maximum Parsimony/Neighbor-joining analyses. Branch colors depict different LTR retrotransposon sublineages and correspond to designations used in Ungerer and Kawakami . Symbols at branch tips correspond to sunflower LTR retrotransposon families identified as highly abundant in Helianthus in Staton and Burke (2015).

Neighbor-joining trees depicting sublineages of gypsy (A) and copia (B) elements based on 129 and 239 amino acid residues of the reverse transcriptase (RT) domain, respectively. Numbers along branches indicate bootstrap support for Maximum Parsimony/Neighbor-joining analyses. Branch colors depict different LTR retrotransposon sublineages and correspond to designations used in Ungerer and Kawakami . Symbols at branch tips correspond to sunflower LTR retrotransposon families identified as highly abundant in Helianthus in Staton and Burke (2015). Clustering with these panels of modified full-length LTR retrotransposons allowed, for each species under investigation, assignment of short-read Illumina sequences to distinct gypsy and copia superfamilies and sublineages within these superfamilies (Figure 3, A and B). Across species, sequences derived from gypsy elements were 3.8- to 5.3-fold more abundant than sequences derived from copia elements and together sequences derived from these two superfamilies combine for between 38.3% and 49.2% of all sequences for the species assayed (Table S1). Sequences from specific gypsy sublineages consistently are more abundant within species’ genomes than others (e.g., sublineages A, B, C, X1, and X2 vs. sublineages E’, Y1, Y2, Z1, and Z2; Figure 3A); these more abundant sublineages form a monophyletic group in phylogenetic analysis of gypsy sequences (Figure 2A).
Figure 3

Genomic abundance of different sublineages of gypsy (A) and copia (B) elements. Shown are means (± SE) based on five graph-based clustering analysis runs for each dataset. Error bars for some histograms are too small to be seen at the resolution of this figure.

Genomic abundance of different sublineages of gypsy (A) and copia (B) elements. Shown are means (± SE) based on five graph-based clustering analysis runs for each dataset. Error bars for some histograms are too small to be seen at the resolution of this figure. For certain sublineages, elevated read densities were observed for some species, suggesting species-specific amplification events. For example, H. agrestis, the species with the largest estimated genome size and highest genomic repetitive fraction, displayed elevated read densities for two gypsy sublineages (A and C), indicating that proliferation of these sublineages may underlie genome expansion in this species. Similar elevated density of reads was observed for sublineage A in H. anomalus and sublineage X1 in P. tenuifolius. Analogous patterns were observed for sublineages of copia elements with respect to relative abundance, with sublineages 1, 3, and 7 contributing disproportionately more, and sublineages 2, 4, 5, and 6 disproportionately less, to the genome repetitive fraction across species. Unlike observations for gypsy sublineages, however, the more abundant copia lineages are not monophyletic but rather consist of three separate, well-supported lineages in the copia phylogeny (Figure 2B). Elevated density was observed in copia sublineage 1 for H. agrestis, again suggestive of a role of this sublineage in genome expansion. Elevated density, though to a lesser degree, also was observed in copia sublineage 3 for H. angustifolius.

Transcriptional activity of LTR retrotransposons in leaf and bud tissue

Transcriptional activity of gypsy sublineages A and C and copia sublineage 1 (see Figure 2, A and B) was assayed via RT-PCR in both leaf and bud tissues for all species under investigation. Detection of transcriptional activity was variable across species and tissue types for gypsy sublineage A (Figure 4A), with transcripts clearly detected in both leaf and bud tissue for all annual species but less detectable in perennial species; and with more detectable expression signal in bud vs. leaf tissue for perennials. In contrast, transcriptional activity of copia sublineage 1 was equally detectable across all species and in both tissue types (Figure 4B). Transcriptional activity of gypsy sublineage C was not detected in any tissue type in any species (data not shown). Positive control reactions targeting actin expression yielded no detectable expression differences across tissue types or species (Figure S1).
Figure 4

RT-PCR assays of gypsy sublineage A (A) and copia sublineage 1 (B) in leaf (L) and bud (B) tissue. Minus signs in parentheses indicate lanes with negative control reactions. Species abbreviations are as in Table 1. Red, annual; blue, perennial; teal, perennial outgroup.

RT-PCR assays of gypsy sublineage A (A) and copia sublineage 1 (B) in leaf (L) and bud (B) tissue. Minus signs in parentheses indicate lanes with negative control reactions. Species abbreviations are as in Table 1. Red, annual; blue, perennial; teal, perennial outgroup.

Discussion

Nuclear genome size across angiosperms varies dramatically, stretching nearly 2,400-fold between the smallest and largest documented flowering plant genomes (Leitch and Leitch 2013). Differential abundance and proliferation of TEs is now recognized as a significant contributor to genome size variation in plants, with LTR retrotransposons recognized as the most abundant and transpositionally dynamic (Hawkins ; Piegu ; El Baidouri and Panaud 2013). Evaluating TE diversity in organismal genomes has been revolutionized by NGS technologies that enable rapid and detailed analysis of TE composition both within and among species (Macas ; Swaminathan ; Wicker ; Piednoel ; Sveinsson ; Diez ; Agren ; Kelly ). Here we utilized Illumina short-read sequence data coupled with sequence information from a panel of full-length gypsy and copia LTR retrotransposons obtained from the H. annuus genome to explore the contribution of LTR retrotransposons to genome size variation among eight diploid Helianthus species representing all four taxonomic sections based on current classification schemes (Schilling and Heiser 1981) and an outgroup species, P. tenuifolius. The species under investigation consist of both annuals and perennials, vary in genome size by ∼fourfold, yet all are diploid with a haploid chromosome complement of n = 17. Given that other major classes of TEs such as DNA transposons and non-LTR retrotransposons (e.g., LINEs) represent a very small fraction of the sunflower genome (∼2% and 0.6%, respectively) (Staton ), these other TE categories were not included in the current analyses. In addition, cluster annotation in RepeatExplorer based on the RepeatMasker Viridplantae database indicates that other repeat types (i.e., low complexity repeats, simple repeats, and satellite DNA) generally are rare (<2% combined). This latter category of repeat types was thus also excluded from analysis.

Variation in genome size

With the exception of H. agrestis, all Helianthus species investigated in the current study possess genome size estimates lower than that for the outgroup species P. tenuifolius. It is currently unknown whether this pattern is attributable to a general history of genome size reduction of Helianthus lineages, genomic expansion in P. tenuifolius, a combination of the two forces, or an artifact of the species sampled. Genome size reduction (DNA loss) can result from processes such as illegitimate recombination and/or unequal intrastrand homologous recombination events at the site of LTRs or interior coding regions of LTR retrotransposons (Devos ; Vitte and Panaud 2003; Ma ; Hawkins ). Hallmarks of these events include the presence in the genome of truncated elements and solo LTRs. These hallmarks have not been thoroughly investigated in Helianthus species or comparatively in P. tenuifolius due to a lack of sufficient sequence data. Based on analyses of 21 BAC clone sequences of the common sunflower H. annuus, however, truncated elements and solo LTRs do not appear a common feature of the sunflower genome (Staton ). Evidence for independent genome expansion in P. tenuifolius following divergence from Helianthus lineages is suggested by elevated read density for at least one gypsy sublineage described in the current study (Figure 3A, see also Staton and Burke 2015). Despite these observations, broader trends across the Asteraceae suggest a directional increase in abundance of the more common gypsy LTR retrotransposons (and accordingly in genome size) from basal to more derived lineages, the latter of which include Helianthus and Phoebanthus species (Staton and Burke 2015). As such, Helianthus and Phoebanthus species’ genomes should be considered larger and with higher copy numbers of LTR retrotransposons compared with other members of Asteraceae, at least based on the limited sampling to date.

Clustering with panels of full-length LTR retrotransposons

A strong positive correlation was found between genome size and genome repetitive fraction, indicating an important role for repetitive DNA in underlying genome size variation in this group. Combining short-read data with sequence information from a panel of full-length LTR retrotransposons in a de novo graph-based clustering approach enabled meaningful comparisons of LTR retrotransposon presence and relative abundance across species. The majority of elements within this panel have estimated insertion times in the H. annuus genome within the last 2 million yr (Buti ; Staton ). As such, our analyses focus on LTR retrotransposons in Helianthus likely to have been active recently; more ancient elements potentially involved in older amplification events may be less well represented. Sequences derived from gypsy elements were observed to be 3.8- to 5.3-fold more common than sequences from copia elements for these species. These results are consistent with previous analyses of the H. annuus genome (Buti ; Staton ), and consistent with genomic composition analyses in other plant species where similar abundance biases have been observed (International Rice Genome Sequencing 2005; Ming ; Paterson ). Our results indicate variation in abundance for different sublineages of gypsy and copia elements within genomes, but general stability in read density within a sublineage across species. Stability in read density across species is expected if most LTR retrotransposon proliferation activity occurred in the common ancestor of these species, with sequence abundances remaining relatively unchanged following subsequent cladogenesis. Elements from the most abundant sublineages of gypsy (i.e., sublineages A, B, C, X1, and X2; Figure 3A) represent part of a larger, well-supported, monophyletic group (Figure 2A), and thus share a common evolutionary history. In contrast, copia sublineages with the highest read densities (i.e., sublineages 1, 3, and 7) represent more distantly related and nonmonophyletic elements. While general stability in read density within most sublineages was observed across species, exceptions to this pattern were found, most notably for three gypsy sublineages (sublineages A, C, and X1) and a single copia sublineage (sublineage 1). These exceptions were marked by higher read densities for species with larger genome size estimates, and were most apparent for H. agrestis and P. tenuifolius. These patterns likely reflect recent and lineage-specific amplifications that have contributed to genome size expansion in these species. Similar patterns have been observed in other plant genera whereby differential abundance of a small number of LTR retrotransposon lineages underlies large genome size differences among species (Hawkins ; Piegu ; El Baidouri and Panaud 2013). Interestingly, representative elements for two of the abundant gypsy sublineages (i.e., RLG-iketas and RLG-wimu) and copia sublineage 1 (RLC-amov, RLC-jiliwu) (see Figure 2, A and B, respectively) also display signatures of recent insertional activity in the common sunflower (H. annuus) genome, indicating potential widespread activity throughout the genus. Developing appropriate methods for meaningful comparisons of TE content and abundance across species genomes has become increasingly necessary as NGS technologies continue to improve and costs continue to decline. The graph-based clustering approach of short-read Illumina data combined with sequence information from a TE reference panel proved effective for interspecific analyses of sublineage identification and sequence densities in Helianthus, and provides a useful method when TE reference panels are available. A potential complicating factor of this method is that, due to sequence divergence among genomes, fewer sequence reads and/or sublineages might be identified in interspecific comparisons as genetic distance increases from the TE reference panel. To explore this possibility, we tested whether the density of species-specific Illumina reads clustering with gypsy and copia tracer sequences decreased with increasing genetic distance from the H. annuus-derived TE reference panel. We failed to find such a negative correlation (Figure S2A). Interestingly, however, a negative correlation was observed when the LTR retrotransposon panel was used as a reference in a read-mapping based approach (Figure S2B). This negative correlation persisted when mapping stringency was relaxed and greater numbers of mismatches allowed. Interspecific read-mapping to quantify TE abundances has been problematic in other species groups as well (Sveinsson ), and generally should be avoided.

Transcriptional activity of gypsy and copia

Transcriptional activity of both gypsy and copia elements has been documented previously in both cultivated (Vukich ; Gill ) and wild (Kawakami ; Kawakami ; Ungerer and Kawakami 2013; Renaut ) sunflowers. In the current study we confirmed expression of these elements in two species (H. annuus and H. anomalus) and demonstrate that transcriptional activity occurs broadly across the genus. Transcriptionally active elements documented in the current study represent the same variants associated with genome expansion events documented in three sunflower homoploid hybrid species (Ungerer ; Kawakami ). Transcriptional activity of gypsy sequences was readily detectable in both leaf and bud tissue for all annual Helianthus species, less detectable in bud tissue of perennial Helianthus species, and undetectable in leaf tissue of perennial Helianthus species. Although the primers used to assay for transcriptional activity were developed from H. annuus (an annual species), differential detection for annual vs. perennial species is unlikely attributable to sequence divergence with increasing phylogenetic distance from H. annuus given that H. agrestis is an independently evolved annual species and more distant genetically from H. annuus than the remaining Helianthus species under investigation (Figure S2). It is interesting to note that more detectable transcriptional activity in annual species is consistent with a higher density of reads derived from this same sublineage based on clustering analyses of genomic short-read data (Figure 3A), demonstrating a potential link between transcriptional activity and genomic abundance level of element copy number in this group of plants. Quantitative PCR experiments have confirmed such a relationship comparing annual sunflower taxa H. annuus and H. petiolaris with their hybrid derivative species H. anomalus, H. deserticola, and H. paradoxus, where higher expression was observed in species with higher copy number abundances (Ungerer and Kawakami 2013; Renaut et al. 2014, but see Gill ). Transcriptional activity of this gypsy sublineage was not detected in either leaf or bud tissue of P. tenuifolius, indicating that expression may be restricted to within Helianthus. In contrast to results for gypsy transcriptional activity, expression of copia was equally detectable among Helianthus annual and perennial species, across tissue types, and in P. tenuifolius. Read density of genomic short-read data for this same sublineage generally are comparable across annual and perennial Helianthus species with the exception of H. agrestis, for which read density is higher. More quantitative assays of transcriptional activity of both gypsy and copia elements may yield additional insights into expression dynamics of these elements across the sunflower genus. Transcriptional activity of additional sublineages of gypsy and copia have been documented previously in H. annuus (Gill ) via RNA-seq and shown to exhibit tissue-specific expression.

Genome expansion in H. agrestis

A notable finding of the current study is genomic amplification of LTR retrotransposon sublineages in the genome of H. agrestis. H. agrestis has a restricted geographical distribution in the southeastern United States, with populations found in central and southern Florida and in a single county in southern Georgia (Heiser ). As noted above, this species is an annual, but distantly related from most other Helianthus annuals that form a monophyletic group and thus has independently evolved this life history form (Stephens ). H. agrestis is atypical in being one of only two Helianthus species that lack a self-incompatibility system (Heiser ). Genome size estimates of H. agrestis indicate a nuclear genome ∼1.9–3.5× larger than any other Helianthus species under investigation in the current study and ∼1.7× larger than that for the outgroup species P. tenuifolius. Genome expansion in H. agrestis is associated with amplification of a small number of LTR retrotransposon sublineages, represented by two different gypsy sublineages and a single copia sublineage. Sequences from these three sublineages represent ∼28% of the H. agrestis genome based on our estimation procedures (Table S1). This observation is consistent with previous findings demonstrating that large interspecific variation in genome size can be attributable to a small number of LTR retrotransposon sublineages (Hawkins ; Piegu ; Vitte and Bennetzen 2006; El Baidouri and Panaud 2013) but contrasts with results observed for species of plants harboring some of the largest genomes (e.g., Fritillaria) where genome composition appears to consist of highly diverse, but relatively low abundance repeat types (Kelly ). As noted above, two of the three most abundant sublineages in the H. agrestis genome (gypsy sublineage A and copia sublineage 1) have contributed to major genome expansion events in one or more diploid hybrid Helianthus species (Ungerer ; Kawakami ), and these sublineages remain active transcriptionally across the genus. It is thus noteworthy that the same LTR retrotransposon sublineages have experienced large-scale amplification events and promoted genome expansion independently in different regions of the Helianthus phylogeny. The forces governing activation (and repression) of these sublineages in different Helianthus species’ genomes is the focus of ongoing work.
  55 in total

1.  High-resolution phylogeny for Helianthus (Asteraceae) using the 18S-26S ribosomal DNA external transcribed spacer.

Authors:  Ruth E Timme; Beryl B Simpson; C Randal Linder
Journal:  Am J Bot       Date:  2007-11       Impact factor: 3.844

2.  Nuclear DNA content and genome size of trout and human.

Authors:  J Dolezel; J Bartos; H Voglmayr; J Greilhuber
Journal:  Cytometry A       Date:  2003-02       Impact factor: 4.355

3.  Genome expansion in three hybrid sunflower species is associated with retrotransposon proliferation.

Authors:  Mark C Ungerer; Suzanne C Strakosh; Ying Zhen
Journal:  Curr Biol       Date:  2006-10-24       Impact factor: 10.834

Review 4.  LTR retrotransposons and flowering plant genome size: emergence of the increase/decrease model.

Authors:  C Vitte; O Panaud
Journal:  Cytogenet Genome Res       Date:  2005       Impact factor: 1.636

Review 5.  Transposable elements and microevolutionary changes in natural populations.

Authors:  Georgi Bonchev; Christian Parisod
Journal:  Mol Ecol Resour       Date:  2013-06-25       Impact factor: 7.090

6.  RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads.

Authors:  Petr Novák; Pavel Neumann; Jiří Pech; Jaroslav Steinhaisl; Jiří Macas
Journal:  Bioinformatics       Date:  2013-02-01       Impact factor: 6.937

7.  The genomic organization of Ty3/gypsy-like retrotransposons in Helianthus (Asteraceae) homoploid hybrid species.

Authors:  S Evan Staton; Mark C Ungerer; Richard C Moore
Journal:  Am J Bot       Date:  2009-08-19       Impact factor: 3.844

8.  Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data.

Authors:  Petr Novák; Pavel Neumann; Jirí Macas
Journal:  BMC Bioinformatics       Date:  2010-07-15       Impact factor: 3.169

9.  Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis.

Authors:  Katrien M Devos; James K M Brown; Jeffrey L Bennetzen
Journal:  Genome Res       Date:  2002-07       Impact factor: 9.043

10.  Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula.

Authors:  Jirí Macas; Pavel Neumann; Alice Navrátilová
Journal:  BMC Genomics       Date:  2007-11-21       Impact factor: 3.969

View more
  8 in total

1.  A D-genome-originated Ty1/Copia-type retrotransposon family expanded significantly in tetraploid cottons.

Authors:  Qian Li; Yue Zhang; Zhengsheng Zhang; Xianbi Li; Dan Yao; Yi Wang; Xufen Ouyang; Yaohua Li; Wu Song; Yuehua Xiao
Journal:  Mol Genet Genomics       Date:  2017-08-28       Impact factor: 3.291

2.  Intraspecific and intraorganismal copy number dynamics of retrotransposons and tandem repeat in Aegilops speltoides Tausch (Poaceae, Triticeae).

Authors:  Imad Shams; Olga Raskina
Journal:  Protoplasma       Date:  2018-01-27       Impact factor: 3.356

3.  Comparative Analysis of Transposable Elements and the Identification of Candidate Centromeric Elements in the Prunus Subgenus Cerasus and Its Relatives.

Authors:  Lei Wang; Yan Wang; Jing Zhang; Yan Feng; Qing Chen; Zhen-Shan Liu; Cong-Li Liu; Wen He; Hao Wang; Shao-Feng Yang; Yong Zhang; Ya Luo; Hao-Ru Tang; Xiao-Rong Wang
Journal:  Genes (Basel)       Date:  2022-04-02       Impact factor: 4.141

4.  RepeatProfiler: A pipeline for visualization and comparative analysis of repetitive DNA profiles.

Authors:  Sherif Negm; Anya Greenberg; Amanda M Larracuente; John S Sproul
Journal:  Mol Ecol Resour       Date:  2021-01-04       Impact factor: 7.090

5.  Genomic abundance and transcriptional activity of diverse gypsy and copia long terminal repeat retrotransposons in three wild sunflower species.

Authors:  Fan Qiu; Mark C Ungerer
Journal:  BMC Plant Biol       Date:  2018-01-05       Impact factor: 4.215

6.  Copy Number Variation of Transposable Elements in Thinopyrum intermedium and Its Diploid Relative Species.

Authors:  Mikhail G Divashuk; Gennady I Karlov; Pavel Yu Kroupin
Journal:  Plants (Basel)       Date:  2019-12-21

7.  The relationship between transposable elements and ecological niches in the Greater Cape Floristic Region: A study on the genus Pteronia (Asteraceae).

Authors:  Zuzana Chumová; Alexander Belyayev; Terezie Mandáková; Vojtěch Zeisek; Eva Hodková; Kristýna Šemberová; Douglas Euston-Brown; Pavel Trávníček
Journal:  Front Plant Sci       Date:  2022-09-29       Impact factor: 6.627

8.  Comparative Genomics of an Unusual Biogeographic Disjunction in the Cotton Tribe (Gossypieae) Yields Insights into Genome Downsizing.

Authors:  Corrinne E Grover; Mark A Arick; Justin L Conover; Adam Thrash; Guanjing Hu; William S Sanders; Chuan-Yu Hsu; Rubab Zahra Naqvi; Muhammad Farooq; Xiaochong Li; Lei Gong; Joann Mudge; Thiruvarangan Ramaraj; Joshua A Udall; Daniel G Peterson; Jonathan F Wendel
Journal:  Genome Biol Evol       Date:  2017-12-01       Impact factor: 3.416

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.