| Literature DB >> 29783941 |
Jérémy Berthelier1,2, Nathalie Casse3, Nicolas Daccord4,5, Véronique Jamilloux6, Bruno Saint-Jean7, Grégory Carrier7.
Abstract
BACKGROUND: Transposable elements (TEs) are mobile DNA sequences known as drivers of genome evolution. Their impacts have been widely studied in animals, plants and insects, but little is known about them in microalgae. In a previous study, we compared the genetic polymorphisms between strains of the haptophyte microalga Tisochrysis lutea and suggested the involvement of active autonomous TEs in their genome evolution.Entities:
Keywords: Algae; Annotation; Genome assembly; Haptophyte; Pipeline; Tisochrysis lutea; Tool; Transposable elements
Mesh:
Substances:
Year: 2018 PMID: 29783941 PMCID: PMC5963040 DOI: 10.1186/s12864-018-4763-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Overview of the PiRATE pipeline. Step 0: genome assembly and raw Illumina data are used as input data. Step 1: The detection of putative TEs and repeated sequences is performed using 12 tools, combining four detection approaches. Detected sequences from approaches 1 and 4 are filtered according to their length (minimum 500 bp). Detected sequences from the tools MITE-Hunter and SINE-Finder are directly saved as non-autonomous TEs. Other detected sequences are clustered with CD-HIT-est to reduce redundancy. Step 2: Putative TE sequences are automatically classified with PASTEC as potentially autonomous TEs, non-autonomous TEs or uncategorized sequences. The potentially autonomous TEs are manually checked and grouped into TE families. Step 3: Three libraries are manually constructed with a “Russian doll” strategy: 1) a “potentially autonomous TEs library”, a “total TEs library” and a “repeated elements library”. A double-run of TEannot is carried out for each library to select sequences that align with a full-length (FLC) on the genome assembly and finally obtain three independent annotations
Fig. 2Evaluation of the detection step of PiRATE with genomic data of Arabidopsis thaliana. a) Percentage of TE families detected in A. thaliana. For each TE order (x-axis) is indicated the percentage of TE families detected with a complete length (coverage score ≥ 70%, white bars) or detected with a partial and a complete length (coverage score ≥ 40%, black bars). The x-axis indicates the number of TE families for each order; “n-a” means non-autonomous. b) Comparison of the percentage of TE families of A. thaliana detected by PiRATE (Step 1), RepARK, RepeatExplorer, dnaPipeTE, RepeatScout, RepeatMasker, LTRharvest and TEdenovo. For each tool is indicated the percentages of TE families of A. thaliana detected with a complete length (coverage score ≥ 70%, white bars) or detected with a partial and a complete length (coverage score ≥ 40%, black bars). The x-axis indicates the tools and nature of the input data
Fig. 3Comparison of the contribution of the four TE detection approaches of PiRATE on the detection of the TE families of Tisochrysis lutea, depending on the input data. For each TE detection approach, we calculated the number of TE families detected with the largest length and divide this number by the total of detected TE families of T. lutea. The input dataset was either the draft genome assembly of T. lutea and raw Illumina data of T. lutea (white bars) or the new genome assembly of T. lutea and raw Illumina data of T. lutea (black bars). The similarity-based detection, structural-based detection and the repetitiveness-based detection use a genome assembly as input data. The last approach builds repeated elements from raw Illumina data
Diversity and proportion of transposable element orders and classes in the genome assembly of Tisochrysis lutea. The abbreviations “a” and “n-a” indicate autonomous and non-autonomous transposable elements respectively
| Orders/ Superfamilies | Number of families (f) or detected sequences (s) | Number of potentially autonomous TEs | Proportion of the potentially autonomous TEs (%) | Proportion of total genome (%) | ||
|---|---|---|---|---|---|---|
| Class I | a | LTR/Copia | 6 f | 45 | 0.37 | 1.09 |
| LTR/Gypsy | 4 f | 242 | 2.56 | 4.65 | ||
| LINE/L1 | 14 f | 59 | 0.25 | 3.87 | ||
| n-a | SINE | 14 s | 0.04 | |||
| LTR/LARD | 17 s | 0.76 | ||||
| LTR/TRIM | 240 s | 5.48 | ||||
| Total Class I | 15.89 | |||||
| Class II | a | TIR/hAT | 129 f | 145 | 0.41 | 2.12 |
| TIR/Mariner | 8 f | 41 | 0.11 | 0.19 | ||
| TIR/Harbinger | 7 f | 26 | 0.05 | 0.34 | ||
| TIR/PiggyBac | 7 f | 14 | 0.04 | 0.26 | ||
| n-a | MITE | 188 s | 2.04 | |||
| Total Class II | 4.95 | |||||
| Total TEs | 572 | 3.79 | 20.84 | |||
Fig. 4Cartography of the 572 potentially autonomous TEs in the genome assembly of Tisochrysis lutea. The contig position is random. TEs belonging to the same superfamily are represented with the same colour. The 73 potentially autonomous TEs belonging to the 17 expressed TE families are highlighted with a green “T”. Elements belonging to the TIR/Mariner Luffy family and TIR/hAT Ace family are marked by a grey circle. Transposase proteins were synthesized for these both TE families