| Literature DB >> 25813049 |
Moaine El Baidouri1, Kyung Do Kim1, Brian Abernathy1, Siwaret Arikit2, Florian Maumus3, Olivier Panaud4, Blake C Meyers2, Scott A Jackson5.
Abstract
Transposable elements (TEs) are mobile genomic DNA sequences found in most organisms. They so densely populate the genomes of many eukaryotic species that they are often the major constituents. With the rapid generation of many plant genome sequencing projects over the past few decades, there is an urgent need for improved TE annotation as a prerequisite for genome-wide studies. Analogous to the use of RNA-seq for gene annotation, we propose a new method for de novo TE annotation that uses as a guide 24 nt-siRNAs that are a part of TE silencing pathways. We use this new approach, called TASR (for Transposon Annotation using Small RNAs), for de novo annotation of TEs in Arabidopsis, rice and soybean and demonstrate that this strategy can be successfully applied for de novo TE annotation in plants.Executable PERL is available for download from: http://tasr-pipeline.sourceforge.net/.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25813049 PMCID: PMC4513842 DOI: 10.1093/nar/gkv257
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Workflow of the de-novo TE annotation pipeline using 24 nt-siRNAs mapping (TASR). (I) Examples of 24 nt-siRNAs ‘mapping profiles’ (A) Expected 24 nt-siRNA TE mapping profile when the siRNAs cover the full length TE and correspond to TE boundaries. (B) 24 nt-siRNAs spread beyond the TE boundaries (+). (C) Missing 24 nt-siRNAs. Cases B and C concern the 5', 3' part or both sides of a TE. (II) The TASR pipeline. (A) Mapping 24 nt-siRNAs to a genome. (B) Extraction of genomic intervals corresponding to the different instances shown in (I). (C) Clustering of TE paralogs based on sequence similarity. (D) Extending flanking regions, (E) Re-defining the boundaries for each TE paralog using an all-against-all comparison, including flanking regions. (F) Separate multi-FASTA files containing TE paralogs for each family.
Figure 2.Venn diagrams showing the comparison of the public TE annotations and TASR TE annotations and bar plots of TASR-specific TE classifications. (A) Green corresponds to common TEs between the two annotations in Mbp. Blue corresponds to TASR-specific TE elements and absent from the public annotations. Yellow corresponds to the TEs from the public annotation that are absent in the TASR TE annotations. (B) TASR-specific TE classification according to Wicker's classification. Green colors correspond to class I transposons. Purple colors correspond to class II elements. Rust colors correspond to CDS sequences and yellow to unclassified TASR-specific TEs.
Figure 3.Comparison of the percentage of methylation and 24 nt-siRNAs densities between TEs annotated using TASRs and the public TE annotation in Arabidopsis, rice and soybean.
Figure 4.Comparison of TE annotations obtained by TASR with RepeatScout and REPET. (A) Venn diagrams showing the common and specific TE fractions in Mbp obtained using TASR and RepeatScout/REPET. Blue is TASR-specific TEs, green common TEs and yellow RepeatScout/REPET-specific TEs. (B) Relative frequency distribution of library length in kbp generated by RepeatScout, REPET and TASR. Note the relative frequency was limited in Y-axis to facilitate comparisons. (C) Comparison of the percentage of TEs targeted by 24 nt-siRNAs between TASR, REPET and RepeatScout. (D) Genome Browser screenshot showing representative example of TE annotation obtained by TASR, REPET and RepeatScout.