| Literature DB >> 31857828 |
Shujun Ou1, Ning Jiang1.
Abstract
Annotation of plant genomes is still a challenging task due to the abundance of repetitive sequences, especially long terminal repeat (LTR) retrotransposons. LTR_FINDER is a widely used program for the identification of LTR retrotransposons but its application on large genomes is hindered by its single-threaded processes. Here we report an accessory program that allows parallel operation of LTR_FINDER, resulting in up to 8500X faster identification of LTR elements. It takes only 72 min to process the 14.5 Gb bread wheat (Triticum aestivum) genome in comparison to 1.16 years required by the original sequential version. LTR_FINDER_parallel is freely available at https://github.com/oushujun/LTR_FINDER_parallel.Entities:
Keywords: Genome annotation; LTR retrotransposon; LTR_FINDER; Transposable element
Year: 2019 PMID: 31857828 PMCID: PMC6909508 DOI: 10.1186/s13100-019-0193-0
Source DB: PubMed Journal: Mob DNA
Benchmarking the performance of LTR_FINDER_parallel
| Genome | Arabidopsis | Rice | Maize | Wheat |
|---|---|---|---|---|
| Version | TAIR10 | MSU7 | AGPv4 | CS1.0 |
| Size | 119.7 Mb | 374.5 Mb | 2134.4 Mb | 14,547.3 Mb |
| Original memory (1 threada) | 0.37 Gbyte | 0.55 Gbyte | 5.00 Gbyte | 11.88 Gbyteb |
| Parallel memory (36 threadsa) | 0.10 Gbyte | 0.12 Gbyte | 0.82 Gbyte | 17.67 Gbyte |
| Original time (1 thread) | 0.58 h | 2.1 h | 448.5 h | 10,169.3 hb |
| Parallel time (36 threads) | 6.4 min | 2.6 min | 10.3 min | 71.8 min |
| Speed up | 5.4 X | 48.5 X | 2613 X | 8498 X |
| # of LTR candidates (1 thread) | 226 | 2851 | 60,165 | 231,043 |
| # of LTR candidates (36 threads) | 226 | 2834 | 59,658 | 237,352 |
| % difference in candidate # | 0.00% | 0.60% | 0.84% | −2.73% |
a Intel(R) Xeon(R) CPU E5–2660 v4 @ 2.00GHz
b LTR_FINDER was run on each chromosome; the maximum memory and the total time are shown