| Literature DB >> 34961152 |
Ilya Kirov1,2, Pavel Merkulov1, Maxim Dudnikov1,2, Ekaterina Polkhovskaya1, Roman A Komakhin1, Zakhar Konstantinov1, Sofya Gvaramiya1, Aleksey Ermolaev3, Natalya Kudryavtseva3, Marina Gilyok1, Mikhail G Divashuk1,2, Gennady I Karlov1, Alexander Soloviev1.
Abstract
Long-read data is a great tool to discover new active transposable elements (TEs). However, no ready-to-use tools were available to gather this information from low coverage ONT datasets. Here, we developed a novel pipeline, nanotei, that allows detection of TE-contained structural variants, including individual TE transpositions. We exploited this pipeline to identify TE insertion in the Arabidopsis thaliana genome. Using nanotei, we identified tens of TE copies, including ones for the well-characterized ONSEN retrotransposon family that were hidden in genome assembly gaps. The results demonstrate that some TEs are inaccessible for analysis with the current A. thaliana (TAIR10.1) genome assembly. We further explored the mobilome of the ddm1 mutant with elevated TE activity. Nanotei captured all TEs previously known to be active in ddm1 and also identified transposition of non-autonomous TEs. Of them, one non-autonomous TE derived from (AT5TE33540) belongs to TR-GAG retrotransposons with a single open reading frame (ORF) encoding the GAG protein. These results provide the first direct evidence that TR-GAGs and other non-autonomous LTR retrotransposons can transpose in the plant genome, albeit in the absence of most of the encoded proteins. In summary, nanotei is a useful tool to detect active TEs and their insertions in plant genomes using low-coverage data from Nanopore genome sequencing.Entities:
Keywords: GAG; ddm1; long read sequencing; structural variants; transposon insertions
Year: 2021 PMID: 34961152 PMCID: PMC8704663 DOI: 10.3390/plants10122681
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
Figure 1Schematic view of nanotei pipeline. The main steps are enumerated. The red parts of the reads corresponding to TE-contained sequences. Created with BioRender.com (accessed on 3 November 2021).
Figure 2(A) Venn diagram showing the number of TEIs common between two Col-0 plants. (B) TEI on chromosome 4 and the schematic representation of the TE candidate AT1TE12295 proved by local assembly. (C) Tandemly organized TEIs with two ATCOPIA78 TEs, one of which was missed in TAIR10.
Figure 3(A) Venn diagram showing the number of TEIs common between two ddm1 plants. (B) Number of TEIs in ddm1 generated by different TE subfamilies. (C) Read alignment and TEI site of AT5TE33540 and dot plot showing the sequence similarity with full-length TE AT3TE48480. The blue box in the dot plot shows the ORF encoding GAG and POL polyproteins.