| Literature DB >> 36015406 |
Ilya Kirov1,2, Elizaveta Kolganova1, Maxim Dudnikov1,2, Olga Yu Yurkevich3, Alexandra V Amosova3, Olga V Muravenko3.
Abstract
High-copy tandemly organized repeats (TRs), or satellite DNA, is an important but still enigmatic component of eukaryotic genomes. TRs comprise arrays of multi-copy and highly similar tandem repeats, which makes the elucidation of TRs a very challenging task. Oxford Nanopore sequencing data provide a valuable source of information on TR organization at the single molecule level. However, bioinformatics tools for de novo identification of TRs in raw Nanopore data have not been reported so far. We developed NanoTRF, a new python pipeline for TR repeat identification, characterization and consensus monomer sequence assembly. This new pipeline requires only a raw Nanopore read file from low-depth (<1×) genome sequencing. The program generates an informative html report and figures on TR genome abundance, monomer sequence and monomer length. In addition, NanoTRF performs annotation of transposable elements (TEs) sequences within or near satDNA arrays, and the information can be used to elucidate how TR-TE co-evolve in the genome. Moreover, we validated by FISH that the NanoTRF report is useful for the evaluation of TR chromosome organization-clustered or dispersed. Our findings showed that NanoTRF is a robust method for the de novo identification of satellite repeats in raw Nanopore data without prior read assembly. The obtained sequences can be used in many downstream analyses including genome assembly assistance and gap estimation, chromosome mapping and cytogenetic marker development.Entities:
Keywords: Nanopore sequencing; genome; pipeline; satellite DNA; tandem repeats
Year: 2022 PMID: 36015406 PMCID: PMC9413040 DOI: 10.3390/plants11162103
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
Figure 1Schematic representation of the NanoTRF pipeline and the report. (A) The scheme showing the seven main steps in the NanoTRF pipeline: (1) TR identification in individual ONP reads by TideHunter; (2) all-to-all similarity search between TRs and clustering of highly similar TRs; (3) consensus monomer assembly by cap3; (4) detection of subrepeats in the contigs; (5) ONP read annotation by the TE protein database; (6) calculation of ONP read coverage by TRs and (7) final report generation. Monomers of distinct TRs are colored in orange and blue. (B) A screenshot of the output table from the html file generated by NanoTRF.
Figure 2Comparison of the results of the NanoTRF and TAREAN identification of TRs in the D. antarctica genome based on the (A) genome abundancy and (B) monomer length. Individual TRs are represented as black dots. X-axis and Y-axis show the results from NanoTRF and TAREAN, respectively.
Figure 3Comparison of the cluster layouts and read coverage data of three TRs with different FISH patterns on chromosomes of Deschampsia antarctica: Da322 (dispersed), Da97 (dispersed and clustered) and Da238 (clustered) probes. Bottom picture shows the results of FISH experiments with labeled individual TRs (Da322, Da97 and Da238; red fluorescence signals) and 45S rDNA (green fluorescence signals). Chromosomes are stained by DAPI (blue fluorescence signals).
Figure 4NanoTRF results for the analysis of the FabTR2 and FabTR58 repeats. (A) Graph layouts and pie chart showing the percentage of reads with different TR occupancies for clusters clust1 (FabTR2) and clust59 (FabTR58). (B) Bar plot of the percentage of reads in the two clusters (clust1 and clust59) possessing similarity to Ogre TE domains.
Sequences of the TRs of Deschampsia antarctica used for generation of the oligonucleotide FISH probes.
| Tandem Repeat/Genome Proportion, % | Length, bp | Sequence |
|---|---|---|
| Da 97/0.21 | 342 | CCCACGGGCTAGGGTTTCGCTGGAAAAGTACCGCCGGAGCGCCGGAATCCCACGAAAACTTGCGTGTGGCCCTAGCATGCATGCACAAGTGTGGTGGAAGGTTCCTAGATGCAATACGTAGCTCCCGGGTGCGATCCTGTTCGCGCGCATGCGATAACACTTAGAAAACTGCTGGACCTCTGGGAGGAATCTCCCGCTACGGGTCAAACGGAGGGCAACCGGTGGAATCATGGGCCCAACCTTGGTTTTCCATGTAGATATGCCCTAAACAAACCCAAACCAACAAAAAAGTACATTGGTCAACCCTCGTACGGAGAATGCTAGGGGGCTAGACCTGCGGGT |
| Da 238/0.042 | 379 | GCCTAACACCCTATCGTAGACACCCATGGGTTGGGGCGCAGTGCACGTAATACTATACGGATCCAGCGTTCCATCGAATTTTGAGTTTTTACTGCAGAAACTTCCATTTTTCCTAGACTTGTGAGCACTTTTTGAGGCCCTAAAAAGGCTTTTTTGGGGTCGAGATGGTCCGCACGCGTGCTGGGGTGTGTGCACGTATGTAAAATCATCCGGATTGCAAAAATTAGAAGTCCTTTTATC |
| Da 322/0.013 | 342 | GGTCTAGGGTTTCCCCGGATACAGACCACCGGAGCGTCGGAATCGCTGGAAAACTTGCATGTGTCCCTAACATATGTGTACAAGTGTGATGTAAGGTTGGTAGATGGCATATCTAGGTCCCAGGCGTGACGCTGTTCGCAGACATGGGCTAACACTTGGTAAAATCCTGGATCTGTATGTGGAAACTCCCGCTACGGGTCAACCGGAGCCTATTTTATGGTAAAGTAGGCCCAACCTCTGCTTTCCATGTACATATGTCCTAAACAAACCAAAACAAGGAAAAAACTCCATTGGTAAACCCTCGTACGGAGAAAGCTATAGGGGTAGATCTGCGGGGTCCCC |