| Literature DB >> 26442169 |
Abstract
As a major driving force of genome evolution, transposons have been deviating from their original connotation as "junk" DNA ever since their important roles were revealed. The recently discovered Helitron transposons have been investigated in diverse eukaryotic genomes because of their remarkable gene-capture ability and other features that are crucial to our current understanding of genome dynamics. Helitrons are not canonical transposons in that they do not end in inverted repeats or create target site duplications, which makes them difficult to identify. Previous methods mainly rely on sequence alignment of conserved Helitron termini or manual curation. The abundance of Helitrons in genomes is still underestimated. We developed an automated and generalized tool, HelitronScanner, that identified a plethora of divergent Helitrons in many plant genomes. A local combinational variable approach as the key component of HelitronScanner offers a more granular representation of conserved nucleotide combinations and therefore is more sensitive in finding divergent Helitrons. This commentary provides an in-depth view of the local combinational variable approach and its association with Helitron sequence patterns. Analysis of Helitron terminal sequences shows that the local combinational variable approach is an efficacious representation of nucleotide patterns imperceptible at a full-sequence level.Entities:
Keywords: Helitron; algorithm; bioinformatic analysis; local combinational variable; sequence pattern
Year: 2014 PMID: 26442169 PMCID: PMC4588551 DOI: 10.4161/21592543.2014.971635
Source DB: PubMed Journal: Mob Genet Elements ISSN: 2159-2543
Figure 1.Divergent HelitronScanner identified 107,367 putative Helitrons from 39 plant genomes. Their top 50 clusters of 30-bp 3’-end sequences include 39,554 Helitrons. Similarities of the clusters are shown by the inner dendrogram. Sequence logos of the clusters are shown in the outer ring.
Figure 2.Connections of less frequent LCVs to Helitrons in the training set are clustered based on their 3’-end sequences. The top 5 clusters, each including 20 selected Helitrons (blue circles), are connected with 46 less frequent LCVs (red circles) they contain. The LCVs are shared by less than 30% of Helitrons in the training set. More frequent LCVs are not shown here to ensure better visualization.
Figure 3.LCV variation and their accumulated weight in LCV distribution in Helitron 5’ (A) and 3’ (B) ends is depicted by nucleotides colored in red. Saturation of color is proportional to numbers of LCVs nucleotides match. The invariant 5’-TC and 3’-CTAG Helitron hallmarks are colored in blue. Histograms of accumulated numbers of matched LCVs in Helitron 5’ (C) and 3’ (D) ends show variation in conserved terminal regions.