| Literature DB >> 22754753 |
Abstract
Understanding the causes and consequences of transposable element (TE) activity in the genomic era requires sophisticated bioinformatics approaches to accurately identify individual insertion sites. Next-generation sequencing technology now makes it possible to rapidly identify new TE insertions using resequencing data, opening up new possibilities to study the nature of TE-induced mutation and the target site preferences of different TE families. While the identification of new TE insertion sites is seemingly a simple task, the mechanisms of transposition present unique challenges for the annotation of de novo transposable element insertions mapped to a reference genome. Here I discuss these challenges and propose a framework for the annotation of de novo TE insertions that accommodates known mechanisms of TE insertion and established coordinate systems for genome annotation.Entities:
Year: 2012 PMID: 22754753 PMCID: PMC3383450 DOI: 10.4161/mge.19479
Source DB: PubMed Journal: Mob Genet Elements ISSN: 2159-2543

Figure 1. Genome coordinate systems and the annotation of TE insertions. The location of an arbitrary genomic feature encoded by the sequence GGGCCC is represented differently in base and interbase coordinate systems (A). Since de novo TE insertions occur between bases in the reference genome, they are more naturally represented by interbase coordinate systems. On the widely-used base coordinate system, mapping a de novo TE insertion requires the invocation of arbitrary rules (either before or after the insertion site) (B). These arbitrary rules can lead to ambiguity in the mapping and interpretation of de novo TE insertions.

Figure 2. TSDs create ambiguity in the annotation of de novo TE insertion sites. Unique DNA in the reference genome (e.g., positions 3–7 for a 5 bp TSD) is duplicated on insertion of a TE for both insertions on the positive strand (> > > ) and negative strand (< < < ). When NGS reads (solid gray arrows) that span the TE-flanking region junction are used to map de novo TE insertions on the positive strand, the placement of the insertion relative the TSD differs for reads from the 5′ (after TSD) and 3′ (before TSD) ends of the TE. Differential annotation of TE insertion sites is also observed for negative strand insertions, but placement relative to the TSD is reversed relative to positive strand insertions. These TSD-induced effects can lead to ambiguity in the mapping and interpretation of de novo TE insertions.