Kaitlin M Carey1, Gilia Patterson1,2, Travis J Wheeler3. 1. Department of Computer Science, University of Montana, 32 Campus Drive, Missoula, MT, USA. 2. Institute of Ecology and Evolution, University of Oregon, 272 Onyx Bridge, Eugene, OR, USA. 3. Department of Computer Science, University of Montana, 32 Campus Drive, Missoula, MT, USA. travis.wheeler@umontana.edu.
Abstract
BACKGROUND: Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and into subfamilies based on more fine-grained features that are often intended to capture family history. We evaluate the reliability of annotation with common subfamilies by assessing the extent to which subfamily annotation is reproducible in replicate copies created by segmental duplications in the human genome, and in homologous copies shared by human and chimpanzee. RESULTS: We find that standard methods annotate over 10% of replicates as belonging to different subfamilies, despite the fact that they are expected to be annotated as belonging to the same subfamily. Point mutations and homologous recombination appear to be responsible for some of this discordant annotation (particularly in the young Alu family), but are unlikely to fully explain the annotation unreliability. CONCLUSIONS: The surprisingly high level of disagreement in subfamily annotation of homologous sequences highlights a need for further research into definition of TE subfamilies, methods for representing subfamily annotation confidence of TE instances, and approaches to better utilizing such nuanced annotation data in downstream analysis.
BACKGROUND: Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and into subfamilies based on more fine-grained features that are often intended to capture family history. We evaluate the reliability of annotation with common subfamilies by assessing the extent to which subfamily annotation is reproducible in replicate copies created by segmental duplications in the human genome, and in homologous copies shared by human and chimpanzee. RESULTS: We find that standard methods annotate over 10% of replicates as belonging to different subfamilies, despite the fact that they are expected to be annotated as belonging to the same subfamily. Point mutations and homologous recombination appear to be responsible for some of this discordant annotation (particularly in the young Alu family), but are unlikely to fully explain the annotation unreliability. CONCLUSIONS: The surprisingly high level of disagreement in subfamily annotation of homologous sequences highlights a need for further research into definition of TE subfamilies, methods for representing subfamily annotation confidence of TE instances, and approaches to better utilizing such nuanced annotation data in downstream analysis.
Entities:
Keywords:
Interspersed repeats; Segmental duplications; Subfamilies; Transposable elements
Authors: Jian-Min Chen; David N Cooper; Nadia Chuzhanova; Claude Férec; George P Patrinos Journal: Nat Rev Genet Date: 2007-09-11 Impact factor: 53.242
Authors: Joseph Cheung; Xavier Estivill; Razi Khaja; Jeffrey R MacDonald; Ken Lau; Lap-Chee Tsui; Stephen W Scherer Journal: Genome Biol Date: 2003-03-17 Impact factor: 13.583
Authors: Aaron C Wacholder; Corey Cox; Thomas J Meyer; Robert P Ruggiero; Vijetha Vemulapalli; Annette Damert; Lucia Carbone; David D Pollock Journal: PLoS Genet Date: 2014-08-14 Impact factor: 5.917
Authors: Robert Hubley; Robert D Finn; Jody Clements; Sean R Eddy; Thomas A Jones; Weidong Bao; Arian F A Smit; Travis J Wheeler Journal: Nucleic Acids Res Date: 2015-11-26 Impact factor: 16.971
Authors: Jessica M Storer; Gabrielle A Hartley; Patrick G S Grady; Ariel Gershman; Savannah J Hoyt; Leonardo G de Lima; Charles Limouse; Reza Halabian; Luke Wojenski; Matias Rodriguez; Nicolas Altemose; Arang Rhie; Leighton J Core; Jennifer L Gerton; Wojciech Makalowski; Daniel Olson; Jeb Rosen; Arian F A Smit; Aaron F Straight; Mitchell R Vollger; Travis J Wheeler; Michael C Schatz; Evan E Eichler; Adam M Phillippy; Winston Timp; Karen H Miga; Rachel J O'Neill Journal: Science Date: 2022-04-01 Impact factor: 63.714