Kamil S Jaron1, Jiří C Moravec1, Natália Martínková2. 1. Institute of Biostatistics and Analyses, Masaryk University and Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, Brno, Czech Republic. 2. Institute of Biostatistics and Analyses, Masaryk University and Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, Brno, Czech Republic Institute of Biostatistics and Analyses, Masaryk University and Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, Brno, Czech Republic.
Abstract
MOTIVATION: Genomic islands (GIs) are DNA fragments incorporated into a genome through horizontal gene transfer (also called lateral gene transfer), often with functions novel for a given organism. While methods for their detection are well researched in prokaryotes, the complexity of eukaryotic genomes makes direct utilization of these methods unreliable, and so labour-intensive phylogenetic searches are used instead. RESULTS: We present a surrogate method that investigates nucleotide base composition of the DNA sequence in a eukaryotic genome and identifies putative GIs. We calculate a genomic signature as a vector of tetranucleotide (4-mer) frequencies using a sliding window approach. Extending the neighbourhood of the sliding window, we establish a local kernel density estimate of the 4-mer frequency. We score the number of 4-mer frequencies in the sliding window that deviate from the credibility interval of their local genomic density using a newly developed discrete interval accumulative score (DIAS). To further improve the effectiveness of DIAS, we select informative 4-mers in a range of organisms using the tetranucleotide quality score developed herein. We show that the SigHunt method is computationally efficient and able to detect GIs in eukaryotic genomes that represent non-ameliorated integration. Thus, it is suited to scanning for change in organisms with different DNA composition. AVAILABILITY AND IMPLEMENTATION: Source code and scripts freely available for download at http://www.iba.muni.cz/index-en.php?pg=research-data-analysis-tools-sighunt are implemented in C and R and are platform-independent. CONTACT: 376090@mail.muni.cz or martinkova@ivb.cz.
MOTIVATION: Genomic islands (GIs) are DNA fragments incorporated into a genome through horizontal gene transfer (also called lateral gene transfer), often with functions novel for a given organism. While methods for their detection are well researched in prokaryotes, the complexity of eukaryotic genomes makes direct utilization of these methods unreliable, and so labour-intensive phylogenetic searches are used instead. RESULTS: We present a surrogate method that investigates nucleotide base composition of the DNA sequence in a eukaryotic genome and identifies putative GIs. We calculate a genomic signature as a vector of tetranucleotide (4-mer) frequencies using a sliding window approach. Extending the neighbourhood of the sliding window, we establish a local kernel density estimate of the 4-mer frequency. We score the number of 4-mer frequencies in the sliding window that deviate from the credibility interval of their local genomic density using a newly developed discrete interval accumulative score (DIAS). To further improve the effectiveness of DIAS, we select informative 4-mers in a range of organisms using the tetranucleotide quality score developed herein. We show that the SigHunt method is computationally efficient and able to detect GIs in eukaryotic genomes that represent non-ameliorated integration. Thus, it is suited to scanning for change in organisms with different DNA composition. AVAILABILITY AND IMPLEMENTATION: Source code and scripts freely available for download at http://www.iba.muni.cz/index-en.php?pg=research-data-analysis-tools-sighunt are implemented in C and R and are platform-independent. CONTACT: 376090@mail.muni.cz or martinkova@ivb.cz.
Authors: Catherine M Mageeney; Britney Y Lau; Julian M Wagner; Corey M Hudson; Joseph S Schoeniger; Raga Krishnakumar; Kelly P Williams Journal: Nucleic Acids Res Date: 2020-05-07 Impact factor: 16.971
Authors: Cíntia L Ribeiro; Daniel Conde; Kelly M Balmant; Christopher Dervinis; Matthew G Johnson; Aaron P McGrath; Paul Szewczyk; Faride Unda; Christina A Finegan; Henry W Schmidt; Brianna Miles; Derek R Drost; Evandro Novaes; Carlos A Gonzalez-Benecke; Gary F Peter; J Gordon Burleigh; Timothy A Martin; Shawn D Mansfield; Geoffrey Chang; Norman J Wickett; Matias Kirst Journal: Proc Natl Acad Sci U S A Date: 2020-02-10 Impact factor: 11.205