Georgios S Vernikos1, Julian Parkhill. 1. The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SA, UK. gsv@sanger.ac.uk
Abstract
MOTIVATION: There is a growing literature on the detection of Horizontal Gene Transfer (HGT) events by means of parametric, non-comparative methods. Such approaches rely only on sequence information and utilize different low and high order indices to capture compositional deviation from the genome backbone; the superiority of the latter over the former has been shown elsewhere. However even high order k-mers may be poor estimators of HGT, when insufficient information is available, e.g. in short sliding windows. Most of the current HGT prediction methods require pre-existing annotation, which may restrict their application on newly sequenced genomes. RESULTS: We introduce a novel computational method, Interpolated Variable Order Motifs (IVOMs), which exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed-order methods. For optimal localization of the boundaries of each predicted region, a second order, two-state hidden Markov model (HMM) is implemented in a change-point detection framework. We applied the IVOM approach to the genome of Salmonella enterica serovar Typhi CT18, a well-studied prokaryote in terms of HGT events, and we show that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands (SPI-1 to SPI-10) but also three novel SPIs (SPI-15, SPI-16, SPI-17) and other HGT events. AVAILABILITY: The software is available under a GPL license as a standalone application at http://www.sanger.ac.uk/Software/analysis/alien_hunter CONTACT: gsv@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: There is a growing literature on the detection of Horizontal Gene Transfer (HGT) events by means of parametric, non-comparative methods. Such approaches rely only on sequence information and utilize different low and high order indices to capture compositional deviation from the genome backbone; the superiority of the latter over the former has been shown elsewhere. However even high order k-mers may be poor estimators of HGT, when insufficient information is available, e.g. in short sliding windows. Most of the current HGT prediction methods require pre-existing annotation, which may restrict their application on newly sequenced genomes. RESULTS: We introduce a novel computational method, Interpolated Variable Order Motifs (IVOMs), which exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed-order methods. For optimal localization of the boundaries of each predicted region, a second order, two-state hidden Markov model (HMM) is implemented in a change-point detection framework. We applied the IVOM approach to the genome of Salmonella enterica serovar TyphiCT18, a well-studied prokaryote in terms of HGT events, and we show that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands (SPI-1 to SPI-10) but also three novel SPIs (SPI-15, SPI-16, SPI-17) and other HGT events. AVAILABILITY: The software is available under a GPL license as a standalone application at http://www.sanger.ac.uk/Software/analysis/alien_hunter CONTACT: gsv@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Pablo Manfredi; Frédéric Lauber; Francesco Renzi; Katrin Hack; Estelle Hess; Guy R Cornelis Journal: Infect Immun Date: 2014-11-03 Impact factor: 3.441
Authors: Joseph R Loquasto; Rodolphe Barrangou; Edward G Dudley; Buffy Stahl; Chun Chen; Robert F Roberts Journal: Appl Environ Microbiol Date: 2013-08-30 Impact factor: 4.792
Authors: Anne M Buboltz; Tracy L Nicholson; Mylisa R Parette; Sara E Hester; Julian Parkhill; Eric T Harvill Journal: J Bacteriol Date: 2008-06-13 Impact factor: 3.490
Authors: Kathryn E Holt; Julian Parkhill; Camila J Mazzoni; Philippe Roumagnac; François-Xavier Weill; Ian Goodhead; Richard Rance; Stephen Baker; Duncan J Maskell; John Wain; Christiane Dolecek; Mark Achtman; Gordon Dougan Journal: Nat Genet Date: 2008-07-27 Impact factor: 38.330
Authors: Mark W Silby; Ana M Cerdeño-Tárraga; Georgios S Vernikos; Stephen R Giddens; Robert W Jackson; Gail M Preston; Xue-Xian Zhang; Christina D Moon; Stefanie M Gehrig; Scott A C Godfrey; Christopher G Knight; Jacob G Malone; Zena Robinson; Andrew J Spiers; Simon Harris; Gregory L Challis; Alice M Yaxley; David Harris; Kathy Seeger; Lee Murphy; Simon Rutter; Rob Squares; Michael A Quail; Elizabeth Saunders; Konstantinos Mavromatis; Thomas S Brettin; Stephen D Bentley; Joanne Hothersall; Elton Stephens; Christopher M Thomas; Julian Parkhill; Stuart B Levy; Paul B Rainey; Nicholas R Thomson Journal: Genome Biol Date: 2009-05-11 Impact factor: 13.583