Literature DB >> 11483581

Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier.

R Sandberg1, G Winberg, C I Bränden, A Kaske, I Ernberg, J Cöster.   

Abstract

Bacterial genomes have diverged during evolution, resulting in clearcut differences in their nucleotide composition, such as their GC content. The analysis of complete sequences of bacterial genomes also reveals the presence of nonrandom sequence variation, manifest in the frequency profile of specific short oligonucleotides. These frequency profiles constitute highly specific genomic signatures. Based on these differences in oligonucleotide frequency between bacterial genomes, we investigated the possibility of predicting the genome of origin for a specific genomic sequence. To this end, we developed a naïve Bayesian classifier and systematically analyzed 28 eubacterial and archaeal genomes. We found that sequences as short as 400 bases could be correctly classified with an accuracy of 85%. We then applied the classifier to the identification of horizontal gene transfer events in whole-genome sequences and demonstrated the validity of our approach by correctly predicting the transfer of both the superoxide dismutase (sodC) and the bioC gene from Haemophilus influenzae to Neisseria meningitis, correctly identifying both the donor and recipient species. We believe that this classification methodology could be a valuable tool in biodiversity studies.

Entities:  

Mesh:

Year:  2001        PMID: 11483581      PMCID: PMC311094          DOI: 10.1101/gr.186401

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  16 in total

Review 1.  Phylogenetic classification and the universal tree.

Authors:  W F Doolittle
Journal:  Science       Date:  1999-06-25       Impact factor: 47.728

2.  Detecting alien genes in bacterial genomes.

Authors:  J Mrázek; S Karlin
Journal:  Ann N Y Acad Sci       Date:  1999-05-18       Impact factor: 5.691

3.  Real-time DNA sequencing using detection of pyrophosphate release.

Authors:  M Ronaghi; S Karamohamed; B Pettersson; M Uhlén; P Nyrén
Journal:  Anal Biochem       Date:  1996-11-01       Impact factor: 3.365

4.  Compositional biases of bacterial genomes and evolutionary implications.

Authors:  S Karlin; J Mrázek; A M Campbell
Journal:  J Bacteriol       Date:  1997-06       Impact factor: 3.490

5.  Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences.

Authors:  N Goldman
Journal:  Nucleic Acids Res       Date:  1993-05-25       Impact factor: 16.971

6.  Differences in dinucleotide frequencies of human, yeast, and Escherichia coli genes.

Authors:  H Nakashima; K Nishikawa; T Ooi
Journal:  DNA Res       Date:  1997-06-30       Impact factor: 4.458

7.  Genes from nine genomes are separated into their organisms in the dinucleotide composition space.

Authors:  H Nakashima; M Ota; K Nishikawa; T Ooi
Journal:  DNA Res       Date:  1998-10-30       Impact factor: 4.458

8.  Natural genetic exchange between Haemophilus and Neisseria: intergeneric transfer of chromosomal genes between major human pathogens.

Authors:  J S Kroll; K E Wilks; J L Farrant; P R Langford
Journal:  Proc Natl Acad Sci U S A       Date:  1998-10-13       Impact factor: 11.205

9.  Statistical analyses of counts and distributions of restriction sites in DNA sequences.

Authors:  S Karlin; C Burge; A M Campbell
Journal:  Nucleic Acids Res       Date:  1992-03-25       Impact factor: 16.971

Review 10.  Dinucleotide relative abundance extremes: a genomic signature.

Authors:  S Karlin; C Burge
Journal:  Trends Genet       Date:  1995-07       Impact factor: 11.639

View more
  78 in total

1.  NotI passporting to identify species composition of complex microbial systems.

Authors:  Veronika Zabarovska; Alexey S Kutsenko; Lev Petrenko; Gelena Kilosanidze; Olle Ljungqvist; Elisabeth Norin; Tore Midtvedt; Gösta Winberg; Roland Möllby; Vladimir I Kashuba; Ingemar Ernberg; Eugene R Zabarovsky
Journal:  Nucleic Acids Res       Date:  2003-01-15       Impact factor: 16.971

2.  Tracing common origins of Genomic Islands in prokaryotes based on genome signature analyses.

Authors:  Mark Wj van Passel
Journal:  Mob Genet Elements       Date:  2011-09-01

Review 3.  Detecting genomic islands using bioinformatics approaches.

Authors:  Morgan G I Langille; William W L Hsiao; Fiona S L Brinkman
Journal:  Nat Rev Microbiol       Date:  2010-05       Impact factor: 60.633

4.  Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics.

Authors:  Marc Weber; Hanno Teeling; Sixing Huang; Jost Waldmann; Mariette Kassabgy; Bernhard M Fuchs; Anna Klindworth; Christine Klockow; Antje Wichels; Gunnar Gerdts; Rudolf Amann; Frank Oliver Glöckner
Journal:  ISME J       Date:  2010-12-16       Impact factor: 10.302

Review 5.  Metagenomic analyses: past and future trends.

Authors:  Carola Simon; Rolf Daniel
Journal:  Appl Environ Microbiol       Date:  2010-12-17       Impact factor: 4.792

6.  Indications for acquisition of reductive dehalogenase genes through horizontal gene transfer by Dehalococcoides ethenogenes strain 195.

Authors:  Christophe Regeard; Julien Maillard; Christine Dufraigne; Patrick Deschavanne; Christof Holliger
Journal:  Appl Environ Microbiol       Date:  2005-06       Impact factor: 4.792

7.  Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.

Authors:  Qiong Wang; George M Garrity; James M Tiedje; James R Cole
Journal:  Appl Environ Microbiol       Date:  2007-06-22       Impact factor: 4.792

Review 8.  Ancient and modern environmental DNA.

Authors:  Mikkel Winther Pedersen; Søren Overballe-Petersen; Luca Ermini; Clio Der Sarkissian; James Haile; Micaela Hellstrom; Johan Spens; Philip Francis Thomsen; Kristine Bohmann; Enrico Cappellini; Ida Bærholm Schnell; Nathan A Wales; Christian Carøe; Paula F Campos; Astrid M Z Schmidt; M Thomas P Gilbert; Anders J Hansen; Ludovic Orlando; Eske Willerslev
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2015-01-19       Impact factor: 6.237

9.  Circuit assemblages derived from net dinucleotide values provide a succinct identity for the HIV-1 genome and each of its genes.

Authors:  Dorothy M Lang
Journal:  Virus Genes       Date:  2007-11-07       Impact factor: 2.332

10.  Machine learning approaches to personalize early prediction of asthma exacerbations.

Authors:  Joseph Finkelstein; In Cheol Jeong
Journal:  Ann N Y Acad Sci       Date:  2016-09-14       Impact factor: 5.691

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.