BACKGROUND: Whereas genome sequencing has given us high-resolution pictures of many different species of bacteria, microarrays provide a means of obtaining information on genome composition for many strains of a given species. Genome-composition analysis using microarrays, or 'genomotyping', can be used to categorize genes into 'present' and 'divergent' categories based on the level of hybridization signal. This typically involves selecting a signal value that is used as a cutoff to discriminate present (high signal) and divergent (low signal) genes. Current methodology uses empirical determination of cutoffs for classification into these categories, but this methodology is subject to several problems that can result in the misclassification of many genes. RESULTS: We describe a method that depends on the shape of the signal-ratio distribution and does not require empirical determination of a cutoff. Moreover, the cutoff is determined on an array-to-array basis, accounting for variation in strain composition and hybridization quality. The algorithm also provides an estimate of the probability that any given gene is present, which provides a measure of confidence in the categorical assignments. CONCLUSIONS: Many genes previously classified as present using static methods are in fact divergent on the basis of microarray signal; this is corrected by our algorithm. We have reassigned hundreds of genes from previous genomotyping studies of Helicobacter pylori and Campylobacter jejuni strains, and expect that the algorithm should be widely applicable to genomotyping data.
BACKGROUND: Whereas genome sequencing has given us high-resolution pictures of many different species of bacteria, microarrays provide a means of obtaining information on genome composition for many strains of a given species. Genome-composition analysis using microarrays, or 'genomotyping', can be used to categorize genes into 'present' and 'divergent' categories based on the level of hybridization signal. This typically involves selecting a signal value that is used as a cutoff to discriminate present (high signal) and divergent (low signal) genes. Current methodology uses empirical determination of cutoffs for classification into these categories, but this methodology is subject to several problems that can result in the misclassification of many genes. RESULTS: We describe a method that depends on the shape of the signal-ratio distribution and does not require empirical determination of a cutoff. Moreover, the cutoff is determined on an array-to-array basis, accounting for variation in strain composition and hybridization quality. The algorithm also provides an estimate of the probability that any given gene is present, which provides a measure of confidence in the categorical assignments. CONCLUSIONS: Many genes previously classified as present using static methods are in fact divergent on the basis of microarray signal; this is corrected by our algorithm. We have reassigned hundreds of genes from previous genomotyping studies of Helicobacter pylori and Campylobacter jejuni strains, and expect that the algorithm should be widely applicable to genomotyping data.
Authors: Michelle Dziejman; Emmy Balon; Dana Boyd; Clare M Fraser; John F Heidelberg; John J Mekalanos Journal: Proc Natl Acad Sci U S A Date: 2002-01-29 Impact factor: 11.205
Authors: James C Smoot; Kent D Barbian; Jamie J Van Gompel; Laura M Smoot; Michael S Chaussee; Gail L Sylva; Daniel E Sturdevant; Stacy M Ricklefs; Stephen F Porcella; Larye D Parkins; Stephen B Beres; David S Campbell; Todd M Smith; Qing Zhang; Vivek Kapur; Judy A Daly; L George Veasy; James M Musser Journal: Proc Natl Acad Sci U S A Date: 2002-03-26 Impact factor: 11.205
Authors: D A Israel; N Salama; U Krishna; U M Rieger; J C Atherton; S Falkow; R M Peek Journal: Proc Natl Acad Sci U S A Date: 2001-11-27 Impact factor: 11.205
Authors: N Dorrell; J A Mangan; K G Laing; J Hinds; D Linton; H Al-Ghusein; B G Barrell; J Parkhill; N G Stoker; A V Karlyshev; P D Butcher; B W Wren Journal: Genome Res Date: 2001-10 Impact factor: 9.043
Authors: B Björkholm; A Lundin; A Sillén; K Guillemin; N Salama; C Rubio; J I Gordon; P Falk; L Engstrand Journal: Infect Immun Date: 2001-12 Impact factor: 3.441
Authors: Judit Marokhazi; Nicholas Waterfield; Gaelle LeGoff; Edward Feil; Richard Stabler; Jason Hinds; Andras Fodor; Richard H ffrench-Constant Journal: J Bacteriol Date: 2003-08 Impact factor: 3.490
Authors: Devendra H Shah; Carol Casavant; Quincy Hawley; Tarek Addwebi; Douglas R Call; Jean Guard Journal: Foodborne Pathog Dis Date: 2012-02-03 Impact factor: 3.171
Authors: Ramy K Aziz; Robert A Edwards; William W Taylor; Donald E Low; Allison McGeer; Malak Kotb Journal: J Bacteriol Date: 2005-05 Impact factor: 3.490
Authors: Sarah L Howard; Michael W Gaunt; Jason Hinds; Adam A Witney; Richard Stabler; Brendan W Wren Journal: J Bacteriol Date: 2006-05 Impact factor: 3.490
Authors: T Kon; S C Weir; J T Trevors; H Lee; J Champagne; L Meunier; R Brousseau; L Masson Journal: Appl Environ Microbiol Date: 2007-09-21 Impact factor: 4.792
Authors: James R White; Patricia Escobar-Paramo; Emmanuel F Mongodin; Karen E Nelson; Jocelyne DiRuggiero Journal: Appl Environ Microbiol Date: 2008-08-22 Impact factor: 4.792
Authors: Preetam H Shah; Ryan C MacFarlane; Dhruva Bhattacharya; John C Matese; Janos Demeter; Suzanne E Stroup; Upinder Singh Journal: Eukaryot Cell Date: 2005-03
Authors: Alexandra S Simões; Raquel Sá-Leão; Marc J Eleveld; Débora A Tavares; João A Carriço; Hester J Bootsma; Peter W M Hermans Journal: J Clin Microbiol Date: 2009-11-11 Impact factor: 5.948
Authors: Tie Koide; Paulo A Zaini; Leandro M Moreira; Ricardo Z N Vêncio; Adriana Y Matsukuma; Alan M Durham; Diva C Teixeira; Hamza El-Dorry; Patrícia B Monteiro; Ana Claudia R da Silva; Sergio Verjovski-Almeida; Aline M da Silva; Suely L Gomes Journal: J Bacteriol Date: 2004-08 Impact factor: 3.490