BACKGROUND: In standard BLAST searches, no information other than the sequences of the query and the database entries is considered. However, in situations where two genes from different species have only borderline similarity in a BLAST search, the discovery that the genes are located within a region of conserved gene order (synteny) can provide additional evidence that they are orthologs. Thus, for interpreting borderline search results, it would be useful to know whether the syntenic context of a database hit is similar to that of the query. This principle has often been used in investigations of particular genes or genomic regions, but to our knowledge it has never been implemented systematically. RESULTS: We made use of the synteny information contained in the Yeast Gene Order Browser database for 11 yeast species to carry out a systematic search for protein-coding genes that were overlooked in the original annotations of one or more yeast genomes but which are syntenic with their orthologs. Such genes tend to have been overlooked because they are short, highly divergent, or contain introns. The key features of our software - called SearchDOGS - are that the database entries are classified into sets of genomic segments that are already known to be orthologous, and that very weak BLAST hits are retained for further analysis if their genomic location is similar to that of the query. Using SearchDOGS we identified 595 additional protein-coding genes among the 11 yeast species, including two new genes in Saccharomyces cerevisiae. We found additional genes for the mating pheromone a-factor in six species including Kluyveromyces lactis. CONCLUSIONS: SearchDOGS has proven highly successful for identifying overlooked genes in the yeast genomes. We anticipate that our approach can be adapted for study of further groups of species, such as bacterial genomes. More generally, the concept of doing sequence similarity searches against databases to which external information has been added may prove useful in other settings.
BACKGROUND: In standard BLAST searches, no information other than the sequences of the query and the database entries is considered. However, in situations where two genes from different species have only borderline similarity in a BLAST search, the discovery that the genes are located within a region of conserved gene order (synteny) can provide additional evidence that they are orthologs. Thus, for interpreting borderline search results, it would be useful to know whether the syntenic context of a database hit is similar to that of the query. This principle has often been used in investigations of particular genes or genomic regions, but to our knowledge it has never been implemented systematically. RESULTS: We made use of the synteny information contained in the Yeast Gene Order Browser database for 11 yeast species to carry out a systematic search for protein-coding genes that were overlooked in the original annotations of one or more yeast genomes but which are syntenic with their orthologs. Such genes tend to have been overlooked because they are short, highly divergent, or contain introns. The key features of our software - called SearchDOGS - are that the database entries are classified into sets of genomic segments that are already known to be orthologous, and that very weak BLAST hits are retained for further analysis if their genomic location is similar to that of the query. Using SearchDOGS we identified 595 additional protein-coding genes among the 11 yeast species, including two new genes in Saccharomyces cerevisiae. We found additional genes for the mating pheromone a-factor in six species including Kluyveromyces lactis. CONCLUSIONS: SearchDOGS has proven highly successful for identifying overlooked genes in the yeast genomes. We anticipate that our approach can be adapted for study of further groups of species, such as bacterial genomes. More generally, the concept of doing sequence similarity searches against databases to which external information has been added may prove useful in other settings.
Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971
Authors: J M Cherry; C Ball; S Weng; G Juvik; R Schmidt; C Adler; B Dunn; S Dwight; L Riles; R K Mortimer; D Botstein Journal: Nature Date: 1997-05-29 Impact factor: 49.962
Authors: B Dujon; K Albermann; M Aldea; D Alexandraki; W Ansorge; J Arino; V Benes; C Bohn; M Bolotin-Fukuhara; R Bordonné; J Boyer; A Camasses; A Casamayor; C Casas; G Chéret; C Cziepluch; B Daignan-Fornier; D V Dang; M de Haan; H Delius; P Durand; C Fairhead; H Feldmann; L Gaillon; K Kleine Journal: Nature Date: 1997-05-29 Impact factor: 49.962
Authors: Alisha Johnson; Peter Gin; Beth N Marbois; Edward J Hsieh; Mian Wu; Mario H Barros; Catherine F Clarke; Alexander Tzagoloff Journal: J Biol Chem Date: 2005-07-18 Impact factor: 5.157
Authors: Tomomi M Yamamoto; Jonathan M Cook; Cassandra V Kotter; Terry Khat; Kevin D Silva; Michael Ferreyros; Justin W Holt; Jefferson D Knight; Amanda Charlesworth Journal: Biochim Biophys Acta Date: 2013-07-01
Authors: Kenneth H Wolfe; David Armisén; Estelle Proux-Wera; Seán S ÓhÉigeartaigh; Haleema Azam; Jonathan L Gordon; Kevin P Byrne Journal: FEMS Yeast Res Date: 2015-06-10 Impact factor: 2.796
Authors: Sarah L Maguire; Seán S ÓhÉigeartaigh; Kevin P Byrne; Markus S Schröder; Peadar O'Gaora; Kenneth H Wolfe; Geraldine Butler Journal: Mol Biol Evol Date: 2013-03-13 Impact factor: 16.240
Authors: Alexandra N Marshall; Maria Camila Montealegre; Claudia Jiménez-López; Michael C Lorenz; Ambro van Hoof Journal: PLoS Genet Date: 2013-03-14 Impact factor: 5.917