Literature DB >> 17690204

Conrad: gene prediction using conditional random fields.

David DeCaprio1, Jade P Vinson, Matthew D Pearson, Philip Montgomery, Matthew Doherty, James E Galagan.   

Abstract

We present Conrad, the first comparative gene predictor based on semi-Markov conditional random fields (SMCRFs). Unlike the best standalone gene predictors, which are based on generalized hidden Markov models (GHMMs) and trained by maximum likelihood, Conrad is discriminatively trained to maximize annotation accuracy. In addition, unlike the best annotation pipelines, which rely on heuristic and ad hoc decision rules to combine standalone gene predictors with additional information such as ESTs and protein homology, Conrad encodes all sources of information as features and treats all features equally in the training and inference algorithms. Conrad outperforms the best standalone gene predictors in cross-validation and whole chromosome testing on two fungi with vastly different gene structures. The performance improvement arises from the SMCRF's discriminative training methods and their ability to easily incorporate diverse types of information by encoding them as feature functions. On Cryptococcus neoformans, configuring Conrad to reproduce the predictions of a two-species phylo-GHMM closely matches the performance of Twinscan. Enabling discriminative training increases performance, and adding new feature functions further increases performance, achieving a level of accuracy that is unprecedented for this organism. Similar results are obtained on Aspergillus nidulans comparing Conrad versus Fgenesh. SMCRFs are a promising framework for gene prediction because of their highly modular nature, simplifying the process of designing and testing potential indicators of gene structure. Conrad's implementation of SMCRFs advances the state of the art in gene prediction in fungi and provides a robust platform for both current application and future research.

Entities:  

Mesh:

Year:  2007        PMID: 17690204      PMCID: PMC1950907          DOI: 10.1101/gr.6558107

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  27 in total

1.  Combining phylogenetic and hidden Markov models in biosequence analysis.

Authors:  Adam Siepel; David Haussler
Journal:  J Comput Biol       Date:  2004       Impact factor: 1.479

2.  Begin at the beginning: predicting genes with 5' UTRs.

Authors:  Randall H Brown; Samuel S Gross; Michael R Brent
Journal:  Genome Res       Date:  2005-05       Impact factor: 9.043

3.  Evaluation of gene structure prediction programs.

Authors:  M Burset; R Guigó
Journal:  Genomics       Date:  1996-06-15       Impact factor: 5.736

4.  The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans.

Authors:  Brendan J Loftus; Eula Fung; Paola Roncaglia; Don Rowley; Paolo Amedeo; Dan Bruno; Jessica Vamathevan; Molly Miranda; Iain J Anderson; James A Fraser; Jonathan E Allen; Ian E Bosdet; Michael R Brent; Readman Chiu; Tamara L Doering; Maureen J Donlin; Cletus A D'Souza; Deborah S Fox; Viktoriya Grinberg; Jianmin Fu; Marilyn Fukushima; Brian J Haas; James C Huang; Guilhem Janbon; Steven J M Jones; Hean L Koo; Martin I Krzywinski; June K Kwon-Chung; Klaus B Lengeler; Rama Maiti; Marco A Marra; Robert E Marra; Carrie A Mathewson; Thomas G Mitchell; Mihaela Pertea; Florenta R Riggs; Steven L Salzberg; Jacqueline E Schein; Alla Shvartsbeyn; Heesun Shin; Martin Shumway; Charles A Specht; Bernard B Suh; Aaron Tenney; Terry R Utterback; Brian L Wickes; Jennifer R Wortman; Natasja H Wye; James W Kronstad; Jennifer K Lodge; Joseph Heitman; Ronald W Davis; Claire M Fraser; Richard W Hyman
Journal:  Science       Date:  2005-01-13       Impact factor: 47.728

5.  Prediction of complete gene structures in human genomic DNA.

Authors:  C Burge; S Karlin
Journal:  J Mol Biol       Date:  1997-04-25       Impact factor: 5.469

6.  Optimally parsing a sequence into different classes based on multiple types of evidence.

Authors:  G D Stormo; D Haussler
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1994

7.  Gene prediction and verification in a compact genome with numerous small introns.

Authors:  Aaron E Tenney; Randall H Brown; Charles Vaske; Jennifer K Lodge; Tamara L Doering; Michael R Brent
Journal:  Genome Res       Date:  2004-10-12       Impact factor: 9.043

8.  Creating a honey bee consensus gene set.

Authors:  Christine G Elsik; Aaron J Mackey; Justin T Reese; Natalia V Milshina; David S Roos; George M Weinstock
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

9.  GENCODE: producing a reference annotation for ENCODE.

Authors:  Jennifer Harrow; France Denoeud; Adam Frankish; Alexandre Reymond; Chao-Kung Chen; Jacqueline Chrast; Julien Lagarde; James G R Gilbert; Roy Storey; David Swarbreck; Colette Rossier; Catherine Ucla; Tim Hubbard; Stylianos E Antonarakis; Roderic Guigo
Journal:  Genome Biol       Date:  2006-08-07       Impact factor: 13.583

10.  Global discriminative learning for higher-accuracy computational gene prediction.

Authors:  Axel Bernal; Koby Crammer; Artemis Hatzigeorgiou; Fernando Pereira
Journal:  PLoS Comput Biol       Date:  2007-02-02       Impact factor: 4.475

View more
  25 in total

Review 1.  A beginner's guide to eukaryotic genome annotation.

Authors:  Mark Yandell; Daniel Ence
Journal:  Nat Rev Genet       Date:  2012-04-18       Impact factor: 53.242

2.  DNA-energetics-based analyses suggest additional genes in prokaryotes.

Authors:  Garima Khandelwal; Jalaj Gupta; B Jayaram
Journal:  J Biosci       Date:  2012-07       Impact factor: 1.826

3.  mGene: accurate SVM-based gene finding with an application to nematode genomes.

Authors:  Gabriele Schweikert; Alexander Zien; Georg Zeller; Jonas Behr; Christoph Dieterich; Cheng Soon Ong; Petra Philips; Fabio De Bona; Lisa Hartmann; Anja Bohlen; Nina Krüger; Sören Sonnenburg; Gunnar Rätsch
Journal:  Genome Res       Date:  2009-06-29       Impact factor: 9.043

4.  TaF: a web platform for taxonomic profile-based fungal gene prediction.

Authors:  Sin-Gi Park; DongSung Ryu; Hyunsung Lee; Hojin Ryu; Yong Ju Ahn; Seung Il Yoo; Junsu Ko; Chang Pyo Hong
Journal:  Genes Genomics       Date:  2018-11-19       Impact factor: 1.839

5.  Approaches to Fungal Genome Annotation.

Authors:  Brian J Haas; Qiandong Zeng; Matthew D Pearson; Christina A Cuomo; Jennifer R Wortman
Journal:  Mycology       Date:  2011-10-03

6.  Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.

Authors:  Kuang-Lim Chan; Rozana Rosli; Tatiana V Tatarinova; Michael Hogan; Mohd Firdaus-Raih; Eng-Ti Leslie Low
Journal:  BMC Bioinformatics       Date:  2017-01-27       Impact factor: 3.169

7.  The 2008 update of the Aspergillus nidulans genome annotation: a community effort.

Authors:  Jennifer Russo Wortman; Jane Mabey Gilsenan; Vinita Joardar; Jennifer Deegan; John Clutterbuck; Mikael R Andersen; David Archer; Mojca Bencina; Gerhard Braus; Pedro Coutinho; Hans von Döhren; John Doonan; Arnold J M Driessen; Pawel Durek; Eduardo Espeso; Erzsébet Fekete; Michel Flipphi; Carlos Garcia Estrada; Steven Geysens; Gustavo Goldman; Piet W J de Groot; Kim Hansen; Steven D Harris; Thorsten Heinekamp; Kerstin Helmstaedt; Bernard Henrissat; Gerald Hofmann; Tim Homan; Tetsuya Horio; Hiroyuki Horiuchi; Steve James; Meriel Jones; Levente Karaffa; Zsolt Karányi; Masashi Kato; Nancy Keller; Diane E Kelly; Jan A K W Kiel; Jung-Mi Kim; Ida J van der Klei; Frans M Klis; Andriy Kovalchuk; Nada Krasevec; Christian P Kubicek; Bo Liu; Andrew Maccabe; Vera Meyer; Pete Mirabito; Márton Miskei; Magdalena Mos; Jonathan Mullins; David R Nelson; Jens Nielsen; Berl R Oakley; Stephen A Osmani; Tiina Pakula; Andrzej Paszewski; Ian Paulsen; Sebastian Pilsyk; István Pócsi; Peter J Punt; Arthur F J Ram; Qinghu Ren; Xavier Robellet; Geoff Robson; Bernhard Seiboth; Piet van Solingen; Thomas Specht; Jibin Sun; Naimeh Taheri-Talesh; Norio Takeshita; Dave Ussery; Patricia A vanKuyk; Hans Visser; Peter J I van de Vondervoort; Ronald P de Vries; Jonathan Walton; Xin Xiang; Yi Xiong; An Ping Zeng; Bernd W Brandt; Michael J Cornell; Cees A M J J van den Hondel; Jacob Visser; Stephen G Oliver; Geoffrey Turner
Journal:  Fungal Genet Biol       Date:  2008-12-25       Impact factor: 3.495

8.  Genome-wide discovery of human heart enhancers.

Authors:  Leelavati Narlikar; Noboru J Sakabe; Alexander A Blanski; Fabio E Arimura; John M Westlund; Marcelo A Nobrega; Ivan Ovcharenko
Journal:  Genome Res       Date:  2010-01-14       Impact factor: 9.043

9.  DISCOVER: a feature-based discriminative method for motif search in complex genomes.

Authors:  Wenjie Fu; Pradipta Ray; Eric P Xing
Journal:  Bioinformatics       Date:  2009-06-15       Impact factor: 6.937

10.  mGene.web: a web service for accurate computational gene finding.

Authors:  Gabriele Schweikert; Jonas Behr; Alexander Zien; Georg Zeller; Cheng Soon Ong; Sören Sonnenburg; Gunnar Rätsch
Journal:  Nucleic Acids Res       Date:  2009-06-03       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.