Literature DB >> 19564452

mGene: accurate SVM-based gene finding with an application to nematode genomes.

Gabriele Schweikert1, Alexander Zien, Georg Zeller, Jonas Behr, Christoph Dieterich, Cheng Soon Ong, Petra Philips, Fabio De Bona, Lisa Hartmann, Anja Bohlen, Nina Krüger, Sören Sonnenburg, Gunnar Rätsch.   

Abstract

We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGene's genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGene's predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGene's predictions are most accurate.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19564452      PMCID: PMC2775605          DOI: 10.1101/gr.090597.108

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  40 in total

Review 1.  Ecology of Caenorhabditis species.

Authors:  Karin Kiontke; Walter Sudhaus
Journal:  WormBook       Date:  2006-01-09

Review 2.  Steady progress and recent breakthroughs in the accuracy of automated genome annotation.

Authors:  Michael R Brent
Journal:  Nat Rev Genet       Date:  2008-01       Impact factor: 53.242

3.  Conrad: gene prediction using conditional random fields.

Authors:  David DeCaprio; Jade P Vinson; Matthew D Pearson; Philip Montgomery; Matthew Doherty; James E Galagan
Journal:  Genome Res       Date:  2007-08-09       Impact factor: 9.043

4.  The Pristionchus pacificus genome provides a unique perspective on nematode lifestyle and parasitism.

Authors:  Christoph Dieterich; Sandra W Clifton; Lisa N Schuster; Asif Chinwalla; Kimberly Delehaunty; Iris Dinkelacker; Lucinda Fulton; Robert Fulton; Jennifer Godfrey; Pat Minx; Makedonka Mitreva; Waltraud Roeseler; Huiyu Tian; Hanh Witte; Shiaw-Pyng Yang; Richard K Wilson; Ralf J Sommer
Journal:  Nat Genet       Date:  2008-09-21       Impact factor: 38.330

5.  mGene.web: a web service for accurate computational gene finding.

Authors:  Gabriele Schweikert; Jonas Behr; Alexander Zien; Georg Zeller; Cheng Soon Ong; Sören Sonnenburg; Gunnar Rätsch
Journal:  Nucleic Acids Res       Date:  2009-06-03       Impact factor: 16.971

6.  Accurate splice site prediction using support vector machines.

Authors:  Sören Sonnenburg; Gabriele Schweikert; Petra Philips; Jonas Behr; Gunnar Rätsch
Journal:  BMC Bioinformatics       Date:  2007       Impact factor: 3.169

7.  InterPro: the integrative protein signature database.

Authors:  Sarah Hunter; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Peer Bork; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D Finn; Julian Gough; Daniel Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Aurélie Laugraud; Ivica Letunic; David Lonsdale; Rodrigo Lopez; Martin Madera; John Maslen; Craig McAnulla; Jennifer McDowall; Jaina Mistry; Alex Mitchell; Nicola Mulder; Darren Natale; Christine Orengo; Antony F Quinn; Jeremy D Selengut; Christian J A Sigrist; Manjula Thimma; Paul D Thomas; Franck Valentin; Derek Wilson; Cathy H Wu; Corin Yeats
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

8.  nGASP--the nematode genome annotation assessment project.

Authors:  Avril Coghlan; Tristan J Fiedler; Sheldon J McKay; Paul Flicek; Todd W Harris; Darin Blasiar; Lincoln D Stein
Journal:  BMC Bioinformatics       Date:  2008-12-19       Impact factor: 3.169

9.  CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction.

Authors:  Samuel S Gross; Chuong B Do; Marina Sirota; Serafim Batzoglou
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

Review 10.  Support vector machines and kernels for computational biology.

Authors:  Asa Ben-Hur; Cheng Soon Ong; Sören Sonnenburg; Bernhard Schölkopf; Gunnar Rätsch
Journal:  PLoS Comput Biol       Date:  2008-10-31       Impact factor: 4.475

View more
  29 in total

1.  A machine learning approach to identify hydrogenosomal proteins in Trichomonas vaginalis.

Authors:  David Burstein; Sven B Gould; Verena Zimorski; Thorsten Kloesges; Fuat Kiosse; Peter Major; William F Martin; Tal Pupko; Tal Dagan
Journal:  Eukaryot Cell       Date:  2011-12-02

Review 2.  A beginner's guide to eukaryotic genome annotation.

Authors:  Mark Yandell; Daniel Ence
Journal:  Nat Rev Genet       Date:  2012-04-18       Impact factor: 53.242

3.  A spatial and temporal map of C. elegans gene expression.

Authors:  W Clay Spencer; Georg Zeller; Joseph D Watson; Stefan R Henz; Kathie L Watkins; Rebecca D McWhirter; Sarah Petersen; Vipin T Sreedharan; Christian Widmer; Jeanyoung Jo; Valerie Reinke; Lisa Petrella; Susan Strome; Stephen E Von Stetina; Menachem Katz; Shai Shaham; Gunnar Rätsch; David M Miller
Journal:  Genome Res       Date:  2010-12-22       Impact factor: 9.043

4.  Genomics: the state of the art in RNA-seq analysis.

Authors:  Ian Korf
Journal:  Nat Methods       Date:  2013-12       Impact factor: 28.547

5.  TaF: a web platform for taxonomic profile-based fungal gene prediction.

Authors:  Sin-Gi Park; DongSung Ryu; Hyunsung Lee; Hojin Ryu; Yong Ju Ahn; Seung Il Yoo; Junsu Ko; Chang Pyo Hong
Journal:  Genes Genomics       Date:  2018-11-19       Impact factor: 1.839

6.  Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.

Authors:  Kuang-Lim Chan; Rozana Rosli; Tatiana V Tatarinova; Michael Hogan; Mohd Firdaus-Raih; Eng-Ti Leslie Low
Journal:  BMC Bioinformatics       Date:  2017-01-27       Impact factor: 3.169

7.  Identifying novel genes in C. elegans using SAGE tags.

Authors:  Matthew J Nesbitt; Donald G Moerman; Nansheng Chen
Journal:  BMC Mol Biol       Date:  2010-12-10       Impact factor: 2.946

8.  Scaffolding a Caenorhabditis nematode genome with RNA-seq.

Authors:  Ali Mortazavi; Erich M Schwarz; Brian Williams; Lorian Schaeffer; Igor Antoshechkin; Barbara J Wold; Paul W Sternberg
Journal:  Genome Res       Date:  2010-10-27       Impact factor: 9.043

9.  rQuant.web: a tool for RNA-Seq-based transcript quantitation.

Authors:  Regina Bohnert; Gunnar Rätsch
Journal:  Nucleic Acids Res       Date:  2010-06-15       Impact factor: 16.971

10.  Exploiting physico-chemical properties in string kernels.

Authors:  Nora C Toussaint; Christian Widmer; Oliver Kohlbacher; Gunnar Rätsch
Journal:  BMC Bioinformatics       Date:  2010-10-26       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.