Literature DB >> 18757608

Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training.

Vardges Ter-Hovhannisyan1, Alexandre Lomsadze, Yury O Chernoff, Mark Borodovsky.   

Abstract

We describe a new ab initio algorithm, GeneMark-ES version 2, that identifies protein-coding genes in fungal genomes. The algorithm does not require a predetermined training set to estimate parameters of the underlying hidden Markov model (HMM). Instead, the anonymous genomic sequence in question is used as an input for iterative unsupervised training. The algorithm extends our previously developed method tested on genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. To better reflect features of fungal gene organization, we enhanced the intron submodel to accommodate sequences with and without branch point sites. This design enables the algorithm to work equally well for species with the kinds of variations in splicing mechanisms seen in the fungal phyla Ascomycota, Basidiomycota, and Zygomycota. Upon self-training, the intron submodel switches on in several steps to reach its full complexity. We demonstrate that the algorithm accuracy, both at the exon and the whole gene level, is favorably compared to the accuracy of gene finders that employ supervised training. Application of the new method to known fungal genomes indicates substantial improvement over existing annotations. By eliminating the effort necessary to build comprehensive training sets, the new algorithm can streamline and accelerate the process of annotation in a large number of fungal genome sequencing projects.

Entities:  

Mesh:

Year:  2008        PMID: 18757608      PMCID: PMC2593577          DOI: 10.1101/gr.081612.108

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  36 in total

Review 1.  [SWISS-PROT: the curated protein sequence database on Internet].

Authors:  K Watanabe; S Harayama
Journal:  Tanpakushitsu Kakusan Koso       Date:  2001-01

2.  Integrating genomic homology into gene structure prediction.

Authors:  I Korf; P Flicek; D Duan; M R Brent
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

3.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Authors:  J Besemer; A Lomsadze; M Borodovsky
Journal:  Nucleic Acids Res       Date:  2001-06-15       Impact factor: 16.971

4.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

5.  A computational analysis of sequence features involved in recognition of short introns.

Authors:  L P Lim; C B Burge
Journal:  Proc Natl Acad Sci U S A       Date:  2001-09-25       Impact factor: 11.205

6.  Gibbs Recursive Sampler: finding transcription factor binding sites.

Authors:  William Thompson; Eric C Rouchka; Charles E Lawrence
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 7.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

8.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding.

Authors:  Mario Stanke; Mark Diekhans; Robert Baertsch; David Haussler
Journal:  Bioinformatics       Date:  2008-01-24       Impact factor: 6.937

9.  GeneID in Drosophila.

Authors:  G Parra; E Blanco; R Guigó
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

10.  Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences.

Authors:  N Pavy; S Rombauts; P Déhais; C Mathé; D V Ramana; P Leroy; P Rouzé
Journal:  Bioinformatics       Date:  1999-11       Impact factor: 6.937

View more
  342 in total

Review 1.  A beginner's guide to eukaryotic genome annotation.

Authors:  Mark Yandell; Daniel Ence
Journal:  Nat Rev Genet       Date:  2012-04-18       Impact factor: 53.242

2.  Comparative genomic analysis of the thermophilic biomass-degrading fungi Myceliophthora thermophila and Thielavia terrestris.

Authors:  Randy M Berka; Igor V Grigoriev; Robert Otillar; Asaf Salamov; Jane Grimwood; Ian Reid; Nadeeza Ishmael; Tricia John; Corinne Darmond; Marie-Claude Moisan; Bernard Henrissat; Pedro M Coutinho; Vincent Lombard; Donald O Natvig; Erika Lindquist; Jeremy Schmutz; Susan Lucas; Paul Harris; Justin Powlowski; Annie Bellemare; David Taylor; Gregory Butler; Ronald P de Vries; Iris E Allijn; Joost van den Brink; Sophia Ushinsky; Reginald Storms; Amy J Powell; Ian T Paulsen; Liam D H Elbourne; Scott E Baker; Jon Magnuson; Sylvie Laboissiere; A John Clutterbuck; Diego Martinez; Mark Wogulis; Alfredo Lopez de Leon; Michael W Rey; Adrian Tsang
Journal:  Nat Biotechnol       Date:  2011-10-02       Impact factor: 54.908

3.  Efficient algorithms for training the parameters of hidden Markov models using stochastic expectation maximization (EM) training and Viterbi training.

Authors:  Tin Y Lam; Irmtraud M Meyer
Journal:  Algorithms Mol Biol       Date:  2010-12-09       Impact factor: 1.405

4.  Genome sequence of an unclassified pleosporales species isolated from human nasopharyngeal aspirate.

Authors:  Kee Peng Ng; Su Mei Yew; Chai Ling Chan; Tuck Soon Soo-Hoo; Shiang Ling Na; Hamimah Hassan; Yun Fong Ngeow; Chee Choong Hoh; Kok Wei Lee; Wai Yan Yee
Journal:  Eukaryot Cell       Date:  2012-06

5.  BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

Authors:  Katharina J Hoff; Simone Lange; Alexandre Lomsadze; Mark Borodovsky; Mario Stanke
Journal:  Bioinformatics       Date:  2015-11-11       Impact factor: 6.937

6.  Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing.

Authors:  David E Cook; Jose Espejo Valle-Inclan; Alice Pajoro; Hanna Rovenich; Bart P H J Thomma; Luigi Faino
Journal:  Plant Physiol       Date:  2018-11-06       Impact factor: 8.340

7.  Comparative genomics of rice false smut fungi Ustilaginoidea virens Uv-Gvt strain from India reveals genetic diversity and phylogenetic divergence.

Authors:  Devanna Pramesh; Muthukapalli K Prasannakumar; Kondarajanahally M Muniraju; H B Mahesh; H D Pushpa; Channappa Manjunatha; Alase Saddamhusen; E Chidanandappa; Manoj K Yadav; Masalavada K Kumara; Huded Sharanabasav; B S Rohith; Gaurab Banerjee; Anupam J Das
Journal:  3 Biotech       Date:  2020-07-19       Impact factor: 2.406

8.  Ab initio gene identification in metagenomic sequences.

Authors:  Wenhan Zhu; Alexandre Lomsadze; Mark Borodovsky
Journal:  Nucleic Acids Res       Date:  2010-04-19       Impact factor: 16.971

9.  Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes.

Authors:  Cédric Feschotte; Umeshkumar Keswani; Nirmal Ranganathan; Marcel L Guibotsy; David Levine
Journal:  Genome Biol Evol       Date:  2009-07-23       Impact factor: 3.416

10.  mGene.web: a web service for accurate computational gene finding.

Authors:  Gabriele Schweikert; Jonas Behr; Alexander Zien; Georg Zeller; Cheng Soon Ong; Sören Sonnenburg; Gunnar Rätsch
Journal:  Nucleic Acids Res       Date:  2009-06-03       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.