Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.

Literature DB >> 32440658

GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.

Tomáš Brůna¹, Alexandre Lomsadze², Mark Borodovsky^1,2,3.

Abstract

We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.

Entities: Chemical Disease Species

Year: 2020 PMID： 32440658 PMCID： PMC7222226 DOI： 10.1093/nargab/lqaa026

Source DB: PubMed Journal: NAR Genom Bioinform ISSN： 2631-9268

25 in total

1. GeneWise and Genomewise.

Authors: Ewan Birney; Michele Clamp; Richard Durbin
Journal: Genome Res Date: 2004-05 Impact factor: 9.043

2. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

Authors: Katharina J Hoff; Simone Lange; Alexandre Lomsadze; Mark Borodovsky; Mario Stanke
Journal: Bioinformatics Date: 2015-11-11 Impact factor: 6.937

3. Direct mapping and alignment of protein sequences onto genomic sequence.

Authors: Osamu Gotoh
Journal: Bioinformatics Date: 2008-08-26 Impact factor: 6.937

4. BLAST+: architecture and applications.

Authors: Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal: BMC Bioinformatics Date: 2009-12-15 Impact factor: 3.169

5. The sequence read archive.

Authors: Rasko Leinonen; Hideaki Sugawara; Martin Shumway
Journal: Nucleic Acids Res Date: 2010-11-09 Impact factor: 16.971

6. Gene identification in novel eukaryotic genomes by self-training algorithm.

Authors: Alexandre Lomsadze; Vardges Ter-Hovhannisyan; Yury O Chernoff; Mark Borodovsky
Journal: Nucleic Acids Res Date: 2005-11-28 Impact factor: 16.971

7. Using intron position conservation for homology-based gene prediction.

Authors: Jens Keilwagen; Michael Wenk; Jessica L Erickson; Martin H Schattat; Jan Grau; Frank Hartung
Journal: Nucleic Acids Res Date: 2016-02-17 Impact factor: 16.971

8. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.

Authors: Aron Marchler-Bauer; Yu Bo; Lianyi Han; Jane He; Christopher J Lanczycki; Shennan Lu; Farideh Chitsaz; Myra K Derbyshire; Renata C Geer; Noreen R Gonzales; Marc Gwadz; David I Hurwitz; Fu Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Zhouxi Wang; Roxanne A Yamashita; Dachuan Zhang; Chanjuan Zheng; Lewis Y Geer; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

9. APPRIS 2017: principal isoforms for multiple gene sets.

Authors: Jose Manuel Rodriguez; Juan Rodriguez-Rivas; Tomás Di Domenico; Jesús Vázquez; Alfonso Valencia; Michael L Tress
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

10. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments.

Authors: Brian J Haas; Steven L Salzberg; Wei Zhu; Mihaela Pertea; Jonathan E Allen; Joshua Orvis; Owen White; C Robin Buell; Jennifer R Wortman
Journal: Genome Biol Date: 2008-01-11 Impact factor: 13.583

43 in total

1. Karyotype variation, spontaneous genome rearrangements affecting chemical insensitivity, and expression level polymorphisms in the plant pathogen Phytophthora infestans revealed using its first chromosome-scale assembly.

Authors: Michael E H Matson; Qihua Liang; Stefano Lonardi; Howard S Judelson
Journal: PLoS Pathog Date: 2022-10-10 Impact factor: 7.464

2. Obligate sexual reproduction of a homothallic fungus closely related to the Cryptococcus pathogenic species complex.

Authors: Andrew Ryan Passer; Shelly Applen Clancey; Terrance Shea; Márcia David-Palma; Anna Floyd Averette; Teun Boekhout; Betina M Porcel; Minou Nowrousian; Christina A Cuomo; Sheng Sun; Joseph Heitman; Marco A Coelho
Journal: Elife Date: 2022-06-17 Impact factor: 8.713

3. A chromosome-level genome assembly and intestinal transcriptome of Trypoxylus dichotomus (Coleoptera: Scarabaeidae) to understand its lignocellulose digestion ability.

Authors: Qingyun Wang; Liwei Liu; Sujiong Zhang; Hong Wu; Junhao Huang
Journal: Gigascience Date: 2022-06-28 Impact factor: 7.658

4. Recent reconfiguration of an ancient developmental gene regulatory network in Heliocidaris sea urchins.

Authors: Phillip L Davidson; Haobing Guo; Jane S Swart; Abdull J Massri; Allison Edgar; Lingyu Wang; Alejandro Berrio; Hannah R Devens; Demian Koop; Paula Cisternas; He Zhang; Yaolei Zhang; Maria Byrne; Guangyi Fan; Gregory A Wray
Journal: Nat Ecol Evol Date: 2022-10-20 Impact factor: 19.100