Literature DB >> 32440658

GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.

Tomáš Brůna1, Alexandre Lomsadze2, Mark Borodovsky1,2,3.   

Abstract

We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.
© The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

Entities:  

Year:  2020        PMID: 32440658      PMCID: PMC7222226          DOI: 10.1093/nargab/lqaa026

Source DB:  PubMed          Journal:  NAR Genom Bioinform        ISSN: 2631-9268


  25 in total

1.  GeneWise and Genomewise.

Authors:  Ewan Birney; Michele Clamp; Richard Durbin
Journal:  Genome Res       Date:  2004-05       Impact factor: 9.043

2.  BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

Authors:  Katharina J Hoff; Simone Lange; Alexandre Lomsadze; Mark Borodovsky; Mario Stanke
Journal:  Bioinformatics       Date:  2015-11-11       Impact factor: 6.937

3.  Direct mapping and alignment of protein sequences onto genomic sequence.

Authors:  Osamu Gotoh
Journal:  Bioinformatics       Date:  2008-08-26       Impact factor: 6.937

4.  BLAST+: architecture and applications.

Authors:  Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal:  BMC Bioinformatics       Date:  2009-12-15       Impact factor: 3.169

5.  The sequence read archive.

Authors:  Rasko Leinonen; Hideaki Sugawara; Martin Shumway
Journal:  Nucleic Acids Res       Date:  2010-11-09       Impact factor: 16.971

6.  Gene identification in novel eukaryotic genomes by self-training algorithm.

Authors:  Alexandre Lomsadze; Vardges Ter-Hovhannisyan; Yury O Chernoff; Mark Borodovsky
Journal:  Nucleic Acids Res       Date:  2005-11-28       Impact factor: 16.971

7.  Using intron position conservation for homology-based gene prediction.

Authors:  Jens Keilwagen; Michael Wenk; Jessica L Erickson; Martin H Schattat; Jan Grau; Frank Hartung
Journal:  Nucleic Acids Res       Date:  2016-02-17       Impact factor: 16.971

8.  CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.

Authors:  Aron Marchler-Bauer; Yu Bo; Lianyi Han; Jane He; Christopher J Lanczycki; Shennan Lu; Farideh Chitsaz; Myra K Derbyshire; Renata C Geer; Noreen R Gonzales; Marc Gwadz; David I Hurwitz; Fu Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Zhouxi Wang; Roxanne A Yamashita; Dachuan Zhang; Chanjuan Zheng; Lewis Y Geer; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

9.  APPRIS 2017: principal isoforms for multiple gene sets.

Authors:  Jose Manuel Rodriguez; Juan Rodriguez-Rivas; Tomás Di Domenico; Jesús Vázquez; Alfonso Valencia; Michael L Tress
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

10.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments.

Authors:  Brian J Haas; Steven L Salzberg; Wei Zhu; Mihaela Pertea; Jonathan E Allen; Joshua Orvis; Owen White; C Robin Buell; Jennifer R Wortman
Journal:  Genome Biol       Date:  2008-01-11       Impact factor: 13.583

View more
  43 in total

1.  Karyotype variation, spontaneous genome rearrangements affecting chemical insensitivity, and expression level polymorphisms in the plant pathogen Phytophthora infestans revealed using its first chromosome-scale assembly.

Authors:  Michael E H Matson; Qihua Liang; Stefano Lonardi; Howard S Judelson
Journal:  PLoS Pathog       Date:  2022-10-10       Impact factor: 7.464

2.  Obligate sexual reproduction of a homothallic fungus closely related to the Cryptococcus pathogenic species complex.

Authors:  Andrew Ryan Passer; Shelly Applen Clancey; Terrance Shea; Márcia David-Palma; Anna Floyd Averette; Teun Boekhout; Betina M Porcel; Minou Nowrousian; Christina A Cuomo; Sheng Sun; Joseph Heitman; Marco A Coelho
Journal:  Elife       Date:  2022-06-17       Impact factor: 8.713

3.  A chromosome-level genome assembly and intestinal transcriptome of Trypoxylus dichotomus (Coleoptera: Scarabaeidae) to understand its lignocellulose digestion ability.

Authors:  Qingyun Wang; Liwei Liu; Sujiong Zhang; Hong Wu; Junhao Huang
Journal:  Gigascience       Date:  2022-06-28       Impact factor: 7.658

4.  Recent reconfiguration of an ancient developmental gene regulatory network in Heliocidaris sea urchins.

Authors:  Phillip L Davidson; Haobing Guo; Jane S Swart; Abdull J Massri; Allison Edgar; Lingyu Wang; Alejandro Berrio; Hannah R Devens; Demian Koop; Paula Cisternas; He Zhang; Yaolei Zhang; Maria Byrne; Guangyi Fan; Gregory A Wray
Journal:  Nat Ecol Evol       Date:  2022-10-20       Impact factor: 19.100

Review 5.  The dark proteome: translation from noncanonical open reading frames.

Authors:  Bradley W Wright; Zixin Yi; Jonathan S Weissman; Jin Chen
Journal:  Trends Cell Biol       Date:  2021-11-26       Impact factor: 21.167

6.  Comparative Analysis of Annotation Pipelines Using the First Japanese White-Eye (Zosterops japonicus) Genome.

Authors:  Madhvi Venkatraman; Robert C Fleischer; Mirian T N Tsuchiya
Journal:  Genome Biol Evol       Date:  2021-05-07       Impact factor: 3.416

Review 7.  Genome annotation of disease-causing microorganisms.

Authors:  Yibo Dong; Chang Li; Kami Kim; Liwang Cui; Xiaoming Liu
Journal:  Brief Bioinform       Date:  2021-03-22       Impact factor: 11.622

8.  Haplotype-resolved genome assembly enables gene discovery in the red palm weevil Rhynchophorus ferrugineus.

Authors:  Guilherme B Dias; Musaad A Altammami; Hamadttu A F El-Shafie; Fahad M Alhoshani; Mohamed B Al-Fageeh; Casey M Bergman; Manee M Manee
Journal:  Sci Rep       Date:  2021-05-11       Impact factor: 4.379

9.  A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.).

Authors:  Tomas N Generalovic; Shane A McCarthy; Ian A Warren; Jonathan M D Wood; James Torrance; Ying Sims; Michael Quail; Kerstin Howe; Miha Pipan; Richard Durbin; Chris D Jiggins
Journal:  G3 (Bethesda)       Date:  2021-05-07       Impact factor: 3.154

10.  Illumina short-read sequencing data, de novo assembly and annotations of the Drosophila nasuta nasuta genome.

Authors:  Stafny DSouza; Koushik Ponnanna; Amruthavalli Chokkanna; Nallur Ramachandra
Journal:  Data Brief       Date:  2020-12-19
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.