Literature DB >> 28466793

Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.

Kuang-Lim Chan1,2, Rozana Rosli3, Tatiana V Tatarinova4, Michael Hogan5, Mohd Firdaus-Raih6, Eng-Ti Leslie Low3.   

Abstract

BACKGROUND: Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion.
RESULTS: We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure).
CONCLUSIONS: Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.

Entities:  

Keywords:  Gene model; Gene prediction; Species specific HMM

Mesh:

Year:  2017        PMID: 28466793      PMCID: PMC5333190          DOI: 10.1186/s12859-016-1426-6

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  44 in total

1.  EMBOSS: the European Molecular Biology Open Software Suite.

Authors:  P Rice; I Longden; A Bleasby
Journal:  Trends Genet       Date:  2000-06       Impact factor: 11.639

2.  The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants.

Authors:  Shu Ouyang; C Robin Buell
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Skew in CG content near the transcription start site in Arabidopsis thaliana.

Authors:  Tatiana Tatarinova; Vyacheslav Brover; Maxim Troukhan; Nickolai Alexandrov
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

4.  JIGSAW: integration of multiple sources of evidence for gene prediction.

Authors:  Jonathan E Allen; Steven L Salzberg
Journal:  Bioinformatics       Date:  2005-08-02       Impact factor: 6.937

Review 5.  Repbase Update, a database of eukaryotic repetitive elements.

Authors:  J Jurka; V V Kapitonov; A Pavlicek; P Klonowski; O Kohany; J Walichiewicz
Journal:  Cytogenet Genome Res       Date:  2005       Impact factor: 1.636

6.  Ab initio gene finding in Drosophila genomic DNA.

Authors:  A A Salamov; V V Solovyev
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

7.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.

Authors:  W H Majoros; M Pertea; S L Salzberg
Journal:  Bioinformatics       Date:  2004-05-14       Impact factor: 6.937

8.  Gene prediction with a hidden Markov model and a new intron submodel.

Authors:  Mario Stanke; Stephan Waack
Journal:  Bioinformatics       Date:  2003-10       Impact factor: 6.937

9.  Full-length messenger RNA sequences greatly improve genome annotation.

Authors:  Brian J Haas; Natalia Volfovsky; Christopher D Town; Maxim Troukhan; Nickolai Alexandrov; Kenneth A Feldmann; Richard B Flavell; Owen White; Steven L Salzberg
Journal:  Genome Biol       Date:  2002-05-30       Impact factor: 13.583

10.  Gene finding in novel genomes.

Authors:  Ian Korf
Journal:  BMC Bioinformatics       Date:  2004-05-14       Impact factor: 3.169

View more
  8 in total

1.  Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing.

Authors:  David E Cook; Jose Espejo Valle-Inclan; Alice Pajoro; Hanna Rovenich; Bart P H J Thomma; Luigi Faino
Journal:  Plant Physiol       Date:  2018-11-06       Impact factor: 8.340

2.  Differential expression of heat shock and floral regulatory genes in pseudocarpel initials of mantled female inflorescences from Elaeis guineensis Jacq.

Authors:  Siew-Eng Ooi; Norashikin Sarpan; Norazlin Abdul Aziz; Azimi Nuraziyan; Meilina Ong-Abdullah
Journal:  Plant Reprod       Date:  2018-11-22       Impact factor: 3.767

3.  Whole genome analysis of Aspergillus sojae SMF 134 supports its merits as a starter for soybean fermentation.

Authors:  Kang Uk Kim; Kyung Min Kim; Yong-Ho Choi; Byung-Serk Hurh; Inhyung Lee
Journal:  J Microbiol       Date:  2019-06-27       Impact factor: 3.422

4.  Genomic Analysis of the Insect-Killing Fungus Beauveria bassiana JEF-007 as a Biopesticide.

Authors:  Se Jin Lee; Mi Rong Lee; Sihyeon Kim; Jong Cheol Kim; So Eun Park; Dongwei Li; Tae Young Shin; Yu-Shin Nai; Jae Su Kim
Journal:  Sci Rep       Date:  2018-08-17       Impact factor: 4.379

5.  PalmXplore: oil palm gene database.

Authors:  Nik Shazana Nik Mohd Sanusi; Rozana Rosli; Mohd Amin Ab Halim; Kuang-Lim Chan; Jayanthi Nagappan; Norazah Azizi; Nadzirah Amiruddin; Tatiana V Tatarinova; Eng-Ti Leslie Low
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

6.  TransPrise: a novel machine learning approach for eukaryotic promoter prediction.

Authors:  Stepan Pachganov; Khalimat Murtazalieva; Aleksei Zarubin; Dmitry Sokolov; Duane R Chartier; Tatiana V Tatarinova
Journal:  PeerJ       Date:  2019-11-01       Impact factor: 2.984

Review 7.  Multi-Omics Approaches and Resources for Systems-Level Gene Function Prediction in the Plant Kingdom.

Authors:  Muhammad-Redha Abdullah-Zawawi; Nisha Govender; Sarahani Harun; Nor Azlan Nor Muhammad; Zamri Zainal; Zeti-Azura Mohamed-Hussein
Journal:  Plants (Basel)       Date:  2022-10-05

8.  Evidence-based gene models for structural and functional annotations of the oil palm genome.

Authors:  Kuang-Lim Chan; Tatiana V Tatarinova; Rozana Rosli; Nadzirah Amiruddin; Norazah Azizi; Mohd Amin Ab Halim; Nik Shazana Nik Mohd Sanusi; Nagappan Jayanthi; Petr Ponomarenko; Martin Triska; Victor Solovyev; Mohd Firdaus-Raih; Ravigadevi Sambanthamurthi; Denis Murphy; Eng-Ti Leslie Low
Journal:  Biol Direct       Date:  2017-09-08       Impact factor: 4.540

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.