Virag Sharma1,2, Peter Schwede1,2, Michael Hiller1,2. 1. Max Planck Institute of Molecular Cell Biology and Genetics, Dresden 01307, Germany. 2. Max Planck Institute for the Physics of Complex Systems, Dresden 01187, Germany.
Abstract
MOTIVATION: Homology-based gene prediction is a powerful concept to annotate newly sequenced genomes. We have previously demonstrated that whole genome alignments can be utilized for accurate comparative coding gene annotation. RESULTS: Here we present CESAR 2.0 that utilizes genome alignments to transfer coding gene annotations from one reference to many other aligned genomes. We show that CESAR 2.0 is 77 times faster and requires 31 times less memory compared to its predecessor. CESAR 2.0 substantially improves the ability to align splice sites that have shifted over larger distances, allowing for precise identification of the exon boundaries in the aligned genome. Finally, CESAR 2.0 supports entire genes, which enables the annotation of joined exons that arose by complete intron deletions. CESAR 2.0 can readily be applied to new genome alignments to annotate coding genes in many other genomes at improved accuracy and without necessitating large-computational resources. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/hillerlab/CESAR2.0. CONTACT: hiller@mpi-cbg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Homology-based gene prediction is a powerful concept to annotate newly sequenced genomes. We have previously demonstrated that whole genome alignments can be utilized for accurate comparative coding gene annotation. RESULTS: Here we present CESAR 2.0 that utilizes genome alignments to transfer coding gene annotations from one reference to many other aligned genomes. We show that CESAR 2.0 is 77 times faster and requires 31 times less memory compared to its predecessor. CESAR 2.0 substantially improves the ability to align splice sites that have shifted over larger distances, allowing for precise identification of the exon boundaries in the aligned genome. Finally, CESAR 2.0 supports entire genes, which enables the annotation of joined exons that arose by complete intron deletions. CESAR 2.0 can readily be applied to new genome alignments to annotate coding genes in many other genomes at improved accuracy and without necessitating large-computational resources. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/hillerlab/CESAR2.0. CONTACT: hiller@mpi-cbg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Virag Sharma; Nikolai Hecker; Juliana G Roscito; Leo Foerster; Bjoern E Langer; Michael Hiller Journal: Nat Commun Date: 2018-03-23 Impact factor: 14.919
Authors: Joana Damas; Graham M Hughes; Kathleen C Keough; Corrie A Painter; Nicole S Persky; Marco Corbo; Michael Hiller; Klaus-Peter Koepfli; Andreas R Pfenning; Huabin Zhao; Diane P Genereux; Ross Swofford; Katherine S Pollard; Oliver A Ryder; Martin T Nweeia; Kerstin Lindblad-Toh; Emma C Teeling; Elinor K Karlsson; Harris A Lewin Journal: Proc Natl Acad Sci U S A Date: 2020-08-21 Impact factor: 11.205
Authors: Juliana G Roscito; Katrin Sameith; Genis Parra; Bjoern E Langer; Andreas Petzold; Claudia Moebius; Marc Bickle; Miguel Trefaut Rodrigues; Michael Hiller Journal: Nat Commun Date: 2018-11-09 Impact factor: 14.919