MOTIVATION: As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. RESULTS: The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. AVAILABILITY AND IMPLEMENTATION: The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online.
MOTIVATION: As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. RESULTS: The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. AVAILABILITY AND IMPLEMENTATION: The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online.
Authors: Benedict Paten; Dent Earl; Ngan Nguyen; Mark Diekhans; Daniel Zerbino; David Haussler Journal: Genome Res Date: 2011-06-10 Impact factor: 9.043
Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937
Authors: Andrew G Clark; Michael B Eisen; Douglas R Smith; Casey M Bergman; Brian Oliver; Therese A Markow; Thomas C Kaufman; Manolis Kellis; William Gelbart; Venky N Iyer; Daniel A Pollard; Timothy B Sackton; Amanda M Larracuente; Nadia D Singh; Jose P Abad; Dawn N Abt; Boris Adryan; Montserrat Aguade; Hiroshi Akashi; Wyatt W Anderson; Charles F Aquadro; David H Ardell; Roman Arguello; Carlo G Artieri; Daniel A Barbash; Daniel Barker; Paolo Barsanti; Phil Batterham; Serafim Batzoglou; Dave Begun; Arjun Bhutkar; Enrico Blanco; Stephanie A Bosak; Robert K Bradley; Adrianne D Brand; Michael R Brent; Angela N Brooks; Randall H Brown; Roger K Butlin; Corrado Caggese; Brian R Calvi; A Bernardo de Carvalho; Anat Caspi; Sergio Castrezana; Susan E Celniker; Jean L Chang; Charles Chapple; Sourav Chatterji; Asif Chinwalla; Alberto Civetta; Sandra W Clifton; Josep M Comeron; James C Costello; Jerry A Coyne; Jennifer Daub; Robert G David; Arthur L Delcher; Kim Delehaunty; Chuong B Do; Heather Ebling; Kevin Edwards; Thomas Eickbush; Jay D Evans; Alan Filipski; Sven Findeiss; Eva Freyhult; Lucinda Fulton; Robert Fulton; Ana C L Garcia; Anastasia Gardiner; David A Garfield; Barry E Garvin; Greg Gibson; Don Gilbert; Sante Gnerre; Jennifer Godfrey; Robert Good; Valer Gotea; Brenton Gravely; Anthony J Greenberg; Sam Griffiths-Jones; Samuel Gross; Roderic Guigo; Erik A Gustafson; Wilfried Haerty; Matthew W Hahn; Daniel L Halligan; Aaron L Halpern; Gillian M Halter; Mira V Han; Andreas Heger; LaDeana Hillier; Angie S Hinrichs; Ian Holmes; Roger A Hoskins; Melissa J Hubisz; Dan Hultmark; Melanie A Huntley; David B Jaffe; Santosh Jagadeeshan; William R Jeck; Justin Johnson; Corbin D Jones; William C Jordan; Gary H Karpen; Eiko Kataoka; Peter D Keightley; Pouya Kheradpour; Ewen F Kirkness; Leonardo B Koerich; Karsten Kristiansen; Dave Kudrna; Rob J Kulathinal; Sudhir Kumar; Roberta Kwok; Eric Lander; Charles H Langley; Richard Lapoint; Brian P Lazzaro; So-Jeong Lee; Lisa Levesque; Ruiqiang Li; Chiao-Feng Lin; Michael F Lin; Kerstin Lindblad-Toh; Ana Llopart; Manyuan Long; Lloyd Low; Elena Lozovsky; Jian Lu; Meizhong Luo; Carlos A Machado; Wojciech Makalowski; Mar Marzo; Muneo Matsuda; Luciano Matzkin; Bryant McAllister; Carolyn S McBride; Brendan McKernan; Kevin McKernan; Maria Mendez-Lago; Patrick Minx; Michael U Mollenhauer; Kristi Montooth; Stephen M Mount; Xu Mu; Eugene Myers; Barbara Negre; Stuart Newfeld; Rasmus Nielsen; Mohamed A F Noor; Patrick O'Grady; Lior Pachter; Montserrat Papaceit; Matthew J Parisi; Michael Parisi; Leopold Parts; Jakob S Pedersen; Graziano Pesole; Adam M Phillippy; Chris P Ponting; Mihai Pop; Damiano Porcelli; Jeffrey R Powell; Sonja Prohaska; Kim Pruitt; Marta Puig; Hadi Quesneville; Kristipati Ravi Ram; David Rand; Matthew D Rasmussen; Laura K Reed; Robert Reenan; Amy Reily; Karin A Remington; Tania T Rieger; Michael G Ritchie; Charles Robin; Yu-Hui Rogers; Claudia Rohde; Julio Rozas; Marc J Rubenfield; Alfredo Ruiz; Susan Russo; Steven L Salzberg; Alejandro Sanchez-Gracia; David J Saranga; Hajime Sato; Stephen W Schaeffer; Michael C Schatz; Todd Schlenke; Russell Schwartz; Carmen Segarra; Rama S Singh; Laura Sirot; Marina Sirota; Nicholas B Sisneros; Chris D Smith; Temple F Smith; John Spieth; Deborah E Stage; Alexander Stark; Wolfgang Stephan; Robert L Strausberg; Sebastian Strempel; David Sturgill; Granger Sutton; Granger G Sutton; Wei Tao; Sarah Teichmann; Yoshiko N Tobari; Yoshihiko Tomimura; Jason M Tsolas; Vera L S Valente; Eli Venter; J Craig Venter; Saverio Vicario; Filipe G Vieira; Albert J Vilella; Alfredo Villasante; Brian Walenz; Jun Wang; Marvin Wasserman; Thomas Watts; Derek Wilson; Richard K Wilson; Rod A Wing; Mariana F Wolfner; Alex Wong; Gane Ka-Shu Wong; Chung-I Wu; Gabriel Wu; Daisuke Yamamoto; Hsiao-Pei Yang; Shiaw-Pyng Yang; James A Yorke; Kiyohito Yoshida; Evgeny Zdobnov; Peili Zhang; Yu Zhang; Aleksey V Zimin; Jennifer Baldwin; Amr Abdouelleil; Jamal Abdulkadir; Adal Abebe; Brikti Abera; Justin Abreu; St Christophe Acer; Lynne Aftuck; Allen Alexander; Peter An; Erica Anderson; Scott Anderson; Harindra Arachi; Marc Azer; Pasang Bachantsang; Andrew Barry; Tashi Bayul; Aaron Berlin; Daniel Bessette; Toby Bloom; Jason Blye; Leonid Boguslavskiy; Claude Bonnet; Boris Boukhgalter; Imane Bourzgui; Adam Brown; Patrick Cahill; Sheridon Channer; Yama Cheshatsang; Lisa Chuda; Mieke Citroen; Alville Collymore; Patrick Cooke; Maura Costello; Katie D'Aco; Riza Daza; Georgius De Haan; Stuart DeGray; Christina DeMaso; Norbu Dhargay; Kimberly Dooley; Erin Dooley; Missole Doricent; Passang Dorje; Kunsang Dorjee; Alan Dupes; Richard Elong; Jill Falk; Abderrahim Farina; Susan Faro; Diallo Ferguson; Sheila Fisher; Chelsea D Foley; Alicia Franke; Dennis Friedrich; Loryn Gadbois; Gary Gearin; Christina R Gearin; Georgia Giannoukos; Tina Goode; Joseph Graham; Edward Grandbois; Sharleen Grewal; Kunsang Gyaltsen; Nabil Hafez; Birhane Hagos; Jennifer Hall; Charlotte Henson; Andrew Hollinger; Tracey Honan; Monika D Huard; Leanne Hughes; Brian Hurhula; M Erii Husby; Asha Kamat; Ben Kanga; Seva Kashin; Dmitry Khazanovich; Peter Kisner; Krista Lance; Marcia Lara; William Lee; Niall Lennon; Frances Letendre; Rosie LeVine; Alex Lipovsky; Xiaohong Liu; Jinlei Liu; Shangtao Liu; Tashi Lokyitsang; Yeshi Lokyitsang; Rakela Lubonja; Annie Lui; Pen MacDonald; Vasilia Magnisalis; Kebede Maru; Charles Matthews; William McCusker; Susan McDonough; Teena Mehta; James Meldrim; Louis Meneus; Oana Mihai; Atanas Mihalev; Tanya Mihova; Rachel Mittelman; Valentine Mlenga; Anna Montmayeur; Leonidas Mulrain; Adam Navidi; Jerome Naylor; Tamrat Negash; Thu Nguyen; Nga Nguyen; Robert Nicol; Choe Norbu; Nyima Norbu; Nathaniel Novod; Barry O'Neill; Sahal Osman; Eva Markiewicz; Otero L Oyono; Christopher Patti; Pema Phunkhang; Fritz Pierre; Margaret Priest; Sujaa Raghuraman; Filip Rege; Rebecca Reyes; Cecil Rise; Peter Rogov; Keenan Ross; Elizabeth Ryan; Sampath Settipalli; Terry Shea; Ngawang Sherpa; Lu Shi; Diana Shih; Todd Sparrow; Jessica Spaulding; John Stalker; Nicole Stange-Thomann; Sharon Stavropoulos; Catherine Stone; Christopher Strader; Senait Tesfaye; Talene Thomson; Yama Thoulutsang; Dawa Thoulutsang; Kerri Topham; Ira Topping; Tsamla Tsamla; Helen Vassiliev; Andy Vo; Tsering Wangchuk; Tsering Wangdi; Michael Weiand; Jane Wilkinson; Adam Wilson; Shailendra Yadav; Geneva Young; Qing Yu; Lisa Zembek; Danni Zhong; Andrew Zimmer; Zac Zwirko; David B Jaffe; Pablo Alvarez; Will Brockman; Jonathan Butler; CheeWhye Chin; Sante Gnerre; Manfred Grabherr; Michael Kleber; Evan Mauceli; Iain MacCallum Journal: Nature Date: 2007-11-08 Impact factor: 49.962
Authors: Wesley C Warren; R Alan Harris; Marina Haukness; Ian T Fiddes; Shwetha C Murali; Jason Fernandes; Philip C Dishuck; Jessica M Storer; Muthuswamy Raveendran; LaDeana W Hillier; David Porubsky; Yafei Mao; David Gordon; Mitchell R Vollger; Alexandra P Lewis; Katherine M Munson; Elizabeth DeVogelaere; Joel Armstrong; Mark Diekhans; Jerilyn A Walker; Chad Tomlinson; Tina A Graves-Lindsay; Milinn Kremitzki; Sofie R Salama; Peter A Audano; Merly Escalona; Nicholas W Maurer; Francesca Antonacci; Ludovica Mercuri; Flavia A M Maggiolini; Claudia Rita Catacchio; Jason G Underwood; David H O'Connor; Ashley D Sanders; Jan O Korbel; Betsy Ferguson; H Michael Kubisch; Louis Picker; Ned H Kalin; Douglas Rosene; Jon Levine; David H Abbott; Stanton B Gray; Mar M Sanchez; Zsofia A Kovacs-Balint; Joseph W Kemnitz; Sara M Thomasy; Jeffrey A Roberts; Erin L Kinnally; John P Capitanio; J H Pate Skene; Michael Platt; Shelley A Cole; Richard E Green; Mario Ventura; Roger W Wiseman; Benedict Paten; Mark A Batzer; Jeffrey Rogers; Evan E Eichler Journal: Science Date: 2020-12-18 Impact factor: 47.728
Authors: Zev N Kronenberg; Ian T Fiddes; David Gordon; Shwetha Murali; Stuart Cantsilieris; Olivia S Meyerson; Jason G Underwood; Bradley J Nelson; Mark J P Chaisson; Max L Dougherty; Katherine M Munson; Alex R Hastie; Mark Diekhans; Fereydoun Hormozdiari; Nicola Lorusso; Kendra Hoekzema; Ruolan Qiu; Karen Clark; Archana Raja; AnneMarie E Welch; Melanie Sorensen; Carl Baker; Robert S Fulton; Joel Armstrong; Tina A Graves-Lindsay; Ahmet M Denli; Emma R Hoppe; PingHsun Hsieh; Christopher M Hill; Andy Wing Chun Pang; Joyce Lee; Ernest T Lam; Susan K Dutcher; Fred H Gage; Wesley C Warren; Jay Shendure; David Haussler; Valerie A Schneider; Han Cao; Mario Ventura; Richard K Wilson; Benedict Paten; Alex Pollen; Evan E Eichler Journal: Science Date: 2018-06-08 Impact factor: 47.728
Authors: David Thybert; Maša Roller; Fábio C P Navarro; Ian Fiddes; Ian Streeter; Christine Feig; David Martin-Galvez; Mikhail Kolmogorov; Václav Janoušek; Wasiu Akanni; Bronwen Aken; Sarah Aldridge; Varshith Chakrapani; William Chow; Laura Clarke; Carla Cummins; Anthony Doran; Matthew Dunn; Leo Goodstadt; Kerstin Howe; Matthew Howell; Ambre-Aurore Josselin; Robert C Karn; Christina M Laukaitis; Lilue Jingtao; Fergal Martin; Matthieu Muffato; Stefanie Nachtweide; Michael A Quail; Cristina Sisu; Mario Stanke; Klara Stefflova; Cock Van Oosterhout; Frederic Veyrunes; Ben Ward; Fengtang Yang; Golbahar Yazdanifar; Amonida Zadissa; David J Adams; Alvis Brazma; Mark Gerstein; Benedict Paten; Son Pham; Thomas M Keane; Duncan T Odom; Paul Flicek Journal: Genome Res Date: 2018-03-21 Impact factor: 9.043
Authors: Joel Armstrong; Mark Diekhans; Stefanie Nachtweide; Ian T Fiddes; Zev N Kronenberg; Jason G Underwood; David Gordon; Dent Earl; Thomas Keane; Evan E Eichler; David Haussler; Mario Stanke; Benedict Paten Journal: Genome Res Date: 2018-06-08 Impact factor: 9.438
Authors: Ramisah Mohd Shah; Angela H Williams; James K Hane; Julie A Lawrence; Lina M Farfan-Caceres; Johannes W Debler; Richard P Oliver; Robert C Lee Journal: G3 (Bethesda) Date: 2020-07-07 Impact factor: 3.154
Authors: Jingtao Lilue; Anthony G Doran; Ian T Fiddes; Monica Abrudan; Joel Armstrong; Ruth Bennett; William Chow; Joanna Collins; Stephan Collins; Anne Czechanski; Petr Danecek; Mark Diekhans; Dirk-Dominik Dolle; Matt Dunn; Richard Durbin; Dent Earl; Anne Ferguson-Smith; Paul Flicek; Jonathan Flint; Adam Frankish; Beiyuan Fu; Mark Gerstein; James Gilbert; Leo Goodstadt; Jennifer Harrow; Kerstin Howe; Ximena Ibarra-Soria; Mikhail Kolmogorov; Chris J Lelliott; Darren W Logan; Jane Loveland; Clayton E Mathews; Richard Mott; Paul Muir; Stefanie Nachtweide; Fabio C P Navarro; Duncan T Odom; Naomi Park; Sarah Pelan; Son K Pham; Mike Quail; Laura Reinholdt; Lars Romoth; Lesley Shirley; Cristina Sisu; Marcela Sjoberg-Herrera; Mario Stanke; Charles Steward; Mark Thomas; Glen Threadgold; David Thybert; James Torrance; Kim Wong; Jonathan Wood; Binnaz Yalcin; Fengtang Yang; David J Adams; Benedict Paten; Thomas M Keane Journal: Nat Genet Date: 2018-10-01 Impact factor: 41.307
Authors: Joel Armstrong; Glenn Hickey; Mark Diekhans; Ian T Fiddes; Adam M Novak; Alden Deran; Qi Fang; Duo Xie; Shaohong Feng; Josefin Stiller; Diane Genereux; Jeremy Johnson; Voichita Dana Marinescu; Jessica Alföldi; Robert S Harris; Kerstin Lindblad-Toh; David Haussler; Elinor Karlsson; Erich D Jarvis; Guojie Zhang; Benedict Paten Journal: Nature Date: 2020-11-11 Impact factor: 49.962
Authors: Robert A Syme; Anke Martin; Nathan A Wyatt; Julie A Lawrence; Mariano J Muria-Gonzalez; Timothy L Friesen; Simon R Ellwood Journal: Front Genet Date: 2018-04-18 Impact factor: 4.599