Alaina Shumate1,2, Steven L Salzberg1,2,3,4. 1. Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD. 2. Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD. 3. Department of Computer Science, Johns Hopkins University, Baltimore, MD. 4. Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD.
Abstract
MOTIVATION: Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. RESULTS: One strategy to annotate new or improved genome assemblies is to map or 'lift over' the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. AVAILABILITY AND IMPLEMENTATION: Liftoff can be installed via bioconda and PyPI. Additionally, the source code for Liftoff is available at https://github.com/agshumate/Liftoff. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. RESULTS: One strategy to annotate new or improved genome assemblies is to map or 'lift over' the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. AVAILABILITY AND IMPLEMENTATION: Liftoff can be installed via bioconda and PyPI. Additionally, the source code for Liftoff is available at https://github.com/agshumate/Liftoff. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard Journal: Genome Res Date: 2012-09 Impact factor: 9.043
Authors: Deanna M Church; Valerie A Schneider; Tina Graves; Katherine Auger; Fiona Cunningham; Nathan Bouk; Hsiu-Chuan Chen; Richa Agarwala; William M McLaren; Graham R S Ritchie; Derek Albracht; Milinn Kremitzki; Susan Rock; Holland Kotkiewicz; Colin Kremitzki; Aye Wollam; Lee Trani; Lucinda Fulton; Robert Fulton; Lucy Matthews; Siobhan Whitehead; Will Chow; James Torrance; Matthew Dunn; Glenn Harden; Glen Threadgold; Jonathan Wood; Joanna Collins; Paul Heath; Guy Griffiths; Sarah Pelan; Darren Grafham; Evan E Eichler; George Weinstock; Elaine R Mardis; Richard K Wilson; Kerstin Howe; Paul Flicek; Tim Hubbard Journal: PLoS Biol Date: 2011-07-05 Impact factor: 8.029
Authors: Yinping Jiao; Paul Peluso; Jinghua Shi; Tiffany Liang; Michelle C Stitzer; Bo Wang; Michael S Campbell; Joshua C Stein; Xuehong Wei; Chen-Shan Chin; Katherine Guill; Michael Regulski; Sunita Kumari; Andrew Olson; Jonathan Gent; Kevin L Schneider; Thomas K Wolfgruber; Michael R May; Nathan M Springer; Eric Antoniou; W Richard McCombie; Gernot G Presting; Michael McMullen; Jeffrey Ross-Ibarra; R Kelly Dawe; Alex Hastie; David R Rank; Doreen Ware Journal: Nature Date: 2017-06-12 Impact factor: 49.962
Authors: Alaina Shumate; Aleksey V Zimin; Rachel M Sherman; Daniela Puiu; Justin M Wagner; Nathan D Olson; Mihaela Pertea; Marc L Salit; Justin M Zook; Steven L Salzberg Journal: Genome Biol Date: 2020-06-02 Impact factor: 13.583
Authors: Cyril Matthey-Doret; Morgan J Colp; Pedro Escoll; Agnès Thierry; Pierrick Moreau; Bruce Curtis; Tobias Sahr; Matt Sarrasin; Michael W Gray; B Franz Lang; John M Archibald; Carmen Buchrieser; Romain Koszul Journal: Genome Res Date: 2022-09-15 Impact factor: 9.438
Authors: Sibelle Torres Vilaça; Riccardo Piccinno; Omar Rota-Stabelli; Maëva Gabrielli; Andrea Benazzo; Michael Matschiner; Luciano S Soares; Alan B Bolten; Karen A Bjorndal; Giorgio Bertorelle Journal: Mol Ecol Date: 2021-08-30 Impact factor: 6.622
Authors: Sergey Nurk; Sergey Koren; Arang Rhie; Mikko Rautiainen; Andrey V Bzikadze; Alla Mikheenko; Mitchell R Vollger; Nicolas Altemose; Lev Uralsky; Ariel Gershman; Sergey Aganezov; Savannah J Hoyt; Mark Diekhans; Glennis A Logsdon; Michael Alonge; Stylianos E Antonarakis; Matthew Borchers; Gerard G Bouffard; Shelise Y Brooks; Gina V Caldas; Nae-Chyun Chen; Haoyu Cheng; Chen-Shan Chin; William Chow; Leonardo G de Lima; Philip C Dishuck; Richard Durbin; Tatiana Dvorkina; Ian T Fiddes; Giulio Formenti; Robert S Fulton; Arkarachai Fungtammasan; Erik Garrison; Patrick G S Grady; Tina A Graves-Lindsay; Ira M Hall; Nancy F Hansen; Gabrielle A Hartley; Marina Haukness; Kerstin Howe; Michael W Hunkapiller; Chirag Jain; Miten Jain; Erich D Jarvis; Peter Kerpedjiev; Melanie Kirsche; Mikhail Kolmogorov; Jonas Korlach; Milinn Kremitzki; Heng Li; Valerie V Maduro; Tobias Marschall; Ann M McCartney; Jennifer McDaniel; Danny E Miller; James C Mullikin; Eugene W Myers; Nathan D Olson; Benedict Paten; Paul Peluso; Pavel A Pevzner; David Porubsky; Tamara Potapova; Evgeny I Rogaev; Jeffrey A Rosenfeld; Steven L Salzberg; Valerie A Schneider; Fritz J Sedlazeck; Kishwar Shafin; Colin J Shew; Alaina Shumate; Ying Sims; Arian F A Smit; Daniela C Soto; Ivan Sović; Jessica M Storer; Aaron Streets; Beth A Sullivan; Françoise Thibaud-Nissen; James Torrance; Justin Wagner; Brian P Walenz; Aaron Wenger; Jonathan M D Wood; Chunlin Xiao; Stephanie M Yan; Alice C Young; Samantha Zarate; Urvashi Surti; Rajiv C McCoy; Megan Y Dennis; Ivan A Alexandrov; Jennifer L Gerton; Rachel J O'Neill; Winston Timp; Justin M Zook; Michael C Schatz; Evan E Eichler; Karen H Miga; Adam M Phillippy Journal: Science Date: 2022-03-31 Impact factor: 63.714
Authors: Bernard Y Kim; Jeremy R Wang; Daniel R Matute; Dmitri A Petrov; Danny E Miller; Olga Barmina; Emily Delaney; Ammon Thompson; Aaron A Comeault; David Peede; Emmanuel R R D'Agostino; Julianne Pelaez; Jessica M Aguilar; Diler Haji; Teruyuki Matsunaga; Ellie E Armstrong; Molly Zych; Yoshitaka Ogawa; Marina Stamenković-Radak; Mihailo Jelić; Marija Savić Veselinović; Marija Tanasković; Pavle Erić; Jian-Jun Gao; Takehiro K Katoh; Masanori J Toda; Hideaki Watabe; Masayoshi Watada; Jeremy S Davis; Leonie C Moyle; Giulia Manoli; Enrico Bertolini; Vladimír Košťál; R Scott Hawley; Aya Takahashi; Corbin D Jones; Donald K Price; Noah Whiteman; Artyom Kopp Journal: Elife Date: 2021-07-19 Impact factor: 8.713