BACKGROUND: Whilst much sequencing effort has focused on key mammalian model organisms such as mouse and human, little is known about the relationship between genome sequencing techniques for non-model mammals and genome assembly quality. This is especially relevant to non-model mammals, where the samples to be sequenced are often degraded and of low quality. A key aspect when planning a genome project is the choice of sequencing data to generate. This decision is driven by several factors, including the biological questions being asked, the quality of DNA available, and the availability of funds. Cutting-edge sequencing technologies now make it possible to achieve highly contiguous, chromosome-level genome assemblies, but rely on high-quality high molecular weight DNA. However, funding is often insufficient for many independent research groups to use these techniques. Here we use a range of different genomic technologies generated from a roadkill European polecat (Mustela putorius) to assess various assembly techniques on this low-quality sample. We evaluated different approaches for de novo assemblies and discuss their value in relation to biological analyses. RESULTS: Generally, assemblies containing more data types achieved better scores in our ranking system. However, when accounting for misassemblies, this was not always the case for Bionano and low-coverage 10x Genomics (for scaffolding only). We also find that the extra cost associated with combining multiple data types is not necessarily associated with better genome assemblies. CONCLUSIONS: The high degree of variability between each de novo assembly method (assessed from the 7 key metrics) highlights the importance of carefully devising the sequencing strategy to be able to carry out the desired analysis. Adding more data to genome assemblies does not always result in better assemblies, so it is important to understand the nuances of genomic data integration explained here, in order to obtain cost-effective value for money when sequencing genomes.
BACKGROUND: Whilst much sequencing effort has focused on key mammalian model organisms such as mouse and human, little is known about the relationship between genome sequencing techniques for non-model mammals and genome assembly quality. This is especially relevant to non-model mammals, where the samples to be sequenced are often degraded and of low quality. A key aspect when planning a genome project is the choice of sequencing data to generate. This decision is driven by several factors, including the biological questions being asked, the quality of DNA available, and the availability of funds. Cutting-edge sequencing technologies now make it possible to achieve highly contiguous, chromosome-level genome assemblies, but rely on high-quality high molecular weight DNA. However, funding is often insufficient for many independent research groups to use these techniques. Here we use a range of different genomic technologies generated from a roadkill European polecat (Mustela putorius) to assess various assembly techniques on this low-quality sample. We evaluated different approaches for de novo assemblies and discuss their value in relation to biological analyses. RESULTS: Generally, assemblies containing more data types achieved better scores in our ranking system. However, when accounting for misassemblies, this was not always the case for Bionano and low-coverage 10x Genomics (for scaffolding only). We also find that the extra cost associated with combining multiple data types is not necessarily associated with better genome assemblies. CONCLUSIONS: The high degree of variability between each de novo assembly method (assessed from the 7 key metrics) highlights the importance of carefully devising the sequencing strategy to be able to carry out the desired analysis. Adding more data to genome assemblies does not always result in better assemblies, so it is important to understand the nuances of genomic data integration explained here, in order to obtain cost-effective value for money when sequencing genomes.
Authors: Sante Gnerre; Iain Maccallum; Dariusz Przybylski; Filipe J Ribeiro; Joshua N Burton; Bruce J Walker; Ted Sharpe; Giles Hall; Terrance P Shea; Sean Sykes; Aaron M Berlin; Daniel Aird; Maura Costello; Riza Daza; Louise Williams; Robert Nicol; Andreas Gnirke; Chad Nusbaum; Eric S Lander; David B Jaffe Journal: Proc Natl Acad Sci U S A Date: 2010-12-27 Impact factor: 11.205
Authors: H Bradley Shaffer; Müge Gidiş; Evan McCartney-Melstad; Kevin M Neal; Hilton M Oyamaguchi; Marisa Tellez; Erin M Toffelmier Journal: Annu Rev Anim Biosci Date: 2015-01-02 Impact factor: 8.923
Authors: Devin P Locke; LaDeana W Hillier; Wesley C Warren; Kim C Worley; Lynne V Nazareth; Donna M Muzny; Shiaw-Pyng Yang; Zhengyuan Wang; Asif T Chinwalla; Pat Minx; Makedonka Mitreva; Lisa Cook; Kim D Delehaunty; Catrina Fronick; Heather Schmidt; Lucinda A Fulton; Robert S Fulton; Joanne O Nelson; Vincent Magrini; Craig Pohl; Tina A Graves; Chris Markovic; Andy Cree; Huyen H Dinh; Jennifer Hume; Christie L Kovar; Gerald R Fowler; Gerton Lunter; Stephen Meader; Andreas Heger; Chris P Ponting; Tomas Marques-Bonet; Can Alkan; Lin Chen; Ze Cheng; Jeffrey M Kidd; Evan E Eichler; Simon White; Stephen Searle; Albert J Vilella; Yuan Chen; Paul Flicek; Jian Ma; Brian Raney; Bernard Suh; Richard Burhans; Javier Herrero; David Haussler; Rui Faria; Olga Fernando; Fleur Darré; Domènec Farré; Elodie Gazave; Meritxell Oliva; Arcadi Navarro; Roberta Roberto; Oronzo Capozzi; Nicoletta Archidiacono; Giuliano Della Valle; Stefania Purgato; Mariano Rocchi; Miriam K Konkel; Jerilyn A Walker; Brygg Ullmer; Mark A Batzer; Arian F A Smit; Robert Hubley; Claudio Casola; Daniel R Schrider; Matthew W Hahn; Victor Quesada; Xose S Puente; Gonzalo R Ordoñez; Carlos López-Otín; Tomas Vinar; Brona Brejova; Aakrosh Ratan; Robert S Harris; Webb Miller; Carolin Kosiol; Heather A Lawson; Vikas Taliwal; André L Martins; Adam Siepel; Arindam Roychoudhury; Xin Ma; Jeremiah Degenhardt; Carlos D Bustamante; Ryan N Gutenkunst; Thomas Mailund; Julien Y Dutheil; Asger Hobolth; Mikkel H Schierup; Oliver A Ryder; Yuko Yoshinaga; Pieter J de Jong; George M Weinstock; Jeffrey Rogers; Elaine R Mardis; Richard A Gibbs; Richard K Wilson Journal: Nature Date: 2011-01-27 Impact factor: 69.504
Authors: Alex R Hastie; Lingli Dong; Alexis Smith; Jeff Finklestein; Ernest T Lam; Naxin Huo; Han Cao; Pui-Yan Kwok; Karin R Deal; Jan Dvorak; Ming-Cheng Luo; Yong Gu; Ming Xiao Journal: PLoS One Date: 2013-02-06 Impact factor: 3.240
Authors: Richard M Leggett; Bernardo J Clavijo; Leah Clissold; Matthew D Clark; Mario Caccamo Journal: Bioinformatics Date: 2013-12-02 Impact factor: 6.937
Authors: Ellie E Armstrong; Ryan W Taylor; Stefan Prost; Peter Blinston; Esther van der Meer; Hillary Madzikanda; Olivia Mufute; Roseline Mandisodza-Chikerema; John Stuelpnagel; Claudio Sillero-Zubiri; Dmitri Petrov Journal: Gigascience Date: 2019-02-01 Impact factor: 6.524