Mark Howison1, Felipe Zapata, Casey W Dunn. 1. Center for Computation and Visualization and Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912, USA.
Abstract
MOTIVATION: Draft de novo genome assemblies are now available for many organisms. These assemblies are point estimates of the true genome sequences. Each is a specific hypothesis, drawn from among many alternative hypotheses, of the sequence of a genome. Assembly uncertainty, the inability to distinguish between multiple alternative assembly hypotheses, can be due to real variation between copies of the genome in the sample, errors and ambiguities in the sequenced data and assumptions and heuristics of the assemblers. Most assemblers select a single assembly according to ad hoc criteria, and do not yet report and quantify the uncertainty of their outputs. Those assemblers that do report uncertainty take different approaches to describing multiple assembly hypotheses and the support for each. RESULTS: Here we review and examine the problem of representing and measuring uncertainty in assemblies. A promising recent development is the implementation of assemblers that are built according to explicit statistical models. Some new assembly methods, for example, estimate and maximize assembly likelihood. These advances, combined with technical advances in the representation of alternative assembly hypotheses, will lead to a more complete and biologically relevant understanding of assembly uncertainty. This will in turn facilitate the interpretation of downstream analyses and tests of specific biological hypotheses.
MOTIVATION: Draft de novo genome assemblies are now available for many organisms. These assemblies are point estimates of the true genome sequences. Each is a specific hypothesis, drawn from among many alternative hypotheses, of the sequence of a genome. Assembly uncertainty, the inability to distinguish between multiple alternative assembly hypotheses, can be due to real variation between copies of the genome in the sample, errors and ambiguities in the sequenced data and assumptions and heuristics of the assemblers. Most assemblers select a single assembly according to ad hoc criteria, and do not yet report and quantify the uncertainty of their outputs. Those assemblers that do report uncertainty take different approaches to describing multiple assembly hypotheses and the support for each. RESULTS: Here we review and examine the problem of representing and measuring uncertainty in assemblies. A promising recent development is the implementation of assemblers that are built according to explicit statistical models. Some new assembly methods, for example, estimate and maximize assembly likelihood. These advances, combined with technical advances in the representation of alternative assembly hypotheses, will lead to a more complete and biologically relevant understanding of assembly uncertainty. This will in turn facilitate the interpretation of downstream analyses and tests of specific biological hypotheses.
Authors: Heather Bracken-Grissom; Allen G Collins; Timothy Collins; Keith Crandall; Daniel Distel; Casey Dunn; Gonzalo Giribet; Steven Haddock; Nancy Knowlton; Mark Martindale; Mónica Medina; Charles Messing; Stephen J O'Brien; Gustav Paulay; Nicolas Putnam; Timothy Ravasi; Greg W Rouse; Joseph F Ryan; Anja Schulze; Gert Wörheide; Maja Adamska; Xavier Bailly; Jesse Breinholt; William E Browne; M Christina Diaz; Nathaniel Evans; Jean-François Flot; Nicole Fogarty; Matthew Johnston; Bishoy Kamel; Akito Y Kawahara; Tammy Laberge; Dennis Lavrov; François Michonneau; Leonid L Moroz; Todd Oakley; Karen Osborne; Shirley A Pomponi; Adelaide Rhodes; Scott R Santos; Nori Satoh; Robert W Thacker; Yves Van de Peer; Christian R Voolstra; David Mark Welch; Judith Winston; Xin Zhou Journal: J Hered Date: 2014 Jan-Feb Impact factor: 2.645
Authors: Haibao Tang; Xingtan Zhang; Chenyong Miao; Jisen Zhang; Ray Ming; James C Schnable; Patrick S Schnable; Eric Lyons; Jianguo Lu Journal: Genome Biol Date: 2015-01-13 Impact factor: 13.583
Authors: Bo Li; Nathanael Fillmore; Yongsheng Bai; Mike Collins; James A Thomson; Ron Stewart; Colin N Dewey Journal: Genome Biol Date: 2014-12-21 Impact factor: 13.583
Authors: James F Denton; Jose Lugo-Martinez; Abraham E Tucker; Daniel R Schrider; Wesley C Warren; Matthew W Hahn Journal: PLoS Comput Biol Date: 2014-12-04 Impact factor: 4.475
Authors: Sergey Koren; Todd J Treangen; Christopher M Hill; Mihai Pop; Adam M Phillippy Journal: BMC Bioinformatics Date: 2014-05-03 Impact factor: 3.169