Literature DB >> 19645596

Maximum likelihood genome assembly.

Paul Medvedev1, Michael Brudno.   

Abstract

Whole genome shotgun assembly is the process of taking many short sequenced segments (reads) and reconstructing the genome from which they originated. We demonstrate how the technique of bidirected network flow can be used to explicitly model the double-stranded nature of DNA for genome assembly. By combining an algorithm for the Chinese Postman Problem on bidirected graphs with the construction of a bidirected de Bruijn graph, we are able to find the shortest double-stranded DNA sequence that contains a given set of k-long DNA molecules. This is the first exact polynomial time algorithm for the assembly of a double-stranded genome. Furthermore, we propose a maximum likelihood framework for assembling the genome that is the most likely source of the reads, in lieu of the standard maximum parsimony approach (which finds the shortest genome subject to some constraints). In this setting, we give a bidirected network flow-based algorithm that, by taking advantage of high coverage, accurately estimates the copy counts of repeats in a genome. Our second algorithm combines these predicted copy counts with matepair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from Escherichia coli and predict copy counts with extremely high accuracy, while assembling long contigs.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19645596      PMCID: PMC3154397          DOI: 10.1089/cmb.2009.0047

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  14 in total

1.  De novo repeat classification and fragment assembly.

Authors:  Pavel A Pevzner; Paul A Pevzner; Haixu Tang; Glenn Tesler
Journal:  Genome Res       Date:  2004-09       Impact factor: 9.043

Review 2.  1-Tuple DNA sequencing: computer analysis.

Authors:  P A Pevzner
Journal:  J Biomol Struct Dyn       Date:  1989-08

3.  The fragment assembly string graph.

Authors:  Eugene W Myers
Journal:  Bioinformatics       Date:  2005-09-01       Impact factor: 6.937

4.  SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing.

Authors:  Juliane C Dohm; Claudio Lottaz; Tatiana Borodina; Heinz Himmelbauer
Journal:  Genome Res       Date:  2007-10-01       Impact factor: 9.043

5.  Extending assembly of short DNA sequences to handle error.

Authors:  William R Jeck; Josephine A Reinhardt; David A Baltrus; Matthew T Hickenbotham; Vincent Magrini; Elaine R Mardis; Jeffery L Dangl; Corbin D Jones
Journal:  Bioinformatics       Date:  2007-09-24       Impact factor: 6.937

6.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

7.  Toward simplifying and accurately formulating fragment assembly.

Authors:  E W Myers
Journal:  J Comput Biol       Date:  1995       Impact factor: 1.479

8.  ALLPATHS: de novo assembly of whole-genome shotgun microreads.

Authors:  Jonathan Butler; Iain MacCallum; Michael Kleber; Ilya A Shlyakhter; Matthew K Belmonte; Eric S Lander; Chad Nusbaum; David B Jaffe
Journal:  Genome Res       Date:  2008-03-13       Impact factor: 9.043

9.  De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.

Authors:  David Hernandez; Patrice François; Laurent Farinelli; Magne Osterås; Jacques Schrenzel
Journal:  Genome Res       Date:  2008-03-10       Impact factor: 9.043

10.  Assembling millions of short DNA sequences using SSAKE.

Authors:  René L Warren; Granger G Sutton; Steven J M Jones; Robert A Holt
Journal:  Bioinformatics       Date:  2006-12-08       Impact factor: 6.937

View more
  41 in total

1.  Reconstructing cancer genomes from paired-end sequencing data.

Authors:  Layla Oesper; Anna Ritz; Sarah J Aerni; Ryan Drebin; Benjamin J Raphael
Journal:  BMC Bioinformatics       Date:  2012-04-19       Impact factor: 3.169

2.  Cactus: Algorithms for genome multiple sequence alignment.

Authors:  Benedict Paten; Dent Earl; Ngan Nguyen; Mark Diekhans; Daniel Zerbino; David Haussler
Journal:  Genome Res       Date:  2011-06-10       Impact factor: 9.043

3.  Assemblathon 1: a competitive assessment of de novo short read assembly methods.

Authors:  Dent Earl; Keith Bradnam; John St John; Aaron Darling; Dawei Lin; Joseph Fass; Hung On Ken Yu; Vince Buffalo; Daniel R Zerbino; Mark Diekhans; Ngan Nguyen; Pramila Nuwantha Ariyaratne; Wing-Kin Sung; Zemin Ning; Matthias Haimel; Jared T Simpson; Nuno A Fonseca; İnanç Birol; T Roderick Docking; Isaac Y Ho; Daniel S Rokhsar; Rayan Chikhi; Dominique Lavenier; Guillaume Chapuis; Delphine Naquin; Nicolas Maillet; Michael C Schatz; David R Kelley; Adam M Phillippy; Sergey Koren; Shiaw-Pyng Yang; Wei Wu; Wen-Chi Chou; Anuj Srivastava; Timothy I Shaw; J Graham Ruby; Peter Skewes-Cox; Miguel Betegon; Michelle T Dimon; Victor Solovyev; Igor Seledtsov; Petr Kosarev; Denis Vorobyev; Ricardo Ramirez-Gonzalez; Richard Leggett; Dan MacLean; Fangfang Xia; Ruibang Luo; Zhenyu Li; Yinlong Xie; Binghang Liu; Sante Gnerre; Iain MacCallum; Dariusz Przybylski; Filipe J Ribeiro; Shuangye Yin; Ted Sharpe; Giles Hall; Paul J Kersey; Richard Durbin; Shaun D Jackman; Jarrod A Chapman; Xiaoqiu Huang; Joseph L DeRisi; Mario Caccamo; Yingrui Li; David B Jaffe; Richard E Green; David Haussler; Ian Korf; Benedict Paten
Journal:  Genome Res       Date:  2011-09-16       Impact factor: 9.043

4.  Building a pan-genome reference for a population.

Authors:  Ngan Nguyen; Glenn Hickey; Daniel R Zerbino; Brian Raney; Dent Earl; Joel Armstrong; W James Kent; David Haussler; Benedict Paten
Journal:  J Comput Biol       Date:  2015-01-07       Impact factor: 1.479

5.  A Flow Procedure for Linearization of Genome Sequence Graphs.

Authors:  David Haussler; Maciej Smuga-Otto; Jordan M Eizenga; Benedict Paten; Adam M Novak; Sergei Nikitin; Maria Zueva; Dmitrii Miagkov
Journal:  J Comput Biol       Date:  2018-05-24       Impact factor: 1.479

6.  Superbubbles, Ultrabubbles, and Cacti.

Authors:  Benedict Paten; Jordan M Eizenga; Yohei M Rosen; Adam M Novak; Erik Garrison; Glenn Hickey
Journal:  J Comput Biol       Date:  2018-02-20       Impact factor: 1.479

7.  Detecting copy number variation with mated short reads.

Authors:  Paul Medvedev; Marc Fiume; Misko Dzamba; Tim Smith; Michael Brudno
Journal:  Genome Res       Date:  2010-08-30       Impact factor: 9.043

8.  A hybrid approach for the automated finishing of bacterial genomes.

Authors:  Ali Bashir; Aaron Klammer; William P Robins; Chen-Shan Chin; Dale Webster; Ellen Paxinos; David Hsu; Meredith Ashby; Susana Wang; Paul Peluso; Robert Sebra; Jon Sorenson; James Bullard; Jackie Yen; Marie Valdovino; Emilia Mollova; Khai Luong; Steven Lin; Brianna LaMay; Amruta Joshi; Lori Rowe; Michael Frace; Cheryl L Tarr; Maryann Turnsek; Brigid M Davis; Andrew Kasarskis; John J Mekalanos; Matthew K Waldor; Eric E Schadt
Journal:  Nat Biotechnol       Date:  2012-07-01       Impact factor: 54.908

9.  Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data.

Authors:  Petr Novák; Pavel Neumann; Jirí Macas
Journal:  BMC Bioinformatics       Date:  2010-07-15       Impact factor: 3.169

10.  Assembly complexity of prokaryotic genomes using short reads.

Authors:  Carl Kingsford; Michael C Schatz; Mihai Pop
Journal:  BMC Bioinformatics       Date:  2010-01-12       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.