Literature DB >> 25364472

AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS.

Vamsi Kundeti1, Sanguthevar Rajasekaran1, Heiu Dinh1.   

Abstract

Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However, finding a shortest double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying unweighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O(|E|2 log2(|V|)) time. In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a Θ(p(|V| + |E|) log(|V|) + (dmaxp)3) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p = max{|{υ∣din(υ) - dout(υ) > 0}|, |{υ∣din(υ) - dout(υ) < 0}|} and dmax = max{∣din(υ) - dout(υ)}. Our algorithm performs asymptotically better than the bi-directed flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p/|V| lies between 0.08% and 0.13% with 95% probability. Many practical bi-directed de Bruijn graphs do not have cyclic CP walks. In such cases it is not clear how the bi-directed flow can be useful in identifying contigs. Our algorithm can handle such situations and identify maximal bi-directed sub-graphs that have CP walks. A Θ(p(|V| + |E|)) time heuristic algorithm based on these ideas has been implemented for the SDDNA problem. This algorithm was tested on short reads from a plant genome and achieves an approximation ratio of at most 1.0134. We also present a Θ((|V| + |E|) log(V)) time algorithm for the single source shortest path problem on bi-directed de Bruijn graphs, which may be of independent interest.

Entities:  

Keywords:  Chinese Postman problem; Sequence assembly algorithms; bi-directed graphs; bioinformatics

Year:  2010        PMID: 25364472      PMCID: PMC4215799          DOI: 10.1007/978-3-642-17458-2_16

Source DB:  PubMed          Journal:  Discrete Math Algorithms Appl


  5 in total

1.  Initial sequencing and analysis of the human genome.

Authors:  E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

2.  An Eulerian path approach to DNA fragment assembly.

Authors:  P A Pevzner; H Tang; M S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2001-08-14       Impact factor: 11.205

3.  The fragment assembly string graph.

Authors:  Eugene W Myers
Journal:  Bioinformatics       Date:  2005-09-01       Impact factor: 6.937

4.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

5.  The sequence of the human genome.

Authors:  J C Venter; M D Adams; E W Myers; P W Li; R J Mural; G G Sutton; H O Smith; M Yandell; C A Evans; R A Holt; J D Gocayne; P Amanatides; R M Ballew; D H Huson; J R Wortman; Q Zhang; C D Kodira; X H Zheng; L Chen; M Skupski; G Subramanian; P D Thomas; J Zhang; G L Gabor Miklos; C Nelson; S Broder; A G Clark; J Nadeau; V A McKusick; N Zinder; A J Levine; R J Roberts; M Simon; C Slayman; M Hunkapiller; R Bolanos; A Delcher; I Dew; D Fasulo; M Flanigan; L Florea; A Halpern; S Hannenhalli; S Kravitz; S Levy; C Mobarry; K Reinert; K Remington; J Abu-Threideh; E Beasley; K Biddick; V Bonazzi; R Brandon; M Cargill; I Chandramouliswaran; R Charlab; K Chaturvedi; Z Deng; V Di Francesco; P Dunn; K Eilbeck; C Evangelista; A E Gabrielian; W Gan; W Ge; F Gong; Z Gu; P Guan; T J Heiman; M E Higgins; R R Ji; Z Ke; K A Ketchum; Z Lai; Y Lei; Z Li; J Li; Y Liang; X Lin; F Lu; G V Merkulov; N Milshina; H M Moore; A K Naik; V A Narayan; B Neelam; D Nusskern; D B Rusch; S Salzberg; W Shao; B Shue; J Sun; Z Wang; A Wang; X Wang; J Wang; M Wei; R Wides; C Xiao; C Yan; A Yao; J Ye; M Zhan; W Zhang; H Zhang; Q Zhao; L Zheng; F Zhong; W Zhong; S Zhu; S Zhao; D Gilbert; S Baumhueter; G Spier; C Carter; A Cravchik; T Woodage; F Ali; H An; A Awe; D Baldwin; H Baden; M Barnstead; I Barrow; K Beeson; D Busam; A Carver; A Center; M L Cheng; L Curry; S Danaher; L Davenport; R Desilets; S Dietz; K Dodson; L Doup; S Ferriera; N Garg; A Gluecksmann; B Hart; J Haynes; C Haynes; C Heiner; S Hladun; D Hostin; J Houck; T Howland; C Ibegwam; J Johnson; F Kalush; L Kline; S Koduru; A Love; F Mann; D May; S McCawley; T McIntosh; I McMullen; M Moy; L Moy; B Murphy; K Nelson; C Pfannkoch; E Pratts; V Puri; H Qureshi; M Reardon; R Rodriguez; Y H Rogers; D Romblad; B Ruhfel; R Scott; C Sitter; M Smallwood; E Stewart; R Strong; E Suh; R Thomas; N N Tint; S Tse; C Vech; G Wang; J Wetter; S Williams; M Williams; S Windsor; E Winn-Deen; K Wolfe; J Zaveri; K Zaveri; J F Abril; R Guigó; M J Campbell; K V Sjolander; B Karlak; A Kejariwal; H Mi; B Lazareva; T Hatton; A Narechania; K Diemer; A Muruganujan; N Guo; S Sato; V Bafna; S Istrail; R Lippert; R Schwartz; B Walenz; S Yooseph; D Allen; A Basu; J Baxendale; L Blick; M Caminha; J Carnes-Stine; P Caulk; Y H Chiang; M Coyne; C Dahlke; A Deslattes Mays; M Dombroski; M Donnelly; D Ely; S Esparham; C Fosler; H Gire; S Glanowski; K Glasser; A Glodek; M Gorokhov; K Graham; B Gropman; M Harris; J Heil; S Henderson; J Hoover; D Jennings; C Jordan; J Jordan; J Kasha; L Kagan; C Kraft; A Levitsky; M Lewis; X Liu; J Lopez; D Ma; W Majoros; J McDaniel; S Murphy; M Newman; T Nguyen; N Nguyen; M Nodell; S Pan; J Peck; M Peterson; W Rowe; R Sanders; J Scott; M Simpson; T Smith; A Sprague; T Stockwell; R Turner; E Venter; M Wang; M Wen; D Wu; M Wu; A Xia; A Zandieh; X Zhu
Journal:  Science       Date:  2001-02-16       Impact factor: 47.728

  5 in total
  1 in total

1.  Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies.

Authors:  Joshua Wetzel; Carl Kingsford; Mihai Pop
Journal:  BMC Bioinformatics       Date:  2011-04-13       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.