Literature DB >> 25398610

SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips.

Shoshana Marcus1, Hayan Lee2, Michael C Schatz2.   

Abstract

MOTIVATION: Genomics is expanding from a single reference per species paradigm into a more comprehensive pan-genome approach that analyzes multiple individuals together. A compressed de Bruijn graph is a sophisticated data structure for representing the genomes of entire populations. It robustly encodes shared segments, simple single-nucleotide polymorphisms and complex structural variations far beyond what can be represented in a collection of linear sequences alone.
RESULTS: We explore deep topological relationships between suffix trees and compressed de Bruijn graphs and introduce an algorithm, splitMEM, that directly constructs the compressed de Bruijn graph in time and space linear to the total number of genomes for a given maximum genome size. We introduce suffix skips to traverse several suffix links simultaneously and use them to efficiently decompose maximal exact matches into graph nodes. We demonstrate the utility of splitMEM by analyzing the nine-strain pan-genome of Bacillus anthracis and up to 62 strains of Escherichia coli, revealing their core-genome properties.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 25398610      PMCID: PMC4253837          DOI: 10.1093/bioinformatics/btu756

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

1.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

2.  Efficient de novo assembly of large genomes using compressed data structures.

Authors:  Jared T Simpson; Richard Durbin
Journal:  Genome Res       Date:  2011-12-07       Impact factor: 9.043

3.  Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany.

Authors:  David A Rasko; Dale R Webster; Jason W Sahl; Ali Bashir; Nadia Boisen; Flemming Scheutz; Ellen E Paxinos; Robert Sebra; Chen-Shan Chin; Dimitris Iliopoulos; Aaron Klammer; Paul Peluso; Lawrence Lee; Andrey O Kislyuk; James Bullard; Andrew Kasarskis; Susanna Wang; John Eid; David Rank; Julia C Redman; Susan R Steyert; Jakob Frimodt-Møller; Carsten Struve; Andreas M Petersen; Karen A Krogfelt; James P Nataro; Eric E Schadt; Matthew K Waldor
Journal:  N Engl J Med       Date:  2011-07-27       Impact factor: 91.245

4.  Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation.

Authors:  David A Rasko; Patricia L Worsham; Terry G Abshire; Scott T Stanley; Jason D Bannan; Mark R Wilson; Richard J Langham; R Scott Decker; Lingxia Jiang; Timothy D Read; Adam M Phillippy; Steven L Salzberg; Mihai Pop; Matthew N Van Ert; Leo J Kenefic; Paul S Keim; Claire M Fraser-Liggett; Jacques Ravel
Journal:  Proc Natl Acad Sci U S A       Date:  2011-03-07       Impact factor: 11.205

5.  The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates.

Authors:  David A Rasko; M J Rosovitz; Garry S A Myers; Emmanuel F Mongodin; W Florian Fricke; Pawel Gajer; Jonathan Crabtree; Mohammed Sebaihia; Nicholas R Thomson; Roy Chaudhuri; Ian R Henderson; Vanessa Sperandio; Jacques Ravel
Journal:  J Bacteriol       Date:  2008-08-01       Impact factor: 3.490

6.  De novo assembly and genotyping of variants using colored de Bruijn graphs.

Authors:  Zamin Iqbal; Mario Caccamo; Isaac Turner; Paul Flicek; Gil McVean
Journal:  Nat Genet       Date:  2012-01-08       Impact factor: 38.330

7.  The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide.

Authors:  Konstantinos Liolios; Nektarios Tavernarakis; Philip Hugenholtz; Nikos C Kyrpides
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

8.  The advantages of SMRT sequencing.

Authors:  Richard J Roberts; Mauricio O Carneiro; Michael C Schatz
Journal:  Genome Biol       Date:  2013-07-03       Impact factor: 13.583

9.  Compact representation of k-mer de Bruijn graphs for genome read assembly.

Authors:  Einar Andreas Rødland
Journal:  BMC Bioinformatics       Date:  2013-10-23       Impact factor: 3.169

10.  Assembly complexity of prokaryotic genomes using short reads.

Authors:  Carl Kingsford; Michael C Schatz; Mihai Pop
Journal:  BMC Bioinformatics       Date:  2010-01-12       Impact factor: 3.169

View more
  25 in total

1.  The design and construction of reference pangenome graphs with minigraph.

Authors:  Heng Li; Xiaowen Feng; Chong Chu
Journal:  Genome Biol       Date:  2020-10-16       Impact factor: 13.583

Review 2.  Pangenome Graphs.

Authors:  Jordan M Eizenga; Adam M Novak; Jonas A Sibbesen; Simon Heumos; Ali Ghaffaari; Glenn Hickey; Xian Chang; Josiah D Seaman; Robin Rounthwaite; Jana Ebler; Mikko Rautiainen; Shilpa Garg; Benedict Paten; Tobias Marschall; Jouni Sirén; Erik Garrison
Journal:  Annu Rev Genomics Hum Genet       Date:  2020-05-26       Impact factor: 8.929

3.  Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2.

Authors:  Jamshed Khan; Marek Kokot; Sebastian Deorowicz; Rob Patro
Journal:  Genome Biol       Date:  2022-09-08       Impact factor: 17.906

4.  MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks.

Authors:  Ye Peng; Shanmei Tang; Dan Wang; Huanzi Zhong; Huijue Jia; Xianghang Cai; Zhaoxi Zhang; Minfeng Xiao; Huanming Yang; Jian Wang; Karsten Kristiansen; Xun Xu; Junhua Li
Journal:  Gigascience       Date:  2018-11-01       Impact factor: 6.524

5.  Succinct colored de Bruijn graphs.

Authors:  Martin D Muggli; Alexander Bowe; Noelle R Noyes; Paul S Morley; Keith E Belk; Robert Raymond; Travis Gagie; Simon J Puglisi; Christina Boucher
Journal:  Bioinformatics       Date:  2017-10-15       Impact factor: 6.937

6.  Extending reference assembly models.

Authors:  Deanna M Church; Valerie A Schneider; Karyn Meltz Steinberg; Michael C Schatz; Aaron R Quinlan; Chen-Shan Chin; Paul A Kitts; Bronwen Aken; Gabor T Marth; Michael M Hoffman; Javier Herrero; M Lisandra Zepeda Mendoza; Richard Durbin; Paul Flicek
Journal:  Genome Biol       Date:  2015-01-24       Impact factor: 13.583

7.  Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage.

Authors:  Guillaume Holley; Roland Wittler; Jens Stoye
Journal:  Algorithms Mol Biol       Date:  2016-04-14       Impact factor: 1.405

8.  deBWT: parallel construction of Burrows-Wheeler Transform for large collection of genomes with de Bruijn-branch encoding.

Authors:  Bo Liu; Dixian Zhu; Yadong Wang
Journal:  Bioinformatics       Date:  2016-06-15       Impact factor: 6.937

9.  A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes.

Authors:  Jindan Guo; Erli Pang; Hongtao Song; Kui Lin
Journal:  BMC Bioinformatics       Date:  2021-05-27       Impact factor: 3.169

10.  Cuttlefish: fast, parallel and low-memory compaction of de Bruijn graphs from large-scale genome collections.

Authors:  Jamshed Khan; Rob Patro
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.