Literature DB >> 32338745

To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.

R A Leo Elworth1, Qi Wang2, Pavan K Kota3, C J Barberan4, Benjamin Coleman4, Advait Balaji1, Gaurav Gupta4, Richard G Baraniuk4, Anshumali Shrivastava1,4, Todd J Treangen1,2.   

Abstract

As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2020        PMID: 32338745      PMCID: PMC7261164          DOI: 10.1093/nar/gkaa265

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  62 in total

1.  Compressive sensing DNA microarrays.

Authors:  Wei Dai; Mona A Sheikh; Olgica Milenkovic; Richard G Baraniuk
Journal:  EURASIP J Bioinform Syst Biol       Date:  2009-01-13

Review 2.  Computational solutions for omics data.

Authors:  Bonnie Berger; Jian Peng; Mona Singh
Journal:  Nat Rev Genet       Date:  2013-05       Impact factor: 53.242

3.  The sequence read archive.

Authors:  Rasko Leinonen; Hideaki Sugawara; Martin Shumway
Journal:  Nucleic Acids Res       Date:  2010-11-09       Impact factor: 16.971

4.  The ocean sampling day consortium.

Authors:  Anna Kopf; Mesude Bicak; Renzo Kottmann; Julia Schnetzer; Ivaylo Kostadinov; Katja Lehmann; Antonio Fernandez-Guerra; Christian Jeanthon; Eyal Rahav; Matthias Ullrich; Antje Wichels; Gunnar Gerdts; Paraskevi Polymenakou; Giorgos Kotoulas; Rania Siam; Rehab Z Abdallah; Eva C Sonnenschein; Thierry Cariou; Fergal O'Gara; Stephen Jackson; Sandi Orlic; Michael Steinke; Julia Busch; Bernardo Duarte; Isabel Caçador; João Canning-Clode; Oleksandra Bobrova; Viggo Marteinsson; Eyjolfur Reynisson; Clara Magalhães Loureiro; Gian Marco Luna; Grazia Marina Quero; Carolin R Löscher; Anke Kremp; Marie E DeLorenzo; Lise Øvreås; Jennifer Tolman; Julie LaRoche; Antonella Penna; Marc Frischer; Timothy Davis; Barker Katherine; Christopher P Meyer; Sandra Ramos; Catarina Magalhães; Florence Jude-Lemeilleur; Ma Leopoldina Aguirre-Macedo; Shiao Wang; Nicole Poulton; Scott Jones; Rachel Collin; Jed A Fuhrman; Pascal Conan; Cecilia Alonso; Noga Stambler; Kelly Goodwin; Michael M Yakimov; Federico Baltar; Levente Bodrossy; Jodie Van De Kamp; Dion Mf Frampton; Martin Ostrowski; Paul Van Ruth; Paul Malthouse; Simon Claus; Klaas Deneudt; Jonas Mortelmans; Sophie Pitois; David Wallom; Ian Salter; Rodrigo Costa; Declan C Schroeder; Mahrous M Kandil; Valentina Amaral; Florencia Biancalana; Rafael Santana; Maria Luiza Pedrotti; Takashi Yoshida; Hiroyuki Ogata; Tim Ingleton; Kate Munnik; Naiara Rodriguez-Ezpeleta; Veronique Berteaux-Lecellier; Patricia Wecker; Ibon Cancio; Daniel Vaulot; Christina Bienhold; Hassan Ghazal; Bouchra Chaouni; Soumya Essayeh; Sara Ettamimi; El Houcine Zaid; Noureddine Boukhatem; Abderrahim Bouali; Rajaa Chahboune; Said Barrijal; Mohammed Timinouni; Fatima El Otmani; Mohamed Bennani; Marianna Mea; Nadezhda Todorova; Ventzislav Karamfilov; Petra Ten Hoopen; Guy Cochrane; Stephane L'Haridon; Kemal Can Bizsel; Alessandro Vezzi; Federico M Lauro; Patrick Martin; Rachelle M Jensen; Jamie Hinks; Susan Gebbels; Riccardo Rosselli; Fabio De Pascale; Riccardo Schiavon; Antonina Dos Santos; Emilie Villar; Stéphane Pesant; Bruno Cataletto; Francesca Malfatti; Ranjith Edirisinghe; Jorge A Herrera Silveira; Michele Barbier; Valentina Turk; Tinkara Tinta; Wayne J Fuller; Ilkay Salihoglu; Nedime Serakinci; Mahmut Cerkez Ergoren; Eileen Bresnan; Juan Iriberri; Paul Anders Fronth Nyhus; Edvardsen Bente; Hans Erik Karlsen; Peter N Golyshin; Josep M Gasol; Snejana Moncheva; Nina Dzhembekova; Zackary Johnson; Christopher David Sinigalliano; Maribeth Louise Gidley; Adriana Zingone; Roberto Danovaro; George Tsiamis; Melody S Clark; Ana Cristina Costa; Monia El Bour; Ana M Martins; R Eric Collins; Anne-Lise Ducluzeau; Jonathan Martinez; Mark J Costello; Linda A Amaral-Zettler; Jack A Gilbert; Neil Davies; Dawn Field; Frank Oliver Glöckner
Journal:  Gigascience       Date:  2015-06-19       Impact factor: 6.524

5.  MetaPalette: a k-mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation.

Authors:  David Koslicki; Daniel Falush
Journal:  mSystems       Date:  2016-06-07       Impact factor: 6.496

6.  kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity.

Authors:  Kevin D Murray; Christfried Webers; Cheng Soon Ong; Justin Borevitz; Norman Warthmann
Journal:  PLoS Comput Biol       Date:  2017-09-05       Impact factor: 4.475

7.  Capturing sequence diversity in metagenomes with comprehensive and scalable probe design.

Authors:  Hayden C Metsky; Katherine J Siddle; Adrianne Gladden-Young; James Qu; David K Yang; Patrick Brehio; Andrew Goldfarb; Anne Piantadosi; Shirlee Wohl; Amber Carter; Aaron E Lin; Kayla G Barnes; Damien C Tully; Bjӧrn Corleis; Scott Hennigan; Giselle Barbosa-Lima; Yasmine R Vieira; Lauren M Paul; Amanda L Tan; Kimberly F Garcia; Leda A Parham; Ikponmwosa Odia; Philomena Eromon; Onikepe A Folarin; Augustine Goba; Etienne Simon-Lorière; Lisa Hensley; Angel Balmaseda; Eva Harris; Douglas S Kwon; Todd M Allen; Jonathan A Runstadler; Sandra Smole; Fernando A Bozza; Thiago M L Souza; Sharon Isern; Scott F Michael; Ivette Lorenzana; Lee Gehrke; Irene Bosch; Gregory Ebel; Donald S Grant; Christian T Happi; Daniel J Park; Andreas Gnirke; Pardis C Sabeti; Christian B Matranga
Journal:  Nat Biotechnol       Date:  2019-02-04       Impact factor: 54.908

8.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

9.  An improved filtering algorithm for big read datasets and its application to single-cell assembly.

Authors:  Axel Wedemeyer; Lasse Kliemann; Anand Srivastav; Christian Schielke; Thorsten B Reusch; Philip Rosenstiel
Journal:  BMC Bioinformatics       Date:  2017-07-03       Impact factor: 3.169

10.  Efficient Generation of Transcriptomic Profiles by Random Composite Measurements.

Authors:  Brian Cleary; Le Cong; Anthea Cheung; Eric S Lander; Aviv Regev
Journal:  Cell       Date:  2017-11-16       Impact factor: 41.582

View more
  2 in total

1.  SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors:  Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal:  Bioinformatics       Date:  2022-05-18       Impact factor: 6.931

2.  Simplitigs as an efficient and scalable representation of de Bruijn graphs.

Authors:  Michael Baym; Gregory Kucherov; Karel Břinda
Journal:  Genome Biol       Date:  2021-04-06       Impact factor: 13.583

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.