Literature DB >> 21217122

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Guillaume Marçais1, Carl Kingsford.   

Abstract

MOTIVATION: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.
RESULTS: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution. AVAILABILITY: The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish.

Entities:  

Mesh:

Year:  2011        PMID: 21217122      PMCID: PMC3051319          DOI: 10.1093/bioinformatics/btr011

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

1.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

2.  Annotating large genomes with exact word matches.

Authors:  John Healy; Elizabeth E Thomas; Jacob T Schwartz; Michael Wigler
Journal:  Genome Res       Date:  2003-09-15       Impact factor: 9.043

3.  A whole-genome assembly of Drosophila.

Authors:  E W Myers; G G Sutton; A L Delcher; I M Dew; D P Fasulo; M J Flanigan; S A Kravitz; C M Mobarry; K H Reinert; K A Remington; E L Anson; R A Bolanos; H H Chou; C M Jordan; A L Halpern; S Lonardi; E M Beasley; R C Brandon; L Chen; P J Dunn; Z Lai; Y Liang; D R Nusskern; M Zhan; Q Zhang; X Zheng; G M Rubin; M D Adams; J C Venter
Journal:  Science       Date:  2000-03-24       Impact factor: 47.728

4.  The sequence and de novo assembly of the giant panda genome.

Authors:  Ruiqiang Li; Wei Fan; Geng Tian; Hongmei Zhu; Lin He; Jing Cai; Quanfei Huang; Qingle Cai; Bo Li; Yinqi Bai; Zhihe Zhang; Yaping Zhang; Wen Wang; Jun Li; Fuwen Wei; Heng Li; Min Jian; Jianwen Li; Zhaolei Zhang; Rasmus Nielsen; Dawei Li; Wanjun Gu; Zhentao Yang; Zhaoling Xuan; Oliver A Ryder; Frederick Chi-Ching Leung; Yan Zhou; Jianjun Cao; Xiao Sun; Yonggui Fu; Xiaodong Fang; Xiaosen Guo; Bo Wang; Rong Hou; Fujun Shen; Bo Mu; Peixiang Ni; Runmao Lin; Wubin Qian; Guodong Wang; Chang Yu; Wenhui Nie; Jinhuan Wang; Zhigang Wu; Huiqing Liang; Jiumeng Min; Qi Wu; Shifeng Cheng; Jue Ruan; Mingwei Wang; Zhongbin Shi; Ming Wen; Binghang Liu; Xiaoli Ren; Huisong Zheng; Dong Dong; Kathleen Cook; Gao Shan; Hao Zhang; Carolin Kosiol; Xueying Xie; Zuhong Lu; Hancheng Zheng; Yingrui Li; Cynthia C Steiner; Tommy Tsan-Yuk Lam; Siyuan Lin; Qinghui Zhang; Guoqing Li; Jing Tian; Timing Gong; Hongde Liu; Dejin Zhang; Lin Fang; Chen Ye; Juanbin Zhang; Wenbo Hu; Anlong Xu; Yuanyuan Ren; Guojie Zhang; Michael W Bruford; Qibin Li; Lijia Ma; Yiran Guo; Na An; Yujie Hu; Yang Zheng; Yongyong Shi; Zhiqiang Li; Qing Liu; Yanling Chen; Jing Zhao; Ning Qu; Shancen Zhao; Feng Tian; Xiaoling Wang; Haiyin Wang; Lizhi Xu; Xiao Liu; Tomas Vinar; Yajun Wang; Tak-Wah Lam; Siu-Ming Yiu; Shiping Liu; Hemin Zhang; Desheng Li; Yan Huang; Xia Wang; Guohua Yang; Zhi Jiang; Junyi Wang; Nan Qin; Li Li; Jingxiang Li; Lars Bolund; Karsten Kristiansen; Gane Ka-Shu Wong; Maynard Olson; Xiuqing Zhang; Songgang Li; Huanming Yang; Jian Wang; Jun Wang
Journal:  Nature       Date:  2009-12-13       Impact factor: 49.962

5.  Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.

Authors:  Rami A Dalloul; Julie A Long; Aleksey V Zimin; Luqman Aslam; Kathryn Beal; Le Ann Blomberg; Pascal Bouffard; David W Burt; Oswald Crasta; Richard P M A Crooijmans; Kristal Cooper; Roger A Coulombe; Supriyo De; Mary E Delany; Jerry B Dodgson; Jennifer J Dong; Clive Evans; Karin M Frederickson; Paul Flicek; Liliana Florea; Otto Folkerts; Martien A M Groenen; Tim T Harkins; Javier Herrero; Steve Hoffmann; Hendrik-Jan Megens; Andrew Jiang; Pieter de Jong; Pete Kaiser; Heebal Kim; Kyu-Won Kim; Sungwon Kim; David Langenberger; Mi-Kyung Lee; Taeheon Lee; Shrinivasrao Mane; Guillaume Marcais; Manja Marz; Audrey P McElroy; Thero Modise; Mikhail Nefedov; Cédric Notredame; Ian R Paton; William S Payne; Geo Pertea; Dennis Prickett; Daniela Puiu; Dan Qioa; Emanuele Raineri; Magali Ruffier; Steven L Salzberg; Michael C Schatz; Chantel Scheuring; Carl J Schmidt; Steven Schroeder; Stephen M J Searle; Edward J Smith; Jacqueline Smith; Tad S Sonstegard; Peter F Stadler; Hakim Tafer; Zhijian Jake Tu; Curtis P Van Tassell; Albert J Vilella; Kelly P Williams; James A Yorke; Liqing Zhang; Hong-Bin Zhang; Xiaojun Zhang; Yang Zhang; Kent M Reed
Journal:  PLoS Biol       Date:  2010-09-07       Impact factor: 8.029

6.  Duplication count distributions in DNA sequences.

Authors:  Suzanne S Sindi; Brian R Hunt; James A Yorke
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2008-12-11

7.  RAP: a new computer program for de novo identification of repeated sequences in whole genomes.

Authors:  Davide Campagna; Chiara Romualdi; Nicola Vitulo; Micky Del Favero; Matej Lexa; Nicola Cannata; Giorgio Valle
Journal:  Bioinformatics       Date:  2004-09-16       Impact factor: 6.937

8.  Quake: quality-aware detection and correction of sequencing errors.

Authors:  David R Kelley; Michael C Schatz; Steven L Salzberg
Journal:  Genome Biol       Date:  2010-11-29       Impact factor: 13.583

9.  A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.

Authors:  Stefan Kurtz; Apurva Narechania; Joshua C Stein; Doreen Ware
Journal:  BMC Genomics       Date:  2008-10-31       Impact factor: 3.969

10.  Aggressive assembly of pyrosequencing reads with mates.

Authors:  Jason R Miller; Arthur L Delcher; Sergey Koren; Eli Venter; Brian P Walenz; Anushka Brownley; Justin Johnson; Kelvin Li; Clark Mobarry; Granger Sutton
Journal:  Bioinformatics       Date:  2008-10-24       Impact factor: 6.937

View more
  1095 in total

1.  Indel Group in Genomes (IGG) Molecular Genetic Markers.

Authors:  Ted W Toal; Diana Burkart-Waco; Tyson Howell; Mily Ron; Sundaram Kuppu; Anne Britt; Roger Chetelat; Siobhan M Brady
Journal:  Plant Physiol       Date:  2016-07-19       Impact factor: 8.340

2.  Strain/species-specific probe design for microbial identification microarrays.

Authors:  Qichao Tu; Zhili He; Ye Deng; Jizhong Zhou
Journal:  Appl Environ Microbiol       Date:  2013-06-07       Impact factor: 4.792

3.  The MaSuRCA genome assembler.

Authors:  Aleksey V Zimin; Guillaume Marçais; Daniela Puiu; Michael Roberts; Steven L Salzberg; James A Yorke
Journal:  Bioinformatics       Date:  2013-08-29       Impact factor: 6.937

4.  Identification of a metabolic disposal route for the oncometabolite S-(2-succino)cysteine in Bacillus subtilis.

Authors:  Thomas D Niehaus; Jacob Folz; Donald R McCarty; Arthur J L Cooper; David Moraga Amador; Oliver Fiehn; Andrew D Hanson
Journal:  J Biol Chem       Date:  2018-04-06       Impact factor: 5.157

5.  Accurate read-based metagenome characterization using a hierarchical suite of unique signatures.

Authors:  Tracey Allen K Freitas; Po-E Li; Matthew B Scholz; Patrick S G Chain
Journal:  Nucleic Acids Res       Date:  2015-03-12       Impact factor: 16.971

6.  Acute Hepatopancreatic Necrosis Disease-Causing Vibrio parahaemolyticus Strains Maintain an Antibacterial Type VI Secretion System with Versatile Effector Repertoires.

Authors:  Peng Li; Lisa N Kinch; Ann Ray; Ankur B Dalia; Qian Cong; Linda M Nunan; Andrew Camilli; Nick V Grishin; Dor Salomon; Kim Orth
Journal:  Appl Environ Microbiol       Date:  2017-06-16       Impact factor: 4.792

7.  Metatranscriptome profiling of a harmful algal bloom.

Authors:  Endymion D Cooper; Bastian Bentlage; Theodore R Gibbons; Tsvetan R Bachvaroff; Charles F Delwiche
Journal:  Harmful Algae       Date:  2014-07       Impact factor: 4.273

8.  Genome sequencing and population genomics modeling provide insights into the local adaptation of weeping forsythia.

Authors:  Lin-Feng Li; Samuel A Cushman; Yan-Xia He; Yong Li
Journal:  Hortic Res       Date:  2020-08-01       Impact factor: 6.793

9.  Chromosome-scale genome assembly of sweet cherry (Prunus avium L.) cv. Tieton obtained using long-read and Hi-C sequencing.

Authors:  Jiawei Wang; Weizhen Liu; Dongzi Zhu; Po Hong; Shizhong Zhang; Shijun Xiao; Yue Tan; Xin Chen; Li Xu; Xiaojuan Zong; Lisi Zhang; Hairong Wei; Xiaohui Yuan; Qingzhong Liu
Journal:  Hortic Res       Date:  2020-08-01       Impact factor: 6.793

10.  A cnidarian parasite of salmon (Myxozoa: Henneguya) lacks a mitochondrial genome.

Authors:  Dayana Yahalomi; Stephen D Atkinson; Moran Neuhof; E Sally Chang; Hervé Philippe; Paulyn Cartwright; Jerri L Bartholomew; Dorothée Huchon
Journal:  Proc Natl Acad Sci U S A       Date:  2020-02-24       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.