Literature DB >> 29220494

SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data.

Yuxin Chen1, Yongsheng Chen2, Chunmei Shi3,4,5, Zhibo Huang1, Yong Zhang1,6, Shengkang Li1,6, Yan Li1, Jia Ye1, Chang Yu7, Zhuo Li8,9, Xiuqing Zhang1, Jian Wang1,10, Huanming Yang1,10, Lin Fang1,6, Qiang Chen3,4,5.   

Abstract

Quality control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated comprehensive functions, proper architectures, and highly scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a "QC-Preprocess-QC" workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing datasets from genomic, small RNA, Digital Gene Expression, and metagenomic experiments, respectively. As a workflow-like tool, SOAPnuke centralizes processing functions into 1 executable and predefines their order to avoid the necessity of reformatting different files when switching tools. Furthermore, the MapReduce framework enables large scalability to distribute all the processing works to an entire compute cluster.We conducted a benchmarking where SOAPnuke and other tools are used to preprocess a ∼30× NA12878 dataset published by GIAB. The standalone operation of SOAPnuke struck a balance between resource occupancy and performance. When accelerated on 16 working nodes with MapReduce, SOAPnuke achieved ∼5.7 times the fastest speed of other tools.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Keywords:  MapReduce; high-throughput sequencing; preprocessing; quality control

Mesh:

Substances:

Year:  2018        PMID: 29220494      PMCID: PMC5788068          DOI: 10.1093/gigascience/gix120

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  35 in total

1.  AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads.

Authors:  Alexis Criscuolo; Sylvain Brisse
Journal:  Genomics       Date:  2013-08-01       Impact factor: 5.736

Review 2.  Three-stage quality control strategies for DNA re-sequencing data.

Authors:  Yan Guo; Fei Ye; Quanghu Sheng; Travis Clark; David C Samuels
Journal:  Brief Bioinform       Date:  2013-09-24       Impact factor: 11.622

3.  TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets.

Authors:  Robert Schmieder; Yan Wei Lim; Forest Rohwer; Robert Edwards
Journal:  BMC Bioinformatics       Date:  2010-06-23       Impact factor: 3.169

4.  HTQC: a fast quality control toolkit for Illumina sequencing data.

Authors:  Xi Yang; Di Liu; Fei Liu; Jun Wu; Jing Zou; Xue Xiao; Fangqing Zhao; Baoli Zhu
Journal:  BMC Bioinformatics       Date:  2013-01-31       Impact factor: 3.169

5.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

6.  The cytoskeleton adaptor protein ankyrin-1 is upregulated by p53 following DNA damage and alters cell migration.

Authors:  A E Hall; W-T Lu; J D Godfrey; A V Antonov; C Paicu; S Moxon; T Dalmay; A Wilczynska; P A J Muller; M Bushell
Journal:  Cell Death Dis       Date:  2016-04-07       Impact factor: 8.469

7.  BIGpre: a quality assessment package for next-generation sequencing data.

Authors:  Tongwu Zhang; Yingfeng Luo; Kan Liu; Linlin Pan; Bing Zhang; Jun Yu; Songnian Hu
Journal:  Genomics Proteomics Bioinformatics       Date:  2011-12       Impact factor: 7.691

Review 8.  High-throughput sequencing for biology and medicine.

Authors:  Wendy Weijia Soon; Manoj Hariharan; Michael P Snyder
Journal:  Mol Syst Biol       Date:  2013       Impact factor: 11.429

9.  QC-Chain: fast and holistic quality control method for next-generation sequencing data.

Authors:  Qian Zhou; Xiaoquan Su; Anhui Wang; Jian Xu; Kang Ning
Journal:  PLoS One       Date:  2013-04-02       Impact factor: 3.240

10.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

View more
  247 in total

1.  Enhancing fatty acid oxidation negatively regulates PPARs signaling in the heart.

Authors:  ZhengLong Liu; Jeffrey Ding; Timothy S McMillen; Outi Villet; Rong Tian; Dan Shao
Journal:  J Mol Cell Cardiol       Date:  2020-06-24       Impact factor: 5.000

2.  Aeromicrobium endophyticum sp. nov., an endophytic actinobacterium isolated from reed (Phragmites australis).

Authors:  Fei-Na Li; Shui-Lin Liao; Shao-Wei Liu; Tao Jin; Cheng-Hang Sun
Journal:  J Microbiol       Date:  2019-05-23       Impact factor: 3.422

3.  Transcription Factor Atf1 Regulates Expression of Cellulase and Xylanase Genes during Solid-State Fermentation of Ascomycetes.

Authors:  Shuai Zhao; Xu-Zhong Liao; Jiu-Xiang Wang; Yuan-Ni Ning; Cheng-Xi Li; Lu-Sheng Liao; Qi Liu; Qi Jiang; Li-Sha Gu; Li-Hao Fu; Yu-Si Yan; Ya-Ru Xiong; Qi-Peng He; Lin-Hui Su; Cheng-Jie Duan; Xue-Mei Luo; Jia-Xun Feng
Journal:  Appl Environ Microbiol       Date:  2019-11-27       Impact factor: 4.792

4.  The germline mutational process in rhesus macaque and its implications for phylogenetic dating.

Authors:  Lucie A Bergeron; Søren Besenbacher; Jaco Bakker; Jiao Zheng; Panyi Li; George Pacheco; Mikkel-Holger S Sinding; Maria Kamilari; M Thomas P Gilbert; Mikkel H Schierup; Guojie Zhang
Journal:  Gigascience       Date:  2021-05-05       Impact factor: 6.524

5.  20-Hydroxyecdysone-responsive microRNAs of insects.

Authors:  Xiaoli Jin; Xiaoyan Wu; Lanting Zhou; Ting He; Quan Yin; Shiping Liu
Journal:  RNA Biol       Date:  2020-06-16       Impact factor: 4.652

6.  Limits to the cellular control of sequestered cryptophyte prey in the marine ciliate Mesodinium rubrum.

Authors:  Andreas Altenburger; Huimin Cai; Qiye Li; Kirstine Drumm; Miran Kim; Yuanzhen Zhu; Lydia Garcia-Cuetos; Xiaoyu Zhan; Per Juel Hansen; Uwe John; Shuaicheng Li; Nina Lundholm
Journal:  ISME J       Date:  2020-11-23       Impact factor: 10.302

7.  Grassland fairy rings of Leucocalocybe mongolica represent the center of a rich soil microbial community.

Authors:  Mingzheng Duan; Tolgor Bau
Journal:  Braz J Microbiol       Date:  2021-04-13       Impact factor: 2.476

8.  Transcriptome sequencing revealed the mechanism of promoting floret opening by exogenous methyl jasmonate in sorghum.

Authors:  Suifei Liu; Yongming He; Yongqi Fu; Xiaochun Zeng
Journal:  3 Biotech       Date:  2021-03-20       Impact factor: 2.406

9.  Parallel and Intertwining Threads of Domestication in Allopolyploid Cotton.

Authors:  Daojun Yuan; Corrinne E Grover; Guanjing Hu; Mengqiao Pan; Emma R Miller; Justin L Conover; Spencer P Hunt; Joshua A Udall; Jonathan F Wendel
Journal:  Adv Sci (Weinh)       Date:  2021-03-15       Impact factor: 16.806

10.  Polyploidy underlies co-option and diversification of biosynthetic triterpene pathways in the apple tribe.

Authors:  Wenbing Su; Yi Jing; Shoukai Lin; Zhen Yue; Xianghui Yang; Jiabao Xu; Jincheng Wu; Zhike Zhang; Rui Xia; Jiaojiao Zhu; Ning An; Haixin Chen; Yanping Hong; Yuan Yuan; Ting Long; Ling Zhang; Yuanyuan Jiang; Zongli Liu; Hailan Zhang; Yongshun Gao; Yuexue Liu; Hailan Lin; Huicong Wang; Levi Yant; Shunquan Lin; Zhenhua Liu
Journal:  Proc Natl Acad Sci U S A       Date:  2021-05-18       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.