Literature DB >> 24471475

Prevention, diagnosis and treatment of high-throughput sequencing data pathologies.

Xiaofan Zhou1, Antonis Rokas.   

Abstract

High-throughput sequencing (HTS) technologies generate millions of sequence reads from DNA/RNA molecules rapidly and cost-effectively, enabling single investigator laboratories to address a variety of 'omics' questions in nonmodel organisms, fundamentally changing the way genomic approaches are used to advance biological research. One major challenge posed by HTS is the complexity and difficulty of data quality control (QC). While QC issues associated with sample isolation, library preparation and sequencing are well known and protocols for their handling are widely available, the QC of the actual sequence reads generated by HTS is often overlooked. HTS-generated sequence reads can contain various errors, biases and artefacts whose identification and amelioration can greatly impact subsequent data analysis. However, a systematic survey on QC procedures for HTS data is still lacking. In this review, we begin by presenting standard 'health check-up' QC procedures recommended for HTS data sets and establishing what 'healthy' HTS data look like. We next proceed by classifying errors, biases and artefacts present in HTS data into three major types of 'pathologies', discussing their causes and symptoms and illustrating with examples their diagnosis and impact on downstream analyses. We conclude this review by offering examples of successful 'treatment' protocols and recommendations on standard practices and treatment options. Notwithstanding the speed with which HTS technologies - and consequently their pathologies - change, we argue that careful QC of HTS data is an important - yet often neglected - aspect of their application in molecular ecology, and lay the groundwork for developing a HTS data QC 'best practices' guide.
© 2014 John Wiley & Sons Ltd.

Keywords:  bioinformatics; high-throughput sequencing; next-generation sequencing; preprocessing; quality control; sequence read

Mesh:

Year:  2014        PMID: 24471475     DOI: 10.1111/mec.12680

Source DB:  PubMed          Journal:  Mol Ecol        ISSN: 0962-1083            Impact factor:   6.185


  12 in total

1.  Transcriptome resources for the white-footed mouse (Peromyscus leucopus): new genomic tools for investigating ecologically divergent urban and rural populations.

Authors:  Stephen E Harris; Rachel J O'Neill; Jason Munshi-South
Journal:  Mol Ecol Resour       Date:  2014-07-16       Impact factor: 7.090

Review 2.  Insights into study design and statistical analyses in translational microbiome studies.

Authors:  Jyoti Shankar
Journal:  Ann Transl Med       Date:  2017-06

3.  SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data.

Authors:  Yuxin Chen; Yongsheng Chen; Chunmei Shi; Zhibo Huang; Yong Zhang; Shengkang Li; Yan Li; Jia Ye; Chang Yu; Zhuo Li; Xiuqing Zhang; Jian Wang; Huanming Yang; Lin Fang; Qiang Chen
Journal:  Gigascience       Date:  2018-01-01       Impact factor: 6.524

4.  Sequencing and Reconstructing Helminth Mitochondrial Genomes Directly from Genomic Next-Generation Sequencing Data.

Authors:  Nikola Palevich; Paul Haydon Maclean
Journal:  Methods Mol Biol       Date:  2021

5.  Detection of Low-Level Mixed-Population Drug Resistance in Mycobacterium tuberculosis Using High Fidelity Amplicon Sequencing.

Authors:  Rebecca E Colman; James M Schupp; Nathan D Hicks; David E Smith; Jordan L Buchhagen; Faramarz Valafar; Valeriu Crudu; Elena Romancenco; Ecaterina Noroc; Lynn Jackson; Donald G Catanzaro; Timothy C Rodwell; Antonino Catanzaro; Paul Keim; David M Engelthaler
Journal:  PLoS One       Date:  2015-05-13       Impact factor: 3.240

Review 6.  A field guide to whole-genome sequencing, assembly and annotation.

Authors:  Robert Ekblom; Jochen B W Wolf
Journal:  Evol Appl       Date:  2014-06-24       Impact factor: 5.183

7.  PEAT: an intelligent and efficient paired-end sequencing adapter trimming algorithm.

Authors:  Yun-Lung Li; Jui-Cheng Weng; Chiung-Chih Hsiao; Min-Te Chou; Chin-Wen Tseng; Jui-Hung Hung
Journal:  BMC Bioinformatics       Date:  2015-01-21       Impact factor: 3.169

8.  AdapterRemoval v2: rapid adapter trimming, identification, and read merging.

Authors:  Mikkel Schubert; Stinus Lindgreen; Ludovic Orlando
Journal:  BMC Res Notes       Date:  2016-02-12

9.  Removing duplicate reads using graphics processing units.

Authors:  Andrea Manconi; Marco Moscatelli; Giuliano Armano; Matteo Gnocchi; Alessandro Orro; Luciano Milanesi
Journal:  BMC Bioinformatics       Date:  2016-11-08       Impact factor: 3.169

10.  A biologist, a statistician, and a bioinformatician walk into a conference room… and walk out with a great metagenomics project plan.

Authors:  Ann E Stapleton
Journal:  Front Plant Sci       Date:  2014-06-03       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.