Literature DB >> 29573184

Batch effects in a multiyear sequencing study: False biological trends due to changes in read lengths.

D M Leigh1,2,3, H E L Lischer1,2, C Grossen1, L F Keller1,4.   

Abstract

High-throughput sequencing is a powerful tool, but suffers biases and errors that must be accounted for to prevent false biological conclusions. Such errors include batch effects; technical errors only present in subsets of data due to procedural changes within a study. If overlooked and multiple batches of data are combined, spurious biological signals can arise, particularly if batches of data are correlated with biological variables. Batch effects can be minimized through randomization of sample groups across batches. However, in long-term or multiyear studies where data are added incrementally, full randomization is impossible, and batch effects may be a common feature. Here, we present a case study where false signals of selection were detected due to a batch effect in a multiyear study of Alpine ibex (Capra ibex). The batch effect arose because sequencing read length changed over the course of the project and populations were added incrementally to the study, resulting in nonrandom distributions of populations across read lengths. The differences in read length caused small misalignments in a subset of the data, leading to false variant alleles and thus false SNPs. Pronounced allele frequency differences between populations arose at these SNPs because of the correlation between read length and population. This created highly statistically significant, but biologically spurious, signals of selection and false associations between allele frequencies and the environment. We highlight the risk of batch effects and discuss strategies to reduce the impacts of batch effects in multiyear high-throughput sequencing studies.
© 2018 John Wiley & Sons Ltd.

Entities:  

Keywords:  GWAS; RADseq; genotyping error; long-term data; outlier; sequencing error

Mesh:

Year:  2018        PMID: 29573184     DOI: 10.1111/1755-0998.12779

Source DB:  PubMed          Journal:  Mol Ecol Resour        ISSN: 1755-098X            Impact factor:   7.090


  8 in total

Review 1.  Opportunities and challenges of macrogenetic studies.

Authors:  Deborah M Leigh; Charles B van Rees; Katie L Millette; Martin F Breed; Chloé Schmidt; Laura D Bertola; Brian K Hand; Margaret E Hunter; Evelyn L Jensen; Francine Kershaw; Libby Liggins; Gordon Luikart; Stéphanie Manel; Joachim Mergeay; Joshua M Miller; Gernot Segelbacher; Sean Hoban; Ivan Paz-Vinas
Journal:  Nat Rev Genet       Date:  2021-08-18       Impact factor: 53.242

2.  Weaving place-based knowledge for culturally significant species in the age of genomics: Looking to the past to navigate the future.

Authors:  Aisling Rayne; Stephanie Blair; Matthew Dale; Brendan Flack; John Hollows; Roger Moraga; Riki N Parata; Makarini Rupene; Paulette Tamati-Elliffe; Priscilla M Wehi; Matthew J Wylie; Tammy E Steeves
Journal:  Evol Appl       Date:  2022-04-08       Impact factor: 4.929

3.  Uneven Missing Data Skew Phylogenomic Relationships within the Lories and Lorikeets.

Authors:  Brian Tilston Smith; William M Mauck; Brett W Benz; Michael J Andersen
Journal:  Genome Biol Evol       Date:  2020-07-01       Impact factor: 3.416

4.  The presence and impact of reference bias on population genomic studies of prehistoric human populations.

Authors:  Torsten Günther; Carl Nettelblad
Journal:  PLoS Genet       Date:  2019-07-26       Impact factor: 5.917

5.  Long-read sequencing reveals the evolutionary drivers of intra-host diversity across natural RNA mycovirus infections.

Authors:  Deborah M Leigh; Karla Peranić; Simone Prospero; Carolina Cornejo; Mirna Ćurković-Perica; Quirin Kupper; Lucija Nuskern; Daniel Rigling; Marin Ježić
Journal:  Virus Evol       Date:  2021-12-01

Review 6.  Genomic reaction norms inform predictions of plastic and adaptive responses to climate change.

Authors:  Rebekah A Oomen; Jeffrey A Hutchings
Journal:  J Anim Ecol       Date:  2022-05-18       Impact factor: 5.606

7.  A robust sequencing assay of a thousand amplicons for the high-throughput population monitoring of Alpine ibex immunogenetics.

Authors:  Camille Kessler; Alice Brambilla; Dominique Waldvogel; Glauco Camenisch; Iris Biebach; Deborah M Leigh; Christine Grossen; Daniel Croll
Journal:  Mol Ecol Resour       Date:  2021-07-07       Impact factor: 8.678

8.  Range reduction of Oblong Rocksnail, Leptoxis compacta, shapes riverscape genetic patterns.

Authors:  Aaliyah D Wright; Nicole L Garrison; Ashantye' S Williams; Paul D Johnson; Nathan V Whelan
Journal:  PeerJ       Date:  2020-09-01       Impact factor: 2.984

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.