Literature DB >> 33468057

Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines.

Stephan Weißbach1,2, Stanislav Sys1, Charlotte Hewel1, Hristo Todorov1, Susann Schweiger1,3, Jennifer Winter1,3, Markus Pfenninger4,5,6, Ali Torkamani7, Doug Evans7, Joachim Burger8, Karin Everschor-Sitte9, Helen Louise May-Simera10, Susanne Gerber11.   

Abstract

BACKGROUND: Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform's impact.
RESULTS: The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups.
CONCLUSION: We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.

Entities:  

Keywords:  Aging; Complete genomics; GATK; Healthy aging; Illumina; Longevity; Next-generation sequencing (NGS) technologies; Platform-biases; Wellderly

Year:  2021        PMID: 33468057      PMCID: PMC7814447          DOI: 10.1186/s12864-020-07362-8

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  34 in total

1.  Replication validity of genetic association studies.

Authors:  J P Ioannidis; E E Ntzani; T A Trikalinos; D G Contopoulos-Ioannidis
Journal:  Nat Genet       Date:  2001-11       Impact factor: 38.330

2.  Best practices for benchmarking germline small-variant calls in human genomes.

Authors:  Peter Krusche; Len Trigg; Paul C Boutros; Christopher E Mason; Francisco M De La Vega; Benjamin L Moore; Mar Gonzalez-Porta; Michael A Eberle; Zivana Tezak; Samir Lababidi; Rebecca Truty; George Asimenos; Birgit Funke; Mark Fleharty; Brad A Chapman; Marc Salit; Justin M Zook
Journal:  Nat Biotechnol       Date:  2019-03-11       Impact factor: 54.908

3.  Variation in genome-wide mutation rates within and between human families.

Authors:  Donald F Conrad; Jonathan E M Keebler; Mark A DePristo; Sarah J Lindsay; Yujun Zhang; Ferran Casals; Youssef Idaghdour; Chris L Hartl; Carlos Torroja; Kiran V Garimella; Martine Zilversmit; Reed Cartwright; Guy A Rouleau; Mark Daly; Eric A Stone; Matthew E Hurles; Philip Awadalla
Journal:  Nat Genet       Date:  2011-06-12       Impact factor: 38.330

4.  DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification.

Authors:  Lixin Chen; Pingfang Liu; Thomas C Evans; Laurence M Ettwiller
Journal:  Science       Date:  2017-02-17       Impact factor: 47.728

5.  An analytical framework for optimizing variant discovery from personal genomes.

Authors:  Gareth Highnam; Jason J Wang; Dean Kusler; Justin Zook; Vinaya Vijayan; Nir Leibovich; David Mittelman
Journal:  Nat Commun       Date:  2015-02-25       Impact factor: 14.919

6.  Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings.

Authors:  Kyu-Baek Hwang; In-Hee Lee; Honglan Li; Dhong-Geon Won; Carles Hernandez-Ferrer; Jose Alberto Negron; Sek Won Kong
Journal:  Sci Rep       Date:  2019-03-01       Impact factor: 4.379

7.  Common miRNA Patterns of Alzheimer's Disease and Parkinson's Disease and Their Putative Impact on Commensal Gut Microbiota.

Authors:  Charlotte Hewel; Julia Kaiser; Anna Wierczeiko; Jan Linke; Christoph Reinhardt; Kristina Endres; Susanne Gerber
Journal:  Front Neurosci       Date:  2019-03-05       Impact factor: 4.677

8.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

Authors:  Annalisa Buniello; Jacqueline A L MacArthur; Maria Cerezo; Laura W Harris; James Hayhurst; Cinzia Malangone; Aoife McMahon; Joannella Morales; Edward Mountjoy; Elliot Sollis; Daniel Suveges; Olga Vrousgou; Patricia L Whetzel; Ridwan Amode; Jose A Guillen; Harpreet S Riat; Stephen J Trevanion; Peggy Hall; Heather Junkins; Paul Flicek; Tony Burdett; Lucia A Hindorff; Fiona Cunningham; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

9.  Systematic comparison of variant calling pipelines using gold standard personal exome variants.

Authors:  Sohyun Hwang; Eiru Kim; Insuk Lee; Edward M Marcotte
Journal:  Sci Rep       Date:  2015-12-07       Impact factor: 4.379

10.  Low-cost scalable discretization, prediction, and feature selection for complex systems.

Authors:  S Gerber; L Pospisil; M Navandar; I Horenko
Journal:  Sci Adv       Date:  2020-01-29       Impact factor: 14.136

View more
  2 in total

1.  Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics.

Authors:  Susanne Gerber; Lukas Pospisil; Stanislav Sys; Charlotte Hewel; Ali Torkamani; Illia Horenko
Journal:  Front Artif Intell       Date:  2022-01-05

Review 2.  Clinical exome sequencing-Mistakes and caveats.

Authors:  Jordi Corominas; Sanne P Smeekens; Marcel R Nelen; Helger G Yntema; Erik-Jan Kamsteeg; Rolph Pfundt; Christian Gilissen
Journal:  Hum Mutat       Date:  2022-03-15       Impact factor: 4.700

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.