Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A Distributed Whole Genome Sequencing Benchmark Study.

Literature DB >> 33335541

A Distributed Whole Genome Sequencing Benchmark Study.

Richard D Corbett¹, Robert Eveleigh², Joe Whitney³, Namrata Barai³, Mathieu Bourgey², Eric Chuah¹, Joanne Johnson¹, Richard A Moore¹, Neda Moradin³, Karen L Mungall¹, Sergio Pereira³, Miriam S Reuter⁴, Bhooma Thiruvahindrapuram³, Richard F Wintle³, Jiannis Ragoussis², Lisa J Strug³, Jo-Anne Herbrick³, Naveed Aziz⁴, Steven J M Jones¹, Mark Lathrop², Stephen W Scherer³, Alfredo Staffa², Andrew J Mungall¹.

Abstract

Population sequencing often requires collaboration across a distributed network of sequencing centers for the timely processing of thousands of samples. In such massive efforts, it is important that participating scientists can be confident that the accuracy of the sequence data produced is not affected by which center generates the data. A study was conducted across three established sequencing centers, located in Montreal, Toronto, and Vancouver, constituting Canada's Genomics Enterprise (www.cgen.ca). Whole genome sequencing was performed at each center, on three genomic DNA replicates from three well-characterized cell lines. Secondary analysis pipelines employed by each site were applied to sequence data from each of the sites, resulting in three datasets for each of four variables (cell line, replicate, sequencing center, and analysis pipeline), for a total of 81 datasets. These datasets were each assessed according to multiple quality metrics including concordance with benchmark variant truth sets to assess consistent quality across all three conditions for each variable. Three-way concordance analysis of variants across conditions for each variable was performed. Our results showed that the variant concordance between datasets differing only by sequencing center was similar to the concordance for datasets differing only by replicate, using the same analysis pipeline. We also showed that the statistically significant differences between datasets result from the analysis pipeline used, which can be unified and updated as new approaches become available. We conclude that genome sequencing projects can rely on the quality and reproducibility of aggregate data generated across a network of distributed sites.

Copyright © 2020 Corbett, Eveleigh, Whitney, Barai, Bourgey, Chuah, Johnson, Moore, Moradin, Mungall, Pereira, Reuter, Thiruvahindrapuram, Wintle, Ragoussis, Strug, Herbrick, Aziz, Jones, Lathrop, Scherer, Staffa and Mungall.

Entities: Chemical Disease Gene Species

Keywords: benchmark; comparison; genome; informatics; variant; whole genome sequencing

Year: 2020 PMID： 33335541 PMCID： PMC7736078 DOI： 10.3389/fgene.2020.612515

Source DB: PubMed Journal: Front Genet ISSN： 1664-8021 Impact factor: 4.599

2 in total

1. Analysis of recent shared ancestry in a familial cohort identifies coding and noncoding autism spectrum disorder variants.

Authors: Islam Oguz Tuncay; Nancy L Parmalee; Raida Khalil; Kiran Kaur; Ashwani Kumar; Mohamed Jimale; Jennifer L Howe; Kimberly Goodspeed; Patricia Evans; Loai Alzghoul; Chao Xing; Stephen W Scherer; Maria H Chahrour
Journal: NPJ Genom Med Date: 2022-02-21 Impact factor: 8.617

2. TMBur: a distributable tumor mutation burden approach for whole genome sequencing.

Authors: Emma Titmuss; Richard D Corbett; Scott Davidson; Sanna Abbasi; Laura M Williamson; Erin D Pleasance; Adam Shlien; Daniel J Renouf; Steven J M Jones; Janessa Laskin; Marco A Marra
Journal: BMC Med Genomics Date: 2022-09-07 Impact factor: 3.622

2 in total