Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 xGAP: A python based efficient, modular, extensible and fault tolerant genomic analysis pipeline for variant discovery.

Literature DB >> 33416856

xGAP: A python based efficient, modular, extensible and fault tolerant genomic analysis pipeline for variant discovery.

Aditya Gorla¹, Brandon Jew², Luke Zhang³, Jae Hoon Sul⁴.

Abstract

MOTIVATION: Since the first human genome was sequenced in 2001, there has been a rapid growth in the number of bioinformatic methods to process and analyze next generation sequencing (NGS) data for research and clinical studies that aim to identify genetic variants influencing diseases and traits. To achieve this goal, one first needs to call genetic variants from NGS data which requires multiple computationally intensive analysis steps. Unfortunately, there is a lack of an open source pipeline that can perform all these steps on NGS data in a manner which is fully automated, efficient, rapid, scalable, modular, user-friendly and fault tolerant. To address this, we introduce xGAP, an extensible Genome Analysis Pipeline, which implements modified GATK best practice to analyze DNA-seq data with aforementioned functionalities.
RESULTS: xGAP implements massive parallelization of the modified GATK best practice pipeline by splitting a genome into many smaller regions with efficient load-balancing to achieve high scalability. It can process 30x coverage whole-genome sequencing (WGS) data in approximately 90 minutes. In terms of accuracy of discovered variants, xGAP achieves average F1 scores of 99.37% for SNVs and 99.20% for Indels across seven benchmark WGS datasets. We achieve highly consistent results across multiple on-premises (SGE & SLURM) high performance clusters. Compared to the Churchill pipeline, with similar parallelization, xGAP is 20% faster when analyzing 50X coverage WGS in AWS. Finally, xGAP is user-friendly and fault tolerant where it can automatically re-initiate failed processes to minimize required user intervention. AVAILABILITY: xGAP is available at https://github.com/Adigorla/xgap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2021 PMID： 33416856 PMCID： PMC8034531 DOI： 10.1093/bioinformatics/btaa1097

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
References

29 in total

1. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors: Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal: Genome Res Date: 2010-07-19 Impact factor: 9.043

2. Best practices for benchmarking germline small-variant calls in human genomes.

Authors: Peter Krusche; Len Trigg; Paul C Boutros; Christopher E Mason; Francisco M De La Vega; Benjamin L Moore; Mar Gonzalez-Porta; Michael A Eberle; Zivana Tezak; Samir Lababidi; Rebecca Truty; George Asimenos; Birgit Funke; Mark Fleharty; Brad A Chapman; Marc Salit; Justin M Zook
Journal: Nat Biotechnol Date: 2019-03-11 Impact factor: 54.908

3. A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors: Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal: Nat Genet Date: 2011-04-10 Impact factor: 38.330

4. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics.

Authors: Benjamin J Kelly; James R Fitch; Yangqiu Hu; Donald J Corsmeier; Huachun Zhong; Amy N Wetzel; Russell D Nordquist; David L Newsom; Peter White
Journal: Genome Biol Date: 2015-01-20 Impact factor: 13.583

5. Halvade: scalable sequence analysis with MapReduce.

Authors: Dries Decap; Joke Reumers; Charlotte Herzeel; Pascal Costanza; Jan Fostier
Journal: Bioinformatics Date: 2015-03-26 Impact factor: 6.937

Review 6. Clinical applications of next generation sequencing in cancer: from panels, to exomes, to genomes.

Authors: Tony Shen; Stefan Hans Pajaro-Van de Stadt; Nai Chien Yeat; Jimmy C-H Lin
Journal: Front Genet Date: 2015-06-17 Impact factor: 4.599

7. VCPA: genomic variant calling pipeline and data management tool for Alzheimer's Disease Sequencing Project.

Authors: Yuk Yee Leung; Otto Valladares; Yi-Fan Chou; Han-Jen Lin; Amanda B Kuzma; Laura Cantwell; Liming Qu; Prabhakaran Gangadharan; William J Salerno; Gerard D Schellenberg; Li-San Wang
Journal: Bioinformatics Date: 2019-06-01 Impact factor: 6.937

8. SparkGA2: Production-quality memory-efficient Apache Spark based genome analysis framework.

Authors: Hamid Mushtaq; Nauman Ahmed; Zaid Al-Ars
Journal: PLoS One Date: 2019-12-05 Impact factor: 3.240

9. Recommendations for performance optimizations when using GATK3.8 and GATK4.

Authors: Jacob R Heldenbrand; Saurabh Baheti; Matthew A Bockol; Travis M Drucker; Steven N Hart; Matthew E Hudson; Ravishankar K Iyer; Michael T Kalmbach; Katherine I Kendig; Eric W Klee; Nathan R Mattson; Eric D Wieben; Mathieu Wiepert; Derek E Wildman; Liudmila S Mainzer
Journal: BMC Bioinformatics Date: 2019-11-08 Impact factor: 3.169

10. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937