Literature DB >> 22948728

Genotyping in the cloud with Crossbow.

James Gurtowski1, Michael C Schatz, Ben Langmead.   

Abstract

Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high-coverage, short-read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service.

Entities:  

Mesh:

Year:  2012        PMID: 22948728      PMCID: PMC3465669          DOI: 10.1002/0471250953.bi1503s39

Source DB:  PubMed          Journal:  Curr Protoc Bioinformatics        ISSN: 1934-3396


  8 in total

1.  SNP detection for massively parallel whole-genome resequencing.

Authors:  Ruiqiang Li; Yingrui Li; Xiaodong Fang; Huanming Yang; Jian Wang; Karsten Kristiansen; Jun Wang
Journal:  Genome Res       Date:  2009-05-06       Impact factor: 9.043

2.  Use of high throughput sequencing to observe genome dynamics at a single cell level.

Authors:  D Parkhomchuk; V Amstislavskiy; A Soldatov; V Ogryzko
Journal:  Proc Natl Acad Sci U S A       Date:  2009-11-23       Impact factor: 11.205

3.  Human genome 10th anniversary. Will computers crash genomics?

Authors:  Elizabeth Pennisi
Journal:  Science       Date:  2011-02-11       Impact factor: 47.728

4.  Cloud computing and the DNA data race.

Authors:  Michael C Schatz; Ben Langmead; Steven L Salzberg
Journal:  Nat Biotechnol       Date:  2010-07       Impact factor: 54.908

5.  Cloud-scale RNA-sequencing differential expression analysis with Myrna.

Authors:  Ben Langmead; Kasper D Hansen; Jeffrey T Leek
Journal:  Genome Biol       Date:  2010-08-11       Impact factor: 13.583

6.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Authors:  Ben Langmead; Cole Trapnell; Mihai Pop; Steven L Salzberg
Journal:  Genome Biol       Date:  2009-03-04       Impact factor: 13.583

7.  Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels.

Authors:  Ian Sudbery; Jim Stalker; Jared T Simpson; Thomas Keane; Alistair G Rust; Matthew E Hurles; Klaudia Walter; Dee Lynch; Lydia Teboul; Steve D Brown; Heng Li; Zemin Ning; Joseph H Nadeau; Colleen M Croniger; Richard Durbin; David J Adams
Journal:  Genome Biol       Date:  2009-10-13       Impact factor: 13.583

8.  Searching for SNPs with cloud computing.

Authors:  Ben Langmead; Michael C Schatz; Jimmy Lin; Mihai Pop; Steven L Salzberg
Journal:  Genome Biol       Date:  2009-11-20       Impact factor: 13.583

  8 in total
  9 in total

1.  Survey of gene splicing algorithms based on reads.

Authors:  Xiuhua Si; Qian Wang; Lei Zhang; Ruo Wu; Jiquan Ma
Journal:  Bioengineered       Date:  2017-09-21       Impact factor: 3.269

Review 2.  Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.

Authors:  Emad A Mohammed; Behrouz H Far; Christopher Naugler
Journal:  BioData Min       Date:  2014-10-29       Impact factor: 2.522

Review 3.  Next generation distributed computing for cancer research.

Authors:  Pankaj Agarwal; Kouros Owzar
Journal:  Cancer Inform       Date:  2015-04-27

4.  A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.

Authors:  Alexey Siretskiy; Tore Sundqvist; Mikhail Voznesenskiy; Ola Spjuth
Journal:  Gigascience       Date:  2015-06-04       Impact factor: 6.524

5.  Closha: bioinformatics workflow system for the analysis of massive sequencing data.

Authors:  GunHwan Ko; Pan-Gyu Kim; Jongcheol Yoon; Gukhee Han; Seong-Jin Park; Wangho Song; Byungwook Lee
Journal:  BMC Bioinformatics       Date:  2018-02-19       Impact factor: 3.169

6.  BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data.

Authors:  Seokjun Soe; Yoonjae Park; Heejoon Chae
Journal:  BMC Bioinformatics       Date:  2018-12-10       Impact factor: 3.169

7.  Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

Authors:  Anjani Ragothaman; Sairam Chowdary Boddu; Nayong Kim; Wei Feinstein; Michal Brylinski; Shantenu Jha; Joohyun Kim
Journal:  Biomed Res Int       Date:  2014-06-09       Impact factor: 3.411

Review 8.  Big Data Application in Biomedical Research and Health Care: A Literature Review.

Authors:  Jake Luo; Min Wu; Deepika Gopukumar; Yiqing Zhao
Journal:  Biomed Inform Insights       Date:  2016-01-19

9.  SeqVItA: Sequence Variant Identification and Annotation Platform for Next Generation Sequencing Data.

Authors:  Prashanthi Dharanipragada; Sampreeth Reddy Seelam; Nita Parekh
Journal:  Front Genet       Date:  2018-11-14       Impact factor: 4.599

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.