Literature DB >> 28968781

Federation in genomics pipelines: techniques and challenges.

Somali Chaterji1, Jinkyu Koo2, Ninghui Li1, Folker Meyer3,4, Ananth Grama1, Saurabh Bagchi2.   

Abstract

Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases.

Mesh:

Year:  2019        PMID: 28968781      PMCID: PMC6357554          DOI: 10.1093/bib/bbx102

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  15 in total

1.  Compressive genomics.

Authors:  Po-Ru Loh; Michael Baym; Bonnie Berger
Journal:  Nat Biotechnol       Date:  2012-07-10       Impact factor: 54.908

2.  Search and clustering orders of magnitude faster than BLAST.

Authors:  Robert C Edgar
Journal:  Bioinformatics       Date:  2010-08-12       Impact factor: 6.937

3.  Next-generation digital information storage in DNA.

Authors:  George M Church; Yuan Gao; Sriram Kosuri
Journal:  Science       Date:  2012-08-16       Impact factor: 47.728

4.  Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.

Authors:  Jianjiong Gao; Bülent Arman Aksoy; Ugur Dogrusoz; Gideon Dresdner; Benjamin Gross; S Onur Sumer; Yichao Sun; Anders Jacobsen; Rileen Sinha; Erik Larsson; Ethan Cerami; Chris Sander; Nikolaus Schultz
Journal:  Sci Signal       Date:  2013-04-02       Impact factor: 8.192

5.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

6.  Cybersecurity in health care.

Authors:  Eric D Perakslis
Journal:  N Engl J Med       Date:  2014-07-31       Impact factor: 91.245

7.  Data Formats in Bioinformatics.

Authors:  Jui-Hung Hung; Zhiping Weng
Journal:  Cold Spring Harb Protoc       Date:  2016-08-01

8.  Identifying personal microbiomes using metagenomic codes.

Authors:  Eric A Franzosa; Katherine Huang; James F Meadow; Dirk Gevers; Katherine P Lemon; Brendan J M Bohannan; Curtis Huttenhower
Journal:  Proc Natl Acad Sci U S A       Date:  2015-05-11       Impact factor: 11.205

9.  The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.

Authors:  F Meyer; D Paarmann; M D'Souza; R Olson; E M Glass; M Kubal; T Paczian; A Rodriguez; R Stevens; A Wilke; J Wilkening; R A Edwards
Journal:  BMC Bioinformatics       Date:  2008-09-19       Impact factor: 3.169

10.  Big Data: Astronomical or Genomical?

Authors:  Zachary D Stephens; Skylar Y Lee; Faraz Faghri; Roy H Campbell; Chengxiang Zhai; Miles J Efron; Ravishankar Iyer; Michael C Schatz; Saurabh Sinha; Gene E Robinson
Journal:  PLoS Biol       Date:  2015-07-07       Impact factor: 8.029

View more
  4 in total

1.  A Distributed Classifier for MicroRNA Target Prediction with Validation Through TCGA Expression Data.

Authors:  Asish Ghoshal; Jinyi Zhang; Michael A Roth; Kevin Muyuan Xia; Ananth Y Grama; Somali Chaterji
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2018-04-19       Impact factor: 3.710

2.  Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers.

Authors:  Kanak Mahadik; Christopher Wright; Milind Kulkarni; Saurabh Bagchi; Somali Chaterji
Journal:  Sci Rep       Date:  2019-10-16       Impact factor: 4.379

3.  Sustainable biobanks: a case study for a green global bioethics.

Authors:  G Samuel; F Lucivero; A M Lucassen
Journal:  Glob Bioeth       Date:  2022-02-24

Review 4.  Deciphering the Omics of Plant-Microbe Interaction: Perspectives and New Insights.

Authors:  Minaxi Sharma; Surya Sudheer; Zeba Usmani; Rupa Rani; Pratishtha Gupta
Journal:  Curr Genomics       Date:  2020-08       Impact factor: 2.236

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.