Literature DB >> 30462158

The application of Hadoop in structural bioinformatics.

Jamie J Alnasir1, Hugh P Shanahan2.   

Abstract

The paper reviews the use of the Hadoop platform in structural bioinformatics applications. For structural bioinformatics, Hadoop provides a new framework to analyse large fractions of the Protein Data Bank that is key for high-throughput studies of, for example, protein-ligand docking, clustering of protein-ligand complexes and structural alignment. Specifically we review in the literature a number of implementations using Hadoop of high-throughput analyses and their scalability. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. Direct comparisons of Hadoop with batch schedulers are absent in the literature but we note there is some evidence that Message Passing Interface implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop, e.g. Spark improve, usage of cloud platforms (e.g. Azure and Amazon Web Services (AWS)) increases and standardised approaches such as Workflow Languages (i.e. Workflow Definition Language, Common Workflow Language and Nextflow) are taken up.

Year:  2018        PMID: 30462158     DOI: 10.1093/bib/bby106

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  3 in total

1.  Shared data science infrastructure for genomics data.

Authors:  Hamid Bagheri; Usha Muppirala; Rick E Masonbrink; Andrew J Severin; Hridesh Rajan
Journal:  BMC Bioinformatics       Date:  2019-08-22       Impact factor: 3.169

2.  iProX in 2021: connecting proteomics data sharing with big data.

Authors:  Tao Chen; Jie Ma; Yi Liu; Zhiguang Chen; Nong Xiao; Yutong Lu; Yinjin Fu; Chunyuan Yang; Mansheng Li; Songfeng Wu; Xue Wang; Dongsheng Li; Fuchu He; Henning Hermjakob; Yunping Zhu
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

3.  BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.

Authors:  Jinxiang Chen; Fuyi Li; Miao Wang; Junlong Li; Tatiana T Marquez-Lago; André Leier; Jerico Revote; Shuqin Li; Quanzhong Liu; Jiangning Song
Journal:  Front Big Data       Date:  2022-01-18
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.