| Literature DB >> 24303235 |
Xiao Dong1, Neil Bahroos, Eugene Sadhu, Tommie Jackson, Morris Chukhman, Robert Johnson, Andrew Boyd, Denise Hynes.
Abstract
In this manuscript, we present our experiences using the Apache Hadoop framework for high data volume and computationally intensive applications, and discuss some best practice guidelines in a clinical informatics setting. There are three main aspects in our approach: (a) process and integrate diverse, heterogeneous data sources using standard Hadoop programming tools and customized MapReduce programs; (b) after fine-grained aggregate results are obtained, perform data analysis using the Mahout data mining library; (c) leverage the column oriented features in HBase for patient centric modeling and complex temporal reasoning. This framework provides a scalable solution to meet the rapidly increasing, imperative "Big Data" needs of clinical and translational research. The intrinsic advantage of fault tolerance, high availability and scalability of Hadoop platform makes these applications readily deployable at the enterprise level cluster environment.Entities:
Year: 2013 PMID: 24303235
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc