Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset.

Literature DB >> 27265059

A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset.

Sarwar Kamal¹, Shamim Hasnat Ripon¹, Nilanjan Dey², Amira S Ashour³, V Santhi⁴.

Abstract

BACKGROUND: In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential.
METHOD: In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor (K-NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches.
RESULTS: To test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy.
CONCLUSIONS: The obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA.

Keywords: Big data; Computational biology; DNA (deoxyribonucleic acid); Imbalance data; K-nearest neighbor; MapReduce

Mesh：

Substances：
DNA

Year: 2016 PMID： 27265059 DOI： 10.1016/j.cmpb.2016.04.005

Source DB: PubMed Journal: Comput Methods Programs Biomed ISSN： 0169-2607 Impact factor: 5.428

Keyword Cloud
Cited

3 in total

Review 1. A Survey of Data Mining and Deep Learning in Bioinformatics.

Authors: Kun Lan; Dan-Tong Wang; Simon Fong; Lian-Sheng Liu; Kelvin K L Wong; Nilanjan Dey
Journal: J Med Syst Date: 2018-06-28 Impact factor: 4.460

2. Top-k dominating queries on incomplete large dataset.

Authors: Jimmy Ming-Tai Wu; Min Wei; Mu-En Wu; Shahab Tayeb
Journal: J Supercomput Date: 2021-08-17 Impact factor: 2.474

3. In-silico designing of epitope-based vaccine against the seven banded grouper nervous necrosis virus affecting fish species.

Authors: Amit Joshi; Dinesh Chandra Pathak; M Amin-Ul Mannan; Vikas Kaushik
Journal: Netw Model Anal Health Inform Bioinform Date: 2021-05-31

3 in total