Literature DB >> 32650089

Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network.

Santos Kumar Baliarsingh1, Swati Vipsita2, Amir H Gandomi3, Abhijeet Panda4, Sambit Bakshi5, Somula Ramasubbareddy6.   

Abstract

BACKGROUND: The size of genomics data has been growing rapidly over the last decade. However, the conventional data analysis techniques are incapable of processing this huge amount of data. For the efficient processing of high dimensional datasets, it is essential to develop some new parallel methods.
METHODS: In this work, a novel distributed method is presented using Map-Reduce (MR)-based approach. The proposed algorithm consists of MR-based Fisher score (mrFScore), MR-based ReliefF (mrRefiefF), and MR-based probabilistic neural network (mrPNN) using a weighted chaotic grey wolf optimization technique (WCGWO). Here, mrFScore, and mrRefiefF methods are introduced for feature selection (FS), and mrPNN is implemented as an effective method for microarray classification. The proper choice of smoothing parameter (σ) plays a major role in the prediction ability of the PNN which is addressed using a novel technique namely, WCGWO. The WCGWO algorithm is used to select the optimal value of σ in PNN.
RESULTS: These algorithms have been successfully implemented using the Hadoop framework. The proposed model is tested by using three large and one small microarray datasets, and a comparative analysis is carried out with the existing FS and classification techniques. The results suggest that WCGWO-mrPNN can outperform other methods for high dimensional microarray classification.
CONCLUSION: The effectiveness of the proposed methods are compared with other existing schemes. Experimental results reveal that the proposed scheme is accurate and robust. Hence, the suggested scheme is considered to be a reliable framework for microarray data analysis. SIGNIFICANCE: Such a method promotes the application of parallel programming using Hadoop cluster for the analysis of large-scale genomics data, particularly when the dataset is of high dimension.
Copyright © 2020 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Fisher score; Grey wolf optimization; Hadoop; MapReduce; Probabilistic neural network; ReliefF

Mesh:

Year:  2020        PMID: 32650089     DOI: 10.1016/j.cmpb.2020.105625

Source DB:  PubMed          Journal:  Comput Methods Programs Biomed        ISSN: 0169-2607            Impact factor:   5.428


  4 in total

1.  Entropy analysis and grey cluster analysis of multiple indexes of 5 kinds of genuine medicinal materials.

Authors:  Libing Zhou; Caiyun Jiang; Qingxia Lin
Journal:  Sci Rep       Date:  2022-04-22       Impact factor: 4.996

2.  A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis.

Authors:  Tarneem Elemam; Mohamed Elshrkawey
Journal:  ScientificWorldJournal       Date:  2022-08-09

3.  EGFAFS: A Novel Feature Selection Algorithm Based on Explosion Gravitation Field Algorithm.

Authors:  Lan Huang; Xuemei Hu; Yan Wang; Yuan Fu
Journal:  Entropy (Basel)       Date:  2022-06-25       Impact factor: 2.738

4.  Demand forecasting model for time-series pharmaceutical data using shallow and deep neural network model.

Authors:  R Rathipriya; Abdul Aziz Abdul Rahman; S Dhamodharavadhani; Abdelrhman Meero; G Yoganandan
Journal:  Neural Comput Appl       Date:  2022-10-06       Impact factor: 5.102

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.