Literature DB >> 32319799

SecDedoop: Secure Deduplication with Access Control of Big Data in the HDFS/Hadoop Environment.

P Ramya1, C Sundar2.   

Abstract

With the rapid growth of storage providers, data deduplication is an essential storage optimization technique that greatly minimizes data storage costs by storing a unique copy of duplicate data. Nowadays, deduplication introduces various new challenges such as security and insufficient space issue. Hence, in this article, we propose a secure data deduplication with access control of big data over HDFS (Hadoop Distributed File System)/Hadoop environment, called SecDedoop. First, the system achieves security for data confidentiality by third party vendor using elliptic curve cryptography. There are two types of keys (public key and private key) generated for data retrieval. Second, we consider data deduplication. The user's original file is divided into a number of equal chunks. Then, each chunk (e.g., 1. txt) is tokenized into words and the weight of words is computed by using TF-IDF frequency. The SHA-3 hash computation is performed to the user's original file. If the hash value is not duplicate, then we store data in HDFS. The PSO (particle swarm optimization)-based MapReduce model is the proposed best data node selection. Initially, MapReduce process is finished for the user's original file and it results in the best set of data nodes; then, we apply PSO to compute the fitness value for best data node selection. Further, we consider MongoDB for fast indexing of the user's original files and also apply FCM (fuzzy-C-means clustering) for clustering the user's files. In this article, we consider the modified version of PSO and FCM to eliminate the open issues in conventional PSO and FCM. The performance of our proposed SecDedoop has been evaluated by using various performance metrics and also proved it outperforms better than previous approaches.

Entities:  

Keywords:  Hadoop Distributed File System; SHA-3; access control; data deduplication; elliptic curve cryptography; particle swarm optimization-based MapReduce

Mesh:

Year:  2020        PMID: 32319799     DOI: 10.1089/big.2019.0120

Source DB:  PubMed          Journal:  Big Data        ISSN: 2167-6461            Impact factor:   2.128


  1 in total

1.  Visual Analysis of Sports Actions Based on Machine Learning and Distributed Expectation Maximization Algorithm.

Authors:  Yan Luo
Journal:  Comput Intell Neurosci       Date:  2022-06-25
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.