Safty

August 15, 2017 | Author: Rahul Sharma | Category: Cloud Computing, Key (Cryptography), Security Engineering, Data, Data Management
Share Embed Donate


Short Description

Hybrid Cloud Approach for Secure Authorized De-duplication"...

Description

Seminar-I Report On

“Hybrid Cloud Approach for Secure Authorized De-duplication” Submitted by Ms.Tejaswini Bhamare MASTER OF ENGINEERING IN (Computer Engineering) Under the Guidance of Prof. B.R.Nandwalkar

DEPARTMENT OF COMPUTER ENGINEERING Kalyani Charitable Trust’s Late G. N. Sapkal College of Engineering, Anjaneri, Nashik- 422212 Academic Year 2014-2015

Kalyani Charitable Trust’s

Late G.N.Sapkal College of Engineering

Certificate This is to certify that the Seminar-I Entitled

Hybrid Cloud Approach For Secure Authorized Deduplication Submitted by

Ms.Tejaswini Bhamare M.E. (Computer Engineering) Has successfully completed her Seminar I Towards the partial fulfillment of Masters Degree in Computer Engineering Savitribai Phule Pune University During the year 2014-2015

Prof.B.R. Nandwalkar (Guide)

Prof.N.R.Wankhede

Dr.V.J.Gond

(HOD)

(Principal)

Contents Contents

ii

List of Figures

iv

1 Introduction

1

1.1

Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1.1

Justification of Problem . . . . . . . . . . . . . . . . . . . . . . .

3

1.1.2

Need of System . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.1.3

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.1.4

Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2 Literature Survey 2.1

5

Study of Existing Systems / Technologies . . . . . . . . . . . . . . . . . .

5

2.1.1

DupLESS Server-Aided Encryption for Deduplicated Storage . . .

5

2.1.2

Proofs of Ownership in Remote Storage Systems . . . . . . . . . .

6

2.1.3

Twin Clouds: An Architecture for Secure Cloud Computing . . .

6

2.1.4

Private Data Deduplication Protocols in Cloud Storage . . . . . .

7

2.2

Analysis of Existing Systems / technologies

. . . . . . . . . . . . . . . .

7

2.3

Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.3.1

Encryption Files . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.3.2

Confidential Encryption . . . . . . . . . . . . . . . . . . . . . . .

8

2.3.3

Proof of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

3 Technical Details 3.1

10

Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

3.1.1

Identification protocol . . . . . . . . . . . . . . . . . . . . . . . .

10

3.1.2

Hash based Deduplication . . . . . . . . . . . . . . . . . . . . . .

10

ii

3.1.3

Roles of Entities

. . . . . . . . . . . . . . . . . . . . . . . . . . .

11

3.1.4

Operations performed on Hybrid Cloud . . . . . . . . . . . . . . .

13

3.2

Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

3.3

Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

4 Conclusion

17

iii

List of Figures 1.1

Architecture of Cloud Computing . . . . . . . . . . . . . . . . . . . . . .

2

1.2

Architecture of Hybrid Cloud . . . . . . . . . . . . . . . . . . . . . . . .

3

1.3

Notation used in paper . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.1

Confidential Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.2

Architecture of Authorized Deduplication . . . . . . . . . . . . . . . . . .

9

3.1

Implementation of hash algorithm . . . . . . . . . . . . . . . . . . . . . .

11

3.2

Hash Work Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

iv

Chapter 1 Introduction 1.1

Problem Definition

In computing, data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data. Related and somewhat synonymous terms are intelligent (data) compression and single-instance (data) storage. This technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent [?].In the deduplication process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced. A Hybrid Cloud is a combined form of private clouds and public clouds in which some critical data resides in the enterprise’s private cloud while other data is stored in and accessible from a public cloud. Hybrid clouds seek to deliver the advantages of scalability, reliability, rapid deployment and potential cos savings of public clouds with the security and increased.control and management of private clouds. As cloud computing becomes famous, an increasing amount of data is being stored in the cloud and used by users with specified privileges, which define the access rights of the stored data.

1

Figure 1.1: Architecture of Cloud Computing

The critical challenge of cloud storage or cloud computing is the management of the continuously increasing volume of data. Data deduplication or Single Instancing essentially refers to the elimination of redundant data. In the deduplication process, duplicate data is deleted, leaving only one copy (single instance) of the data to be stored. However, indexing of all data is still retained should that data ever be required. In general the data deduplication eliminates the duplicate copies of repeating data. The data is encrypted before outsourcing it on the cloud or network. This encryption requires more time and space requirements to encode data. In case of large data storage the encryption becomes even more complex and critical. By using the data deduplication inside a hybrid cloud, the encryption will become simpler. As we all know that the network is consist of abundant amount of data, which is being shared by users and nodes in the network. Many large scale network uses the data cloud to store and share their data on the network. The node or user, which is present in the network have full rights to upload or download data over the network. But many times different user uploads the same data on the network. Which will create a duplication inside the cloud. If the user wants to retrieve the data or download the data from cloud, every time he has to use the two encrypted files of same data. The cloud will do same operation on the two copies of data files. Due to this the data confidentiality and the security of the cloud get violated. It creates the burden on the operation of cloud. 2

To avoid this duplication of data and to maintain the confidentiality in the cloud we

Figure 1.2: Architecture of Hybrid Cloud

using the concept of Hybrid cloud. It is a combination of public and private cloud. Hybrid cloud storage combines the advantages of scalability, reliability, rapid deployment and potential cost savings of public cloud storage with the security and full control of private cloud storage.

1.1.1

Justification of Problem

Data deduplication is the process of eliminating redundant data by comparing new segments with segments already stored and only keeping one copy. The technology can lead to a significant reduction in required storage space, especially in situations where redundancy is high. As a result, data deduplication has firmly established itself in the backup market.

1.1.2

Need of System

Data deduplication is a technique for reducing the amount of storage space an organization needs to save its data. In most organizations, the storage systems contain duplicate copies of many pieces of data. For example, the same file may be saved in several different places by different users, or two or more files that aren’t identical may still include much of the same data. Deduplication eliminates these extra copies by saving just one copy of the data and replacing the other copies with pointers that lead back to the original copy. Companies frequently use deduplication in backup and disaster recovery applications, but it can be used to free up space in primary storage as well.

3

1.1.3

Applications

Hybrid clouds are mainly built to suit any of the IT environment or architecture, whether it might be any enterprise wide IT network or any department. Public data which is stored can be analyzed from statistical analyses which is done by social media, government entities can be used to enhance and analyze their own corporate data stand which is internal to gain the most form of perusing hybrid cloud Benefits. But analysis of big data and high performance computing that is involved between clouds is challenging.

1.1.4

Contribution

For solving the problems of deduplication we consider a hybrid cloud architecture consisting of a public cloud and a private cloud. Different privilege levels have been allocated to securely perform duplicate check in private cloud. A new deduplication system supporting differential duplicate check is proposed under this hybrid cloud architecture where the S-CSP resides in the public cloud.

Figure 1.3: Notation used in paper

4

Chapter 2 Literature Survey 2.1

Study of Existing Systems / Technologies

2.1.1

DupLESS Server-Aided Encryption for Deduplicated Storage DupLess: Server aided encryption for deduplicated storage for cloud stor-

age service provider like Mozy, Dropbox, and others perform deduplication to save space by only storing one copy of each file uploaded .Message lock encryption is used to resolve the problem of clients encrypt their file however the saving are lock. Dupless is used to provide secure deduplicated storage as well as storage resisting brute-force attacks. Clients encrypt under message-based keys obtained from a key-server via an oblivious PRF protocol in dupless server. It allow clients to store encrypted data with an existing service, have the service occurs deduplication on their on the part , and yet achieves strong confidentiality guarantees. It show that encryption for deduplicated storage can successfully reach desired performance and space savings close to that of using the storage service with plaintext data.

5

Characteristics:

1. More Security . 2. Easily-deployed solution for encryption that supports deduplication. 3. User Friendly: Use command-line client that supports both Dropbox and Google Drive. 4. Resolve the problem of message lock Encryption.

2.1.2

Proofs of Ownership in Remote Storage Systems

It stores only the single copy of the duplicate data. Client-side deduplication tries to identify deduplication chance already at the client and save the bandwidth of uploading copies of existing files to the server.To overcome the attacks Shai Halevi1, Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg proposes the Proof of ownership which lets a client efficiently prove to a server that that the client keep a file, rather than just some short information about it present solutions based on Merkle trees and specific encodings, and analyse their security. Characteristics:

1. To identify the attacks that exploit client-side deduplication.. 2. Proofs of ownership provide the rigorous security. 3. Rigorous efficiency requirements of Peta-byte scale storage systems.

2.1.3

Twin Clouds: An Architecture for Secure Cloud Computing

S. Bugiel, S. Nurnberger, A. Sadeghi, and T. Schneider proposed architecture for secure outsourcing of data and arbitrary computations to an untrusted commodity cloud. In come towards, the user communicates with a trusted cloud. Which encrypts as well as verifies the data stored and operations occurred in the untrusted cloud .It divide the 6

computations such that the trusted cloud is used for security-critical operations in the less time-critical setup phase, whereas queries to the outsourced data are processed in parallel by the fast cloud on encrypted data.

2.1.4

Private Data Deduplication Protocols in Cloud Storage

Most important issue in the cloud storage is utilization of the storage capacity. In this paper, there are two categories of data deduplication strategy, and extend the fault-tolerant digital signature scheme proposed by Zhang on examining redundancy of blocks to achieve the data deduplication. The proposed scheme in this paper not only reduces the cloud storage capacity, but also improves the speed of data deduplication. Furthermore, the signature is computed for every uploaded file for verifying the integrity of files.

2.2

Analysis of Existing Systems / technologies Our system is designed to solve the differential privilege problem in secure dedupli-

cation. The security will be analysed in terms of two aspects, that is, the authorization of duplicate check and the confidentiality of data.Some basic tools have been used to construct the secure deduplication, which are assumed to be secure. These basic tools include the convergent encryption scheme, symmetric encryption scheme, and the PoW scheme. Based on this assumption, we show that systems are secure with respect to the following security analysis.

2.3

Proposed System In the proposed system we are achieving the data deduplication by providing the

proof of data by the data owner. This proof is used at the time of uploading of the file. Each file uploaded to the cloud is also bounded by a set of privileges to specify which kind of users is allowed to perform the duplicate check and access the files. Before submitting his duplicate check request for some file, the user needs to take this file and his own privileges as inputs. The user is able to find a duplicate for this file if and only if there is a copy of this file and a matched privilege stored in cloud. 7

2.3.1

Encryption Files

Here we are using the common secret key k to encrypt as well as decrypt data. This will use to convert the plain text to cipher text and again cipher text to plain text. Here we have used three basic functions, KeyGenSE: k is the key generation algorithm that generates using security parameter 1. EncSE (k, M): C is the symmetric encryption algorithm that takes the secret and message M and then outputs the ciphertext C; DecSE (k, C): M is the symmetric decryption algorithm that takes the secret and ciphertext C and then outputs the original message M.

2.3.2

Confidential Encryption

It provides data confidentiality in deduplication. A user derives a convergent key from each original data copy and encrypts the data copy with the convergent key. In addition, the user also derives a tag for the data copy, such that the tag will be used to detect duplicates.

Figure 2.1: Confidential Encryption

8

2.3.3

Proof of Data

It provides data confidentiality in deduplication. A user derives a convergent key from each original data copy and encrypts the data copy with the convergent key. In addition, the user also derives a tag for the data copy, such that the tag will be used to detect duplicates.

Figure 2.2: Architecture of Authorized Deduplication

9

Chapter 3 Technical Details 3.1 3.1.1

Concept Identification protocol

An identification protocol can be described with two phases: Proof and Verify. In the stage of Proof, a prover/user U can demonstrate his identity to a verifier by performing some identification proof related to his identity. The input of the prover/user is his private key skU that is sensitive information such as private key of a public key in his certificate or credit card number etc. that he would not like to share with the other users. The verifier performs the verification with input of public information pkU related to skU. At the conclusion of the protocol, the verifier outputs either accept or reject to denote whether the proof is passed or not. There are many efficient identification protocols in literature, including certificate-based, identity-based identification etc.

3.1.2

Hash based Deduplication

Hash based data de-duplication methods use a hashing algorithm to identify chunks of data. Commonly used algorithms are Secure Hash Algorithm (SHA-1) and MessageDigest Algorithm (MD5). When data is processed by a hashing algorithm, a hash is created that represents the data. A hash is a bit string (128 bits for MD5 and 160 bits for SHA-1) that represents the data processed. If you processed the same data through the hashing algorithm multiple times, the same hash is created each time. Hash based de-duplication breaks data into chunks, either fixed or variable length, and 10

processes the chunk with the hashing algorithm to create a hash. If the hash already exists, the data is deemed to be a duplicate and is not stored. If the hash does not exist, then the data is stored and the hash index is updated with the new hash. In Figure 5, data chunks A, B, C, D, and E are processed by the hash algorithm and creates hashes Ah, Bh, Ch, Dh, and Eh; for purposes of this example, we assume this is all new data. Later, chunks A, B, C, D, and F are processed. F generates a new hash Fh. Since A, B, C, and D generated the same hash, the data is presumed to be the same data, so it is not stored again. Since F generates a new hash, the new hash and new data are stored.

Figure 3.1: Implementation of hash algorithm

Flow Chart

3.1.3

Roles of Entities

• S-CSP The purpose of this entity to work as a data storage service in public cloud.On the half of the user S-CSP store the data.The S-CSP eliminate the duplicate data using deduplication and keep the unique data as it is.S-SCP entity is used to reduce the storage cost.S-CSP han abundant storage capacity and computational power.When user send respective token for accessing his file from public cloud S-CSP matches this token with internally if it matched then an then only he send the file or ciphertext Cf with token, otherwise he send abort signal to user.After receiving file user use convergent key KF to decrypt the file. 11

Figure 3.2: Hash Work Flow Chart

• Data User A user is an entity that want to access the data or files from S-SCP.User generate the key and store that key in private cloud.In storage system supporting deduplication,The user only upload unique data but do not upload any duplicate data to save the upload bandwidth,which may be owned by the same user or different users.Each file is protected by convergent encryption key and can access by only athorized person.In our system user must need to register in private cloud for storing token with respective file which are store on public cloud.When he want to access that file he access respective token from private cloud and then access his files from public cloud.token consist of file content F and convergent key KF.

• Private Cloud In general for providing more security user can use the private cloud instead of public cloud.User store the generated key in private cloud.At the time of downloading system ask the key to download the file.User can not store the secrete key internally.for providing proper protection to key we use private cloud.Private cloud only store the convergent key with respective file.When user want to access the key he first check authority of user then an then provide key.

12

• Public Cloud Public cloud entity is used for the storage purpose.User upload the files in public cloud.Public cloud is similar as S-CSP.When the user want to download the files from public cloud,it will be ask the key which is generated or stored in private cloud.When the users key is match with files key at that time user can download the file,without key user can not access the file.Only authorized user can access the file.In public cloud all files are stored in encrypted format.If any chance unauthorized person hack our file,but without the secrete or convergent key he doesnt access original file.On public cloud there are lots of files are store each user access its respective file if its token matches with S-CSP server token.

3.1.4

Operations performed on Hybrid Cloud

• File Uploading : When user want to upload the file to the public cloud then user first encrypt the file which is to be upload by make a use of the symmetric key,and send it to the Public cloud. At the same time user generates the key for that file and sends it to the private cloud. in this way user can upload the file in to the public cloud.

• File Downloading When user wants to download the file that he/she has upload on the public cloud.he/she make a request to the public cloud. then public cloud provide a list of files that many users are upload on it.Among that user select one of the file form the list of files and enter the download option.at that time private cloud sends a message that enter the key for the file generated by the user.then user enters the key for the file that he/she is generated.then private cloud checks the key for that file and if the key is correct that means the user is valid.only then and then the user can download the file from the public cloud otherwise user can’t download the file. When user download the file from the public cloud it is in the encrypted format then user decrypt that file by using the same symmetric key.

13

3.2

Design Goals

1. Differential Authorization Each authorized user is able to get his/her individual token of his file to perform duplicate check based on his privileges. Under this assumption, any user cannot generate a token for duplicate check out of his privileges or without the aid from the private cloud server.

2. Authorized Duplicate Check Authorized user is able to use his/her individual private keys to generate query for certain file and the privileges he/she owned with the help of private cloud, while the public cloud performs duplicate check directly and tells the user if there is any duplicate.

3. Unforgeability of file token/duplicate-check token Unauthorized users without appropriate privileges or file should be prevented from getting or generating the file tokens for duplicate check of any file stored at the S-CSP. The duplicate check token of users should be issued from the private cloud server in our scheme.

3.3

Performance Analysis

We implement a prototype of the proposed authorized deduplication system, in which we model three entities as separate C++ programs. A Client program is used to model the data users to carry out the file upload process. A Private Server program is used to model the private cloud which manages the private keys and handles the file token computation. A Storage Server program is used to model the S-CSP which stores and deduplicates files. We implement cryptographic operations of hashing and encryption with the OpenSSL library.We also implement the communication between the entities based on HTTP, using GNU Libmicrohttpd and libcurl .Thus, users can issue HTTP Post requests to the servers.

14

Our implementation of the Client provides the following function calls to support token generation and deduplication along the file upload process.

• FileTag(File) - It computes SHA-1 hash of the File as File Tag; • TokenReq(Tag, UserID) - It requests the Private Server for File Token generation with the File Tag and User ID; • DupCheckReq(Token) - It requests the Storage Server for Duplicate Check of the File by sending the file token received from private server; • ShareTokenReq(Tag, Priv.) - It requests the Private Server to generate the Share File Token with the File Tag and Target Sharing Privilege Set; • FileEncrypt(File) - It encrypts the File with Convergent Encryption using 256-bit AES algorithm in cipher block chaining (CBC) mode, where the convergent key is from SHA-256 Hashing of the file; • FileUploadReq(FileID, File, Token) It uploads the File Data to the Storage Server if the file is Unique and updates the File Token stored. • Our implementation of the Private Server includes corresponding request handlers for the token generation and maintains a key storage with Hash Map. • TokenGen(Tag, UserID) - It loads the associated privilege keys of the user and generate the token with HMAC-SHA-1 algorithm;

15

ADVANTAGES OF AUTHORISED DEDUPLICATION SYSTEM

1. The client is permitted to perform the duplicate copy check for records selected with the particular subject.

2. The complex subject to help stronger security by encoding the record with distinct privilege keys.

3. Decrease the storage space of the tags for reliability check. To strengthen the security of deduplication and ensure the data privacy.

16

Chapter 4 Conclusion The idea of Authorized Data deduplication was proposed to secure the information security by counting differential benefits of clients in the copy check. Yan Kit Li et al additionally exhibited a few new deduplication developments supporting approved copy check in hybrid cloud construction modeling, in which the copy check tokens of documents are created by the private cloud server having private keys. Security examination shows that our plans are secure as far as insider and outsider attacks determined in the proposed security model. As an issue verification of idea, they actualized a model of the proposed approved copy check plan and behavior test bed investigates their model. They indicated that their authorized copy check plan brings about insignificant overhead comparing convergent encryption and system exchange. We design and implement a new system which could protect the security for predicatable message. The main idea of our technique is that the novel encryption key generation algorithm. For simplicity, we will use the hash functions to define the taggeneration functions and convergent keys in this section. In traditional convergent encryption, to support duplicate check, the key is derived from the file F by using some cryptographic hash function kF = H(F). To avoid the deterministic key generation, the encryption key kF for file F in our system will be generated with the aid of the private key cloud server with privilege key kp. The encryption key can be viewed as the form of kF;p =H0(H(F), kp) H2(F), where H0,H and H2 are all cryptographic hash functions. The file F is encrypted with another key k, while k will be encrypted with kF;p. In this way, both the private cloud server and S-CSP cannot decrypt the ciphertext.

17

Bibliography

18

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF