M.Sc. in Cloud Computing Dissertation
Short Description
An adaptive algorithm for dynamic resource allocation in large heterogeneous Cloud environments. M.Sc. in Cloud Computin...
Description
An adaptive algorithm for dynamic resource allocation in large heterogeneous Cloud environments Ryan Lacerna
Submitted as part of the requirements for the degree of MSc in Cloud Computing at the School of Computing, National College of Ireland Dublin, Ireland.
September 2015
Supervisor Dr. Adriana Chis
Abstract Today, nearly everybody is connected to the Internet and consumes cloud services whether to store, process and deliver data. Cloud consists of large networks of virtualized solutions via data centres. In this new era where cloud is at the forefront, a multitude of domains such as healthcare, education, finance, science etc. have established the need for new content-driven applications. These content-driven applications require massive data gathering, generation, processing and then have them all in a large heterogeneous system that consist of a variety of private/public cloud systems that are geographically dispersed. In this context, resource provisioning and allocation becomes a big challenge in modern distributed systems due to the unpredictable fluctuation of service requests and heterogeneity of system types within the cloud environment. In consideration, an intelligent load balancer becomes an indispensable part of cloud computing. In this research paper we propose a novel algorithm to tackle such heterogeneity. This new algorithm takes advantage of the social communication and selforganisation of the intelligent foraging behaviour of Honeybees. Creating a distributed, self-organising, multi-agent system that takes advantage of the Self-aggregation technique. Self-Aggregation attempts to group services together to structure the clouds heterogeneity. In this dissertation, we empirically evaluate this new algorithms UserBase response time and data center processing performance via CloudSim against various state-of-the-industry algorithms that are currently being used in large and heterogeneous distributed systems.
ii
Acknowledgements I would like to give my special thanks to Dr. Adriana Chis of the School of Computing, National College of Ireland for her invaluable support, technical advice and staying up 4am in the morning to proof-read and provide me feedback before submission. I have never met a lecturer with such work ethic and care for students to do well. I would also like to give my sincere thanks to Clara Aggasid for proof reading this dissertation. Her kindness and help in analysing the research paper has given me confidence that the grammar and spelling are up to standard. I could not have completed this research paper without the unwaivering love and support from my family and friends who supported and encouraged me throughout the process. The literature review and quality of critial thinking would not have been up to standard without the careful analysis and feedback from Keith Brittle. Lastly, I want to thank my team and colleagues at H&R Block GTC for their kindness and encouragement at work during the process of my dissertation.
iii
Submission of Thesis and Dissertation National College of Ireland Research Students Declaration Form (Thesis/Author Declaration Form) Name: __________________________________________________________ Student Number: _________________________________________________ Degree for which thesis is submitted: ________________________________ Material submitted for award (a) I declare that the work has been composed by myself. (b) I declare that all verbatim extracts contained in the thesis have been distinguished by quotation marks and the sources of information specifically acknowledged. (c) My thesis will be included in electronic format in the College Institutional Repository TRAP (thesis reports and projects) (d) Either *I declare that no material contained in the thesis has been used in any other submission for an academic award. Or *I declare that the following material contained in the thesis formed part of a submission for the award of ________________________________________________________________ (State the award and the awarding body and list the material below) Signature of research student: _____________________________________
Date: _____________________
Submission of Thesis to Norma Smurfit Library, National College of Ireland
Student name: ______________________________ Student number: __________________
School: ___________________________________ Course: __________________________
Degree to be awarded: _______________________________________________________________
Title of Thesis: ______________________________________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________
One hard bound copy of your thesis will be lodged in the Norma Smurfit Library and will be available for consultation. The electronic copy will be accessible in TRAP (http://trap.ncirl.ie/), the National College of Ireland’s Institutional Repository. In accordance with normal academic library practice all theses lodged in the National College of Ireland Institutional Repository (TRAP) are made available on open access. I agree to a hard bound copy of my thesis being available for consultation in the library. I also agree to an electronic copy of my thesis being made publicly available on the National College of Ireland’s Institutional Repository TRAP. Signature of Candidate: ____________________________________________________________
For completion by the School: The aforementioned thesis was received by__________________________ Date:_______________
This signed form must be appended to all hard bound and electronic copies of your thesis submitted to your school
Contents Abstract
ii
Acknowledgements
iii
1 Introduction 1.1
1
Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Background
3 4
2.1
Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.2
Honey Bee Foraging Behaviour algorithm . . . . . . . . . . . . . . . . .
6
2.3
Self-Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3 Design
11
3.1
CloudSim: Simulation Framework
. . . . . . . . . . . . . . . . . . . . .
11
3.2
Design Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.3
Self-aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.4
Honey Bee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.4.1
15
Load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Implementation
17
4.1
CloudSim components and extension . . . . . . . . . . . . . . . . . . . .
17
4.2
Self-Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
4.3
Honey Bee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.3.1
VM Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.3.2
Task transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
5 Evaluation
24
5.1
Setup and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
5.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
6 Conclusion
30
vi
List of Figures 3.1
Layered CloudSim Architecture [Calheiros et al., 2011] . . . . . . . . . .
12
3.2
Flow Diagram of Self-Aggregation . . . . . . . . . . . . . . . . . . . . .
14
3.3
Illustrates the flow diagram of the honey bee algorithm for VM load balancing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
4.1
Flow Diagram of the system architecture . . . . . . . . . . . . . . . . . .
20
5.1
UserBase data based on social network behaviour [Wickremasinghe et al., 2010] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
5.2
Graph illustrating DC Processing time . . . . . . . . . . . . . . . . . . .
27
5.3
Overall response times . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
5.4
Graph illustrating the number of task allocated to a VM for each algorithm. The x-Axis represent virtual machines while the y-Axis represent the number of allocated task. . . . . . . . . . . . . . . . . . . . . . . . .
vii
29
List of Tables 5.1
Application Deployment Configurations . . . . . . . . . . . . . . . . . .
25
5.2
Physical Hardware details of a Data Center . . . . . . . . . . . . . . . .
25
5.3
Honey Bee Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
5.4
Active Load Balancer Results . . . . . . . . . . . . . . . . . . . . . . . .
26
5.5
Round Robin Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
5.6
Throlled Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
viii
Listings 4.1
Adding our policy to the GUI . . . . . . . . . . . . . . . . . . . . . . . .
18
4.2
Integrating the algorithm to the simulation . . . . . . . . . . . . . . . .
18
4.3
Get the next available VM . . . . . . . . . . . . . . . . . . . . . . . . . .
18
4.4
Extending the ServiceProximityServiceBroker . . . . . . . . . . . . . . .
19
4.5
List of Data centers and List of Best Response time recorded . . . . . .
20
4.6
Get destination for the user request . . . . . . . . . . . . . . . . . . . . .
20
4.7
Querying for the next destination the user request . . . . . . . . . . . .
21
4.8
Updating response time record of data centers . . . . . . . . . . . . . . .
21
4.9
Grouping UVM and OVM . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.10 Allocating tasks to a VM . . . . . . . . . . . . . . . . . . . . . . . . . .
23
ix
Chapter 1
Introduction Cloud computing is a completely internet-based computing model where all the services delivered are hosted within the cloud. Cloud is a collection of thousands of computers interlinked together in a complex manner L.D. and Krishna [2013]. This computing model integrates the summation of parallel and distributed computing to deliver on-demand access to shared resources. These shared resources consist of software, hardware or information of devices and computers that use the resource. This model is one of the emerging topics in the IT industry and provides computing as a utility service where consumers can pay as you use. The shared use of these cloud resources by the consumers does lead to a range of issues in the system. Challenges such as; scalability, fault tolerance, system reliability, high availability and energy efficiency. These challenges occur when multiple concurrent requests to a single server lead to the server malfunctioning due to overload, while other servers are underutilised (idle) Yao and Ju-Hou [2012]. This type of failure is led by an imbalance of load in the system. Large Data centres are where cloud hosts its services. Large-scale data centres are mainly heterogeneous systems, which means that cloud users are geographically dispersed and use a diverse range of services. This is a big challenge for data centres to deliver and handle these services efficiently where millions of requests can fluctuate frequently. This issue can be tackled by the implementation of a load balancer. A load balancers responsibility is to balance the load effectively across the machines. A survey by Valentini et al. [2013] on energy efficient techniques describes that the main aim for a load balancer is to achieve optimal resource utilization, avoid overloading the system and minimize the response time. However, effective resource allocation in such a large, dynamic and diverse system such as the cloud is a big challenge. An evaluation of
1
efficient resource allocation techniques by Hameed et al. [2014] highlight that various applications may not be related to each other via workload. Therefore, one of the main research challenges is to discover which applications could be effectively consolidated into one server. The Honey Bee behavioural algorithm is categorised as a meta-heuristic type of artificial intelligence and has been proven to be a very effective algorithm for load balancing. Depictions of meta-heuristic algorithms by A Sesum-Cavic and Kuhn [2011] demonstrate that it’s exploration techniques explore unchartered areas in the search space, while its knowledge accumulation is used and exploited. This is a good balance between contradictory requirements, which eventually leads to finding the optimum solution (global maxima). An overview of the Honey bee algorithm and its applications by Abu-Mouti and El-Hawary [2012] shows that Honey Bee algorithm have been used in various applications such as comparative analysis, electric power systems, parallel and grid computing, data clustering and image analysis, computer science applications and many more. The overview of this aforementioned research on the Honey Bee algorithm highlights that has matched or outperformed some of the other meta-heuristic algorithms. In resource allocation, Honeybee takes into account task prioritizations that have been migrated from an over-utilized virtual machine (VM). It also enhance overall performance of processing and load balancing that aims to reduce the amount of time a task needs to wait on queue of a VM. Thus, decreased latency. However, it has been proven that this improvement in performance works well as the variety of services increased, but its performance seize to increase as the size of the system grows. It was discovered that the implementation of the Honey Bee on the application layer had led to a certain change in topology at the resource layer. Resulting in the minority of services having disproportionate amount of connectivity, while the majority of services only had a small number of links. Self-Aggregation (Active clustering) technique groups vital, similar or vital services to deal with load balancing. In the normal environment of Self-aggregation as described by Di Nitto et al. [2007] is a network with interconnected nodes characterized by their types and a list of its neighbour nodes. In this situation the Self-aggregation algorithm attempts to evolve the connectivity with its neighbouring nodes in order to reach an optimum configuration. These nodes are characterized by this algorithms technique that is used to rewire the network. The research question to be answered: will the performance of Honey bee foraging algorithm used in load balancing be improved with the adoption of Self-aggregation in large heterogeneous cloud systems? This question is asked with a view of improving load balancing to tackle such a challenge. To reveal the need for further research on this area, this study reviews the research conducted previously and critically analysed
2
their discovery and findings. These reviews are included in the main body of the next chapter.
1.1
Contribution
Our research aims to improve resource allocation via a combination of algorithms for load balancing that is targeted specifically for large and heterogeneous distributed systems. The contributions of this dissertation are the following: • A novel hybrid-scheduling algorithm, and a prototype implementation, that aims to improve resource allocation. The aforementioned algorithm is an intelligent self-organising solution that utilizes resources effectively in the heterogeneous nature of Cloud. • An empirical evaluation of this novel solution is carried-out/performed against existing state-of-the-industry algorithms, considering various performance metrics such as reponse time, VM utilization and processing times of data centers. • The empirical performance evaluations of our proposed algorithm will be conducted through a simulation using and extending the CloudSim [Calheiros et al., 2011] simulation framework.
3
Chapter 2
Background The main focus of discussion within this main body is the critical analysis of load balancing and the techniques used which answer some of the challenges faced to deliver a smooth and efficient resource allocation in heterogeneous cloud systems. Here, the algorithms that will be used to answer the research question are also introduced. This chapter presents relevant related work organized as follows: • Section 2.1 Load Balancing: discusses why load balancers play a critical role in cloud computing and the challenges that confronts load balancers in heterogeneous cloud systems. • Section 2.2 Honey Bee Behaviour algorithm: analyses the Honey Bee algorithm and its effectiveness and weaknesses in balancing loads. • Section 2.3 Self-Aggregation: this section justifies the Self-aggregations effectiveness used in the Active clustering algorithm in balancing loads and points out its weaknesses. In this section it is also detailed why a combination of algorithms is required to enable an effective load balancing technique.
2.1
Load Balancing
This section reviews the previous research on load balancing and the challenges faced by the algorithms in the different system environments. This is a critical section in which we reveal the motivation why the research question chooses to explore dynamic algorithms. Load balancing is one of the indispensable feature of cloud computing. According to 4
the definition by Soni and Kalra [2014], load balancers are essential for cloud systems to achieve high throughput and shorten the response time. High throughput is one of the fundamental goals for load balancing. Throughput is the measure of the amount of work that the system can process in a given amount of time. A similar view by L.D. and Krishna [2013] describe the main objectives of load balancing to improve the speed of execution time of applications that uses a cloud resource. The workload of these applications behave in an unpredictable manner during run time. An investigation by Devine et al. [2005] follows the view that applications behave unpredictably or change during computation. These studies make a strong argument for measuring the performance load balancing on heterogeneous cloud systems. Heterogeneous cloud systems as described by Topcuoglu et al. [2002], are a diverse and geographically dispersed collection of computing resources interconnected through a high speed network. A study conducted by Yao and Ju-Hou [2012] express that in a cloud computing environment, homogenous servers in the cloud system are less. Homogenous servers (nodes) refer to systems that have a uniform set of hardware and software working in parallel to achieve a specific goal. A great deal of research has been established in load balancing on homogeneous nodes. However, the major drawback, Randles et al. [2009b] argue from this approach is that, it is unrealistic for most cloud computing instances. For most cloud computing instances are defined as diverse, and it is paramount that a heterogeneous and dynamic system is needed to provide optimal delivery of service. The study by Devine et al. [2005] shares this concern and highlights that it is important that algorithms need to be adaptive and state-of-the-art, heterogeneous, tailoring work assignments proportionate to communication, memory and processing resources. This vision is strengthened by another investigation on load balancing by Yao and Ju-Hou [2012]. If the system collapses due to the maintenance of the servers, it could bring great losses to the cloud users; the research emphasises that dealing with such requests in a heterogeneous system environment is an important challenge. To accommodate the needs for a heterogeneous system Devine et al. [2005] suggest that it requires a dynamic load balancer which can adapt to such unpredictable workloads. Load balancing techniques are extensively examined for both homogenous and heterogeneous environments. L.D. and Krishna [2013] illustrate two types of load balancing techniques. They are called static and dynamic. • Static load balancing algorithms as described by L.D. and Krishna [2013], perform well when servers have low variations in the applications workload. This type of load balancing thrives on homogenous systems. However, approaches of this kind carry with them various well known limitations. From existing reviews from the
5
previous paragraph, we find that these type of algorithms will not work well in heterogeneous cloud systems where the workloads vary frequently. This concern is shared by Yao and Ju-Hou [2012] where a description of the three conditions in which load balancing algorithms need to follow; (1) When the workloads are not too demanding, the algorithms should be able to self-organise to scale down the exchange of information, (2) Load balancing mechanisms should be able to thrive on heterogeneous cloud environments, and (3) The system throughput should not be affected, therefore the load balancing mechanisms should enhance the system throughput high as possible. Yao and Ju-Hou [2012] emphasise that high throughput is the most essential requirement in load balancing. • Dynamic load balancing algorithms are techniques which can autonomously adapt to varying workloads at run time. The study by L.D. and Krishna [2013] states that dynamic load balancing algorithms have dominance over static load balancing algorithms. But in order to achieve such advantage, it requires the additional cost of collecting and maintaining load information.L.D. and Krishna [2013] highlights that there type of load balancing techniques are exceedingly good in load balancing for heterogeneous environments. Likewise,Sesum-Cavic and Kuhn [2010] share this view that self-organisation solutions are proven to be a useful approach when handling complexity. And that dynamic load balancing allows the highest level of productivity and leads to a dramatic increase in the overall distributed system performance. Our proposed load balancing algorithm called the Honey Bee is also a dynamic technique. Based on the three essential conditions followed by effective load balancing techniques, described from the previous paragraph. We can interpret that the heterogeneous cloud computing system and load balancing has a certain resemblance with a colony of Honey bees and harvesting for food Yao and Ju-Hou [2012]. This section has reviewed load balancing and its system environments. We have discovered that there are two types of load balancing algorithms for different system environments in cloud computing and found dynamic algorithms to be the most effective in heterogeneous systems. Building from this section, the next section shall further elaborate a dynamic load balancing algorithm called the Honey bee. Detailing its technique, and examining its strengths and weaknesses in terms of load balancing.
2.2
Honey Bee Foraging Behaviour algorithm
This section elaborates on the review of the Honey Bee algorithm from past studies in the area. This section also highlights the strengths and weaknesses of the algorithm 6
which points to the need of improving the algorithm based on the research question for this review. The honey bee colony and how it forages for food is described in research by Randles et al. [2010]; the forager bees are sent for sufficient food sources; the forager returns to the hive and advertises the find to the hive using a method called a waggle dance. The suitability of the food source is measured by its quantity, the quality of the nectar that has been harvested or its distance away from the hive. Once the food source has been communicated through the waggle dance, the other honey bees follow the forager back to the discovered food source and begin harvesting. When the bees return to the hive, the remaining quantity of the food source is advertised by each honey bee through the waggle dance. This enables the bees to be sent out to a more plentiful food source and abandon less plentiful food sources. This method is well perceived in other research into honey bee inspired load balancing techniques including Sahu et al. [2013], L.D. and Krishna [2013],Yao and Ju-Hou [2012] Ghafari et al. [2013]. In the previous section it is mentioned that we can interpret that heterogeneous cloud systems and load balancing have a striking resemblance to the behaviour of honey bees foraging for food. The Honey Bee Foraging Behaviour algorithm is an optimization algorithm that imitates the intelligent behaviour of a swarm of honey bees foraging for food L.D. and Krishna [2013]. The principles in which the use of this intelligent algorithm for Load balancing are proposed. Sesum-Cavic and Kuhn [2010]represents the honey bees as software agents at each individual servers. A server contains exactly a single hive and one flower which has several units of nectar. A task is represented as unit of a single nectar. Individual hives consists of a limited number of forager (outgoing) and follower (receiver) bees. In the initial stages, all outgoing bees are foragers. These foragers scout for the location policy partner server of their server to decide whether to take/give nectar to/from the server, and call followers. The goal is to locate the optimum location policy partner server by following the optimum path that is advertised to be the shortest path. Randles et al. [2010] express that given this robust profit calculation method, the pattern of this intelligent behaviour offers a distributed and global mechanism of communication; making sure that profitable virtual machines seem to be an attractive candidate and are allocated to underutilized servers. On a performance evaluation by Karaboga and Basturk [2008] compared Honeybee algorithm against other evolutionary algorithms. Empirical results show that under various control parameters, the algorithm outperforms the algorithms and emphasizes that this technique can be implemented to tackle multimodal engineering problems with dimensionality. The promise of this robust and dynamic algorithm strengthens the reason for further research on its load balancing capabilities.
7
The throughput performance of the Honey Bee algorithm is examined by a series of empirical tests by Randles et al. [2009b]. The algorithm is evaluated against 2 other effective dynamic algorithms called Biased random sampling (BRS) and Self-aggregation (Active Clustering). The empirical results show that, for the Honey bee algorithm, increasing the server resources does not improve its equivalent throughput, its performance remains consistent. Conversely, the honey bee algorithm performs progressively well showing an increase in performance when the diversity of the workloads increase inline with the servers workload increase. A more recent study by Randles et al. [2010], demonstrates empirical results which shows similar results to that of the previously mentioned experiment. The result demonstrates that the honey bee algorithm performs in a consistent manner. Although it does not increase throughput performance in-line with the increase in system size. A third experiment conducted by Yao and JuHou [2012] aimed to improve the current honey bee algorithm and the empirical results that emerged from the experiments strengthens the argument presented in Randles et al. [2009b] and Randles et al. [2010]. Yao and Ju-Hou [2012] agree that increasing the system size has deteriorated the performance of honey bee load balancing. An observation by Randles et al. [2009a] show that the self-arrangement at one layer causes an effect on other layers which was not predictable from the implementation. The findings present some weaknesses of the honey bee algorithm. Where the algorithm arranges only an insufficient link between requests in the same server node queue, therefore, the improvement of the throughput in the system is sub-optimal. These results are critical and builds upon the purpose of the research question mentioned in this review. To improve this sub-optimal throughput performance in the honey-bee algorithm. In this section we have given a brief overview of how the colony of foraging honey bees behave and have described how this intelligent behaviour is adopted for load balancing in heterogeneous cloud systems. We have also discovered that although the throughput for honey bee performs very well when the service types are diverse, its throughput performance is sub-optimal when system size increases. In the next section we shall further elaborate another algorithm mentioned in the last paragraph called Self-aggregation (Active-clustering). Defining its strengths and weaknesses with the view of combining honey bee and self-aggregation to improve system throughput.
2.3
Self-Aggregation
This section describes the technique of Self-aggregation, revealing its strengths and weaknesses based from the experiments in which the algorithm was tested. This chapter also reveals the need for a combination of algorithms to improve load balancing
8
Self-aggregation is a dynamic load balancing technique that rewires the network Randles et al. [2009b]. This algorithm aims to group similar services together. Whereas many load balancers only work well in situations where the nodes are aware of similar (like) nodes and delegate workloads to them. This algorithm is particularly important to this review as Randles et al. [2010] demonstrate the concern where an implementation of the honey bee algorithm in the application layer induced a particular topology to arise in the resource layer. This led to a number of services having an unequal amount of network connectivity from its collaborating services, at the same time most services have only received a low number of links. Saffre et al. [2009] demonstrated a series of experiments that proves regular rewiring process in this situation could implicitly fix the problem, resulting in an established optimum cooperation in the long term for similar processes. An experiment by di Nitto et al. [2008] demonstrate that a number of jobs processed using self-aggregation have shown an improvement as a result of the Load balancing algorithms being able to work on a large homogeneous environment. Empirical results from the experiment by Randles et al. [2009b], detail self-aggregation as having a contrasting result from the Honey bee algorithm. It shows that the performance of self-aggregation increases as the number of processing nodes increase. Whereas self-aggregations performance decreases when the types (diversity) of nodes increase. This is where the Honey bee algorithm performs well. A more recent study by Randles et al. [2010] illustrates similar results where the self-aggregation algorithms performance is decreased as the diversity of the nodes increase. One question that needs to be asked, however, is why Randles et al. [2009b] and Randles et al. [2010] did not try to combine some of these algorithms in order to root out some of their weaknesses. Another drawback with the research by Yao and Ju-Hou [2012] is that it relies too heavily on changing the parameters of the different Honey bee algorithms. All these studies reviewed so far, however, did not combine the algorithms but did suggest combining them and improving the algorithm based on their weaknesses. A combination of a variety of load balancing algorithms is proposed by Randles et al. [2009b], when using a selection of the types of algorithms; either combined to provide an optimal solution or utilised independently when different scenarios arise. Yao and Ju-Hou [2012] share this view stating, within a certain number of server nodes, the increased size of the system does not lead to an increase in throughput performance, therefore, an improved Honey bee algorithm is suggested to solve this issue. This review follows the research question where we adopt self-aggregation to improve the honey bee algorithm. Vasile et al. [2014] have developed a similar idea to this research where self-aggregation and a combination of algorithms have been implemented. The conclusion coming from the novel algorithm implemented from their research shows that
9
clustering had an overhead effect, however, using a specialized algorithm combined with the self-aggregation have resulted in a good improvement in results. In this section we have demonstrated how the self-aggregation algorithm works, pointing out its strengths and revealing its weakness. Here we have discovered that a combination or a variety of algorithms working together is needed to solve the underlying issues of some load balancing algorithms.
10
Chapter 3
Design This chapter explores in detail the design of the proposed algorithm for this research. First we will explain the architecture of CloudSim and explain how we integrate our algorithm to this well used simulator. Thereafter, we explain the self-aggregation phase of our solution, identifying its parameters and explaining the algorithm. Finally we examine the scheduling algorithm to be utilized in order to allocate task to resources and also responsible for maintaining resource utilization of VMs.
3.1
CloudSim: Simulation Framework
Although there are a variety of commercial cloud computing infrastructures, such as, Azure, Aneka, Google App Engine and EC2, building a cloud testbed on a real infrastructure is time consuming and expensive. It is almost impossible to test performances of different scenarios of the application in a controllable and repeatable manner. For the purpose of this research, we therefore implement simulation methods to evaluate performance of our algorithms. CloudSim is a framework designed by Calheiros et al. [2011], it is used to emulate application services and cloud-based infrastructures and can also be used on economy-driven resource management policies on large cloud computing systems. With the use of this toolkit we can focus on resource allocation problems rather than the implementation of the low level details of the cloud-based infrastructures and services Calheiros et al. [2011]. CloudSim supports both system and behaviour modeling of Cloud system components such as data centers, VMs, and resource provisioning policies. Several researchers and
11
organisations, such as HP Labs are using CloudSim for their investigation of resource provisioning and energy-efficient management of data center resources. CloudSim also supports the modeling and simulation of Cloud computing environments consisting of both single and federated Clouds [Calheiros et al., 2011]. There are a number of simulators(Grenchmark, GreenCloud, NS3, REPAST, iCanCloud) that can be used for simulating a Cloud enviroment, but these simulators do not fully support a realistic modeling of systems and behaviour of Cloud environments as well as CloudSim. We use and extend CloudSim to provide a prototype implementation of our novel algorithm. The framework is written in the Java programming language, therefore, our system will be implemented in Java.
Figure 3.1: Layered CloudSim Architecture [Calheiros et al., 2011] Figure 3.1 shows the layered architecture of the CloudSim toolkit and its architectural components. By extending the basic functionalities already available in CloudSim, we will be able to perform test beds on specific configurations and scenarios, therefore leading to the development of best practices in all of the critical aspects in cloud computing. For this research, we implement the Self-aggregation phase on the Service Broker which dynamically allocates the load to a data centre based on best response time and user proximity to the Data Centers. The Honey Bee algorithm is then implemented in 12
the data center where the Data Centre Controller consults the load balancer for the next VM to allocate the cloudlet(task). The following sections will further detail these architectures.
3.2
Design Challenges
The original design for the self-aggregation phase was to implement a clustering algorithm called K-means by MacQueen [1967]. This algorithm is one of the simplest and most popular unsupervised learning algorithms that solve well-known clustering problems. For this research, the purpose of using K-means was to cluster characteristics of resources and feed this information to the VM load balancer. The algorithm works by: • Placing k points into the space represented by the objects being clustered. These points represent initial group centroids. • Assign each object to the group which has the nearest centroid. • Once all the objects have been assigned, re-calculate the positions of k centroids. • Repeat step 2 and step 3 until the centroids no longer move. This creates a separation of the objects into groups. Although the K-means algorithm is considered to be one of the simplest and popular unsupervised learning algorithms, one of its disadvantages is that it can be computationally expensive. First, the number of clusters(k) needs to be pre-defined. This pre-definition of k could mean that it can affect the dynamic nature of what we are trying to achieve. To extend K-means to accommodate the adaptiveness of finding k will lead to increase in complexity. This increase complexity can affect the execution time of our proposed solution that can then lead to reduced response time and performance. The next section explains the design of the implemented alternative to K-means.
3.3
Self-aggregation
Self-aggregation technique is a load-balancing technique that groups services together depending on certain characteristics. In our proposed solution, we will re-configure this technique to accommodate our novel algorithm.
13
Our Self-aggregation algorithm phase is implemented on the Service Broker component of CloudAnalyst. At this level we can dynamically re-configure the load based on the best response time achieved for the user, the proximity of that user base and current load of the data center. The main purpose of implementing this algorithm in the service broker is to have a level of control over user request routing to a data center. This is to address the problem raised in the background section where the Honey Bee’s performance degrades as the number of user requests and system increase.
Figure 3.2: Flow Diagram of Self-Aggregation
3.4
Honey Bee
The system performs load balancing by pre-emptive scheduling. Load balancing is used to distribute the load evenly across machines to avoid overloaded, under-loaded or idle machines. For our proposed system we use the Honey bee algorithm for VM load balancing. The task removed from the overloaded VMs becomes the Honey bees. Consequence of a submission to an under-loaded VM, a number of various tasks and load of tasks 14
assigned to that VM will be updated. This will be advantageous for other tasks since whenever a high task need to be assigned to a VM, it should address the VM which has a minimum number tasks so that the particular task will be executed quicker. Considering that all the VMs are sorted in an ascending order, the removed task will be assigned to under-loaded VMs.
3.4.1
Load balancing
We follow the Honey Bee formula used by L.D. and Krishna [2013] Let V M = V M1 , V M2 , V M3 , ..., V Mm be the set of n virtual machines which should process the tasks outlined in the set T = T1 , T2 , T3 , ..., Tn . Overall capacity of all VMs is given in (1) (1) C =
m X
Ci
i=1
The total length of the task that are assigned to a VM is to be considered as the load on a single VM. Formula given in (2) (2) LV Mi ,t =
N (T, t) S(V Mi , t)
Where the N (T, t) is the number of tasks at the time t on the service queue of V M i, while S(V M i, t) is the Service rate of the V M i at time t. Load of all the VMs is given in (3) (3) L =
m X
LV Mi
i=1
Processing time of a VM is calculated by (4) (4) P Ti =
LV Mi ci
Processing time of all VM are calculated by (5) (5) P T =
L C
All of the VMs load are divided by the capacity of all VMs Following on the values presented from the above parameters, the scout bee checks weather the host is in a balanced or unbalanced state. This is calculated used standard deviation of the load given in (6)
15
v u m u1 X (6) σ = t (P Ti − P T )2 m i=1
If the σ value falls within the condition of the threshold set (T s) ∈ [0, 1], then the system is balanced, otherwise it is in an unbalanced state.
Figure 3.3: Illustrates the flow diagram of the honey bee algorithm for VM load balancing. In this chapter we examined the design of the proposed solution for this research, and reviewed the Self-Aggregation phase and VM load balancing phase of our proposed novel algorithm. In the next chapter we delve into the actual implementation.
16
Chapter 4
Implementation The implementation chapter details all the components of our proposed novel algorithm. Firstly, the CloudSim simulator was extended to work with the implementation of our solutions. In the beginning of this chapter, the CloudSim architecture and its components are explained, this is important as the proposed algorithms are built to work with these components. Thereafter, the implementation of the Self-Aggregation technique on the Service Broker is explained. Lastly, The Honey Bee VM load balancer implementation and integration are demonstrated. To achieve our proposed implementation we have used the IntelliJ IDEA as our Integrated Development Environment(IDE) using Java version 1.8.0 for creating and executing our code. We imported the CloudAnalyst package components that are implemented in Java into our IDE.
4.1
CloudSim components and extension
This section briefly describes the main components of the CloudSim simulator and the flow of these components to fully understand the integration of our algorithm. CloudAnalyst is an extension of CloudSim that provides a graphical user interface(GUI) package to assist the ease of experimental modification and configurations. The main components and its responsibilities are: • GUI: As discussed in the previous paragraph, this package is responsible for the UI, and manages the controls of front end for the application. It manages the screen transitions and all other user interface activities. For us to be able to
17
use our implemented algorithm we have included BROKER POLICY SELF AGGREGATION to the Conf igureSimulationP anel class so that we can select the algorithm in the GUI. 1 2 3 4 5 6 7
cmbServiceBroker = new JComboBox(new ←String[]{Constants.BROKER_POLICY_PROXIMITY, Constants.BROKER_POLICY_OPTIMAL_RESPONSE, Constants.BROKER_POLICY_SELF_AGGREGATION}); cmbServiceBroker.setSelectedItem(simulation.getServiceBrokerPolicy()); cmbServiceBroker.setBounds(x, y, compW, compH); mainTab.add(cmbServiceBroker);
Listing 4.1: Adding our policy to the GUI • UserBase: This component is responsible for modelling groups of users and generates the traffic that represents the users. • Simulation: This component is in charge of the simulation parameters, generating and executing the simulations. In the Simulation component we have extended the code to integrate the algorithm within the simulation. 1 2 3 4 5 6 7 8 9
CloudAppServiceBroker serviceBroker; if (serviceBrokerPolicy.equals(Constants.BROKER_POLICY_PROXIMITY)){ serviceBroker = new ServiceProximityServiceBroker(); } else if ←(serviceBrokerPolicy.equals(Constants.BROKER_POLICY_SELF_AGGREGATION)){ serviceBroker = new SelfAggregationBroker(dcbs); }else { serviceBroker = new BestResponseTimeServiceBroker(); } internet.addServiceBroker(DEFAULT_APP_ID, serviceBroker);
Listing 4.2: Integrating the algorithm to the simulation • Internet: This component is responsible for modelling the internet and the implementation of routing behaviours. • DataCenterController: This component is responsible for the data center activities. The Data Center controller queries the Load balancer for the next available VM to allocated a cloudlet(task) to, if the VM is busy(−1) it queues the cloudlet for that VM. 1 2 3
int nextAvailVM = loadBalancer.getNextAvailableVm(); if (nextAvailVM == -1){
18
4 5 6 7 8 9 10 11
//All VM’s are busy. Put it in queue //System.out.println("VM’s busy, queueing " + cl); waitingQueue.add(cl); queuedCount++; } else { submitCloudlet(cl, nextAvailVM); }
Listing 4.3: Get the next available VM • VmLoadBalancer: This component is responsible for modelling the load balancing policy used by the data centers when allocating requests. We entend this component in our BeeV mLoadBalancer class. • InternetCharacteristics: This component is reponsible for defining the internet characteristics applied during the simulation, such as available bandwidths between regions, latency, the current traffic levels and the current level of performance information for each of the data centers. • CloudAppServiceBroker: This component is responsible for modelling the service brokers which manages traffic routing between the user bases. We implement this interface in our Self AggregationBroker class that uses getDestination method and takes in a GeoLocatable as input which then returns a name of the datacenter, based on our Self-Aggregation policy. 1 2
public class SelfAggregationBroker extends ServiceProximityServiceBroker implements CloudAppServiceBroker
Listing 4.4: Extending the ServiceProximityServiceBroker A high level illustration of the system flow for these components are shown in Figure 4.1:
4.2
Self-Aggregation
This section provides the description of the final state of implementation of the Self Aggregation policy. As mentioned from section 3.1 we have implemented this phase in the Service Broker to give us the level of control over user requests routing to a data center. At this level, we can also monitor all the data centers, the proximity of the UserBases to the data center and latency.
19
Figure 4.1: Flow Diagram of the system architecture To
achieve
the
implementation
of
the
Self
-Aggregation
component,
the
Self AggregationBroker is implemented within the cloudsim.ext.servicebroker package of CloudAnalyst. The algorithm works by: • Maintaining a list of all the data centers and also a list of all the best response time currently recently recorded for each of the data centers. 1 2
private Map bestResponseTimesRecord; private Map DataCenterList;
Listing 4.5: List of Data centers and List of Best Response time recorded • Once the internet receives a message from the user base it consults the SelfAggregation algorithm to determine a data center for the request. 1 2 3
int appId = cloudlet.getAppId(); CloudAppServiceBroker serviceBroker = serviceBrokers.get(appId); destName = serviceBroker.getDestination(originator);
20
Listing 4.6: Get destination for the user request • Our Self-Aggregation algorithm then queries the ServiceProximity broker for the destination 1 2 3 4 5 6 7 8 9 10 11 12
public String getDestination(GeoLocatable inquirer) { List proximityList = ←InternetCharacteristics.getInstance().getProximityList(inquirer.getRegion()); int region; String dcName; for (int i = 0; i < proximityList.size(); i++) { region = proximityList.get(i); dcName = getAnyDataCenter(region); if (dcName != null) { return dcName; } }
Listing 4.7: Querying for the next destination the user request • The Self-aggregation algorithm then updates its best response time records if the new response time has performed better than the previous. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
for (String dc : serviceLatencies.keySet()){ currLatency = serviceLatencies.get(dc)[0]; bestSoFar = bestResponseTimesRecord.get(dc); if (currLatency != null){ if (bestSoFar != null){ if (currLatency
View more...
Comments