به حداقل رساندن هزینه برای برنامه های محاسباتی در زیرساخت های ابر هیبرید
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
6607 | 2013 | 9 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Future Generation Computer Systems, Volume 29, Issue 7, September 2013, Pages 1786–1794
چکیده انگلیسی
We address the problem of task planning on multiple clouds formulated as a mixed integer nonlinear programming problem (MINLP). Its specification with AMPL modeling language allows us to apply solvers such as Bonmin and Cbc. Our model assumes multiple heterogeneous compute and storage cloud providers, such as Amazon, Rackspace, GoGrid, ElasticHosts and a private cloud, parameterized by costs and performance, including constraints on maximum number of resources at each cloud. The optimization objective is the total cost, under deadline constraint. We compute the relation between deadline and cost for a sample set of data- and compute-intensive tasks, representing bioinformatics experiments. Our results illustrate typical problems when making decisions on deployment planning on clouds and how they can be addressed using optimization techniques.
مقدمه انگلیسی
In contrast to already well established computing and storage resources (clusters, grids) for the research community, clouds in the form of infrastructure-as-a-service (IaaS) platforms (pioneered by Amazon EC2) provide on-demand resource provisioning with a pay-per-use model. These capabilities together with the benefits introduced by virtualization, make clouds attractive to the scientific community [1]. In addition to public clouds such as Amazon EC2 or Rackspace, private and community cloud installations have been deployed for the purpose of scientific projects, e.g. FutureGrid1 or campus-based private cloud at Notre Dame.2 As a result, multiple deployment scenarios differing in costs and performance, coupled together with new provisioning models offered by clouds make the problem of resource allocation and capacity planning for scientific applications a challenge. The motivation for this research comes from our previous work [2] and [3], in which we run experiments with compute-intensive bioinformatics application on a hybrid cloud consisting of Amazon EC2 and a private cloud. The application is composed of a set of components (deployed as virtual machines) that communicate using a queue (Amazon SQS) and process data that is stored on a cloud storage (Amazon S3). The results of these experiments indicate that clouds do not introduce significant delays in terms of virtualization overhead and deployment times. However, multiple options for placement of application components and input/output data, which differ in their performance and costs, lead to non-trivial resource allocation decisions. For example, when data is stored on the public cloud, the data transfer costs between storage and a private cloud may become large enough to make it more economical to pay for compute resources from the public cloud than to transfer the data to a private cloud where computing is cheaper. In this paper, we address the resource allocation problem by applying the optimization techniques using AMPL modeling language [4], which provides access to a wide range of ready to use solvers. Our model assumes multiple heterogeneous compute and storage cloud providers, such as Amazon, Rackspace, ElasticHosts and a private cloud, parameterized by costs and performance. We also assume that the number of resources of a given type in each cloud may be limited, which is often the case not only for private clouds, but also for larger commercial ones. The optimization objective is the total cost, under deadline constraint. To illustrate how these optimization tools can be useful for planning decisions, we analyze the relations between deadline and cost for different task and data sizes, which are close to our experiments with bioinformatics applications. The main contributions of the paper are the following: – We formulate the problem of minimization of cost of running computational application on hybrid cloud infrastructure as a mixed integer nonlinear programming problem and its specification with AMPL modeling language. – We evaluate the model on scenarios involving limited and unlimited public and private cloud resources, for compute-intensive and data-intensive tasks, and for a wide range of deadline parameters. – We discuss the results and lessons learned from the model and its evaluation. The paper is organized as follows: after discussing the related work in Section 2, we introduce the details and assumptions of our application and infrastructure model in Section 3. Then, in Section 4 we formulate the problem using AMPL by specifying the variables, parameters, constraints and optimization goals. Section 5 presents the results we obtained by applying the model to the scenarios involving multiple public and private clouds, overlapping computation and data transfers, and identifying special cases. In Section 6 we provide a sensitivity analysis of our model and show how such analysis can be useful for potential users or computing service resellers. In Section 7 we estimate how our model behaves if the task sizes are not uniform and change dynamically. The conclusions and future work are given in Section 8.
نتیجه گیری انگلیسی
The results presented in this paper illustrate typical problems when making decisions on deployment planning on clouds and how they can be addressed using optimization techniques. We have shown how the mixed integer nonlinear programming can be applied to model and solve the problem of resource allocation on multiple heterogeneous clouds, including private and public ones, and taking into account the cost of compute instances and data transfers. Our results show that the total cost grows slowly for long deadlines, since it is possible to use free resources from a private cloud. However, for short deadlines it is necessary to use the instances from public clouds, starting from the ones with best price/performance ratio. The shorter the deadlines, the more costly instance types have to be added, thus the cost grows more rapidly. Moreover, our results can be also useful for multi-objective optimization. In such a case, it would be possible to run the optimization algorithm in a certain neighborhood of the desired deadline and select the best solution using a specified cost/time trade-off. Alternatively, multiple solutions as in Fig. 6 and Fig. 10 or 11 may be presented to the users allowing them to select the most acceptable solution. Our model can be also used as an approximate method to solve the problems where tasks sizes are not ideally uniform, but can differ within a limited range. Optimal task allocation in hybrid cloud environment is not a trivial problem as one needs to know the estimates of computational cost of tasks in advance. If such data are available, it is possible to use tools such as AMPL. This approach may be successful as long as one is able to formulate the optimization model and select a suitable solver. These tasks are not straightforward though, since small change in model may move problem from one class to another (e.g. from mixed integer to MINLP) requiring to find another solver. Optimal specification of the model is also important as the same problem may be formulated in various ways, each of which may differ considerably in performance. In future work we plan to experiment with variations of the model to represent other classes of applications, such as scientific workflows [1] that often consist of multiple stages, each characterized by different data and compute requirements. References