Libra — An Economy-Driven Cluster Schedulng and Service Level Agreements (SLA)-based Resource Allocation System
Problem Statement
Clustering involves connecting two or more computers together to take advantage
of combined computational power and resources. Hence, a cluster works as an
integrated collection of resources that can provide a single system image
spanning all its nodes. Clustering is a popular strategy for processing
applications because it transparently spreads the processing of different jobs
throughout the cluster, and are used for high-performance applications such as
AI expert systems, flight simulations, and scientific calculations.
Computational economy refers to the inclusion of user-specified QoS parameters
with jobs so that resource management is based on a user-centric approach
rather than on a system-centric approach. This essentially means that user
constraints such as deadline and budget are more important in determining the
priority of a job by the scheduler, than system policies like ordering jobs
according to the basis of submission time. Currently, there is no holistic
scheduling mechanism in cluster computing to enable differing QoS levels for
different clients.
Objectives
The main purpose of our project is to:
-
Develop a QoS-based scheduler for resource management on a homogenous cluster
-
Optimize the scheduler according to time or cost considerations of the user,
for sequential and embarrassingly parallel jobs
-
Test the scheduler through simulations of various types of job queues and user
criteria
Scope
The Libra Scheduler will only manage sequential and embarrassingly parallel
jobs to be run on a homogenous Linux cluster. Linux is an open source operating
system, with extensive documentation and user support, and moreover, many open
source CMS are well suited for a Linux-based cluster.
To provide QoS to users, there will be no mechanism for users to interact with
each other, and bargain on the use of resources according to their
considerations, as is provided in a grid-computing environment by projects like
Nimrod/G. Once the user job is submitted, the user may not modify the job
details. However, if possible, we may allow for interactive jobs that can take
in user commands required during the execution of a job.
Project Description
The focus of our project is to implement a scheduler that aims to maximize user
satisfaction. Thus the job details submitted by the user will include job
prioritization criteria: the allocated budget and the deadline required by the
user, enabling the scheduler to maximize CPU utilization while remaining within
the constraints imposed by the need to optimize user Quality of Service (QoS).
The scheduler will allocate jobs based on the job parameters, which are job
specifications submitted by the user with the job, including:
-
Location of the executable and input data sets
-
Where standard output is to be placed
-
System type
-
Maximum length of run
-
Whether the job needs sequential or parallel resources
However, our scheduler will be QoS driven: it will aim to optimize resource
utilization within user-imposed constraints: thus, user satisfaction is the
primary concern, as opposed to maximizing CPU utilization. Thus, the two job
parameters most relevant to the scheduling decisions will be:
-
Budget allocated by the user to the process
-
Deadline
As mentioned earlier, the type of jobs that will be supported are sequential
and embarrassingly parallel jobs.
With support from the CMS, the Libra Scheduler should embody the following
features:
-
Should be able to enforce resource allocations according to user-centric
priorities
-
Should be dynamic, and not static, which is a necessary implication of the
user-centric approach, so that users who need their jobs completed in emergency
and are willing to pay a high price for it, are able to get their job done
through dynamic reallocation of resources even if the job is submitted later
than other jobs or the system is heavily loaded. Hence, the scheduler should be
able to change resource limits, priorities, privileges and execution order of
the submitted jobs.
-
Should be scalable, which means that its performance should not degrade with
the addition of nodes and jobs to our cluster
-
Should be configurable, and allow for various scheduling policies that can be
modified to incorporate QoS parameters
-
Should be separable from the CMS
-
Should provide administrative security
-
Should provide job accounting, to aid in scheduling policies
-
Should ideally provide a GUI for all components, such as for users to submit
jobs and for administrators to oversee scheduling
-
Should ideally provide for check pointing, load balancing, process migration
and job runtime limits, which provide for better resource management, fault
tolerance and reliability
A market-based economic model for computational economy needs to be developed
for our cluster, which would be responsible for the pricing and allocation of
resources according to user constraints. The model that we are going to
implement is the bid-based proportional resource-sharing model, possibly
incorporating features of other models such as the commodity market model.
The Team Members
Active Members
Alumni
Publications
-
Jahanzeb Sherwani, Nosheen Ali, Nausheen Lotia, Zahra Hayat, and Rajkumar
Buyya, Libra: An Economy driven Job Scheduling System for Clusters,
Proceedings of the 6th International Conference on High Performance Computing in Asia-Pacific Region
(HPC Asia 2002), December 16-19, 2002, Bangalore, India.
(Talk - PPT/PDF)
-
Jahanzeb Sherwani, Nosheen Ali, Nausheen Lotia, Zahra Hayat, and Rajkumar Buyya,
Libra: A Computational Economy-based Job Scheduling System for Clusters,
Software: Practice and Experience, Volume 34, Issue 6, Pages 573-590, May 2004.
-
Chee Shin Yeo and Rajkumar Buyya,
Pricing for Utility-driven Resource Management and Allocation in Clusters,
Proceedings of the 12th International Conference on Advanced Computing and Communication (ADCOM 2004),
December 2004, Ahmedabad, India.
(Talk - PPT/PDF)
-
Chee Shin Yeo and Rajkumar Buyya,
Pricing for Utility-driven Resource Management and Allocation in Clusters,
International Journal of High Performance Computing Applications, Volume 21, Issue 4, Pages 405-418, November 2007. (extended version of ADCOM 2004 paper)
-
Chee Shin Yeo and Rajkumar Buyya,
Service Level Agreement based Allocation of Cluster Resources: Handling Penalty to Enhance Utility,
Proceedings of the 7th IEEE International Conference on Cluster Computing (Cluster 2005),
September 2005, Boston, MA.
(Talk - PPT/PDF)
-
Chee Shin Yeo and Rajkumar Buyya,
Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters,
Proceedings of the 35th International Conference on Parallel Processing (ICPP 2006),
August 2006, Columbus, OH.
(Talk - PPT/PDF)
-
Chee Shin Yeo and Rajkumar Buyya,
A taxonomy of market-based resource management systems for utility-driven cluster computing,
Software: Practice and Experience, Volume 36, Issue 13, Pages 1381-1419, 10 November 2006.
-
Chee Shin Yeo and Rajkumar Buyya,
Integrated Risk Analysis for a Commercial Computing Service,
Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007),
March 2007, Long Beach, CA.
(Talk - PPT/PDF)
-
Chee Shin Yeo and Rajkumar Buyya,
Integrated Risk Analysis for a Commercial Computing Service in Utility Computing,
Journal of Grid Computing, Volume 7, Issue 1, Pages 1-24, March 2009. (extended version of IPDPS 2007 paper)
Software and Documentation
Software
Documentation