create new tag
view all tags

Cloud Scheduler Use Cases

The following use cases represent the basic functionality of the cloud scheduler - that is, the following tasks are those that will be regularly performed by the cloud scheduler. These use cases have been created to lay out abstract guidelines for cloud scheduler development to follow. More technical, detailed information regarding the implementation of these steps can be found in the Cloud Scheduler Design document.

Job Arrival / VM Scheduling Case

The basic sequence that the cloud scheduler will perform on the arrival and detection of a job (as submitted to a job scheduler, currently a condor job pool) is as follows:

  • The job scheduler receives a job, job_X, that requires an X_VM type of virtual machine.
  • The cloud scheduler then:
    • pulls the job scheduler queue and detects the newly submitted job_X.
    • finds a copy of X_VM (in an image repoistory, or as specified by location in the job_X's job description file).
    • asks a registry service (the cloud MDS) for the current status of cloud resources, building an internal representation of cloud resources.
    • searches these resources for a cluster that supports job_X's VM requirements (also specified in the job description file).
    • sends provisioning instructions (create commands via Nimbus and the workspace-control program, currently) to a selected resource.
      • this instruction creates a VM of type X_VM on the selected resource.
      • if this create call fails, the cloud scheduler will detect the failure and re-submit requests or choose new resources to submit requests to.

Multiple Job Arrival (Alternative)

If a job set is submitted to the job scheduler, multiple VMs of the required type (or types) may be created on the cloud. The process is the same as above, except for the following points:

  • The job scheduler receives a job set, for example, 100 executions of job_Y. job_Y requires the Y_VM.
  • ...
  • The cloud scheduler sends provisioning instructions to the cloud to start some number of Y_VMs (the number will depend on scheduling optimization heuristics).
  • ...

VM Expires / VM Termination Case

The cloud cluster will follow this sequence when a VM running on the cloud reaches it expiry time*:

  • An X_VM reaches its expiry time.
  • The cloud scheduler:
    • checks job queues for jobs requiring an X_VM type virtual machine.
      • if jobs requiring an X_VM are present in the job queues, the scheduler leaves the X_VM running (and leases a new expiry time)**.
      • if there are no jobs requiring an X_VM in the job queues, the X_VM is destroyed.

*: Note that this termination case is very basic. One issue is where the VM expiry time is set (by the user, in the job description; or by the scheduler, as a standard expiry time for job checking). A case that needs to be considered is when a VM finishes executing its job, but has a large amount of time left before it expires. If there are no more jobs that require this VM in the job pool, this VM could potentially sit idle. There are a few potential solutions:

  1. The cloud scheduler could set a relatively short expiry time for each VM it boots. When the VM expires, the cloud scheduler will re-evaluate the job pool in order to determine whether or not the VM is still required. If there are jobs requiring the VM, the scheduler will re-set the expiry time on the VM. If there are no more jobs requiring the VM, the cloud scheduler will destroy the VM.
  2. The cloud scheduler could rely on the user to set a reasonable expiry time for his/her required VMs. (VMTime would then be a field of the requirements parameter in the job description file). The cloud scheduler would set a reasonable default if this field is not set in the job description file.
  3. The cloud scheduler can detect when a job finishes through interaction with the job scheduler. When a job_X (requiring an X_VM) finishes, the cloud scheduler could check the job pool for other jobs requiring X_VMs. If no other jobs exist, the cloud scheduler would then destroy its X_VMs.

**: For scheduling optimization, there will be some sort of heuristic here. Depending on the number of jobs in the queue requiring X_VMs, and the number of X_VMs running on the cloud, a certain number of X_VMs might be destroyed. For example, if there are only two jobs requiring X_VMs in the job pool, and there are five X_VMs running on the cloud, the cloud scheduler would destroy three of the X_VMs (or something similar, depending on the heuristic developed - it might be better to leave only one X_VM, for example).

-- DuncanPenfoldBrown - 2009-06-12

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | More topic actions
Topic revision: r2 - 2009-06-12 - dpb
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback