Tags:
create new tag
view all tags

Opportunistic computing

RandallSobie

Overview

  • cloud computing resources are limited by the quota of the group/project
    • it often is committed to the project and cannot be used by others during quiet periods
  • goal is to utilize (a fraction of) idle computing resources on a cloud (a number of remote clouds have expressed interest in this topic)
  • site admins provide manual or automated information on available resources (cores, RAM, disk and swap);
    • for example, a site could allow opportunistic usage of 20% of the idle resources, and if demand increases, the opportunistic usage could be scaled back immediately or gradually
  • system tested on our internal (beaver) openstack cloud (using a gradual scale-back)

Open questions

  • how to provide the opportunistic resources
    • e.g. manual delivery exchange of information or an automated system based on resource availability
  • when resources are required, do we terminate the VM (eg. EC2 spot VM) or allow a graceful retirement (letting the job finish 12-24 hours)
  • manual information exchange will limit changes to work-hours and best-effort during off-hours
    • though the cloud-admins could terminate our VMs and modify our quota at any moment
  • how to track the opportunistic usage

Possible deployment

  • construct an "opportunistic-cloud" to be managed by CSV2 (keep separate from production resources)
  • first test system - separate Openstack project (tenant) for HEP group with a small set of resources; send to us (file, email) quota for the day
  • resources should be of a scale that would see some changes in the system
    • though we could do that artificially - eg. 10 VMs on day1, 5 VMs on day2

Possible implementation

  • cloud quota set to total number of cores (by cloud admin)
  • we adjust softmax depending on what could be used and what is currently in use by all cloud projects

Example:

  • currently used on the cloud:
    • by all projects: 2000 cores
    • by us: 1000 cores

  • provided by cloud admin
    • CPUs on cloud: 5000 (slowly variable depending on number of worker nodes currently online)
    • used CPUs: 2000 (highly variable, used by all groups together)
    • max use allowed by us: 75% (mostly fixed)
    • hardquota in openstack: >=5000 (mainly fixed)

  • what we do with it:
    • allCPUs available: 5000*75/100 (3750)
    • usedCPUs: 2000-ownCPUs (1000)
    • set softmax: allCPUs-usedCPUs (2750) ---> that's the max we will use until next update
      • total CPUs used in the end: 3750
      • 1250 CPUs stay unused and can be used by others to start additional VMs

  • high usage case:
    • currently used by us 2750 cores
    • currently used in total: 4750 cores
    • allCPUs available same as above: 3750
    • usedCPUs by others: 2000
    • set softmax to: 1750 ---> we scale down by at least 1000 cores
      • but not lower than our own real quota (e.g. 2800 arbutus cores)

  • percentage decision by cloud admin:
    • low=fast startup of VMs for others but possibility of low overall usage of the cloud
    • high=others may not startup VMs for a while, but overall high usage of all cloud resources
  • Open questions:
    • how fast should we scale down
      • kill VMs (instantly) vs retire (up to 24h)
    • how to get the information about the cloud usage
      • cloud admin provides it vs we read the openstack db remotely

Opportunistic computing - Old material

DESCRIBES ONLY USE OF SOFTMAX FOR OPPORTUNISTIC USE BETWEEN CSV2 PROJECTS - NOT THE CLOUD BASED OPPORTUNISTIC USAGE!

Define configurable parameters (set via the GUI):

  • Q = cloud quota (set on the cloud) of a tenant [Not always the "true" quota on many clouds; cannot boot more that the "cloud quota"]
  • S = (cores softmax) quota within the tenant resources for all VM that run in this tenant (e.g. A plus B plus any service-VMs; eg. squids) [S has a value for B and another for A; S(A) and S(B)]

Define dynamic parameters:

  • F = (foreign cores) the number of cores within the tenant that are not use by this experiment (e.g. not used by A or by B) [F has a value for B and another for A; F(A) and F(B)]
  • #cores = number cores used by VMs on this cloud

Current settings on otter

  • S(B)= 397 cores
  • S(A)= 357 cores

B can start a VM if the #cores is less than S(B)=397 and A can start a VM if the #cores is less than S(A)=357

Example, B has #cores=0 and A has #cores=357, then B submits lots of jobs

  • B can start VMs until #cores = S(B) or 397
  • A sees that #cores is 397, which is larger than S(A)=357 and is above its quota
    • A then retires 40 cores (VMs) [#cores - S(A) = 40]
    • A jobs eventually retire and A-VM is destroyed
    • Once the A-VM is destroy, then B has #cores is less than S(B) and B can boot VMs
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | More topic actions
Topic revision: r6 - 2020-09-22 - mebert
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback