Difference: CloudSchedulerAdminGuide (1 vs. 9)

Revision 92015-05-15 - mhp

Line: 1 to 1
 
META TOPICPARENT name="VirtualizationProjectHome"
Line: 30 to 30
 
    • cloud_admin -d cloudname
  • Either:
    • force retire all the VMs: cloud_admin -o -c cloudname -a
Added:
>
>
    • or force retire some number of VMs: cloud_admin -o -c cloudname -b [number]
 
    • or force retire as many VMs as needed by giving the VMID of each one: cloud_admin -o -c cloudname -n VMID
  • Optionally:
    • edit /etc/cloudscheduler/cloud_resources.conf to set a reduced resource usage limit

Revision 82015-03-27 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="VirtualizationProjectHome"
Line: 9 to 9
 

The cloud_admin tool

The cloud_admin tool is not installed into the path by default; it will be in the directory cloud_scheduler was installed from.
  • on condor.heprc.uvic.ca it's in /root/cs_update/cloud-scheduler-dev/cloud_admin
Changed:
<
<
  • ./cloud_admin --help
    to see the options available
>
>
  • cloud_admin --help
    to see the options available
 

Common Tasks

Removing or Retiring VMs

Changed:
<
<
  • To remove a VM from management by CS: ./cloud_admin -m -c cloudname -n vmid
>
>
  • To remove a VM from management by CS: cloud_admin -m -c cloudname -n vmid
 
    • This causes CS to forget about a VM and leave it alone. The VM will stay up and Condor will keep running jobs
Changed:
<
<
  • To gracefully retire a VM: ./cloud_admin -o -c cloudname -n VMID
>
>
  • To gracefully retire a VM: cloud_admin -o -c cloudname -n VMID
 

Enabling and Disabling Clouds

Changed:
<
<
  • To enable a cloud: ./cloud_admin -e cloudname
  • To disable the cloud: ./cloud_admin -d cloudname
>
>
  • To enable a cloud: cloud_admin -e cloudname
  • To disable the cloud: cloud_admin -d cloudname
 
  • To make these changes persist after a restart, modify the enabled property in /etc/cloudscheduler/cloud_resources.conf

Adding Resources

Line: 27 to 27
 

Draining a cloud (completely or partially)

  • Disable the cloud:
Changed:
<
<
    • ./cloud_admin -d cloudname
>
>
    • cloud_admin -d cloudname
 
  • Either:
Changed:
<
<
    • force retire all the VMs: ./cloud_admin -o -c cloudname -a
    • or force retire as many VMs as needed by giving the VMID of each one: ./cloud_admin -o -c cloudname -n VMID
>
>
    • force retire all the VMs: cloud_admin -o -c cloudname -a
    • or force retire as many VMs as needed by giving the VMID of each one: cloud_admin -o -c cloudname -n VMID
 
  • Optionally:
    • edit /etc/cloudscheduler/cloud_resources.conf to set a reduced resource usage limit
    • when the VMs have finished retiring, do service cloud_scheduler quickrestart
Line: 44 to 44
 "GridPPClouds": ["gridpp-imperial","gridpp-oxford"] }
Changed:
<
<
Use ./cloud_admin -y to show the currently loaded aliases, and ./cloud_admin -t to reload them from the file.
>
>
Use cloud_admin -y to show the currently loaded aliases, and cloud_admin -t to reload them from the file.
 

In the event of a crash

Line: 52 to 52
 

Find out why a job is not booting a VM

  • Turn on verbose logging
Changed:
<
<
    • ./cloud_admin -l VERBOSE
>
>
    • cloud_admin -l VERBOSE
 
  • Get full cycle of the Scheduler thread logging
    • tail -f /var/log/cloudscheduler.log | grep Scheduler
  • look for messages from get_fitting_resources that indicate a resource mismatch or shortage
  • see what error responses are coming back from clouds that try to boot the VM but fail
  • Set the logging back to DEBUG
Changed:
<
<
    • ./cloud_admin -l DEBUG
>
>
    • cloud_admin -l DEBUG
 

Upgrading Cloud Scheduler

  • It's safer to shutdown all the VMs before doing an upgrade in case the class definitons have changed and break the persistence file
Line: 67 to 67
 
  • Drain all the VMs from all the clouds
    • See above task and repeat for each cloud
  • Stubborn VMs can be killed with cloud_admin
Changed:
<
<
    • ./cloud_admin -k -c cloudname -n vmid
>
>
    • cloud_admin -k -c cloudname -n vmid
 
  • When all VMs shutdown stop cloud scheduler
    • service cloud_scheduler stop
  • Get the new release from github (most likely the dev branch)

Revision 72014-02-18 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="VirtualizationProjectHome"
Added:
>
>
 

Cloud Scheduler Admin Guide

Note on the cloud_scheduler init script usage

Line: 34 to 36
 
    • when the VMs have finished retiring, do service cloud_scheduler quickrestart
  • Illustrated example Here
Added:
>
>

Cloud Aliases

The /etc/cloudscheduler/cloud_alias.json file can be used to define aliases for clouds, like this:
{
"CERNClouds": ["atlas_test","atlas_wigner","victoria_test"],
"GridPPClouds": ["gridpp-imperial","gridpp-oxford"]
}
Use ./cloud_admin -y to show the currently loaded aliases, and ./cloud_admin -t to reload them from the file.
 

In the event of a crash

  • Check /tmp/cloudscheduler.crash.log and/or post a new issue on github
  • get patch or quickfix the error based on the message in crash log and start cloud_scheduler back up: service cloud_scheduler start

Revision 62013-12-12 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="VirtualizationProjectHome"

Cloud Scheduler Admin Guide

Line: 10 to 10
 
  • ./cloud_admin --help
    to see the options available

Common Tasks

Added:
>
>

Removing or Retiring VMs

  • To remove a VM from management by CS: ./cloud_admin -m -c cloudname -n vmid
    • This causes CS to forget about a VM and leave it alone. The VM will stay up and Condor will keep running jobs
  • To gracefully retire a VM: ./cloud_admin -o -c cloudname -n VMID
 

Enabling and Disabling Clouds

  • To enable a cloud: ./cloud_admin -e cloudname
  • To disable the cloud: ./cloud_admin -d cloudname
Line: 59 to 64
 
  • Run some short test jobs to make sure VMs are booting up shutting down normally
  • Set the Panda queues back online
Deleted:
<
<

Miscellaneous

  • To remove a VM from management by CS: ./cloud_admin -m -c cloudname -n vmid
 -- MichealPaterson - 2013-05-28

Revision 52013-12-11 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="VirtualizationProjectHome"

Cloud Scheduler Admin Guide

Note on the cloud_scheduler init script usage

Changed:
<
<
  • To avoid having VMs shutdown always use quickrestart or quickstop and not the normal restart or stop
>
>
  • To avoid having VMs shutdown, always use quickrestart or quickstop instead of the normal restart or stop
 

The cloud_admin tool

Changed:
<
<
  • cloud_admin is not installed into the path by default, it will be in the directory cloud_scheduler was installed from
    • on condor.heprc.uvic.ca it's in /root/cs_update/cloud-scheduler-dev/cloud_admin
>
>
The cloud_admin tool is not installed into the path by default; it will be in the directory cloud_scheduler was installed from.
  • on condor.heprc.uvic.ca it's in /root/cs_update/cloud-scheduler-dev/cloud_admin
 
  • ./cloud_admin --help
    to see the options available
Added:
>
>
 

Common Tasks

Added:
>
>

Enabling and Disabling Clouds

  • To enable a cloud: ./cloud_admin -e cloudname
  • To disable the cloud: ./cloud_admin -d cloudname
  • To make these changes persist after a restart, modify the enabled property in /etc/cloudscheduler/cloud_resources.conf
 

Adding Resources

Changed:
<
<
  • To add or update resources, edit /etc/cloudscheduler/cloud_resources.conf , then service cloud_scheduler quickrestart
  • To re-enable a cloud that was disabled: ./cloud_admin -e cloudname
>
>
  • To add or update resources, edit /etc/cloudscheduler/cloud_resources.conf , then service cloud_scheduler quickrestart
 
Changed:
<
<

Removing Resources (without job interupts)

  • Disable the cloud
>
>

Draining a cloud (completely or partially)

  • Disable the cloud:
 
    • ./cloud_admin -d cloudname
Changed:
<
<
  • Force Retire the number of VMs you need to reduce resources by
    • ./cloud_admin -o -c cloudname -n vmid
  • edit /etc/cloudscheduler/cloud_resources.conf to decrease the resources
  • when the VMs have finished retiring quickrestart cloud_scheduler
    • service cloud_scheduler quickrestart
>
>
  • Either:
    • force retire all the VMs: ./cloud_admin -o -c cloudname -a
    • or force retire as many VMs as needed by giving the VMID of each one: ./cloud_admin -o -c cloudname -n VMID
  • Optionally:
    • edit /etc/cloudscheduler/cloud_resources.conf to set a reduced resource usage limit
    • when the VMs have finished retiring, do service cloud_scheduler quickrestart
  • Illustrated example Here
 

In the event of a crash

Changed:
<
<
  • Check /tmp/cloudscheduler.crash.log and email admin with contents or post a new issue on github
  • get patch or quickfix the error based on the message in crash log and start cloud_scheduler back up
  • service cloud_scheduler start

Finding out why a job is not booting a VM

>
>
  • Check /tmp/cloudscheduler.crash.log and/or post a new issue on github
  • get patch or quickfix the error based on the message in crash log and start cloud_scheduler back up: service cloud_scheduler start

Find out why a job is not booting a VM

 
  • Turn on verbose logging
    • ./cloud_admin -l VERBOSE
  • Get full cycle of the Scheduler thread logging
    • tail -f /var/log/cloudscheduler.log | grep Scheduler
  • look for messages from get_fitting_resources that indicate a resource mismatch or shortage
Changed:
<
<
  • see what error responses are coming back clouds that try to boot the vm but fail
>
>
  • see what error responses are coming back from clouds that try to boot the VM but fail
 
  • Set the logging back to DEBUG
    • ./cloud_admin -l DEBUG
Changed:
<
<

Draining a cloud of VMs without killing jobs

  • Disable the cloud
    • ./cloud_admin -d cloudname
  • Force retire all the VMs
    • ./cloud_admin -o -c cloudname -a
  • Illustrated example Here
>
>
 

Upgrading Cloud Scheduler

  • It's safer to shutdown all the VMs before doing an upgrade in case the class definitons have changed and break the persistence file
Changed:
<
<
  • Contact Ryan and Asoka to get the queues disabled so jobs stop coming in
>
>
  • Set the Panda queues brokeroff so jobs stop coming in
    • This needs to be done manually since the switcher doesn't know about Cloud Schedulers
 
  • Drain all the VMs from all the clouds
    • See above task and repeat for each cloud
  • Stubborn VMs can be killed with cloud_admin
Line: 53 to 57
 
  • python2.7 setup.py install
  • service cloud_scheduler start
  • Run some short test jobs to make sure VMs are booting up shutting down normally
Changed:
<
<
  • Contact Ryan and Asoka to turn the queues back online
>
>
  • Set the Panda queues back online
 

Miscellaneous

  • To remove a VM from management by CS: ./cloud_admin -m -c cloudname -n vmid

Revision 42013-07-19 - mhp

Line: 1 to 1
 
META TOPICPARENT name="VirtualizationProjectHome"

Cloud Scheduler Admin Guide

Line: 39 to 39
 
    • ./cloud_admin -d cloudname
  • Force retire all the VMs
    • ./cloud_admin -o -c cloudname -a
Added:
>
>
  • Illustrated example Here
 

Upgrading Cloud Scheduler

  • It's safer to shutdown all the VMs before doing an upgrade in case the class definitons have changed and break the persistence file
  • Contact Ryan and Asoka to get the queues disabled so jobs stop coming in

Revision 32013-07-15 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="VirtualizationProjectHome"

Cloud Scheduler Admin Guide

Line: 10 to 10
 
  • ./cloud_admin --help
    to see the options available

Common Tasks

Adding Resources

Changed:
<
<
  • edit /etc/cloudscheduler/cloud_resources.conf and add / update allocations
  • service cloud_scheduler quickrestart
>
>
  • To add or update resources, edit /etc/cloudscheduler/cloud_resources.conf , then service cloud_scheduler quickrestart
  • To re-enable a cloud that was disabled: ./cloud_admin -e cloudname
 

Removing Resources (without job interupts)

  • Disable the cloud
    • ./cloud_admin -d cloudname
Line: 52 to 53
 
  • service cloud_scheduler start
  • Run some short test jobs to make sure VMs are booting up shutting down normally
  • Contact Ryan and Asoka to turn the queues back online
Added:
>
>

Miscellaneous

  • To remove a VM from management by CS: ./cloud_admin -m -c cloudname -n vmid
  -- MichealPaterson - 2013-05-28 \ No newline at end of file

Revision 22013-06-27 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="VirtualizationProjectHome"

Cloud Scheduler Admin Guide

Line: 38 to 38
 
    • ./cloud_admin -d cloudname
  • Force retire all the VMs
    • ./cloud_admin -o -c cloudname -a
Changed:
<
<

Updgrading Cloud Scheduler

>
>

Upgrading Cloud Scheduler

 
  • It's safer to shutdown all the VMs before doing an upgrade in case the class definitons have changed and break the persistence file
  • Contact Ryan and Asoka to get the queues disabled so jobs stop coming in
  • Drain all the VMs from all the clouds

Revision 12013-05-28 - mhp

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="VirtualizationProjectHome"

Cloud Scheduler Admin Guide

Note on the cloud_scheduler init script usage

  • To avoid having VMs shutdown always use quickrestart or quickstop and not the normal restart or stop

The cloud_admin tool

  • cloud_admin is not installed into the path by default, it will be in the directory cloud_scheduler was installed from
    • on condor.heprc.uvic.ca it's in /root/cs_update/cloud-scheduler-dev/cloud_admin
  • ./cloud_admin --help
    to see the options available

Common Tasks

Adding Resources

  • edit /etc/cloudscheduler/cloud_resources.conf and add / update allocations
  • service cloud_scheduler quickrestart

Removing Resources (without job interupts)

  • Disable the cloud
    • ./cloud_admin -d cloudname
  • Force Retire the number of VMs you need to reduce resources by
    • ./cloud_admin -o -c cloudname -n vmid
  • edit /etc/cloudscheduler/cloud_resources.conf to decrease the resources
  • when the VMs have finished retiring quickrestart cloud_scheduler
    • service cloud_scheduler quickrestart

In the event of a crash

  • Check /tmp/cloudscheduler.crash.log and email admin with contents or post a new issue on github
  • get patch or quickfix the error based on the message in crash log and start cloud_scheduler back up
  • service cloud_scheduler start

Finding out why a job is not booting a VM

  • Turn on verbose logging
    • ./cloud_admin -l VERBOSE
  • Get full cycle of the Scheduler thread logging
    • tail -f /var/log/cloudscheduler.log | grep Scheduler
  • look for messages from get_fitting_resources that indicate a resource mismatch or shortage
  • see what error responses are coming back clouds that try to boot the vm but fail
  • Set the logging back to DEBUG
    • ./cloud_admin -l DEBUG

Draining a cloud of VMs without killing jobs

  • Disable the cloud
    • ./cloud_admin -d cloudname
  • Force retire all the VMs
    • ./cloud_admin -o -c cloudname -a

Updgrading Cloud Scheduler

  • It's safer to shutdown all the VMs before doing an upgrade in case the class definitons have changed and break the persistence file
  • Contact Ryan and Asoka to get the queues disabled so jobs stop coming in
  • Drain all the VMs from all the clouds
    • See above task and repeat for each cloud
  • Stubborn VMs can be killed with cloud_admin
    • ./cloud_admin -k -c cloudname -n vmid
  • When all VMs shutdown stop cloud scheduler
    • service cloud_scheduler stop
  • Get the new release from github (most likely the dev branch)
  • python2.7 setup.py install
  • service cloud_scheduler start
  • Run some short test jobs to make sure VMs are booting up shutting down normally
  • Contact Ryan and Asoka to turn the queues back online

-- MichealPaterson - 2013-05-28

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback