See also
CloudSchedulerAdminGuide.
List the status of the Belle2 Cloud Schdeduler (bellecs.heprc.uvic.ca and condor.heprc.uvic.ca)
cloud_status -a
Clusters in resource pool:
Cluster: alto
ADDRESS CLOUD TYPE VM SLOTS MEMORY STORAGE HYPERVISOR ENABLED
alto.cloud.nrc.ca OpenStack 10 [224000] 2000 False
(and more clouds)
To enable (-e) or disable (-d) clouds
cloud_admin -d e alto
Check the status of a single job
condor_q -analy [job.id]
To retire VM i-0001ab95 running on cernopenstack and is already in condor use:
cloud_admin -o -c cernopenstack -n i-0001ab95
To kill a VM which does not show up in condor yet use -k instead of -o, for example:
cloud_admin -k -c cernopenstack -n i-0001ab95
Adjusting the quota using the following command will cause the extra VM to be moved into a separate list and retired.
cloud_admin -c mouse -v 1
List the cloud aliases and the active clouds within those aliases (for condor.heprc.uvic.ca useful for the IAAS queue)
cloud_admin -y
cat /etc/cloudscheduler/cloud_alias.json
To change the number of VM slots, edit the CLOUD entry in
/etc/cloudscheduler/cloud_resources.conf and then execute
(this will charge the cloud statuses back to the default setting)
vi /etc/cloudscheduler/cloud_resources.conf
service cloud_scheduler quickrestart
To modify the VM-type or VM-flavour, edit the file and restart Cloudscheduler
vi /etc/cloudscheduler/cloud_scheduler.conf
When adding or removing a cloud, one needs to edit the following file:
vi /etc/cloudscheduler/cloud_aliases.json
To retire all the VMs in a cloud (they must be registered with HTCondor)
cloud_admin -d <cloudname>
cloud_admin -o -c <cloudname> -a
To kill all the VMs in a cloud (get rid of VMs not registered with HTCondor)
cloud_admin -k -c <cloudname> -a
List the status of the CS threads
cloud_status -x
Thread Heart beat times:
Scheduler Thread(45): 23
Cleanup Thread(90): 74
VMPoller Thread(155): 49
JobPoller Thread(45): 21
MachinePoller Thread(45): 35
If the numbers are large, then one has to kill the processes:
ps aux | grep cloud_scheduler
kill -9 <CS process #>
service cloud_scheduler start
To see what VMs have attached to condor
condor_status -m
To hold all idle jobs and then release the idle jobs in the condor queue:
condor_hold -const 'JobStatus==1'
condor_release -const 'JobStatus==1'
To edit the RAM size or disk space requirement for Belle2 jobs:
vi /etc/condor/config.d/partition
JOB_DEFAULT_REQUESTMEMORY = 2000
JOB_DEFAULT_REQUESTDISK = 4000000
Also check the logs in
/var/log/cloudscheduler.log
/var/log/condor/MatchLog
/var/log/condor/ShadowLog
/opt/dirac/runit/WorkloadManagement/SiteDirectorUVic/log/current
To find out where the job is running
grep "match (" /var/log/condor/SchedLog
To get the plot of the EC2 spot price (c3.4xlarge)
https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#SpotInstances:
To list all the image names and their ami values on CERN (on Belle-CS)
/root/cern_ec2_ami.py
--
RandallSobie - 2014-03-18