Tags:
create new tag
view all tags

See also CloudSchedulerAdminGuide.

CloudScheduler commands

List the status of the Belle2 Cloud Schdeduler (bellecs.heprc.uvic.ca and condor.heprc.uvic.ca)

cloud_status -a

Clusters in resource pool:
Cluster: alto
ADDRESS                    CLOUD TYPE       VM SLOTS    MEMORY     STORAGE    HYPERVISOR ENABLED
alto.cloud.nrc.ca          OpenStack        10          [224000]   2000                  False
(and more clouds)

To enable (-e) or disable (-d) clouds

cloud_admin -d e alto

Check the status of a single job

condor_q -analy [job.id]

To retire VM i-0001ab95 running on cernopenstack and is already in condor use:

cloud_admin -o -c cernopenstack -n i-0001ab95

To kill a VM which does not show up in condor yet use -k instead of -o, for example:

cloud_admin -k -c cernopenstack -n i-0001ab95

Adjusting the quota using the following command will cause the extra VM to be moved into a separate list and retired.

cloud_admin -c mouse -v 1

List the cloud aliases and the active clouds within those aliases (for condor.heprc.uvic.ca useful for the IAAS queue)

cloud_admin -y
cat /etc/cloudscheduler/cloud_alias.json

To change the number of VM slots, edit the CLOUD entry in /etc/cloudscheduler/cloud_resources.conf and then execute (this will charge the cloud statuses back to the default setting)

vi /etc/cloudscheduler/cloud_resources.conf
service cloud_scheduler quickrestart

To modify the VM-type or VM-flavour, edit the file and restart Cloudscheduler

vi /etc/cloudscheduler/cloud_scheduler.conf

When adding or removing a cloud, one needs to edit the following file:

vi /etc/cloudscheduler/cloud_aliases.json

To retire all the VMs in a cloud (they must be registered with HTCondor)

cloud_admin -d <cloudname>
cloud_admin -o -c <cloudname> -a

To kill all the VMs in a cloud (get rid of VMs not registered with HTCondor)

cloud_admin -k -c <cloudname> -a

List the status of the CS threads

cloud_status -x

Thread Heart beat times:
   Scheduler Thread(45): 23
   Cleanup Thread(90): 74
   VMPoller Thread(155): 49
   JobPoller Thread(45): 21
   MachinePoller Thread(45): 35
If the numbers are large, then one has to kill the processes:
ps aux | grep cloud_scheduler
kill -9 <CS process #>
service cloud_scheduler start

To see what VMs have attached to condor

condor_status -m

To hold all idle jobs and then release the idle jobs in the condor queue:

condor_hold -const 'JobStatus==1'
condor_release -const 'JobStatus==1'

To edit the RAM size or disk space requirement for Belle2 jobs:

vi /etc/condor/config.d/partition
JOB_DEFAULT_REQUESTMEMORY = 2000
JOB_DEFAULT_REQUESTDISK = 4000000

Also check the logs in

/var/log/cloudscheduler.log 
/var/log/condor/MatchLog
/var/log/condor/ShadowLog
/opt/dirac/runit/WorkloadManagement/SiteDirectorUVic/log/current

To find out where the job is running

grep "match (" /var/log/condor/SchedLog

To get the plot of the EC2 spot price (c3.4xlarge)

https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#SpotInstances:

To list all the image names and their ami values on CERN (on Belle-CS)

/root/cern_ec2_ami.py

-- RandallSobie - 2014-03-18

Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | More topic actions
Topic revision: r20 - 2015-12-16 - rsobie
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback