-- ColinLeavettBrown - 2013-04-25

NEP52 Batch Services for RPI

Introduction

NEP52 provides a powerful, scalable batch processing capability to the Research Platform Initiative (RPI). This function is enabled through the following Virtual Machine (VM) images:

  • NEP52-cloud-scheduler - This VM hosts Cloud Scheduler, a service to auto-provision VMs, together with its HTCondor batch job scheduling environment. In addition, this node provides user login capabilities to allow users to submit their workload to the scheduler for execution, monitor and retrieve their results.
  • NEP52-cvmfs-server - Provides a software distribution appliance VM that can host and distribute software for multiple VMs and VM types. The VM provided contains only one simple demonstration application and should be considered a template for building efficient, project specific software repositories. Using this server in a project can greatly reduce image sizes and improve image and software propagation efficiency.
  • NEP52-batch-cvmfs-client - This VM provides a minimal Scientific Linux 6.3 kernel installation for both interactive and batch processing. It has been configured to access software from a CVMFS server and, if instantiated by Cloud Scheduler, to register with a HTCondor batch scheduler and run batch jobs.

Procedures:

The following procedures are designed to demonstrate the capabilities of the NEP52 batch processing services. By executing these procedures, you should learn how to utilize these services to process your own batch workload. The procedures will accomplish the following:

  1. Interactively run the demonstration application allowing you to become familiar with OpenStack services, OpenStack networking, and the CVMFS software distribution appliance (procedure #1).
  2. Customize the NEP52-cvmfs-server image to create a CVMFS server of your own (procedure #2).
  3. Customize the NEP52-batch-cvmfs-client image to create a VM to run a demonstration job using your CVMFS server (procedure #3).
  4. Customize the NEP52-cloud-scheduler image to create a user login ("head") node and scheduling environment to run your batch workloads on the DAIR clouds (procedure #4).
  5. Run a demonstration batch job, monitor its progress, and check its output (procedure #5).

1. Running the "Hello" demonstration application from the software repository

  1. Launch an instance of NEP52-cvmfs-server. If you are able (ie. it is not already being used by someone else), set the instance name to "cvmfs-server". The NEP52-batch-cvmfs-client is already configured to communicate with a server named "cvmfs-server" and you will not have to perform step 4 if you choose this name. Also, though it is not necessary for this procedure, assigning a "keypair" and "floating-ip" to the server would allow you to re-use this server in procedure #2.
  2. Launch an instance of NEP52-batch-cvmfs-client, assigning a "keypair" and a floating IP.
  3. When your instance of the NEP52-batch-cvmfs-client becomes "active", login as root (ie. "ssh -i <pub-key> root@<floating-ip>").
  4. If you were unable to choose the server name of "cvmfs-server", reconfigure the client to use the name you chose for the server"
    • In the domain configuration file "/etc/cvmfs/domain.d/cvmfs.server", change the server name from "cvmfs-server" in the URL on line #18 to the name you chose in step 1.
    • In the configuration file "/etc/cvmfs/config.d/dair.cvmfs.server.conf", change the server name from "cvmfs-server" in the URL on line #1 to the name you chose in step 1.
    • Activate the new configuration with the command "service cvmfs restartautofs".
  5. Issue the command "/cvmfs/dair.cvmfs.server/Hello".

2. Creating a customized software repository

  1. If you are able to re-use the NEP52-cvmfs-server from procedure #1, login as root (ie. "ssh -i <pub-key> root@<floating-ip>"). Otherwise, launch an instance of NEP52-cvmfs-server, assigning a "keypair" and a floating IP and, when "active", login as root.
  2. Modify the content of the "/cvmfs/dair.cvmfs.server/" directory. As distributed, this directory contains the "empty" placeholder and the "Hello" script. The content of this directory, including any directory subtree, will be distributed to requesting clients once the repository has been signed and published. As a demonstration, copy "Hello" to "Goodbye" and modify the echoed text in "Goodbye" as desired.
  3. Sign and publish the repository by issuing the following commands:
    • chown -R cvmfs.cvmfs /cvmfs/dair.cvmfs.server
    • cvmfs-sync
    • cvmfs_server publish
  4. Use the OpenStack dashboard to take a snapshot of your CVMFS server.

3. Creating a customized client to use a customized software repository

  1. Launch an instance of your CVMFS server snapshot created in procedure #2. The name you choose for this instance will be used in subsequent steps of this procedure.
  2. If you are able to re-use the NEP52-batch-cvmfs-client from procedure #1, login as root (ie. "ssh -i <pub-key> root@<floating-ip>"). Otherwise, launch an instance of NEP52-batch-cvmfs-client, assigning a "keypair" and a floating IP and, when "active", login as root.
  3. Modify the CVMFS client configuration as follows:
    • In the domain configuration file "/etc/cvmfs/domain.d/cvmfs.server", change the server name (was originally "cvmfs-server') in the URL on line #18 to the name you chose in step 1.
    • In the configuration file "/etc/cvmfs/config.d/dair.cvmfs.server.conf", change the server name (was originally "cvmfs-server') in the URL on line #1 to the name you chose in step 1.
    • Activate the new configuration with the command "service cvmfs restartautofs".
  4. Check the functionality of your client/server:
    • Use standard linux commands to view the content of your software directoy, eg. "ls -l /cvmfs/dair.cvmfs.server/*".
    • Execute applications from your software directory.
  5. Save a copy of your client:
    • Change the image name within the "/.image.metadata" to something unique.
    • Delete network related UDEV rules created by this kernel for each unique MAC address, ie. "sed -i '/ATTR{address}/ D' /etc/udev/rules.d/70-persistent-net.rules".
    • Use the OpenStack dashboard to take a snapshot of your batch client.

4. Configuring and starting the Cloud Scheduler service

  1. Launch an instance of NEP52-cloud-scheduler, assigning a "keypair" and a floating IP.
  2. When your instance of the NEP52-cloud-scheduler becomes "active", login as root (ie. "ssh -i <pub-key> root@<floating-ip>").
  3. Configure the cloud resources that you want to use. The configuration file, located at "/etc/cloudscheduler/cloud_resources.conf" documents all cloud resource parameters and contains template definitions for the Alberta and Quebec DAIR clouds at the bottom of the configuration file. The following items within each template should be modified:
    • Copy your ec2 credentials, "key_name", "access_key_id", and "secret_key_id", into the places indicated.
    • Review/adjust the resources, "vm_slots", "cpu_cores", "storage", and "memory" to be used on the cloud.
    • Be sure to set "enabled: true" for the appropriate clouds.
  4. Start the Cloud Scheduler service, ie. "service cloud_scheduler start".
  5. Add Cloud Scheduler to the list of services to be started automatically at boot, ie. "chkconfig cloud_scheduler on"
  6. Save a copy of your Cloud Scheduler:
    • You may wish to review procedure #5, prior to taking your snapshot, which also makes customizations (specifically #5.3 and #5.4) to the Cloud Scheduler image.
    • Delete network related UDEV rules created by this kernel for each unique MAC address, ie. "sed -i '/ATTR{address}/ D' /etc/udev/rules.d/70-persistent-net.rules".
    • Use the OpenStack dashboard to take a snapshot of your batch client.

5. Running a batch job

For this procedure, we will assume you have a running Cloud Scheduler (procedure #4), a running custom software repository (procedure #2), and that you have a customized client (procedure #3).
  1. Login to the Cloud Scheduler, ie. "ssh -i <pub-key> root@<floating-ip>").
  2. Switch to the sysadmin account, ie. "su - sysadmin". This normal user account has password-less sudo access and a template job description file and simple job script.
  3. Determine the "ami" of your customized client:
    • Copy your ec2 credentials, access key id, and secret key id, into the places indicated within the /home/sysadmin/.ec2/ec2_credentials personal configuration file.
    • Issue "list_ami" to list all the images/ami IDs that you may access; ami IDs have the format "ami-xxxxxxxx" where "xxxxxxxx" are hexadecimal digits.
  4. Finalize the demonstration "/home/sysadmin/demo.job" job description file:
    • Change "<image-name>" to match the unique name you set within the your batch client's /.image.metadata file (see procedure #3.5).
    • Change "<ami-id>" to the ami ID of your client image.
  5. Submit the job for execution, ie. "condor_submit demo.job". The results of the job will be returned to the file "demo.log", "demo.out", and "demo.error" upon job completion.
  6. The progress of the job can be monitored with the "condor_q" and "cloud_status" commands", eg. try "watch 'cloud_status -t; cloud_status -m; condor_q'" - use "Control-C" to exit the "watch" command. Cloud Scheduler has several polling cycles, so it can take several minutes before the VM starts and one or two more before the job runs. If your job won't run, the Cloud Scheduler log at /var/log/cloudscheduler.log probably contains the reason why.
Edit | Attach | Watch | Print version | History: r36 | r18 < r17 < r16 < r15 | Backlinks | Raw View | More topic actions...
Topic revision: r16 - 2013-05-22 - crlb
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback