Tags:
create new tag
view all tags

NEP52 Cloud Computing Batch Service

Overview

If you already are familiar with batch computing and just want to get to running your jobs on VMs skip to the CloudScheduler Test Drive Section below.

Cloud Computing Batch Service provides the CloudScheduler tool which will manage Virtual Machine (VMs) on Infrastructure-as-a-Service (IaaS) clouds in order to run batch computing jobs. With CloudScheduler, you prepare batch jobs by selecting VM images, applications, data and parameters needed to process your workload, and then submit these selections to an HTCondor batch queue. CloudScheduler automatically starts the VMs, required to process your jobs in the HTCondor batch queue, on any of the available clouds. For people already familiar with classic batch computing, the process will be very familiar.

Here's how it should work for the user Jane's perspective:

  1. Jane prepares a VM image loaded with the software she needs for processing, then uploads it to an image repository. It's also possible that this could have been done previously by one of her colleagues, or she picks a pre-cooked image (as is the case in the following tutorial).
  2. Jane submits a bunch of processing jobs to a Condor pool. In the Condor jobs, she specifies regular Condor parameters, but also specifies a VM image that she would like her job to run on.
  3. Jane then waits for her jobs to complete.
  4. Jane gets her results.

rpi_cloud_architecture.png

The following tutorial will walk you though the steps necessary to run your first job on the cloud using the DAIR cloud. Only access to additional clouds is required for you to run your jobs distributed at multiple cloud sites. We start by assuming that you have access to the DAIR cloud.

CloudScheduler Test Drive

You can use the software provided in this RPI project to easily run your batch jobs on the DAIR cloud. The following examples will allow you to test drive this functionality quickly. In summary:

  • "Running your first batch job" will have you launch an instance of the CloudScheduler image (NEP52-cloud-scheduler), configure and start CloudScheduler, and submit a batch job. The batch job will trigger CloudScheduler to boot a VM on the DAIR OpenStack Cloud automatically, the job will run, and you can monitor its progress and check the job output. At the end of the job, when there are no more jobs in the queue, CloudScheduler will automatically remove idle batch VMs.
  • " Running a batch job which uses the SharedSoftware Repository service" will have you launch an instance of the CVMFS server image (NEP52-cvmfs-server), submit a batch job, and check the output of the distributed application.

In order to try the CloudScheduler Test Drive, you will need the following:

  • A DAIR login ID with a large enough quota to run the three concurrent demonstration instances. If you do not have a DAIR login ID, you may request an account by sending email to dair.admin@canarie.ca
  • To create your own keypair and save the pem file locally (see the Openstack dashboard/documentation).
  • To retrieve your EC2_ACCESS_KEY and EC2_SECRET_KEY from the Openstack dashboard.

Preparation

The NEP52 Batch Services service and the related NEP52 Shared Software Repository service make use of the network and require specific ports to be open. These ports must be added to your OpenStack default security group by logging into the OpenStack dashboard, selecting the "Access & Security" tab and clicking "Edit Rules" beside the default security group. Use the "Add Rule" dialog at the bottom of the form and ensure that all the ports shown in the figure below are included before proceeding with the test drive:

DefaultSG.png

Running your first batch job

Step 1: Log into DAIR and boot a CloudScheduler instance

Login into the DAIR OpenStack Dashboard: https://nova-ab.dair-atir.canarie.ca . Select the alberta region. Refer to the OpenStack docs for all the details of booting and managing VMs via the dashboard.

Go to the 'Images and Snapshot' tab on the left of the page then click the button that says 'Launch' next to the NEP52-cloud-scheduler image.

Fill in the form to look the same as the screen shot below substituting your username where you see the string "hepnet".

launch.png

Now you need to select your SSH key to associate with the instance so that you can login to the image. Click the access and security tab, pick your key, click "launch" (see screen shot below) and wait for the instance to become active.

select_key.png

Step 2: Log into the CloudScheduler instance and configure it

Now associate a floating IP to the machine. Click on the instances tab on the left. From the "Actions" beside your newly started CloudScheduler instance, choose "Associate Floating IP", complete the dialog and click "Associate".

Now ssh into the box as root (you can find the IP of the machine from the dashboard):

%STARTCONSOLE% ssh -i ~/.ssh/MyKey.pem root@208.75.74.80 %ENDCONSOLE%

Use your favourite editor (ie. nano, vi, or vim) to edit the CloudScheduler configuration file to contain your DAIR EC2 credentials, specifically "{keypair_name}", "{EC2_ACCESS_KEY}", and "{EC2_SECRET_KEY}", for both the Alberta and Quebec DAIR clouds. Then start the CloudScheduler service:

%STARTCONSOLE% vi /etc/cloudscheduler/cloud_resources.conf service cloud_scheduler start %ENDCONSOLE%

If you don't have your credentials follow this video to see how to do it. Your credentials will be used by CloudScheduler to boot VMs on your behalf.

Step 3: Run a job and be amazed

Switch to the guest user on the VM and then submit the first demonstration job which calculates pi to 1000 decimal places. You can then see what's happening with cloud_status and condor_q or you can issue these two commands periodically through "watch" to monitor the job progress:

%STARTCONSOLE% su - guest condor_submit demo-1.job watch 'cloud_status -m; condor_status; condor_q' %ENDCONSOLE%

When the job completes, it disappears from the queue. The primary output for the job will be contained in the file 'demo-1.out', errors will be reported in 'demo-1.err', and the HTCondor job log is saved in 'demo-1.log'. All these file names are user defined in the job description file 'demo-1.job'.

%STARTCONSOLE% cat demo-1.out %ENDCONSOLE%

You have just run a demonstration job on a dynamically created Virtual Machine.

Running a batch job which uses the Shared Software Repository service

We provide a VM appliance which is preconfigured with CVMFS which will allow you to share your software to multiple running VMs.

CVMFS is a read only network file system that is designed for sharing software to VMs. It's a secure and very fast way to mount POSIX network file system that can be shared to hundreds of running VMs.

Step 1:

Using the OpenStack dashboard and the same launch procedure as for the CloudScheduler image, launch an instance of NEP52-cvmfs-server. You must set the instance name to "{username}-cvmfs" (obviously replacing "{username}" with your own username). It is always a good idea to assign the instance your keypair so that you can log into it if the need arises.

Step 2:

If you are not already logged into the CloudScheduler VM, login and switch to the guest account:

%STARTCONSOLE% ssh -i ~/.ssh/MyKey.pem root@208.75.74.80 su - guest %ENDCONSOLE%

Edit the second demonstration job description file, and replace the string "{username}" with your username.

%STARTCONSOLE% vi demo-2.job %ENDCONSOLE%

The line you must change looks like this:

%STARTCONSOLE% Arguments = {user-name} %ENDCONSOLE%

Now submit the job and watch it like we did before:

%STARTCONSOLE% condor_submit demo-2.job watch 'cloud_status -m; condor_q' %ENDCONSOLE%

Once the job finishes you should see something like this in the file "demo-2.out":

%STARTCONSOLE% cat demo-2.out Job started at Tue May 28 15:40:22 PDT 2013 => demo-2.sh <= Simple script for testing the default CVMFS appliance.

Shutting down CernVM-FS: [ OK ] Stopping automount: [ OK ] Starting automount: [ OK ] Starting CernVM-FS: [ OK ]

-rwxr-xr-x 1 cvmfs cvmfs 110 Mar 28 16:00 /cvmfs/dair.cvmfs.server/Hello -rw-r--r-- 1 cvmfs cvmfs 47 Mar 28 16:00 /cvmfs/dair.cvmfs.server/empty

Hello! You have successfully connected to the skeleton CVMFS server and run its software.

Job finished at Tue May 28 15:40:27 PDT 2013 %ENDCONSOLE%

Customizing the Shared Software Repository server to host your applications

In order to make the CVMFS server really useful to you, you will need to install your application software within its' repository. Modifying the content of the software repository is outside the scope of this document, but it is covered by the service documentation for "NEP52 Shared Software Repository" service.

Running a batch job to exercise the modifications to your Shared Software Repository

If you have followed the documentation for "NEP52 Shared Software Repository" and have created the sample "Goodbye" application, then you may want to run the third demonstration job to exercise your CVMFS modifications in the batch envirnoment. The procedure is identical to the one given for the second demonstration job above, substituting "demo-3" for "demo-2" wherever it appears.

Take snapshots of your customized images

If you followed all the steps above you have a customized version the CloudScheduler appliance running. You can now use the OpenStack dashboard to snapshot this server to save yourself the work of customizing it again.

Using the Batch Service VMs on private clouds

If you would like to use the Batch Services virtual machines on your own private cloud, you may retrieve the VM images from the DAIR repository for installation in your cloud's repository. DAIR is an OpenStack cloud with the Glance image repository. Assuming your private cloud is also an OpenStack cloud with a Glance repository, the process for copying the image from DAIR to your own repository would be:

  • On a convenient workstation:
    • Install the glance client (eg. yum install python-glanceclient, apt-get install python-glanceclient, etc.).
    • Log into the DAIR cloud and download the RC file (ie. Settings->!OpenStack API->Download RC File).
    • Source the DAIR RC file (eg. source openrc-alberta.sh).
    • Download the image to your workstation, eg:
      • glance image-list
      • glance image-download --file {image-name-on-workstation} {image-ID-from-list-on-DAIR}
    • Log into your private cloud and download the RC file (ie. Settings->!OpenStack API->Download RC File).
    • Source your cloud's RC file (eg. source openrc-yourcloud.sh).
    • Upload the image to your private cloud, eg:
      • glance image-create --file {image-name-on-workstation} --name {image-name-on-your-cloud} --disk-format qcow2 --container-format bare
Topic attachments
I Attachment HistorySorted ascending Action Size Date Who Comment
PNGpng launch.png r1 manage 57.9 K 2013-05-28 - 16:57 UnknownUser  
PNGpng rpi_cloud_architecture.png r1 manage 98.6 K 2013-09-03 - 01:23 UnknownUser  
PNGpng select_key.png r2 r1 manage 23.6 K 2013-05-28 - 19:41 UnknownUser  
Edit | Attach | Watch | Print version | History: r36 < r35 < r34 < r33 < r32 | Backlinks | Raw View | More topic actions
Topic revision: r36 - 2014-02-12 - crlb
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback