-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Original document written by PatrickArmstrong. Modified for initial implementation of the split configuration on the elephant cluster. In this configuration, there are eleven nodes, with the head node residing on elephant01. During initial deployment the first three nodes will be used to develop cluster management, while the remaining nodes, elephant04 to elehant11 will be used for an interim cloud cluster, with elephant11 acting as its head node.

Overview

  1. Create privileged "nimbus" user on all cluster nodes.
  2. Switch to nimbus user for the remainder of the installation and create public/private keys.
  3. Download all required packages.
  4. Switch to temporary head node and install prerequisites

Step 1: Create privileged "nimbus" user on all cluster nodes.

Create the nimbus account on elephant head node (elephant01) with the required sudo privileges.

[crlb@elephant01 ~]$ sudo adduser nimbus
[crlb@elephant01 ~]$ sudo visudo

Comment out the requiretty directive:

#Defaults    requiretty

Allow any command with a password:

nimbus  ALL=(ALL)       ALL

And the following commands with no password:

nimbus  ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/mount-alter.sh
nimbus  ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/dhcp-config.sh
nimbus  ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/xen-ebtables-config.sh

Save changes and propagate to every node in the cluster:

[crlb@elephant01 ~]$ sudo /usr/local/sbin/usync

Step 2: Switch to nimbus user for the remainder of the installation and create public/private keys.

[crlb@elephant01 ~]$ sudo su - nimbus
Password: ********
[nimbus@elephant01 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Created directory '/home/nimbus/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
e4:43:60:84:0c:ea:dc:02:dd:4b:93:fd:f4:e4:38:e8 nimbus@elephant01.heprc.uvic.ca
[nimbus@elephant01 ~]$ cp .ssh/id_rsa.pub .ssh/authorized_keys
[nimbus@elephant01 ~]$ 

Step 3: Download all required packages.

[nimbus@elephant01 ~]$ mkdir -p Downloads/nimbus-2.3
[nimbus@elephant01 ~]$ cd Downloads/nimbus-2.3
[nimbus@elephant01 nimbus-2.3]$ wget http://www.nimbusproject.org/downloads/nimbus-2.3.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://www.nimbusproject.org/downloads/nimbus-controls-2.3.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.8/ws-core/bin/ws-core-4.0.8-bin.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://mirror.csclub.uwaterloo.ca/apache/ant/binaries/apache-ant-1.8.0-bin.tar.bz2
[nimbus@elephant01 nimbus-2.3]$ wget http://www.gridcanada.ca/ca/bffbd7d0.0

Switch to the interim cloud cluster head node, elephant11 and install java-1.6.0-sun-compat:

[crlb@elephant01 ~]$ ssh e11
[crlb@elephant11 ~]$ sudo yum -y install java-1.6.0-sun-compat

Install Apache Ant

[crlb@elephant11 ~]$ cd /usr/local
[crlb@elephant11 local]$ sudo tar -xjvf ~nimbus/Downloads/apache-ant-1.8.0-bin.tar.bz2

Create home for nimbus/globus ws-core

[crlb@elephant11 local]$ sudo mkdir nimbus-2.3
[crlb@elephant11 local]$ sudo chown nimbus.nimbus nimbus-2.3
[crlb@elephant11 local]$ sudo ln -s nimbus-2.3 nimbus

and for Nimbus worker node control software

[crlb@elephant11 local]$ sudo mkdir -p /opt/nimbus-2.3
[crlb@elephant11 local]$ sudo chown nimbus.nimbus /opt/nimbus-2.3

Switch to the nimbus account

[crlb@elephant11 local]$ sudo su - nimbus
[nimbus@elephant11 ~]$ 

The remainder of this procedure is done entirely as the nimbus user. The frontend tools will be installed to /usr/local/nimbus, and the backend tools to /opt/nimbus. Both of these directories are owned by nimbus, and need to be created by root. Additionally, /opt/nimbus is NFS mounted on the worker nodes. This install uses a custom build of Nimbus that has features and fixes that are not in a release at this time. In the future, this install should be done from the latest release on the Nimbus website.

This install will install the minimum set of utilities to run Nimbus. Nimbus needs to run in a Globus Container to work, and we will install the bare essentials of the Globus container, and will use a cert from Grid Canada. If you need the full set of globus utilities, please refer to the instructions on the GridX1 Wiki, and skip "Installing the Webservice Core" on this page.

Installing the Webservice Core

First installwe set up the basic globus webservice core. First, download and install the basic core tools.

[nimbus@elephant11 ~]$ cd /usr/local/nimbus
[nimbus@elephant11 nimbus]$ tar -xzf ~/Downloads/ws-core-4.0.8-bin.tar.gz
[nimbus@elephant11 nimbus]$ mv ws-core-4.0.8/* .
[nimbus@elephant11 nimbus]$ rmdir ws-core-4.0.8

Create an empty grid-mapfile. This file will contain the certificate subjects of the users of your cloud-enabled cluster.

[nimbus@elephant11 nimbus]$  touch /usr/local/nimbus/share/grid-mapfile

Now set our environment variables. I'm assuming bash is your nimbus user's shell. If you're using csh or ksh, you might want to try substituting .profile for .bashrc:

[nimbus@elephant11 nimbus]$ cd
[nimbus@elephant11 ~]$  echo "export GLOBUS_LOCATION=/usr/local/nimbus" >> .bashrc
[nimbus@elephant11 ~]$  echo "export X509_CERT_DIR=/usr/local/nimbus/share/certificates" >> .bashrc
[nimbus@elephant11 ~]$  echo "export PATH=$PATH:/usr/local/apache-ant-1.8.0/bin" >> .bashrc
[nimbus@elephant11 ~]$  . .bashrc

Certificates

Now we can set up the certificates. We're going to put them in our $X509_CERT_DIR . First, we make our certificates directory and put the grid canada root certificates in there.

[nimbus@elephant11 ~]$ mkdir -p $X509_CERT_DIR
[nimbus@elephant11 ~]$ cd $X509_CERT_DIR
[nimbus@elephant11 ~]$ wget http://www.gridcanada.ca/ca/bffbd7d0.0

Then create a host certificate request to send to our CA.

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/grid-cert-request -int -host `hostname -f` -dir $X509_CERT_DIR -caEmail ca@gridcanada.ca -force
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
Enter organization DN by entering individual component names and their values.
The component name can be one of: [givenname, surname, ou, uid, cn, initials, unstructuredname, t, unstructuredaddress, emailaddress, o, st, l, generation, sn, e, c, dc]
-----
Enter name component: C
Enter 'C' value: CA
Enter name component: O
Enter 'O' value: Grid
Enter name component: 
Generating a 1024 bit RSA private key
A private key and a certificate request has been generated with the subject:

/C=CA/O=Grid/CN=host/canfardev.dao.nrc.ca

The private key is stored in /usr/local/nimbus/share/certificates/hostkey.pem
The request is stored in /usr/local/nimbus/share/certificates/hostcert_request.pem

Now mail this request file (/usr/local/nimbus/share/certificates/hostcert_request.pem) to ca@gridcanada.ca . It might take a day or so before you get your certificate back.

Once you have your key, paste the contents into /usr/local/nimbus/share/certificates/hostcert.pem

Now that we have our certificate, we have to point our container to our key and certificate and to our empty grid-mapfile. To do so, edit $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml to point to your new certificates and modify the gridmap value:

[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml 
<?xml version="1.0" encoding="UTF-8"?>
<securityConfig xmlns="http://www.globus.org">
    <credential>
        <key-file value="/usr/local/nimbus/share/certificates/hostkey.pem"/>
        <cert-file value="/usr/local/nimbus/share/certificates/hostcert.pem"/>
    </credential>
    <gridmap value="/usr/local/nimbus/share/grid-mapfile"/>
</securityConfig>

Now we'll activate our security configuration by adding a element under the CONTAINER_SECURITY_DESCRIPTOR:

[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
<!-- @CONTAINER_SECURITY_DESCRIPTOR@ -->
<parameter name="containerSecDesc"
              value="etc/globus_wsrf_core/global_security_descriptor.xml"/>

Testing our Container

Now that we've set up security, we can try starting our container for the first time. To do so, run globus-start-container. You should see something like the following:

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-container
Starting SOAP server at: https://204.174.103.121:8443/wsrf/services/ 
With the following services:

[1]: https://204.174.103.121:8443/wsrf/services/AdminService
[2]: https://204.174.103.121:8443/wsrf/services/AuthzCalloutTestService
[3]: https://204.174.103.121:8443/wsrf/services/ContainerRegistryEntryService
...
[25]: https://204.174.103.121:8443/wsrf/services/gsi/AuthenticationService

If you do, hit control-c. Congratulations! Your container is working.

If you get the following error

org.globus.common.ChainedIOException: Authentication failed [Caused by: Failure unspecified at GSS-API level [Caused by: Unknown CA]]
You are probably missing the Grid Canada .0 file(bffbd7d0.0 in this case). Either copy the file from another globus machine's X509_CERT_DIR or download the GC CA Bundle from the GC Certificate Authority website. and put the bffbd7d0.0 file into the X509_CERT_DIR and try starting the container again.

Automate Startup of Container

Now that we know our container works, we can create a script to run our container at login. Paste the following script into $GLOBUS_LOCATION/bin/globus-start-stop:

#!/bin/sh
set -e
export GLOBUS_OPTIONS="-Xms256M -Xmx1024M -Dorg.globus.tcp.port.range=50000,51999"
export GLOBUS_TCP_PORT_RANGE=50000,51999

cd $GLOBUS_LOCATION
case "$1" in
    start)
        nohup $GLOBUS_LOCATION/bin/globus-start-container -p 8443 \
       >>$GLOBUS_LOCATION/var/container.log &
        ;;
    stop)
        $GLOBUS_LOCATION/bin/grid-proxy-init \
            -key $GLOBUS_LOCATION/share/certificates/hostkey.pem\
            -cert $GLOBUS_LOCATION/share/certificates/hostcert.pem\
            -out /tmp/shutdownproxy.pem\
            >/dev/null
        export X509_USER_PROXY=/tmp/shutdownproxy.pem
        $GLOBUS_LOCATION/bin/globus-stop-container hard
        unset X509_USER_PROXY
        rm /tmp/shutdownproxy.pem
        ;;
    restart)
        $GLOBUS_LOCATION/start-stop stop
        $GLOBUS_LOCATION/start-stop start
        ;;
    *)
        echo "Usage: globus {start|stop}" >&2
        exit 1
       ;;
esac
exit 0

Then mark it as executable:

[nimbus@elephant11 ~]$ chmod 744 $GLOBUS_LOCATION/bin/globus-start-stop

We can now try starting and stopping the container with this script, and see if we're listening on 8443:

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop start
$ netstat -an | grep 8443
tcp        0      0 0.0.0.0:8443                0.0.0.0:*                   LISTEN     

Great! Now we have a running container. Let's stop it before we carry on with our installation.

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop stop

Install Nimbus

Unpack the nimbus package and run the install script.
[nimbus@elephant11 ~]$ cd /tmp
[nimbus@elephant11 tmp]$ tar -xvf ~/Downloads/nimbus-2.3.tar.gz
[nimbus@elephant11 tmp]$ /tmp/nimbus-2.3/bin/all-build-and-install.sh
[nimbus@elephant11 tmp]$ rm -rf /tmp/nimbus-2.3/
[nimbus@elephant11 tmp]$ rmdir hsperfdata_nimbus

Setting Up Worker Nodes

Setting up passwordless access to worker nodes

Nimbus needs to be able to ssh without a password from the head node to the worker nodes and vice versa. This is for sending commands back and forth. The following setup assumes you have the nimbus home directory mounted over NFS between the head node and the worker nodes. If you don't you'll just need to copy the .ssh directory on the head node to the nimbus home directory on each worker.

$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
9c:75:52:2f:d9:bd:5a:05:43:ee:3f:b2:83:cc:f2:0b nimbus@canfardev.dao.nrc.ca
$ cd ~/.ssh
$ cp id_rsa.pub authorized_keys
$ chmod 600 authorized_keys

Now test it:

nimbus@canfardev $ ssh gildor
nimbus@ gildor $ ssh canfardev.dao.nrc.ca
nimbus@canfardev $

Great. It works. You may be asked to authorize a new host key. If so, just answer "yes".

Setting up Xen, ebtables and dhcpd

First, make sure Xen is installed. If it is, you should see something like the following when you run these commands:

# which xm
/usr/sbin/xm
# uname -r
2.6.18-128.1.1.el5xen
$ ps aux | grep xen
root        21  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenwatch]
root        22  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenbus]
root      2549  0.0  0.0   2188   956 ?        S    16:35   0:00 xenstored --pid-file /var/run/xenstore.pid
root      2554  0.0  0.1  12176  3924 ?        S    16:35   0:00 python /usr/sbin/xend start
root      2555  0.0  0.1  63484  4836 ?        Sl   16:35   0:00 python /usr/sbin/xend start
root      2557  0.0  0.0  12212   364 ?        Sl   16:35   0:00 xenconsoled --log none --timestamp none --log-dir /var/log/xen/console

If it's not installed, you can do so with:

# yum install xen kernel-xen
# chkconfig xend on

Then reboot.

You'll also need to install ebtables (not currently used) and dhcp. Do this by first enabling the DAG repository, then installing with yum:

# rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
# yum install ebtables dhcp

Now, edit the dhcpd config file. Make sure it looks something like this:

# vim /etc/dhcpd.conf
# dhcpd.conf
#
# Configuration file for ISC dhcpd for workspaces


#################
## GLOBAL OPTS ##
#################

# Option definitions common or default to all supported networks

# Keep this:
ddns-update-style none;

# Can be overriden in host entry:
default-lease-time 120;
max-lease-time 240;


#############
## SUBNETS ##
#############

# Make an entry like this for each supported subnet.  Otherwise, the DHCP
# daemon will not listen for requests on the interface of that subnet.

subnet 172.21.0.0 netmask 255.255.0.0 {
}

### DO NOT EDIT BELOW, the following entries are added and 
### removed programmatically.

### DHCP-CONFIG-AUTOMATIC-BEGINS ###


Setting Up Control Agents

The Nimbus Control Agents are the binaries on the worker node that act on behalf of the head node. They need to be installed on each worker node.

If you've already set up the control agents on one node, you shouldn't need to do the following steps on the other nodes. Just make sure the install directory is NFS mounted.

First, make sure we have the install directory:

# ls /opt/nimbus
/opt/nimbus

Now do the install:

# wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz
# tar xzf nimbus-controls-TP2.2.tar.gz
# cd nimbus-controls-TP2.2/workspace-control
# cp worksp.conf.example /opt/nimbus/worksp.conf
# python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus

The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Adding Node to Nimbus Config

This should be done after you've already installed Nimbus on the head node. If you haven't done that yet, come back to this section.

Edit $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools/canfardevpool to add the new node. Your file should look something like this:

#Some comments up here
gildor 3072
guilin 3072

Your worker node should now be ready!

Now run the auto-configuration program. Following is a transcript of running this program on canfardev:

$ $GLOBUS_LOCATION/share/nimbus-autoconfig/autoconfig.sh

# ------------------------- #
# Nimbus auto-configuration #
# ------------------------- #

Using GLOBUS_LOCATION: /usr/local/nimbus

Is the current account (nimbus) the one the container will run under? y/n:
y

Pick a VMM to test with, enter a hostname: 
gildor

----------

How much RAM (MB) should be allocated for a test VM on the 'gildor' VMM?
256

Will allocate 256 MB RAM for test VM on the 'gildor' VMM.

----------

Is the current account (nimbus) also the account the privileged scripts will run under on the VMM (gildor)? y/n:
y

Does the container account (nimbus) need a special (non-default) SSH key to access the 'nimbus' account on the VMM nodes? y/n:
n

----------

Testing basic SSH access to nimbus@gildor

Test command (1): ssh -T -n -o BatchMode=yes nimbus@gildor /bin/true

Basic SSH test (1) working to nimbus@gildor

----------

Now we'll set up the *hostname* that VMMs will use to contact the container over SSHd

Even if you plan on ever setting up just one VMM and it is localhost to the container, you should still pick a hostname here ('localhost' if you must)

*** It looks like you have a hostname set up: canfardev.dao.nrc.ca

Would you like to manually enter a different hostname? y/n:
n

Using hostname: canfardev.dao.nrc.ca

----------

Is your local SSHd server on a port different than 22?  Enter 'n' or a port number: 
n

Attempting to connect to: canfardev.dao.nrc.ca:22

Contacted a server @ canfardev.dao.nrc.ca:22

----------

Now we will test the basic SSH notification conduit from the VMM to the container

Test command (2): ssh -T -n -o BatchMode=yes nimbus@gildor ssh -T -n -o BatchMode=yes -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Notification test (2) working (ssh from nimbus@gildor to nimbus@canfardev.dao.nrc.ca at port 22)

----------

OK, looking good.

---------------------------------------------------------------------
---------------------------------------------------------------------

If you have not followed the instructions for setting up workspace control yet, please do the basic installation steps now.

Look for the documentation at:
  - http://workspace.globus.org/vm/TP2.2/admin/quickstart.html#part-III

----------

A sample installation command set can be provided for you here if you supply a group name.  Group privileges are used for some configurations and programs.

What is a privileged unix group of nimbus on gildor?  Or type 'n' to skip this step.
nimbus

----------

*** Sample workspace-control installation commands:

    ssh root@gildor
        ^^^^ YOU NEED TO BE ROOT

    wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz
    tar xzf nimbus-controls-TP2.2.tar.gz
    cd nimbus-controls-TP2.2

    mkdir -p /opt/nimbus
    cp worksp.conf.example /opt/nimbus/worksp.conf
    python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus


*** (see 'python install.py -h' for other options, including non-interactive installation)

----------

Waiting for you to install workspace control for the account 'nimbus' on the test VMM 'gildor'

After this is accomplished, press return to continue.

----------

Going to test container access to workspace control installation.

On 'gildor', did you install workspace-control somewhere else besides '/opt/nimbus/bin/workspace-control'? y/n:
n

Test command (3): ssh -T -n -o BatchMode=yes nimbus@gildor /opt/nimbus/bin/workspace-control -h 1>/dev/null

Workspace control test (3) working

----------

Testing ability to push files to workspace control installation.

We are looking for the directory on the VMM to push customization files from the container node. This defaults to '/opt/workspace/tmp'

Did you install workspace-control under some other base directory besides /opt/workspace? y/n: 
n
Test command (4): scp -o BatchMode=yes /usr/local/nimbus/share/nimbus-autoconfig/lib/transfer-test-file.txt nimbus@gildor:/opt/workspace/tmp/

transfer-test-file.txt                                         100%   73     0.1KB/s   00:00    

SCP test (4) working

----------

Great.

---------------------------------------------------------------------
---------------------------------------------------------------------

Now you will choose a test network address to give to an incoming VM.

Does the test VMM (gildor) have an IP address on the same subnet that VMs will be assigned network addresses from? y/n:
y
----------

What is a free IP address on the same subnet as 'gildor' (whose IP address is '172.21.1.197')
172.21.1.200 

----------

Even if it does not resolve, a hostname is required for '172.21.1.200' to include in the DHCP lease the VM will get:
canfardevtest

----------

What is the default gateway for 172.21.1.200? (guessing it is 172.21.1.1)
You can type 'none' if you are sure you don't want the VM to have a gateway

Please enter a gateway IP address or type 'none'.

What is the default gateway for 172.21.1.200? (guessing it is 172.21.1.1)
You can type 'none' if you are sure you don't want the VM to have a gateway
172.21.1.1

----------

What is the IP address of the DNS server that should be used by the VM? (guessing it is 172.21.1.34)
You can type 'none' if you are sure you don't want the VM to have DNS
172.21.1.34
----------

OK, in the 'make adjustments' step that follows, the service will be configured to provide this ONE network address to ONE guest VM.

You should add more VMMs and more available network addresses to assign guest VMs only after you successfully test with one VMM and one network address.

----------

*** Changes to your configuration are about to be executed.

So far, no configurations have been changed.  The following adjustments will be made based on the questions and tests we just went through:

 - The GLOBUS_LOCATION in use: /usr/local/nimbus
 - The account running the container/service: nimbus
 - The hostname running the container/service: canfardev.dao.nrc.ca
 - The contact address of the container/service for notifications: nimbus@canfardev.dao.nrc.ca (port 22)

 - The test VMM: gildor
 - The available RAM on that VMM: 256
 - The privileged account on the VMM: nimbus

 - The workspace-control path on VMM: /opt/workspace/bin/workspace-control
 - The workspace-control tmpdir on VMM: /opt/workspace/tmp

 - Test network address IP: 172.21.1.200
 - Test network address hostname: canfardevtest
 - Test network address gateway: 172.21.1.1
 - Test network address DNS: 172.21.1.34

----------


These settings are now stored in '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-decisions.sh'

If you type 'y', that script will be run for you with the settings.

Or you can answer 'n' to the next question and adjust this file.
And then manually run '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-adjustments.sh' at your leisure.


OK, point of no return.  Proceed? y/n
y

*** Running /usr/local/nimbus/share/nimbus-autoconfig/autoconfig-adjustments.sh . . .

# ------------------------------------------- #
# Nimbus auto-configuration: make adjustments #
# ------------------------------------------- #

Read settings from '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-decisions.sh'

----------

[*] The 'service.sshd.contact.string' configuration was:
    ... set to 'nimbus@canfardev.dao.nrc.ca:22'
    ... (it used to be set to 'REPLACE_WITH_SERVICE_NODE_HOSTNAME:22')
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'

----------

[*] The 'control.ssh.user' configuration was:
    ... set to 'nimbus'
    ... (it used to be set blank)
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'

----------

[*] The 'use.identity' configuration does not need to be changed.
    ... already set to be blank
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'

----------

[*] The 'control.path' configuration does not need to be changed.
    ... already set to '/opt/workspace/bin/workspace-control'
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/vmm.conf'

----------

[*] The 'control.tmp.dir' configuration does not need to be changed.
    ... already set to '/opt/workspace/tmp'
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/vmm.conf'

----------

[*] Backing up old resource pool settings
    ... created new directory '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/.backups/old-pools-01'
    ... moved 'pool1' to '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/.backups/old-pools-01'

----------

[*] Creating new resource pool
    ... created '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/testpool'

----------

[*] Backing up old network settings
    ... created new directory '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'
    ... moved 'private' to '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'
    ... moved 'public' to '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'

----------

[*] Creating new network called 'public'
    ... created '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/public'

----------

NOTE: you need to MATCH this network in the workspace-control configuration file.
This configuration file is at '/opt/workspace/worksp.conf' by default

For example, you might have this line:

association_0: public; xenbr0; vif0.1 ; none; 172.21.1.200/24

    ... "public" is the name of the network we chose.
    ... "xenbr0" is the name of the bridge to put VMs in this network on.
    ... "vif0.1" is the interface where the DHCP server is listening in dom0 on the VMM
    ... and the network address range serves as a sanity check (you can disable that check in the conf file)

----------

Making sure 'fake mode' is off:

[*] The 'fake.mode' configuration was:
    ... set to 'false'
    ... (it used to be set to 'true')
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/other/common.conf'

----------

Finished.

See 'NOTE' above.

Great. That seemed to work okay.

Let's carry on with the configuration.

Nimbus Configuration

First we need to tell Nimbus which machines we can boot virtual machines on. To do this, we need to edit this Nimbus frontend configuration files. These are in $GLOBUS_LOCATION/etc/nimbus . Let's define the machines we can boot on:

$ vim /usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/testpool
NOTE: a node may not be in more than one pool at the same time, this will
#       result in an initialization error

# Supported form:
# node_name  memory_to_manage networks_supported
#
# If third field is blank (or marked with '*'), it is assumed that pool
# node supports all networks available to remote clients.  Otherwise use a comma
# separated list (no spaces between).
#
# Note that if you list a network here that is not valid at runtime,
# it will silently be ignored (check your spelling).


# File contents injected @ Mon Jul 20 11:57:41 PDT 2009
gildor 1024

For now, we only have one machine in our pool, and it has 1024MB free ram with which to boot VMs.

Now we need to set up networking. To do this, we need to create a pool of network addresses we can assign to machines we boot on the cluster. Since we're going to start with only provate networking, we will create a private pool. We define this as a text file in $GLOBUS_LOCATION/etc/nimbus/workspace-service/network-pools

This file contains a DNS server for these machines, as well as a list of ip addresses, hostnames, and other networking details. We will set this file up for two addresses now.

$ cat $GLOBUS_LOCATION/etc/nimbus/workspace-service/network-pools/private
# DNS server IP or 'none'
172.21.1.34

# hostname ipaddress gateway broadcast subnetmask
canfardev00 172.21.1.200 172.21.1.1 none none
canfardev01 172.21.1.201 172.21.1.1 none none

Now we need to set up an equivalent networking association in the worksp.conf file on the worker nodes. You need to associate each network pool with a virtual interface on each worker node.

From worksp.conf on gildor:

association_0: private; xenbr0; vif0.0 ; none; 172.21.1.0/24

Now finally, point Nimbus to the grid-mapfile we created earlier:

$ vim $GLOBUS_LOCATION/etc/nimbus/factory-security-config.xml
<securityConfig xmlns="http://www.globus.org">
    <auth-method>
        <GSITransport/>
        <GSISecureMessage/>
        <GSISecureConversation/>
    </auth-method>
    <authz value="gridmap"/>
    <gridmap value="share/grid-mapfile"/>
</securityConfig>

Initial Test

Start up your container with globus-start-container. You should see the following new services:

https://10.20.0.1:8443/wsrf/services/ElasticNimbusService
https://10.20.0.1:8443/wsrf/services/WorkspaceContextBroker
https://10.20.0.1:8443/wsrf/services/WorkspaceEnsembleService
https://10.20.0.1:8443/wsrf/services/WorkspaceFactoryService
https://10.20.0.1:8443/wsrf/services/WorkspaceGroupService
https://10.20.0.1:8443/wsrf/services/WorkspaceService
https://10.20.0.1:8443/wsrf/services/WorkspaceStatusService

Now we'll run a test script. Let's try.

$ wget http://workspace.globus.org/vm/TP2.2/admin/test-create.sh
$ grid-proxy-init
$ sh ./test-create.sh

Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "/usr/local/nimbus/share/nimbus-clients/sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"...
Problem: General error: org.globus.wsrf.impl.security.authorization.exceptions.AuthorizationException: "/C=CA/O=Grid/OU=phys.uvic.ca/CN=Patrick Armstrong" is not authorized to use operation: {http://www.globus.org/2008/06/workspace}create on this service

Whoops! I need to add myself to the grid-mapfile:

$ echo '"/C=CA/O=Grid/OU=phys.uvic.ca/CN=Patrick Armstrong" nimbus' >> $GLOBUS_LOCATION/share/grid-mapfile
$ sh ./test-create.sh

Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "/usr/local/nimbus/share/nimbus-clients/sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"...
Problem: Resource request denied: Error creating workspace(s): 'public' is not a valid network name

Oh, whoops again. It looks like our test script wants to use public networking, and we don't have that set up.

$ cp $GLOBUS_LOCATION/share/nimbus-clients/sample-workspace.xml sample-workspace.xml
$ vim sample-workspace.xml (change public to private)
$ vim test-create.sh (change $GLOBUS_LOCATION/share/nimbus-clients/sample-workspace.xml to sample-workspace.xml)
$ sh test-create
...
Invalid:
--------
  - fatal, image '/opt/workspace/images/ttylinux-xen' does not exist on the filesystem
  - IMG #1 is invalid
  - no valid partitions/HD images
  - fatal, number of mountpoints (1) does not match number of valid partitions/HD images (0)
  - fatal, image '/opt/workspace/images/vmlinuz-2.6-xen' does not exist on the filesystem
  - fatal, no images configured
  - failure is triggered, backing out any networking reservations

for help use --help
"http://example1/localhost/image": Corrupted, calling destroy for you.
"http://example1/localhost/image" was terminated.

Whoops! Looks like we need to put the ttylinux files into the images directory on the worker node:

gildor # cd /opt/workspace/images
gildor # wget wget http://workspace.globus.org/downloads/ttylinux-xen.tgz
gildor # tar xzvf ttylinux-xen.tgz 
ttylinux-xen
ttylinux-xen.conf
gildor # rm -Rf ttylinux-xen.tgz
gildor # cp /boot/vmlinuz-2.6.18-128.1.1.el5xen vmlinuz-2.6-xen
Try again:

$ sh test-create.sh
Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"... done.



Workspace created: id 6
eth0
      Association: private
       IP address: 172.21.1.200
         Hostname: canfardev00
          Gateway: 172.21.1.1

       Start time: Mon Jul 20 13:53:39 PDT 2009
         Duration: 30 minutes.
    Shutdown time: Mon Jul 20 14:23:39 PDT 2009
 Termination time: Mon Jul 20 14:33:39 PDT 2009

Wrote EPR to "test.epr"


Waiting for updates.

"http://example1/localhost/image" state change: Unstaged --> Propagated
"http://example1/localhost/image" state change: Propagated --> Running

Oh it worked! Neat. Now let's kill the VM.

$ workspace --destroy -e test.epr 

Destroying workspace 6 @ "https://204.174.103.121:8443/wsrf/services/WorkspaceService"... destroyed.

Great! Now we're done! Other things to do now are add machines to the list of vmm-pools and network-pools.

Troubleshooting

If you encounter dhcp problems. Check /etc/dhcpd.conf on the worker nodes and make sure you are listening on the correct subnet(s).

If you encounter an ebtables problem. You can try a patched version of ebtables. See This page for details.

-- PatrickArmstrong - 16 Jul 2009

-- PatrickArmstrong - 2009-09-04


This topic: HEPrc > ColinLeavettBrown > WiP1
Topic revision: r5 - 2010-02-16 - crlb
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback