-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Original document written by PatrickArmstrong. Modified for initial implementation of the split configuration on the elephant cluster. In this configuration, there are eleven nodes, with the head node residing on elephant01. During initial deployment the first three nodes will be used to develop cluster management, while the remaining nodes, elephant04 to elehant11 will be used for an interim cloud cluster, with elephant11 acting as its head node.

Overview

  1. Create privileged "nimbus" user on all cluster nodes.
  2. Switch to nimbus user for the remainder of the installation and create public/private keys.
  3. Download all required packages.
  4. Switch to Nimbus head node and install prerequisites
  5. Install Globus ws-core Web Services container.
  6. Install required X509 grid certificates.
  7. Test the Web Services container.
  8. Automate the Web Services start/stop.
  9. Install Nimbus.
  10. Setting Up Worker Nodes.

Step 1: Create privileged "nimbus" user on all cluster nodes.

Create the nimbus account on elephant head node (elephant01) with the required sudo privileges.

[crlb@elephant01 ~]$ sudo adduser nimbus
[crlb@elephant01 ~]$ sudo visudo

Comment out the requiretty directive:

#Defaults    requiretty

Allow any command with a password:

nimbus  ALL=(ALL)       ALL

And the following commands with no password:

nimbus  ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/mount-alter.sh
nimbus  ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/dhcp-config.sh
nimbus  ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/xen-ebtables-config.sh

Save changes and propagate to every node in the cluster:

[crlb@elephant01 ~]$ sudo /usr/local/sbin/usync

Step 2: Switch to nimbus user for the remainder of the installation and create public/private keys.

[crlb@elephant01 ~]$ sudo su - nimbus
Password: ********
[nimbus@elephant01 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Created directory '/home/nimbus/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
e4:43:60:84:0c:ea:dc:02:dd:4b:93:fd:f4:e4:38:e8 nimbus@elephant01.heprc.uvic.ca
[nimbus@elephant01 ~]$ cp .ssh/id_rsa.pub .ssh/authorized_keys
[nimbus@elephant01 ~]$ chmod 600 .ssh/authorized_keys
[nimbus@elephant01 ~]$ 

Step 3: Download all required packages.

[nimbus@elephant01 ~]$ mkdir -p Downloads/nimbus-2.3
[nimbus@elephant01 ~]$ cd Downloads/nimbus-2.3
[nimbus@elephant01 nimbus-2.3]$ wget http://www.nimbusproject.org/downloads/nimbus-2.3.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://www.nimbusproject.org/downloads/nimbus-controls-2.3.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.8/ws-core/bin/ws-core-4.0.8-bin.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://mirror.csclub.uwaterloo.ca/apache/ant/binaries/apache-ant-1.8.0-bin.tar.bz2
[nimbus@elephant01 nimbus-2.3]$ wget http://www.gridcanada.ca/ca/bffbd7d0.0

Step 4: Switch to Nimbus head node and install prerequisites.

[nimbus@elephant01 ~]$ ssh e11
The authenticity of host 'e11 (10.200.200.11)' can't be established.
RSA key fingerprint is 7c:92:13:5d:35:59:dd:ca:2e:bd:95:b4:97:ed:f0:97.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'e11,10.200.200.11' (RSA) to the list of known hosts.
Last login: Tue Feb 16 09:52:21 2010 from elephant01.admin
[nimbus@elephant11 ~]$ 

Install java-1.6.0-sun-compat:

[nimbus@elephant11 ~]$ sudo yum -y install java-1.6.0-sun-compat

Install Apache Ant

[nimbus@elephant11 ~]$ cd /usr/local
[nimbus@elephant11 local]$ sudo tar -xjvf ~nimbus/Downloads/apache-ant-1.8.0-bin.tar.bz2

Create a home for nimbus/globus ws-core

[nimbus@elephant11 local]$ sudo mkdir nimbus-2.3
[nimbus@elephant11 local]$ sudo chown nimbus.nimbus nimbus-2.3
[nimbus@elephant11 local]$ sudo ln -s nimbus-2.3 nimbus

and a home for nimbus worker node control software

[nimbus@elephant11 local]$ cd /opt
[nimbus@elephant11 opt]$ sudo mkdir nimbus-2.3
[nimbus@elephant11 opt]$ sudo chown nimbus.nimbus imbus-2.3
[nimbus@elephant11 opt]$ sudo ln -s nimbus-2.3 nimbus

Step 5: Install Globus ws-core Web Services container.

The frontend tools (globus ws-core and nimbus) will be Installed to /usr/local/nimbus:

[nimbus@elephant11 ~]$ cd /usr/local/nimbus
[nimbus@elephant11 nimbus]$ tar -xzf ~/Downloads/ws-core-4.0.8-bin.tar.gz
[nimbus@elephant11 nimbus]$ mv ws-core-4.0.8/* .
[nimbus@elephant11 nimbus]$ rmdir ws-core-4.0.8

Set environment variables. Example assumes bash as the nimbus user's shell:

[nimbus@elephant11 nimbus]$ cd
[nimbus@elephant11 ~]$  echo "export GLOBUS_LOCATION=/usr/local/nimbus" >> .bashrc
[nimbus@elephant11 ~]$  echo "export X509_CERT_DIR=/usr/local/nimbus/share/certificates" >> .bashrc
[nimbus@elephant11 ~]$  echo "export PATH=$PATH:/usr/local/apache-ant-1.8.0/bin" >> .bashrc
[nimbus@elephant11 ~]$  . .bashrc

Create an empty grid-mapfile to contain the certificate subjects of the users of the cloud-enabled cluster.

[nimbus@elephant11 nimbus]$  touch $GLOBUS_LOCATION/share/grid-mapfile

Step 6: Install required X509 grid certificates.

Make our certificates directory and put the grid canada root certificates in there.

[nimbus@elephant11 ~]$ mkdir -p $X509_CERT_DIR
[nimbus@elephant11 ~]$ mv Downloads/nimbus-2.3/bffbd7d0.0 $X509_CERT_DIR/

Then create a host certificate request to send to our CA.

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/grid-cert-request -int -host `hostname -f` -dir $X509_CERT_DIR -caEmail ca@gridcanada.ca -force
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
Enter organization DN by entering individual component names and their values.
The component name can be one of: [givenname, surname, ou, uid, cn, initials, unstructuredname, t, unstructuredaddress, emailaddress, o, st, l, generation, sn, e, c, dc]
-----
Enter name component: C
Enter 'C' value: CA
Enter name component: O
Enter 'O' value: Grid
Enter name component: 
Generating a 1024 bit RSA private key
A private key and a certificate request has been generated with the subject:

/C=CA/O=Grid/CN=host/canfardev.dao.nrc.ca

The private key is stored in /usr/local/nimbus/share/certificates/hostkey.pem
The request is stored in /usr/local/nimbus/share/certificates/hostcert_request.pem

Now mail this request file ($X509_CERT_DIR/hostcert_request.pem) to ca@gridcanada.ca . It might take a day or so before you get your certificate back.

Once you have your key, paste the contents into $X509_CERT_DIR/hostcert.pem

Now that we have our certificate, we have to point our container to our key and certificate and to our empty grid-mapfile. To do so, edit $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml to point to your new certificates and modify the gridmap value:

[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml 
<?xml version="1.0" encoding="UTF-8"?>
<securityConfig xmlns="http://www.globus.org">
    <credential>
        <key-file value="/usr/local/nimbus/share/certificates/hostkey.pem"/>
        <cert-file value="/usr/local/nimbus/share/certificates/hostcert.pem"/>
    </credential>
    <gridmap value="/usr/local/nimbus/share/grid-mapfile"/>
</securityConfig>

Activate the security configuration by adding a element under the CONTAINER_SECURITY_DESCRIPTOR:

[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
<!-- @CONTAINER_SECURITY_DESCRIPTOR@ -->
<parameter name="containerSecDesc"
              value="etc/globus_wsrf_core/global_security_descriptor.xml"/>

Step 7: Test the Web Services container.

Now that we've set up security, we can try starting our container for the first time. To do so, run globus-start-container. You should see something like the following:

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-container
Starting SOAP server at: https://204.174.103.121:8443/wsrf/services/ 
With the following services:

[1]: https://204.174.103.121:8443/wsrf/services/AdminService
[2]: https://204.174.103.121:8443/wsrf/services/AuthzCalloutTestService
[3]: https://204.174.103.121:8443/wsrf/services/ContainerRegistryEntryService
...
[25]: https://204.174.103.121:8443/wsrf/services/gsi/AuthenticationService

If you do, hit control-c. Congratulations! Your container is working.

If you get the following error

org.globus.common.ChainedIOException: Authentication failed [Caused by: Failure unspecified at GSS-API level [Caused by: Unknown CA]]
You are probably missing the Grid Canada .0 file(bffbd7d0.0 in this case). Either copy the file from another globus machine's X509_CERT_DIR or download the GC CA Bundle from the GC Certificate Authority website. and put the bffbd7d0.0 file into the X509_CERT_DIR and try starting the container again.

Step 8: Automate the Web Services start/stop.

Now that we know our container works, we can create a script to run our container at login. Paste the following script into $GLOBUS_LOCATION/bin/globus-start-stop:

#!/bin/sh
set -e
export GLOBUS_OPTIONS="-Xms256M -Xmx1024M -Dorg.globus.tcp.port.range=50000,51999"
export GLOBUS_TCP_PORT_RANGE=50000,51999

cd $GLOBUS_LOCATION
case "$1" in
    start)
        nohup $GLOBUS_LOCATION/bin/globus-start-container -p 8443 \
       >>$GLOBUS_LOCATION/var/container.log &
        ;;
    stop)
        $GLOBUS_LOCATION/bin/grid-proxy-init \
            -key $GLOBUS_LOCATION/share/certificates/hostkey.pem\
            -cert $GLOBUS_LOCATION/share/certificates/hostcert.pem\
            -out /tmp/shutdownproxy.pem\
            >/dev/null
        export X509_USER_PROXY=/tmp/shutdownproxy.pem
        $GLOBUS_LOCATION/bin/globus-stop-container hard
        unset X509_USER_PROXY
        rm /tmp/shutdownproxy.pem
        ;;
    restart)
        $GLOBUS_LOCATION/start-stop stop
        $GLOBUS_LOCATION/start-stop start
        ;;
    *)
        echo "Usage: globus {start|stop}" >&2
        exit 1
       ;;
esac
exit 0

Mark it as executable:

[nimbus@elephant11 ~]
$ chmod 744 $GLOBUS_LOCATION/bin/globus-start-stop

We can now try starting and stopping the container with this script, and see if we're listening on 8443:

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop start
[nimbus@elephant11 ~]$ netstat -an | grep 8443
tcp        0      0 0.0.0.0:8443                0.0.0.0:*                   LISTEN     

Great! Now we have a running container. Let's stop it before we carry on with our installation.

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop stop

Step 9: Install Nimbus

Unpack the nimbus package into a temporary directory and run the install script. When the install script completes, Nimbus will have been installed in $GLOBUS_LOCATION, and the temporary files can be removed.

[nimbus@elephant11 ~]$ cd /tmp
[nimbus@elephant11 tmp]$ tar -xvf ~/Downloads/nimbus-2.3.tar.gz
[nimbus@elephant11 tmp]$ /tmp/nimbus-2.3/bin/all-build-and-install.sh
[nimbus@elephant11 tmp]$ rm -rf /tmp/nimbus-2.3/
[nimbus@elephant11 tmp]$ rmdir hsperfdata_nimbus
[nimbus@elephant11 tmp]$ cd
[nimbus@elephant11 ~]$ 

Step 10: Setting Up Worker Nodes

Verify password-less access to worker nodes:

[nimbus@elephant11 ~]$ ssh e10
The authenticity of host 'e10 (10.200.200.10)' can't be established.
RSA key fingerprint is 81:ba:c1:49:1f:a5:22:30:60:a6:b8:ba:19:0b:38:2c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'e10,10.200.200.10' (RSA) to the list of known hosts.
Last login: Tue Feb 16 13:55:22 2010 from elephant11.admin
[nimbus@elephant10 ~]$ ssh e11
Last login: Tue Feb 16 13:54:41 2010 from elephant01.admin
[nimbus@elephant11 ~]$ 
nimbus@canfardev $

Ensure Xen is installed and running:

[nimbus@elephant11 ~]$ sudo xm list
Name                                        ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      23690         16 r-----   1410.6
[nimbus@elephant11 ~]$

If Xen is not installed:

[nimbus@elephant11 ~]$ sudo yum install xen kernel-xen
[nimbus@elephant11 ~]$ sudo chkconfig xend on
[nimbus@elephant11 ~]$ sudo shutdown -r now

Install ebtables and dhcp. Do this by first enabling the DAG repository and installing with yum:

[nimbus@elephant11 ~]$ sudo rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-.3.6-1.el5.rf.i386.rpm
[nimbus@elephant11 ~]$ sudo yum install ebtables dhcp

Install the back-end tools (Nimbus control agents) in /opt/nimbus:

[nimbus@elephant11 ~]$ cd /opt/nimbus
[nimbus@elephant11 nimbus]$ tar -xzvf ~/Downloads/nimbus-2.3/nimbus-controls-2.3.tar.gz
[nimbus@elephant11 nimbus]$ mv ./nimbus-controls-2.3/workspace-control/* .

Configure DHCP:

[nimbus@elephant11 nimbus]$ sudo cp ./share/workspace-control/dhcpd.conf.example /etc/dhcpd.conf
[nimbus@elephant11 nimbus]$ sudo vi /etc/dhcpd.conf

defining one subnet as:

subnet 10.200.200.0 netmask 255.255.255.0 {
}

The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Adding Node to Nimbus Config

This should be done after you've already installed Nimbus on the head node. If you haven't done that yet, come back to this section.

Edit $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools/canfardevpool to add the new node. Your file should look something like this:

#Some comments up here
gildor 3072
guilin 3072

Your worker node should now be ready!

Now run the auto-configuration program. Following is a transcript of running this program on canfardev:

$ $GLOBUS_LOCATION/share/nimbus-autoconfig/autoconfig.sh

# ------------------------- #
# Nimbus auto-configuration #
# ------------------------- #

Using GLOBUS_LOCATION: /usr/local/nimbus

Is the current account (nimbus) the one the container will run under? y/n:
y

Pick a VMM to test with, enter a hostname: 
gildor

----------

How much RAM (MB) should be allocated for a test VM on the 'gildor' VMM?
256

Will allocate 256 MB RAM for test VM on the 'gildor' VMM.

----------

Is the current account (nimbus) also the account the privileged scripts will run under on the VMM (gildor)? y/n:
y

Does the container account (nimbus) need a special (non-default) SSH key to access the 'nimbus' account on the VMM nodes? y/n:
n

----------

Testing basic SSH access to nimbus@gildor

Test command (1): ssh -T -n -o BatchMode=yes nimbus@gildor /bin/true

Basic SSH test (1) working to nimbus@gildor

----------

Now we'll set up the *hostname* that VMMs will use to contact the container over SSHd

Even if you plan on ever setting up just one VMM and it is localhost to the container, you should still pick a hostname here ('localhost' if you must)

*** It looks like you have a hostname set up: canfardev.dao.nrc.ca

Would you like to manually enter a different hostname? y/n:
n

Using hostname: canfardev.dao.nrc.ca

----------

Is your local SSHd server on a port different than 22?  Enter 'n' or a port number: 
n

Attempting to connect to: canfardev.dao.nrc.ca:22

Contacted a server @ canfardev.dao.nrc.ca:22

----------

Now we will test the basic SSH notification conduit from the VMM to the container

Test command (2): ssh -T -n -o BatchMode=yes nimbus@gildor ssh -T -n -o BatchMode=yes -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Notification test (2) working (ssh from nimbus@gildor to nimbus@canfardev.dao.nrc.ca at port 22)

----------

OK, looking good.

---------------------------------------------------------------------
---------------------------------------------------------------------

If you have not followed the instructions for setting up workspace control yet, please do the basic installation steps now.

Look for the documentation at:
  - http://workspace.globus.org/vm/TP2.2/admin/quickstart.html#part-III

----------

A sample installation command set can be provided for you here if you supply a group name.  Group privileges are used for some configurations and programs.

What is a privileged unix group of nimbus on gildor?  Or type 'n' to skip this step.
nimbus

----------

*** Sample workspace-control installation commands:

    ssh root@gildor
        ^^^^ YOU NEED TO BE ROOT

    wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz
    tar xzf nimbus-controls-TP2.2.tar.gz
    cd nimbus-controls-TP2.2

    mkdir -p /opt/nimbus
    cp worksp.conf.example /opt/nimbus/worksp.conf
    python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus


*** (see 'python install.py -h' for other options, including non-interactive installation)

----------

Waiting for you to install workspace control for the account 'nimbus' on the test VMM 'gildor'

After this is accomplished, press return to continue.

----------

Going to test container access to workspace control installation.

On 'gildor', did you install workspace-control somewhere else besides '/opt/nimbus/bin/workspace-control'? y/n:
n

Test command (3): ssh -T -n -o BatchMode=yes nimbus@gildor /opt/nimbus/bin/workspace-control -h 1>/dev/null

Workspace control test (3) working

----------

Testing ability to push files to workspace control installation.

We are looking for the directory on the VMM to push customization files from the container node. This defaults to '/opt/workspace/tmp'

Did you install workspace-control under some other base directory besides /opt/workspace? y/n: 
n
Test command (4): scp -o BatchMode=yes /usr/local/nimbus/share/nimbus-autoconfig/lib/transfer-test-file.txt nimbus@gildor:/opt/workspace/tmp/

transfer-test-file.txt                                         100%   73     0.1KB/s   00:00    

SCP test (4) working

----------

Great.

---------------------------------------------------------------------
---------------------------------------------------------------------

Now you will choose a test network address to give to an incoming VM.

Does the test VMM (gildor) have an IP address on the same subnet that VMs will be assigned network addresses from? y/n:
y
----------

What is a free IP address on the same subnet as 'gildor' (whose IP address is '172.21.1.197')
172.21.1.200 

----------

Even if it does not resolve, a hostname is required for '172.21.1.200' to include in the DHCP lease the VM will get:
canfardevtest

----------

What is the default gateway for 172.21.1.200? (guessing it is 172.21.1.1)
You can type 'none' if you are sure you don't want the VM to have a gateway

Please enter a gateway IP address or type 'none'.

What is the default gateway for 172.21.1.200? (guessing it is 172.21.1.1)
You can type 'none' if you are sure you don't want the VM to have a gateway
172.21.1.1

----------

What is the IP address of the DNS server that should be used by the VM? (guessing it is 172.21.1.34)
You can type 'none' if you are sure you don't want the VM to have DNS
172.21.1.34
----------

OK, in the 'make adjustments' step that follows, the service will be configured to provide this ONE network address to ONE guest VM.

You should add more VMMs and more available network addresses to assign guest VMs only after you successfully test with one VMM and one network address.

----------

*** Changes to your configuration are about to be executed.

So far, no configurations have been changed.  The following adjustments will be made based on the questions and tests we just went through:

 - The GLOBUS_LOCATION in use: /usr/local/nimbus
 - The account running the container/service: nimbus
 - The hostname running the container/service: canfardev.dao.nrc.ca
 - The contact address of the container/service for notifications: nimbus@canfardev.dao.nrc.ca (port 22)

 - The test VMM: gildor
 - The available RAM on that VMM: 256
 - The privileged account on the VMM: nimbus

 - The workspace-control path on VMM: /opt/workspace/bin/workspace-control
 - The workspace-control tmpdir on VMM: /opt/workspace/tmp

 - Test network address IP: 172.21.1.200
 - Test network address hostname: canfardevtest
 - Test network address gateway: 172.21.1.1
 - Test network address DNS: 172.21.1.34

----------


These settings are now stored in '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-decisions.sh'

If you type 'y', that script will be run for you with the settings.

Or you can answer 'n' to the next question and adjust this file.
And then manually run '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-adjustments.sh' at your leisure.


OK, point of no return.  Proceed? y/n
y

*** Running /usr/local/nimbus/share/nimbus-autoconfig/autoconfig-adjustments.sh . . .

# ------------------------------------------- #
# Nimbus auto-configuration: make adjustments #
# ------------------------------------------- #

Read settings from '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-decisions.sh'

----------

[*] The 'service.sshd.contact.string' configuration was:
    ... set to 'nimbus@canfardev.dao.nrc.ca:22'
    ... (it used to be set to 'REPLACE_WITH_SERVICE_NODE_HOSTNAME:22')
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'

----------

[*] The 'control.ssh.user' configuration was:
    ... set to 'nimbus'
    ... (it used to be set blank)
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'

----------

[*] The 'use.identity' configuration does not need to be changed.
    ... already set to be blank
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'

----------

[*] The 'control.path' configuration does not need to be changed.
    ... already set to '/opt/workspace/bin/workspace-control'
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/vmm.conf'

----------

[*] The 'control.tmp.dir' configuration does not need to be changed.
    ... already set to '/opt/workspace/tmp'
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/vmm.conf'

----------

[*] Backing up old resource pool settings
    ... created new directory '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/.backups/old-pools-01'
    ... moved 'pool1' to '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/.backups/old-pools-01'

----------

[*] Creating new resource pool
    ... created '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/testpool'

----------

[*] Backing up old network settings
    ... created new directory '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'
    ... moved 'private' to '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'
    ... moved 'public' to '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'

----------

[*] Creating new network called 'public'
    ... created '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/public'

----------

NOTE: you need to MATCH this network in the workspace-control configuration file.
This configuration file is at '/opt/workspace/worksp.conf' by default

For example, you might have this line:

association_0: public; xenbr0; vif0.1 ; none; 172.21.1.200/24

    ... "public" is the name of the network we chose.
    ... "xenbr0" is the name of the bridge to put VMs in this network on.
    ... "vif0.1" is the interface where the DHCP server is listening in dom0 on the VMM
    ... and the network address range serves as a sanity check (you can disable that check in the conf file)

----------

Making sure 'fake mode' is off:

[*] The 'fake.mode' configuration was:
    ... set to 'false'
    ... (it used to be set to 'true')
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/other/common.conf'

----------

Finished.

See 'NOTE' above.

Great. That seemed to work okay.

Let's carry on with the configuration.

Nimbus Configuration

First we need to tell Nimbus which machines we can boot virtual machines on. To do this, we need to edit this Nimbus frontend configuration files. These are in $GLOBUS_LOCATION/etc/nimbus . Let's define the machines we can boot on:

$ vim /usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/testpool
NOTE: a node may not be in more than one pool at the same time, this will
#       result in an initialization error

# Supported form:
# node_name  memory_to_manage networks_supported
#
# If third field is blank (or marked with '*'), it is assumed that pool
# node supports all networks available to remote clients.  Otherwise use a comma
# separated list (no spaces between).
#
# Note that if you list a network here that is not valid at runtime,
# it will silently be ignored (check your spelling).


# File contents injected @ Mon Jul 20 11:57:41 PDT 2009
gildor 1024

For now, we only have one machine in our pool, and it has 1024MB free ram with which to boot VMs.

Now we need to set up networking. To do this, we need to create a pool of network addresses we can assign to machines we boot on the cluster. Since we're going to start with only provate networking, we will create a private pool. We define this as a text file in $GLOBUS_LOCATION/etc/nimbus/workspace-service/network-pools

This file contains a DNS server for these machines, as well as a list of ip addresses, hostnames, and other networking details. We will set this file up for two addresses now.

$ cat $GLOBUS_LOCATION/etc/nimbus/workspace-service/network-pools/private
# DNS server IP or 'none'
172.21.1.34

# hostname ipaddress gateway broadcast subnetmask
canfardev00 172.21.1.200 172.21.1.1 none none
canfardev01 172.21.1.201 172.21.1.1 none none

Now we need to set up an equivalent networking association in the worksp.conf file on the worker nodes. You need to associate each network pool with a virtual interface on each worker node.

From worksp.conf on gildor:

association_0: private; xenbr0; vif0.0 ; none; 172.21.1.0/24

Now finally, point Nimbus to the grid-mapfile we created earlier:

$ vim $GLOBUS_LOCATION/etc/nimbus/factory-security-config.xml
<securityConfig xmlns="http://www.globus.org">
    <auth-method>
        <GSITransport/>
        <GSISecureMessage/>
        <GSISecureConversation/>
    </auth-method>
    <authz value="gridmap"/>
    <gridmap value="share/grid-mapfile"/>
</securityConfig>

Initial Test

Start up your container with globus-start-container. You should see the following new services:

https://10.20.0.1:8443/wsrf/services/ElasticNimbusService
https://10.20.0.1:8443/wsrf/services/WorkspaceContextBroker
https://10.20.0.1:8443/wsrf/services/WorkspaceEnsembleService
https://10.20.0.1:8443/wsrf/services/WorkspaceFactoryService
https://10.20.0.1:8443/wsrf/services/WorkspaceGroupService
https://10.20.0.1:8443/wsrf/services/WorkspaceService
https://10.20.0.1:8443/wsrf/services/WorkspaceStatusService

Now we'll run a test script. Let's try.

$ wget http://workspace.globus.org/vm/TP2.2/admin/test-create.sh
$ grid-proxy-init
$ sh ./test-create.sh

Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "/usr/local/nimbus/share/nimbus-clients/sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"...
Problem: General error: org.globus.wsrf.impl.security.authorization.exceptions.AuthorizationException: "/C=CA/O=Grid/OU=phys.uvic.ca/CN=Patrick Armstrong" is not authorized to use operation: {http://www.globus.org/2008/06/workspace}create on this service

Whoops! I need to add myself to the grid-mapfile:

$ echo '"/C=CA/O=Grid/OU=phys.uvic.ca/CN=Patrick Armstrong" nimbus' >> $GLOBUS_LOCATION/share/grid-mapfile
$ sh ./test-create.sh

Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "/usr/local/nimbus/share/nimbus-clients/sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"...
Problem: Resource request denied: Error creating workspace(s): 'public' is not a valid network name

Oh, whoops again. It looks like our test script wants to use public networking, and we don't have that set up.

$ cp $GLOBUS_LOCATION/share/nimbus-clients/sample-workspace.xml sample-workspace.xml
$ vim sample-workspace.xml (change public to private)
$ vim test-create.sh (change $GLOBUS_LOCATION/share/nimbus-clients/sample-workspace.xml to sample-workspace.xml)
$ sh test-create
...
Invalid:
--------
  - fatal, image '/opt/workspace/images/ttylinux-xen' does not exist on the filesystem
  - IMG #1 is invalid
  - no valid partitions/HD images
  - fatal, number of mountpoints (1) does not match number of valid partitions/HD images (0)
  - fatal, image '/opt/workspace/images/vmlinuz-2.6-xen' does not exist on the filesystem
  - fatal, no images configured
  - failure is triggered, backing out any networking reservations

for help use --help
"http://example1/localhost/image": Corrupted, calling destroy for you.
"http://example1/localhost/image" was terminated.

Whoops! Looks like we need to put the ttylinux files into the images directory on the worker node:

gildor # cd /opt/workspace/images
gildor # wget wget http://workspace.globus.org/downloads/ttylinux-xen.tgz
gildor # tar xzvf ttylinux-xen.tgz 
ttylinux-xen
ttylinux-xen.conf
gildor # rm -Rf ttylinux-xen.tgz
gildor # cp /boot/vmlinuz-2.6.18-128.1.1.el5xen vmlinuz-2.6-xen
Try again:

$ sh test-create.sh
Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"... done.



Workspace created: id 6
eth0
      Association: private
       IP address: 172.21.1.200
         Hostname: canfardev00
          Gateway: 172.21.1.1

       Start time: Mon Jul 20 13:53:39 PDT 2009
         Duration: 30 minutes.
    Shutdown time: Mon Jul 20 14:23:39 PDT 2009
 Termination time: Mon Jul 20 14:33:39 PDT 2009

Wrote EPR to "test.epr"


Waiting for updates.

"http://example1/localhost/image" state change: Unstaged --> Propagated
"http://example1/localhost/image" state change: Propagated --> Running

Oh it worked! Neat. Now let's kill the VM.

$ workspace --destroy -e test.epr 

Destroying workspace 6 @ "https://204.174.103.121:8443/wsrf/services/WorkspaceService"... destroyed.

Great! Now we're done! Other things to do now are add machines to the list of vmm-pools and network-pools.

Troubleshooting

If you encounter dhcp problems. Check /etc/dhcpd.conf on the worker nodes and make sure you are listening on the correct subnet(s).

If you encounter an ebtables problem. You can try a patched version of ebtables. See This page for details.

-- PatrickArmstrong - 16 Jul 2009

-- PatrickArmstrong - 2009-09-04

and the backend tools to /opt/nimbus. Both of these directories are owned by nimbus, and need to be created by root. Additionally, /opt/nimbus is NFS mounted on the worker nodes. This install uses a custom build of Nimbus that has features and fixes that are not in a release at this time. In the future, this install should be done from the latest release on the Nimbus website.

Edit | Attach | Watch | Print version | History: r25 | r10 < r9 < r8 < r7 | Backlinks | Raw View | More topic actions...
Topic revision: r8 - 2010-02-17 - crlb
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback