Difference: WiP1 (1 vs. 25)

Revision 252011-07-18 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-07-20

NEP-52 Lustre file system notes:

Line: 95 to 95
 
    • Lustre: 6455:0:(socklnd_cb.c:421:ksocknal_txlist_done()) Deleting packet type 1 len 368 206.12.154.1@tcp->10.200.201.4@tcp
    • Lustre: 6460:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to 12345-10.200.201.3@tcp
Changed:
<
<
To change the network (in this example we will change from bond0 (10.200.201.0/24) to eth3 (206.12.154.0/26), the following procedure should be adopted:
>
>
To change the network (in this example we will change from eth3 (206.12.154.0/26) to bond0 (10.200.201.0/24), the following procedure should be adopted:
 
  • Stop the file system (see Step 3).
Changed:
<
<
  • On e1 through e6: Update the /etc/modprob.d/lustre-lnet.conf on e1 through e5, eg. options lnet networks=tcp(eth3)
  • On e1 through e6: Request log file regeneration via the command "tunefs.lustre --writeconf /dev/vg00/lv00".
  • On e1 through e6: Update MDT/OST filesystem parameters via the command "tunefs.lustre --erase-params --mgsnode=206.12.154.1@tcp /dev/vg00/lv00"
  • On e1 through e6: Update iptables to reflect the new network configuration.
  • On e1 through e3: Unload the lnet module thereby purging the active module parameters. Issuing the "rmmod lnet" command generally results in "module in use by ....". The "rmmod" command can be issued repetitively for each of the identified modules until the "lnet" module is successfully unloaded. Alternatively, each node can be rebooted.
>
>
  • On e1 through e7: Update the /etc/modprob.d/lustre-lnet.conf, eg. options lnet networks=tcp(bond0)
  • On e1:
    • Request log file regeneration for the MDT with the command "tunefs.lustre --writeconf /dev/vg00/MDT0000".
    • Update filesystem parameters for the MDT with the command "tunefs.lustre --erase-params --mgsnode=10.200.201.1@tcp /dev/vg00/MDT0000"
  • On e2 through e7:
    • Request log file regeneration for each OST with the command "/usr/local/sbin/OSTsWriteconf".
    • Update the filesystem parameters for each OST with the command "/usr/local/sbin/OSTsMsgnode 10.200.201.1"
  • On e1 through e7: Update iptables to reflect the new network configuration.
  • On e1 through e7: Unload the lnet module thereby purging the active module parameters. Issuing the "rmmod lnet" command generally results in "module in use by ....". The "rmmod" command can be issued repetitively for each of the identified modules until the "lnet" module is successfully unloaded. Alternatively, each node can be rebooted.
 
  • Restart/remount the file system (see Step 3).

Revision 242011-01-25 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-07-20
Changed:
<
<

Lustre Procedures

>
>

NEP-52 Lustre file system notes:

 
Changed:
<
<

Content

  1. Determine which OSS is serving an OST.
  2. Temporarily deactivate an OST.
  3. Re-activating an OST.
  4. Determining which files have objects on a particular OST.
>
>

Overview

The elephant cluster was used in developing these notes. Much of the information comes from the Lustre Operations Manual.
  1. Installing Lustre on the cluster.
  2. Define the Lustre file system.
  3. Starting and stopping the Lustre file system.
  4. Changing the Lustre network.
  5. Installing a patchless client on a workstation.
  6. Determine which OSS is serving an OST.
  7. Temporarily deactivate an OST.
  8. Re-activating an OST.
  9. Determining which files have objects on a particular OST.
  10. Restoring a corrupted PV after a motherboard replacement..
 
Changed:
<
<

Proc 1: Determine which OSS is serving an OST.

Index Up Down
>
>

Index Up Down Note 1: Installing Lustre on the cluster.

The elephant cluster needs to run a Xen kernel and the Lustre file system. It was there fore necessary to obtain a kernel with both the xen and lustre patched. The source was available on the Oracle website. The instructions for building the required RPMs can be found here. When the build is complete, install the following RPMs on each of the nodes in the cluster:

  • e2fsprogs

  • kernel-lustre
  • kernel-lustre-xen
  • kernel-lustre-xen-devel

  • lustre
  • lustre-ldiskfs
  • lustre-modules

Reboot each node with one of the kernels (kernel-lustre-xen (required for Xen) or kernel-lustre) installed above. Note: we found that the single user mode kernel boot parameter ("S", including all tested synonyms) is ineffective with any Xen kernel. If you need to boot into single user mode, use a non-Xen kernel.

Index Up Down Note 2: Define the Lustre file system.

The lustre file system consists of an MGS, an MDT, and one or more OSTs. These components can exist on one or more nodes. In the case of elephant, e1 hosts a combined MGS/MDT, and e2 through e6 each hosts multiple OSTs. They were created as follows:

  • As root, create the MGS/MDT on e1:
    • pvcreate /dev/sdb
    • vgcreate vg00 /dev/sdb
    • lvcreate -L 100M -n MDT0000 vg00
    • mkfs.lustre --mgs --mdt --fsname=lustre /dev/vg00/MDT0000 - NB: The mkfs.lustre must be performed as root; sudo mkfs.lustre does not produce the correct results.
    • Create mount points: mkdir /lustreMDT /lustreFS

  • As root, create OSTs on e2 through e6:
    • pvcreate /dev/sdb
    • vgcreate vg00 /dev/sdb
    • lvcreate -L 600G -n OSTnnnn vg00
    • mkfs.lustre --ost --mgsnode=10.200.201.1@tcp --fsname=lustre /dev/vg00/OSTnnnn - NB: The mkfs.lustre must be performed as root; sudo mkfs.lustre does not produce the correct results.
    • Create mount points: mkdir -p /lustreOST/{OSTnnnn,OSTnnnn} /lustreFS

The "mkfs.lustre" command will assign a filesystem name to each OST created. The name takes the form "OSTnnnn", where "nnnn" is a sequentially assigned hexadecimal number starting at "0000". On elephant, the filesystem name, the logical volume name, and mount point are made consistent, ie. logical volume /dev/vg00/OST000a contains OST filesystem OST000a which is mounted on /lustreOSTs/OST000a. This is not difficult to ensure (more difficult to fix after the fact) and the shell script /usr/local/sbin/mountOSTs depend on this arrangement (at least it depends on the LV name matching the mount point; the filesystem name is not so important). To display the filesystem name, issue the command "tunefs.lustre /dev/vg00/OSTnnnn". Currently, there are 20 OSTs allocated, 0000 through 0013. The next OST filesystem name that will be assigned by mkfs.lustre will be OST0014.

  • Define the Lustre network (LNET):
    • LNET is defined by options within the modprobe configuration. In elephant's case, the definition is effected by the /etc/modprobe.d/lustre-lnet.conf file containing the following definition:
      • options lnet networks=tcp(bond0)
    • This configuration instructs lustre to use only the bond0 (10.200.201.0/24) interface.

  • Update iptables to allow access to port 988 on all server nodes (e1 through e6). Remember, all clients mounting the file system will have read/write access. The following iptable entries were created for the elephant cluster:
    • # lustre filesystem access - elephant cluster:
    • -A RH-Firewall-1-INPUT -s 10.200.201.0/24 -m tcp -p tcp --dport 988 -j ACCEPT

  • The following iptable rules were used during testing and are now obsolete:
    • -A RH-Firewall-1-INPUT -s 206.12.154.0/26 -m tcp -p tcp --dport 988 -j ACCEPT
    • # lustre filesystem access - CLB's workstation:
    • -A RH-Firewall-1-INPUT -s 142.104.60.69 -m tcp -p tcp --dport 988 -j ACCEPT
    • # lustre filesystem access - Kyle's testnode, NRC Ottawa:
    • -A RH-Firewall-1-INPUT -s 132.246.148.31 -m tcp -p tcp --dport 988 -j ACCEPT

Index Up Down Note 3: Starting and stopping the Lustre file system

Staring and stoppping the lustre file system is performed by mounting and unmounting the file system components. To start the file system the mounts should be issued as root in the following order:

  • Mount the MGS/MDT on e1: mount -t lustre /dev/vg00/MDT0000 /lustreMDT
  • Mount the OSTs on e2 through e6. /usr/local/sbin/mountOSTs
  • Mount the lustre filesystem on all nodes: mount -t 10.200.201.1@tcp:/lustre /lustreFS

To stop the file system, the unmounts should be issued as root in the following order:

  • On all nodes, unmount the lustre filesystem: umount /lustreFS
  • On e1: umount /lustreMDT
  • on e2 through e6: umount /lustreOST/*

Index Up Down Note 4: Changing the Lustre network.

In order to change the network, all network references must be updated or the file system may be rendered unuseable. Though the Lustre network configuration is defined in only two places ("mkfs.lustre --ost --mgsnode=!10.200.201.1@tcp --fsname=lustre /dev/vg00/lv00" and "options lnet networks=tcp(bond0)"), references are held in the following three places on each server node:

  • The active LNET module parameters viewable with the command "lctl list_nids".
  • The active MDT/OST filesystem parameters viewable with the command "tunefs.lustre /dev/vg00/OSTnnnn".
  • The MDT/OST log files which are not viewable. However, the following messages indicate that one or more log files have invalid refereneces:
    • Lustre: 6455:0:(socklnd_cb.c:421:ksocknal_txlist_done()) Deleting packet type 1 len 368 206.12.154.1@tcp->10.200.201.4@tcp
    • Lustre: 6460:0:(socklnd_cb.c:915:ksocknal_launch_packet()) No usable routes to 12345-10.200.201.3@tcp

To change the network (in this example we will change from bond0 (10.200.201.0/24) to eth3 (206.12.154.0/26), the following procedure should be adopted:

  • Stop the file system (see Step 3).
  • On e1 through e6: Update the /etc/modprob.d/lustre-lnet.conf on e1 through e5, eg. options lnet networks=tcp(eth3)
  • On e1 through e6: Request log file regeneration via the command "tunefs.lustre --writeconf /dev/vg00/lv00".
  • On e1 through e6: Update MDT/OST filesystem parameters via the command "tunefs.lustre --erase-params --mgsnode=206.12.154.1@tcp /dev/vg00/lv00"
  • On e1 through e6: Update iptables to reflect the new network configuration.
  • On e1 through e3: Unload the lnet module thereby purging the active module parameters. Issuing the "rmmod lnet" command generally results in "module in use by ....". The "rmmod" command can be issued repetitively for each of the identified modules until the "lnet" module is successfully unloaded. Alternatively, each node can be rebooted.
  • Restart/remount the file system (see Step 3).

Index Up Down Note 5: Installing a patchless client on a workstation.

  • Download the Lustre source.
  • tar -xzvf lustre-1.8.2.tar.gz
  • cd lustre-1.8.2
  • ./configure --disable-server --enable-client --with-linux=/usr/src/kernels/2.6.18-164.15.1.el5-xen-i686
  • make
  • sudo make install
  • sudo depmod -a
  • sudo modprobe lustre
  • sudo mount -t lustre 206.12.154.1@tcp:/lustre /lustreFS/

Index Up Down Note 6: Determine which OSS is serving an OST.

  On the MGS/MDT server:
Line: 30 to 136
  The IP address identifies which node is serving which OST.
Changed:
<
<

Proc 2: Temporarily deactivate an OST.

Index Up Down
>
>

Index Up Down Note 7: Temporarily deactivate an OST.

  On the MGS/MDT server:
  • Determine the device number for the MDT's OSC corresponding to the OST to be deactivated (a device is indentified by its' endpoints, eg. lustre-OSTnnnn-osc and lustre-mdtlov_UUID):
Line: 60 to 165
 
  • The "lctl dl | grep osc" command can be used to check the change in status.
Changed:
<
<

Proc 3: Re-activating an OST.

Index Up Down
>
>

Index Up Down Note 8: Re-activating an OST.

  On the MGS/MDT server:
  • Determine the device number for the MDT's OSC corresponding to the OST to be re-activated (a device is indentified by its' endpoints, eg. lustre-OSTnnnn-osc and lustre-mdtlov_UUID):
Line: 91 to 195
 
  • The "lctl dl | grep osc" command can be used to check the change in status.
Changed:
<
<

Proc 4: Determining which files have objects on a particular OST.

Index Up Down
>
>

Index Up Down Note 9: Determining which files have objects on a particular OST.

  This procedure can be performed on any lustre node:
Line: 131 to 234
 
    1. 0 0
[root@elephant bin]# lctl conf_param lustre-OST0004.osc.active=0 [root@elephant bin]#
Deleted:
<
<

Proc 5: Restoring a corrupted PV after a motherboard replacement.

Index Up Down
 
Added:
>
>
 
Added:
>
>

Index Up Down Note 10: Restoring a corrupted PV after a motherboard replacement.

 
On e4:

Revision 232011-01-07 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-07-20

Lustre Procedures

Line: 132 to 132
 [root@elephant bin]# lctl conf_param lustre-OST0004.osc.active=0 [root@elephant bin]#
Added:
>
>

Proc 5: Restoring a corrupted PV after a motherboard replacement.

Index Up Down

On e4:
dd if=/dev/sdb of=/tmp/boot.txt bs=512 count=1

On e3:
scp crlb@e4:/tmp/boot.txt boot-e4.txt
dd if=boot-e4.txt of=/dev/sdb bs=512 count=1
fdisk -l /dev/sdb

pvcreate --restorefile /etc/lvm/backup/vg00 --uuid f26AZq-ycTI-7QKf-3yn9-3VCe-w1V3-dOaKlk /dev/sdb
* uuid was taken from /etc/lvm/backup/vg00
vgcfgrestore vg00
pvscan
vgchange -ay vg00
mount -a
mountOSTs 
df
 \ No newline at end of file

Revision 222010-11-04 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-07-20

Lustre Procedures

Line: 6 to 6
 

Content

  1. Determine which OSS is serving an OST.
Changed:
<
<
  1. Deactivate an OST.
>
>
  1. Temporarily deactivate an OST.
 
  1. Re-activating an OST.
  2. Determining which files have objects on a particular OST.
Line: 31 to 31
 The IP address identifies which node is serving which OST.

Changed:
<
<

Proc 2: Deactivate an OST.

>
>

Proc 2: Temporarily deactivate an OST.

 Index Up Down

On the MGS/MDT server:

Revision 212010-08-06 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-07-20

Lustre Procedures

Line: 125 to 125
  . [crlb@elephant01 ~]$
Added:
>
>
[root@elephant bin]# lfs find --obd lustre-OST0004_UUID /lustreFS/  | wc
      0       0       0
[root@elephant bin]# lctl conf_param lustre-OST0004.osc.active=0
[root@elephant bin]#
 \ No newline at end of file

Revision 202010-07-20 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-07-20

Lustre Procedures

Line: 8 to 8
 
  1. Determine which OSS is serving an OST.
  2. Deactivate an OST.
  3. Re-activating an OST.
Added:
>
>
  1. Determining which files have objects on a particular OST.
 

Proc 1: Determine which OSS is serving an OST.

Line: 34 to 35
 Index Up Down

On the MGS/MDT server:

Changed:
<
<
  1. Determine the device number for the MDT's OSC corresponding to the OST to be deactivated (a device is indentified by its' endpoints, eg. lustre-OSTnnnn-osc and lustre-mdtlov_UUID):
>
>
  • Determine the device number for the MDT's OSC corresponding to the OST to be deactivated (a device is indentified by its' endpoints, eg. lustre-OSTnnnn-osc and lustre-mdtlov_UUID):
 
[crlb@elephant01 ~]$ lctl dl | grep osc
Line: 49 to 50
 [crlb@elephant01 ~]$
Changed:
<
<
  1. To deactivate OST0003 from the above list issue:
>
>
  • To deactivate OST0003 from the above list issue:
 
[crlb@elephant01 ~]$ sudo lctl --device 8 deactivate
Line: 57 to 58
 [crlb@elephant01 ~]$

Changed:
<
<
The "lctl dl | grep osc" command can be used to check the change in status.
>
>
  • The "lctl dl | grep osc" command can be used to check the change in status.
 

Proc 3: Re-activating an OST.

Index Up Down

On the MGS/MDT server:

Changed:
<
<
  1. Determine the device number for the MDT's OSC corresponding to the OST to be re-activated (a device is indentified by its' endpoints, eg. lustre-OSTnnnn-osc and lustre-mdtlov_UUID):
>
>
  • Determine the device number for the MDT's OSC corresponding to the OST to be re-activated (a device is indentified by its' endpoints, eg. lustre-OSTnnnn-osc and lustre-mdtlov_UUID):
 
[crlb@elephant01 ~]$ lctl dl | grep osc
Line: 79 to 80
 [crlb@elephant01 ~]$
Changed:
<
<
  1. To Re-activate OST0003 from the above list issue:
>
>
  • To Re-activate OST0003 from the above list issue:
 
[crlb@elephant01 ~]$ sudo lctl --device 8 activate
Line: 87 to 88
 [crlb@elephant01 ~]$
Changed:
<
<
The "lctl dl | grep osc" command can be used to check the change in status.
>
>
  • The "lctl dl | grep osc" command can be used to check the change in status.

Proc 4: Determining which files have objects on a particular OST.

Index Up Down

This procedure can be performed on any lustre node:

  • Determine the UUID for the OST of interest:
[crlb@elephant01 ~]$ lfs df
UUID                 1K-blocks      Used Available  Use% Mounted on
lustre-MDT0000_UUID   91743520    496624  86004016    0% /lustreFS[MDT:0]
lustre-OST0000_UUID  928910792 717422828 164301980   77% /lustreFS[OST:0]
lustre-OST0001_UUID  928910792 720414360 161310444   77% /lustreFS[OST:1]
lustre-OST0002_UUID  928910792 730323340 151401464   78% /lustreFS[OST:2]
lustre-OST0003_UUID  928910792 348690392 533034416   37% /lustreFS[OST:3]

filesystem summary:  3715643168 2516850920 1010048304   67% /lustreFS

[crlb@elephant01 ~]$ 

  • To list the files with objects on OST0003:
[crlb@elephant01 ~]$ lfs find --obd lustre-OST0003_UUID /lustreFS/
   .
   .
/lustreFS//BaBar/work/allruns_backup/17320691/A26.0.0V01x57F/config.tcl
/lustreFS//BaBar/work/allruns_backup/17320691/A26.0.0V01x57F/17320691.moose.01.root
/lustreFS//BaBar/work/allruns_backup/17320691/status.txt
/lustreFS//BaBar/work/allruns_backup/17320697/A26.0.0V01x57F/B+B-_generic.dec
   .
   .
[crlb@elephant01 ~]$ 

Revision 192010-07-20 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
Changed:
<
<
-- ColinLeavettBrown - 2010-02-11

Nimbus 2.3 install on Elephant

Original document written by PatrickArmstrong. Modified for initial implementation of the split configuration on the elephant cluster. In this configuration, there are eleven nodes, with the head node residing on elephant01. During initial deployment the first three nodes will be used to develop cluster management, while the remaining nodes, elephant04 to elehant11 will be used for an interim cloud cluster, with elephant11 acting as its head node.

Overview

  1. Create privileged user "nimbus" on all cluster nodes.
  2. Switch to user "nimbus" and create public/private keys.
  3. Download all required packages.
  4. Install head node prerequisites.
  5. Install Globus ws-core Web Services container.
  6. Install X509 grid certificates.
  7. Test Web Services container.
  8. Automate Web Services start/stop.
  9. Install Nimbus.
  10. Install Nimbus Workspace Control Agents
  11. Worker Node Setup.
  12. Auto-configure and Test Nimbus.
>
>
-- ColinLeavettBrown - 2010-07-20

Lustre Procedures

 
Changed:
<
<

Step 1: Create privileged user "nimbus" on all cluster nodes.

Create the nimbus account on elephant head node (elephant01) with the required sudo privileges.

[crlb@elephant01 ~]$ sudo adduser nimbus
[crlb@elephant01 ~]$ sudo visudo
>
>

Content

  1. Determine which OSS is serving an OST.
  2. Deactivate an OST.
  3. Re-activating an OST.
 
Changed:
<
<
Comment out the requiretty directive:
#Defaults    requiretty
>
>

Proc 1: Determine which OSS is serving an OST.

Index Up Down
 
Changed:
<
<
Allow any command with a password:
>
>
On the MGS/MDT server:
 
Changed:
<
<
nimbus ALL=(ALL) ALL
>
>
[crlb@elephant01 ~]$ lctl get_param osc.*.ost_conn_uuid osc.lustre-OST0000-osc-ffff8803bd5ab400.ost_conn_uuid=206.12.154.2@tcp osc.lustre-OST0000-osc.ost_conn_uuid=206.12.154.2@tcp osc.lustre-OST0001-osc-ffff8803bd5ab400.ost_conn_uuid=206.12.154.3@tcp osc.lustre-OST0001-osc.ost_conn_uuid=206.12.154.3@tcp osc.lustre-OST0002-osc-ffff8803bd5ab400.ost_conn_uuid=206.12.154.4@tcp osc.lustre-OST0002-osc.ost_conn_uuid=206.12.154.4@tcp osc.lustre-OST0003-osc-ffff8803bd5ab400.ost_conn_uuid=206.12.154.5@tcp osc.lustre-OST0003-osc.ost_conn_uuid=206.12.154.5@tcp [crlb@elephant01 ~]$ :
 
Changed:
<
<
And the following commands with no password:
nimbus  ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/mount-alter.sh
nimbus  ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/dhcp-config.sh
nimbus  ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/xen-ebtables-config.sh

Save changes and propagate to every node in the cluster:

[crlb@elephant01 ~]$ sudo /usr/local/sbin/usync
>
>
The IP address identifies which node is serving which OST.
 
Changed:
<
<

Step 2: Switch to user "nimbus" and create public/private keys.

[crlb@elephant01 ~]$ sudo su - nimbus
Password: ********
[nimbus@elephant01 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Created directory '/home/nimbus/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
e4:43:60:84:0c:ea:dc:02:dd:4b:93:fd:f4:e4:38:e8 nimbus@elephant01.heprc.uvic.ca
[nimbus@elephant01 ~]$ cp .ssh/id_rsa.pub .ssh/authorized_keys
[nimbus@elephant01 ~]$ chmod 600 .ssh/authorized_keys
[nimbus@elephant01 ~]$ 

Step 3: Download all required packages.

[nimbus@elephant01 ~]$ mkdir -p Downloads/nimbus-2.3
[nimbus@elephant01 ~]$ cd Downloads/nimbus-2.3
[nimbus@elephant01 nimbus-2.3]$ wget http://www.nimbusproject.org/downloads/nimbus-2.3.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://www.nimbusproject.org/downloads/nimbus-controls-2.3.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.8/ws-core/bin/ws-core-4.0.8-bin.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://mirror.csclub.uwaterloo.ca/apache/ant/binaries/apache-ant-1.8.0-bin.tar.bz2
[nimbus@elephant01 nimbus-2.3]$ wget http://www.gridcanada.ca/ca/bffbd7d0.0

Step 4: Install head node prerequisites.

[nimbus@elephant01 ~]$ ssh e11
The authenticity of host 'e11 (10.200.200.11)' can't be established.
RSA key fingerprint is 7c:92:13:5d:35:59:dd:ca:2e:bd:95:b4:97:ed:f0:97.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'e11,10.200.200.11' (RSA) to the list of known hosts.
Last login: Tue Feb 16 09:52:21 2010 from elephant01.admin
[nimbus@elephant11 ~]$ 

Install java-1.6.0-sun-compat:

[nimbus@elephant11 ~]$ sudo yum -y install java-1.6.0-sun-compat

Install Apache Ant

[nimbus@elephant11 ~]$ cd /usr/local
[nimbus@elephant11 local]$ sudo tar -xjvf ~nimbus/Downloads/apache-ant-1.8.0-bin.tar.bz2

Step 5: Install Globus ws-core Web Services container.

The frontend tools (globus ws-core and nimbus) will be Installed to /usr/local/nimbus:

[nimbus@elephant11 local]$ sudo mkdir -p versions/nimbus-2.3
[nimbus@elephant11 local]$ sudo chown nimbus.nimbus versions/nimbus-2.3
[nimbus@elephant11 local]$ sudo ln -s versions/nimbus-2.3 nimbus
[nimbus@elephant11 local]$ cd nimbus
[nimbus@elephant11 nimbus]$ tar -xzf ~/Downloads/ws-core-4.0.8-bin.tar.gz
[nimbus@elephant11 nimbus]$ mv ws-core-4.0.8/* .
[nimbus@elephant11 nimbus]$ rmdir ws-core-4.0.8
[nimbus@elephant11 nimbus]$ mkdir var

Set environment variables. Example assumes bash as the nimbus user's shell:

[nimbus@elephant11 nimbus]$ cd
[nimbus@elephant11 ~]$  echo "export GLOBUS_LOCATION=/usr/local/nimbus" >> .bashrc
[nimbus@elephant11 ~]$  echo "export X509_CERT_DIR=/usr/local/nimbus/share/certificates" >> .bashrc
[nimbus@elephant11 ~]$  echo "export PATH=$PATH:/usr/local/apache-ant-1.8.0/bin" >> .bashrc
[nimbus@elephant11 ~]$  . .bashrc

Create an empty grid-mapfile to contain the certificate subjects of the users of the cloud-enabled cluster.

[nimbus@elephant11 nimbus]$  touch $GLOBUS_LOCATION/share/grid-mapfile

Step 6: Install X509 grid certificates.

Make our certificates directory and put the grid canada root certificates in there.

[nimbus@elephant11 ~]$ mkdir -p $X509_CERT_DIR
[nimbus@elephant11 ~]$ mv Downloads/nimbus-2.3/bffbd7d0.0 $X509_CERT_DIR/

Then create a host certificate request to send to our CA.

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/grid-cert-request -int -host `hostname -f` -dir $X509_CERT_DIR -caEmail ca@gridcanada.ca -force
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
Enter organization DN by entering individual component names and their values.
The component name can be one of: [givenname, surname, ou, uid, cn, initials, unstructuredname, t, unstructuredaddress, emailaddress, o, st, l, generation, sn, e, c, dc]
-----
Enter name component: C
Enter 'C' value: CA
Enter name component: O
Enter 'O' value: Grid
Enter name component: 
Generating a 1024 bit RSA private key
A private key and a certificate request has been generated with the subject:

/C=CA/O=Grid/CN=host/canfardev.dao.nrc.ca

The private key is stored in /usr/local/nimbus/share/certificates/hostkey.pem
The request is stored in /usr/local/nimbus/share/certificates/hostcert_request.pem

Now mail this request file ($X509_CERT_DIR/hostcert_request.pem) to ca@gridcanada.ca . It might take a day or so before you get your certificate back.

Once you have your key, paste the contents into $X509_CERT_DIR/hostcert.pem

Now that we have our certificate, we have to point our container to our key and certificate and to our empty grid-mapfile. To do so, edit $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml to point to your new certificates and modify the gridmap value:

[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml 
<?xml version="1.0" encoding="UTF-8"?>
<securityConfig xmlns="http://www.globus.org">
    <credential>
        <key-file value="/usr/local/nimbus/share/certificates/hostkey.pem"/>
        <cert-file value="/usr/local/nimbus/share/certificates/hostcert.pem"/>
    </credential>
    <gridmap value="/usr/local/nimbus/share/grid-mapfile"/>
</securityConfig>

Activate the security configuration by adding a element under the CONTAINER_SECURITY_DESCRIPTOR:

[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
<!-- @CONTAINER_SECURITY_DESCRIPTOR@ -->
<parameter name="containerSecDesc"
              value="etc/globus_wsrf_core/global_security_descriptor.xml"/>

Step 7: Test Web Services container.

Now that we've set up security, we can try starting our container for the first time. To do so, run globus-start-container. You should see something like the following:

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-container
Starting SOAP server at: https://204.174.103.121:8443/wsrf/services/ 
With the following services:

[1]: https://204.174.103.121:8443/wsrf/services/AdminService
[2]: https://204.174.103.121:8443/wsrf/services/AuthzCalloutTestService
[3]: https://204.174.103.121:8443/wsrf/services/ContainerRegistryEntryService
...
[25]: https://204.174.103.121:8443/wsrf/services/gsi/AuthenticationService

If you do, hit control-c. Congratulations! Your container is working.

If you get the following error

org.globus.common.ChainedIOException: Authentication failed [Caused by: Failure unspecified at GSS-API level [Caused by: Unknown CA]]
You are probably missing the Grid Canada .0 file(bffbd7d0.0 in this case). Either copy the file from another globus machine's X509_CERT_DIR or download the GC CA Bundle from the GC Certificate Authority website. and put the bffbd7d0.0 file into the X509_CERT_DIR and try starting the container again.

Step 8: Automate Web Services start/stop.

Now that we know our container works, we can create a script to run our container at login. Paste the following script into $GLOBUS_LOCATION/bin/globus-start-stop:

#!/bin/sh
set -e
export GLOBUS_OPTIONS="-Xms256M -Xmx1024M -Dorg.globus.tcp.port.range=50000,51999"
export GLOBUS_TCP_PORT_RANGE=50000,51999

cd $GLOBUS_LOCATION
case "$1" in
    start)
        nohup $GLOBUS_LOCATION/bin/globus-start-container -p 8443 \
       >>$GLOBUS_LOCATION/var/container.log &
        ;;
    stop)
        $GLOBUS_LOCATION/bin/grid-proxy-init \
            -key $GLOBUS_LOCATION/share/certificates/hostkey.pem\
            -cert $GLOBUS_LOCATION/share/certificates/hostcert.pem\
            -out /tmp/shutdownproxy.pem\
            >/dev/null
        export X509_USER_PROXY=/tmp/shutdownproxy.pem
        $GLOBUS_LOCATION/bin/globus-stop-container hard
        unset X509_USER_PROXY
        rm /tmp/shutdownproxy.pem
        ;;
    restart)
        $GLOBUS_LOCATION/start-stop stop
        $GLOBUS_LOCATION/start-stop start
        ;;
    *)
        echo "Usage: globus {start|stop}" >&2
        exit 1
       ;;
esac
exit 0

Mark it as executable:

[nimbus@elephant11 ~]
$ chmod 744 $GLOBUS_LOCATION/bin/globus-start-stop

We can now try starting and stopping the container with this script, and see if we're listening on 8443:

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop start
[nimbus@elephant11 ~]$ netstat -an | grep 8443
tcp        0      0 0.0.0.0:8443                0.0.0.0:*                   LISTEN     

Great! Now we have a running container. Let's stop it before we carry on with our installation.

[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop stop

Step 9: Install Nimbus

Unpack the nimbus package into a temporary directory and run the install script. When the install script completes, Nimbus will have been installed in $GLOBUS_LOCATION, and the temporary files can be removed.

[nimbus@elephant11 ~]$ cd /tmp
[nimbus@elephant11 tmp]$ tar -xvf ~/Downloads/nimbus-2.3.tar.gz
[nimbus@elephant11 tmp]$ /tmp/nimbus-2.3/bin/all-build-and-install.sh
[nimbus@elephant11 tmp]$ rm -rf /tmp/nimbus-2.3/
[nimbus@elephant11 tmp]$ rmdir hsperfdata_nimbus

Step 10: Install Nimbus Workspace Control Agents

The back-end tools (Nimbus control agents) will be installed in /opt/nimbus:

[nimbus@elephant11 tmp]$ cd /opt
[nimbus@elephant11 opt]$ sudo mkdir -p versions/nimbus-2.3
[nimbus@elephant11 opt]$ sudo chown nimbus.nimbus versions/nimbus-2.3
[nimbus@elephant11 opt]$ sudo ln -s versions/nimbus-2.3 nimbus
[nimbus@elephant11 opt]$ cd nimbus
[nimbus@elephant11 nimbus]$ tar -xzvf ~/Downloads/nimbus-2.3/nimbus-controls-2.3.tar.gz
[nimbus@elephant11 nimbus]$ mv ./nimbus-controls-2.3/workspace-control/* .

Export the Nimbus control agent directory to the worker nodes by adding the following to /etc/exports:

/opt/versions                   elephant*.admin(rw,no_root_squash)

Then issue:

[nimbus@elephant11 nimbus]$ sudo exportfs -a

Step 11: Worker Node Setup

Verify password-less access to worker nodes:

[nimbus@elephant11 ~]$ ssh e10
The authenticity of host 'e10 (10.200.200.10)' can't be established.
RSA key fingerprint is 81:ba:c1:49:1f:a5:22:30:60:a6:b8:ba:19:0b:38:2c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'e10,10.200.200.10' (RSA) to the list of known hosts.
Last login: Tue Feb 16 13:55:22 2010 from elephant11.admin
[nimbus@elephant10 ~]$ ssh e11
Last login: Tue Feb 16 13:54:41 2010 from elephant01.admin
[nimbus@elephant11 ~]$ exit
[nimbus@elephant10 ~]$1

Ensure Xen is installed and running:

[nimbus@elephant10 ~]$ sudo xm list
Name                                        ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      23690         16 r-----   1410.6
[nimbus@elephant10 ~]$

If Xen is not installed:

[nimbus@elephant10 ~]$ sudo yum install xen kernel-xen
[nimbus@elephant10 ~]$ sudo chkconfig xend on
[nimbus@elephant10 ~]$ sudo shutdown -r now

Apply Nimbus networking bug workaround
[nimbus@elephant10 ~]$ sudo vi /etc/xen/xend-config.sxp
Find the line:
#(xend-tcp-xmlrpc-server no)
and change it to:
(xend-http-server yes)
Save the change and restart the Xen daemon:
[nimbus@elephant10 ~]$ sudo /etc/init.d/xend restart

Install ebtables. For the x86_64 kernels on elephant:

[nimbus@elephant10 ~]$ cd /tmp
[nimbus@elephant10 tmp]$ wget --no-check-certificate https://wiki.gridx1.ca/twiki/pub/HEPrc/GVWandEBTables/ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm
[nimbus@elephant10 tmp]$ sudo rpm -ivhf ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm
[nimbus@elephant10 tmp]$ rm ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm

Mount Nimbus Control Agents directory.

[nimbus@elephant10 tmp]$ sudo mkdir -p /opt/versions
[nimbus@elephant10 tmp]$ sudo echo 'elephant11.admin:/opt/versions  /opt/versions nfs defaults 0 0' >>/etc/fstab
[nimbus@elephant10 tmp]$ sudo mount -a
[nimbus@elephant10 tmp]$ sudo ln -s /opt/versions/nimbus-2.3-a4b265d /opt/nimbus

Install and configure DHCP.

[nimbus@elephant11 tmp]$ sudo yum install dhcp
[nimbus@elephant11 tmp]$ sudo cp /opt/nimbus/share/workspace-control/dhcpd.conf.example /etc/dhcpd.conf
[nimbus@elephant11 tmp]$ sudo vi /etc/dhcpd.conf

Modify the /etc/libvirt/libvirtd.conf by setting the following values:

unix_sock_group = "nimbus"
unix_sock_rw_perms = "0777"

Restart libvirt:

[nimbus@elephant11 tmp]$ sudo /etc/init.d/libvirtd restart

Step 12: Auto-configure and Test Nimbus.

Add a Node to the Nimbus configuration

This should be done after you've already installed Nimbus on the head node. If you haven't done that yet, come back to this section.

Edit $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools/canfardevpool to add the new node. Your file should look something like this:

#Some comments up here
e10 3072

The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Your worker node should now be ready!

Now run the auto-configuration program. Following is a transcript of running this program on canfardev:

$ $GLOBUS_LOCATION/share/nimbus-autoconfig/autoconfig.sh

# ------------------------- #
# Nimbus auto-configuration #
# ------------------------- #

Using GLOBUS_LOCATION: /usr/local/nimbus

Is the current account (nimbus) the one the container will run under? y/n:
y

Pick a VMM to test with, enter a hostname: 
gildor

----------

How much RAM (MB) should be allocated for a test VM on the 'gildor' VMM?
256

Will allocate 256 MB RAM for test VM on the 'gildor' VMM.

----------

Is the current account (nimbus) also the account the privileged scripts will run under on the VMM (gildor)? y/n:
y

Does the container account (nimbus) need a special (non-default) SSH key to access the 'nimbus' account on the VMM nodes? y/n:
n

----------

Testing basic SSH access to nimbus@gildor

Test command (1): ssh -T -n -o BatchMode=yes nimbus@gildor /bin/true

Basic SSH test (1) working to nimbus@gildor

----------

Now we'll set up the *hostname* that VMMs will use to contact the container over SSHd

Even if you plan on ever setting up just one VMM and it is localhost to the container, you should still pick a hostname here ('localhost' if you must)

*** It looks like you have a hostname set up: canfardev.dao.nrc.ca

Would you like to manually enter a different hostname? y/n:
n

Using hostname: canfardev.dao.nrc.ca

----------

Is your local SSHd server on a port different than 22?  Enter 'n' or a port number: 
n

Attempting to connect to: canfardev.dao.nrc.ca:22

Contacted a server @ canfardev.dao.nrc.ca:22

----------

Now we will test the basic SSH notification conduit from the VMM to the container

Test command (2): ssh -T -n -o BatchMode=yes nimbus@gildor ssh -T -n -o BatchMode=yes -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:
>
>

Proc 2: Deactivate an OST.

Index Up Down
 
Changed:
<
<
Host key verification failed.
>
>
On the MGS/MDT server:
  1. Determine the device number for the MDT's OSC corresponding to the OST to be deactivated (a device is indentified by its' endpoints, eg. lustre-OSTnnnn-osc and lustre-mdtlov_UUID):
 
Deleted:
<
<
* That failed.

Try it manually in another terminal? There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key. For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Host key verification failed.

* That failed.

Try it manually in another terminal? There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key. For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Notification test (2) working (ssh from nimbus@gildor to nimbus@canfardev.dao.nrc.ca at port 22)


OK, looking good.



If you have not followed the instructions for setting up workspace control yet, please do the basic installation steps now.

Look for the documentation at: - http://workspace.globus.org/vm/TP2.2/admin/quickstart.html#part-III


A sample installation command set can be provided for you here if you supply a group name. Group privileges are used for some configurations and programs.

What is a privileged unix group of nimbus on gildor? Or type 'n' to skip this step. nimbus


* Sample workspace-control installation commands:

ssh root@gildor ^^^^ YOU NEED TO BE ROOT

wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz tar xzf nimbus-controls-TP2.2.tar.gz cd nimbus-controls-TP2.2

mkdir -p /opt/nimbus cp worksp.conf.example /opt/nimbus/worksp.conf python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus

* (see 'python install.py -h' for other options, including non-interactive installation)


Waiting for you to install workspace control for the account 'nimbus' on the test VMM 'gildor'

After this is accomplished, press return to continue.


Going to test container access to workspace control installation.

On 'gildor', did you install workspace-control somewhere else besides '/opt/nimbus/bin/workspace-control'? y/n: n

Test command (3): ssh -T -n -o BatchMode=yes nimbus@gildor /opt/nimbus/bin/workspace-control -h 1>/dev/null

Workspace control test (3) working


Testing ability to push files to workspace control installation.

We are looking for the directory on the VMM to push customization files from the container node. This defaults to '/opt/workspace/tmp'

Did you install workspace-control under some other base directory besides /opt/workspace? y/n: n Test command (4): scp -o BatchMode=yes /usr/local/nimbus/share/nimbus-autoconfig/lib/transfer-test-file.txt nimbus@gildor:/opt/workspace/tmp/

transfer-test-file.txt 100% 73 0.1KB/s 00:00

SCP test (4) working


Great.



Now you will choose a test network address to give to an incoming VM.

Does the test VMM (gildor) have an IP address on the same subnet that VMs will be assigned network addresses from? y/n: y


What is a free IP address on the same subnet as 'gildor' (whose IP address is '172.21.1.197') 172.21.1.200


Even if it does not resolve, a hostname is required for '172.21.1.200' to include in the DHCP lease the VM will get: canfardevtest


What is the default gateway for 172.21.1.200? (guessing it is 172.21.1.1) You can type 'none' if you are sure you don't want the VM to have a gateway

Please enter a gateway IP address or type 'none'.

What is the default gateway for 172.21.1.200? (guessing it is 172.21.1.1) You can type 'none' if you are sure you don't want the VM to have a gateway 172.21.1.1


What is the IP address of the DNS server that should be used by the VM? (guessing it is 172.21.1.34) You can type 'none' if you are sure you don't want the VM to have DNS 172.21.1.34


OK, in the 'make adjustments' step that follows, the service will be configured to provide this ONE network address to ONE guest VM.

You should add more VMMs and more available network addresses to assign guest VMs only after you successfully test with one VMM and one network address.


* Changes to your configuration are about to be executed.

So far, no configurations have been changed. The following adjustments will be made based on the questions and tests we just went through:

- The GLOBUS_LOCATION in use: /usr/local/nimbus - The account running the container/service: nimbus - The hostname running the container/service: canfardev.dao.nrc.ca - The contact address of the container/service for notifications: nimbus@canfardev.dao.nrc.ca (port 22)

- The test VMM: gildor - The available RAM on that VMM: 256 - The privileged account on the VMM: nimbus

- The workspace-control path on VMM: /opt/workspace/bin/workspace-control - The workspace-control tmpdir on VMM: /opt/workspace/tmp

- Test network address IP: 172.21.1.200 - Test network address hostname: canfardevtest - Test network address gateway: 172.21.1.1 - Test network address DNS: 172.21.1.34


These settings are now stored in '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-decisions.sh'

If you type 'y', that script will be run for you with the settings.

Or you can answer 'n' to the next question and adjust this file. And then manually run '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-adjustments.sh' at your leisure.

OK, point of no return. Proceed? y/n y

* Running /usr/local/nimbus/share/nimbus-autoconfig/autoconfig-adjustments.sh . . .

# ------------------------------------------- # # Nimbus auto-configuration: make adjustments # # ------------------------------------------- #

Read settings from '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-decisions.sh'


[*] The 'service.sshd.contact.string' configuration was: ... set to 'nimbus@canfardev.dao.nrc.ca:22' ... (it used to be set to 'REPLACE_WITH_SERVICE_NODE_HOSTNAME:22') ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'


[*] The 'control.ssh.user' configuration was: ... set to 'nimbus' ... (it used to be set blank) ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'


[*] The 'use.identity' configuration does not need to be changed. ... already set to be blank ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'


[*] The 'control.path' configuration does not need to be changed. ... already set to '/opt/workspace/bin/workspace-control' ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/vmm.conf'


[*] The 'control.tmp.dir' configuration does not need to be changed. ... already set to '/opt/workspace/tmp' ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/vmm.conf'


[*] Backing up old resource pool settings ... created new directory '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/.backups/old-pools-01' ... moved 'pool1' to '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/.backups/old-pools-01'


[*] Creating new resource pool ... created '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/testpool'


[*] Backing up old network settings ... created new directory '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01' ... moved 'private' to '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01' ... moved 'public' to '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'


[*] Creating new network called 'public' ... created '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/public'


NOTE: you need to MATCH this network in the workspace-control configuration file. This configuration file is at '/opt/workspace/worksp.conf' by default

For example, you might have this line:

association_0: public; xenbr0; vif0.1 ; none; 172.21.1.200/24

... "public" is the name of the network we chose. ... "xenbr0" is the name of the bridge to put VMs in this network on. ... "vif0.1" is the interface where the DHCP server is listening in dom0 on the VMM ... and the network address range serves as a sanity check (you can disable that check in the conf file)


Making sure 'fake mode' is off:

[*] The 'fake.mode' configuration was: ... set to 'false' ... (it used to be set to 'true') ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/other/common.conf'


Finished.

See 'NOTE' above.

Great. That seemed to work okay.

Let's carry on with the configuration.

Nimbus Configuration

First we need to tell Nimbus which machines we can boot virtual machines on. To do this, we need to edit this Nimbus frontend configuration files. These are in $GLOBUS_LOCATION/etc/nimbus . Let's define the machines we can boot on:

$ vim /usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/testpool
NOTE: a node may not be in more than one pool at the same time, this will
#       result in an initialization error

# Supported form:
# node_name  memory_to_manage networks_supported
#
# If third field is blank (or marked with '*'), it is assumed that pool
# node supports all networks available to remote clients.  Otherwise use a comma
# separated list (no spaces between).
#
# Note that if you list a network here that is not valid at runtime,
# it will silently be ignored (check your spelling).


# File contents injected @ Mon Jul 20 11:57:41 PDT 2009
gildor 1024

For now, we only have one machine in our pool, and it has 1024MB free ram with which to boot VMs.

Now we need to set up networking. To do this, we need to create a pool of network addresses we can assign to machines we boot on the cluster. Since we're going to start with only provate networking, we will create a private pool. We define this as a text file in $GLOBUS_LOCATION/etc/nimbus/workspace-service/network-pools

This file contains a DNS server for these machines, as well as a list of ip addresses, hostnames, and other networking details. We will set this file up for two addresses now.

$ cat $GLOBUS_LOCATION/etc/nimbus/workspace-service/network-pools/private
# DNS server IP or 'none'
172.21.1.34

# hostname ipaddress gateway broadcast subnetmask
canfardev00 172.21.1.200 172.21.1.1 none none
canfardev01 172.21.1.201 172.21.1.1 none none

Now we need to set up an equivalent networking association in the worksp.conf file on the worker nodes. You need to associate each network pool with a virtual interface on each worker node.

From worksp.conf on gildor:

 
Changed:
<
<
association_0: private; xenbr0; vif0.0 ; none; 172.21.1.0/24
>
>
[crlb@elephant01 ~]$ lctl dl | grep osc 5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5 6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5 8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 11 UP osc lustre-OST0000-osc-ffff8803bd5ab400 a91b4601-8f1d-5061-2175-7ac02693cc0f 5 12 UP osc lustre-OST0001-osc-ffff8803bd5ab400 a91b4601-8f1d-5061-2175-7ac02693cc0f 5 13 UP osc lustre-OST0002-osc-ffff8803bd5ab400 a91b4601-8f1d-5061-2175-7ac02693cc0f 5 14 UP osc lustre-OST0003-osc-ffff8803bd5ab400 a91b4601-8f1d-5061-2175-7ac02693cc0f 5 [crlb@elephant01 ~]$
 
Changed:
<
<
Now finally, point Nimbus to the grid-mapfile we created earlier:
>
>
  1. To deactivate OST0003 from the above list issue:
 
Changed:
<
<
$ vim $GLOBUS_LOCATION/etc/nimbus/factory-security-config.xml <auth-method> </auth-method>
>
>
[crlb@elephant01 ~]$ sudo lctl --device 8 deactivate [sudo] password for crlb: [crlb@elephant01 ~]$
 
Changed:
<
<

Initial Test

Start up your container with globus-start-container. You should see the following new services:

https://10.20.0.1:8443/wsrf/services/ElasticNimbusService
https://10.20.0.1:8443/wsrf/services/WorkspaceContextBroker
https://10.20.0.1:8443/wsrf/services/WorkspaceEnsembleService
https://10.20.0.1:8443/wsrf/services/WorkspaceFactoryService
https://10.20.0.1:8443/wsrf/services/WorkspaceGroupService
https://10.20.0.1:8443/wsrf/services/WorkspaceService
https://10.20.0.1:8443/wsrf/services/WorkspaceStatusService
>
>
The "lctl dl | grep osc" command can be used to check the change in status.
 
Changed:
<
<
Now we'll run a test script. Let's try.

$ wget http://workspace.globus.org/vm/TP2.2/admin/test-create.sh
$ grid-proxy-init
$ sh ./test-create.sh

Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "/usr/local/nimbus/share/nimbus-clients/sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"...
Problem: General error: org.globus.wsrf.impl.security.authorization.exceptions.AuthorizationException: "/C=CA/O=Grid/OU=phys.uvic.ca/CN=Patrick Armstrong" is not authorized to use operation: {http://www.globus.org/2008/06/workspace}create on this service

Whoops! I need to add myself to the grid-mapfile:

$ echo '"/C=CA/O=Grid/OU=phys.uvic.ca/CN=Patrick Armstrong" nimbus' >> $GLOBUS_LOCATION/share/grid-mapfile
$ sh ./test-create.sh

Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "/usr/local/nimbus/share/nimbus-clients/sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"...
Problem: Resource request denied: Error creating workspace(s): 'public' is not a valid network name

Oh, whoops again. It looks like our test script wants to use public networking, and we don't have that set up.

$ cp $GLOBUS_LOCATION/share/nimbus-clients/sample-workspace.xml sample-workspace.xml
$ vim sample-workspace.xml (change public to private)
$ vim test-create.sh (change $GLOBUS_LOCATION/share/nimbus-clients/sample-workspace.xml to sample-workspace.xml)
$ sh test-create
...
Invalid:
--------
  - fatal, image '/opt/workspace/images/ttylinux-xen' does not exist on the filesystem
  - IMG #1 is invalid
  - no valid partitions/HD images
  - fatal, number of mountpoints (1) does not match number of valid partitions/HD images (0)
  - fatal, image '/opt/workspace/images/vmlinuz-2.6-xen' does not exist on the filesystem
  - fatal, no images configured
  - failure is triggered, backing out any networking reservations

for help use --help
"http://example1/localhost/image": Corrupted, calling destroy for you.
"http://example1/localhost/image" was terminated.
>
>

Proc 3: Re-activating an OST.

Index Up Down
 
Changed:
<
<
Whoops! Looks like we need to put the ttylinux files into the images directory on the worker node:
gildor # cd /opt/workspace/images
gildor # wget wget http://workspace.globus.org/downloads/ttylinux-xen.tgz
gildor # tar xzvf ttylinux-xen.tgz 
ttylinux-xen
ttylinux-xen.conf
gildor # rm -Rf ttylinux-xen.tgz
gildor # cp /boot/vmlinuz-2.6.18-128.1.1.el5xen vmlinuz-2.6-xen
Try again:
>
>
On the MGS/MDT server:
  1. Determine the device number for the MDT's OSC corresponding to the OST to be re-activated (a device is indentified by its' endpoints, eg. lustre-OSTnnnn-osc and lustre-mdtlov_UUID):
 
Changed:
<
<
$ sh test-create.sh Workspace Factory Service: https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "sample-workspace.xml" Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"... done.

Workspace created: id 6 eth0

Association
private IP address: 172.21.1.200
Hostname
canfardev00 Gateway: 172.21.1.1

Start time: Mon Jul 20 13:53:39 PDT 2009

Duration
30 minutes. Shutdown time: Mon Jul 20 14:23:39 PDT 2009 Termination time: Mon Jul 20 14:33:39 PDT 2009

Wrote EPR to "test.epr"

Waiting for updates.

"http://example1/localhost/image" state change: Unstaged --> Propagated "http://example1/localhost/image" state change: Propagated --> Running

>
>
[crlb@elephant01 ~]$ lctl dl | grep osc 5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5 6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5 8 IN osc lustre-OST0003-osc lustre-mdtlov_UUID 5 11 UP osc lustre-OST0000-osc-ffff8803bd5ab400 a91b4601-8f1d-5061-2175-7ac02693cc0f 5 12 UP osc lustre-OST0001-osc-ffff8803bd5ab400 a91b4601-8f1d-5061-2175-7ac02693cc0f 5 13 UP osc lustre-OST0002-osc-ffff8803bd5ab400 a91b4601-8f1d-5061-2175-7ac02693cc0f 5 14 UP osc lustre-OST0003-osc-ffff8803bd5ab400 a91b4601-8f1d-5061-2175-7ac02693cc0f 5 [crlb@elephant01 ~]$
 
Changed:
<
<
Oh it worked! Neat. Now let's kill the VM.
>
>
  1. To Re-activate OST0003 from the above list issue:
 
Changed:
<
<
$ workspace --destroy -e test.epr

Destroying workspace 6 @ "https://204.174.103.121:8443/wsrf/services/WorkspaceService"... destroyed.

>
>
[crlb@elephant01 ~]$ sudo lctl --device 8 activate [sudo] password for crlb: [crlb@elephant01 ~]$
 
Changed:
<
<
Great! Now we're done! Other things to do now are add machines to the list of vmm-pools and network-pools.

Troubleshooting

If you encounter dhcp problems. Check /etc/dhcpd.conf on the worker nodes and make sure you are listening on the correct subnet(s).

If you encounter an ebtables problem. You can try a patched version of ebtables. See This page for details.

-- PatrickArmstrong - 16 Jul 2009

-- PatrickArmstrong - 2009-09-04

and the backend tools to /opt/nimbus. Both of these directories are owned by nimbus, and need to be created by root. Additionally, /opt/nimbus is NFS mounted on the worker nodes. This install uses a custom build of Nimbus that has features and fixes that are not in a release at this time. In the future, this install should be done from the latest release on the Nimbus website.

>
>
The "lctl dl | grep osc" command can be used to check the change in status.

Revision 182010-04-15 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11
Changed:
<
<

Nimbus install on Elephant

>
>

Nimbus 2.3 install on Elephant

 Original document written by PatrickArmstrong. Modified for initial implementation of the split configuration on the elephant cluster. In this configuration, there are eleven nodes, with the head node residing on elephant01. During initial deployment the first three nodes will be used to develop cluster management, while the remaining nodes, elephant04 to elehant11 will be used for an interim cloud cluster, with elephant11 acting as its head node.

Line: 15 to 15
 
  1. Test Web Services container.
  2. Automate Web Services start/stop.
  3. Install Nimbus.
Changed:
<
<
  1. Worker Node Setup
>
>
  1. Install Nimbus Workspace Control Agents
 
  1. Worker Node Setup.
  2. Auto-configure and Test Nimbus.

Revision 172010-02-25 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11
Changed:
<
<

Nimbus Install on Elephant Cluster

>
>

Nimbus install on Elephant

 Original document written by PatrickArmstrong. Modified for initial implementation of the split configuration on the elephant cluster. In this configuration, there are eleven nodes, with the head node residing on elephant01. During initial deployment the first three nodes will be used to develop cluster management, while the remaining nodes, elephant04 to elehant11 will be used for an interim cloud cluster, with elephant11 acting as its head node.

Changed:
<
<

Overview

>
>

Overview

 
  1. Create privileged user "nimbus" on all cluster nodes.
  2. Switch to user "nimbus" and create public/private keys.
  3. Download all required packages.
Line: 16 to 16
 
  1. Automate Web Services start/stop.
  2. Install Nimbus.
  3. Worker Node Setup
Changed:
<
<
    1. Verify password-less access to worker nodes:
    2. Ensure Xen is installed and running:
    3. If Xen is not installed: Setting
    4. _Xen configuration change for temporary Nimbus networking bug workaround
    5. Install ebtables.
    6. Install the back-end tools (Nimbus control agents) in /opt/nimbus:
    7. Install and configure DHCP.
>
>
  1. Worker Node Setup.
  2. Auto-configure and Test Nimbus.
 

Step 1: Create privileged user "nimbus" on all cluster nodes.

Line: 330 to 325
 [nimbus@elephant10 ~]$ ssh e11 Last login: Tue Feb 16 13:54:41 2010 from elephant01.admin [nimbus@elephant11 ~]$ exit
Changed:
<
<
[nimbus@elephant10 ~]$
>
>
[nimbus@elephant10 ~]$1
 
Line: 350 to 345
 
Changed:
<
<

Xen configuration change for temporary Nimbus networking bug workaround

[nimbus@elephant10 ~]$ sudo vi /etc/xen/xend-config.sxp

Find the line:

#(xend-tcp-xmlrpc-server no)

and change it to::

(xend-http-server yes)

Save the change and restart the Xen daemon:

[nimbus@elephant10 ~]$ sudo /etc/init.d/xend restart

End of workaround

>
>
Apply Nimbus networking bug workaround
[nimbus@elephant10 ~]$ sudo vi /etc/xen/xend-config.sxp
Find the line:
#(xend-tcp-xmlrpc-server no)
and change it to:
(xend-http-server yes)
Save the change and restart the Xen daemon:
[nimbus@elephant10 ~]$ sudo /etc/init.d/xend restart
  Install ebtables. For the x86_64 kernels on elephant:
Line: 381 to 366
 

Mount Nimbus Control Agents directory.

Added:
>
>
[nimbus@elephant10 tmp]$ sudo mkdir -p /opt/versions
[nimbus@elephant10 tmp]$ sudo echo 'elephant11.admin:/opt/versions  /opt/versions nfs defaults 0 0' >>/etc/fstab
[nimbus@elephant10 tmp]$ sudo mount -a
[nimbus@elephant10 tmp]$ sudo ln -s /opt/versions/nimbus-2.3-a4b265d /opt/nimbus
  Install and configure DHCP.
Changed:
<
<
[nimbus@elephant11 nimbus]$ sudo yum install dhcp [nimbus@elephant11 nimbus]$ sudo cp ./share/workspace-control/dhcpd.conf.example /etc/dhcpd.conf [nimbus@elephant11 nimbus]$ sudo vi /etc/dhcpd.conf
>
>
[nimbus@elephant11 tmp]$ sudo yum install dhcp [nimbus@elephant11 tmp]$ sudo cp /opt/nimbus/share/workspace-control/dhcpd.conf.example /etc/dhcpd.conf [nimbus@elephant11 tmp]$ sudo vi /etc/dhcpd.conf

Modify the /etc/libvirt/libvirtd.conf by setting the following values:

unix_sock_group = "nimbus"
unix_sock_rw_perms = "0777"
 
Changed:
<
<
defining appropriate subnet, e.g.:
>
>
Restart libvirt:
 
Changed:
<
<
subnet 10.200.200.0 netmask 255.255.255.0 { }
>
>
[nimbus@elephant11 tmp]$ sudo /etc/init.d/libvirtd restart
 
Changed:
<
<
Configure libvert:
>
>

Step 12: Auto-configure and Test Nimbus.

 
Changed:
<
<

Adding Node to Nimbus Config

>
>
Add a Node to the Nimbus configuration
  This should be done after you've already installed Nimbus on the head node. If you haven't done that yet, come back to this section.

Edit $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools/canfardevpool to add the new node. Your file should look something like this:

#Some comments up here
Changed:
<
<
gildor 3072 guilin 3072
>
>
e10 3072
 
Changed:
<
<

Step 11: Auto-configure and Test Nimbus.Worker Node Setup

>
>
 The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Revision 162010-02-25 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 380 to 380
 [nimbus@elephant10 tmp]$ rm ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm
Added:
>
>
Mount Nimbus Control Agents directory.
  Install and configure DHCP.

Revision 152010-02-24 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 6 to 6
 

Overview

Changed:
<
<
  1. Create privileged "nimbus" user on all cluster nodes.
  2. Switch to nimbus user for the remainder of the installation and create public/private keys.
  3. Download all required packages.
  4. Switch to Nimbus head node and install prerequisites
  5. Install Globus ws-core Web Services container.
  6. Install required X509 grid certificates.
  7. Test the Web Services container.
  8. Automate the Web Services start/stop.
  9. Install Nimbus.
  10. Setting Up Worker Nodes.
    • Xen configuration change for temporary Nimbus networking bug workaround
>
>
  1. Create privileged user "nimbus" on all cluster nodes.
  2. Switch to user "nimbus" and create public/private keys.
  3. Download all required packages.
  4. Install head node prerequisites.
  5. Install Globus ws-core Web Services container.
  6. Install X509 grid certificates.
  7. Test Web Services container.
  8. Automate Web Services start/stop.
  9. Install Nimbus.
  10. Worker Node Setup
    1. Verify password-less access to worker nodes:
    2. Ensure Xen is installed and running:
    3. If Xen is not installed: Setting
    4. _Xen configuration change for temporary Nimbus networking bug workaround
    5. Install ebtables.
    6. Install the back-end tools (Nimbus control agents) in /opt/nimbus:
    7. Install and configure DHCP.
 
Changed:
<
<

Step 1: Create privileged "nimbus" user on all cluster nodes.

>
>

Step 1: Create privileged user "nimbus" on all cluster nodes.

  Create the nimbus account on elephant head node (elephant01) with the required sudo privileges.
Line: 50 to 56
 

Changed:
<
<

Step 2: Switch to nimbus user for the remainder of the installation and create public/private keys.

>
>

Step 2: Switch to user "nimbus" and create public/private keys.

 
[crlb@elephant01 ~]$ sudo su - nimbus
Password: ********
Line: 82 to 88
 

Changed:
<
<

Step 4: Switch to Nimbus head node and install prerequisites.

>
>

Step 4: Install head node prerequisites.

 
[nimbus@elephant01 ~]$ ssh e11
The authenticity of host 'e11 (10.200.200.11)' can't be established.
Line: 104 to 110
 [nimbus@elephant11 local]$ sudo tar -xjvf ~nimbus/Downloads/apache-ant-1.8.0-bin.tar.bz2
Deleted:
<
<
Create a home for nimbus/globus ws-core
[nimbus@elephant11 local]$ sudo mkdir nimbus-2.3
[nimbus@elephant11 local]$ sudo chown nimbus.nimbus nimbus-2.3
[nimbus@elephant11 local]$ sudo ln -s nimbus-2.3 nimbus

and a home for nimbus worker node control software

[nimbus@elephant11 local]$ cd /opt
[nimbus@elephant11 opt]$ sudo mkdir nimbus-2.3
[nimbus@elephant11 opt]$ sudo chown nimbus.nimbus imbus-2.3
[nimbus@elephant11 opt]$ sudo ln -s nimbus-2.3 nimbus
 

Step 5: Install Globus ws-core Web Services container.

Line: 123 to 113
 

Step 5: Install Globus ws-core Web Services container.

Deleted:
<
<
 The frontend tools (globus ws-core and nimbus) will be Installed to /usr/local/nimbus:
Changed:
<
<
[nimbus@elephant11 ~]$ cd /usr/local/nimbus
>
>
[nimbus@elephant11 local]$ sudo mkdir -p versions/nimbus-2.3 [nimbus@elephant11 local]$ sudo chown nimbus.nimbus versions/nimbus-2.3 [nimbus@elephant11 local]$ sudo ln -s versions/nimbus-2.3 nimbus [nimbus@elephant11 local]$ cd nimbus
 [nimbus@elephant11 nimbus]$ tar -xzf ~/Downloads/ws-core-4.0.8-bin.tar.gz [nimbus@elephant11 nimbus]$ mv ws-core-4.0.8/* . [nimbus@elephant11 nimbus]$ rmdir ws-core-4.0.8
Line: 149 to 140
 

Changed:
<
<

Step 6: Install required X509 grid certificates.

>
>

Step 6: Install X509 grid certificates.

  Make our certificates directory and put the grid canada root certificates in there.
Line: 208 to 199
 

Changed:
<
<

Step 7: Test the Web Services container.

>
>

Step 7: Test Web Services container.

  Now that we've set up security, we can try starting our container for the first time. To do so, run globus-start-container. You should see something like the following:
Line: 234 to 225
 

Changed:
<
<

Step 8: Automate the Web Services start/stop.

>
>

Step 8: Automate Web Services start/stop.

  Now that we know our container works, we can create a script to run our container at login. Paste the following script into $GLOBUS_LOCATION/bin/globus-start-stop:
#!/bin/sh
Line: 299 to 290
 [nimbus@elephant11 tmp]$ /tmp/nimbus-2.3/bin/all-build-and-install.sh [nimbus@elephant11 tmp]$ rm -rf /tmp/nimbus-2.3/ [nimbus@elephant11 tmp]$ rmdir hsperfdata_nimbus
Deleted:
<
<
[nimbus@elephant11 tmp]$ cd [nimbus@elephant11 ~]$
 

Changed:
<
<

Step 10: Setting Up Worker Nodes

>
>

Step 10: Install Nimbus Workspace Control Agents

The back-end tools (Nimbus control agents) will be installed in /opt/nimbus:

[nimbus@elephant11 tmp]$ cd /opt
[nimbus@elephant11 opt]$ sudo mkdir -p versions/nimbus-2.3
[nimbus@elephant11 opt]$ sudo chown nimbus.nimbus versions/nimbus-2.3
[nimbus@elephant11 opt]$ sudo ln -s versions/nimbus-2.3 nimbus
[nimbus@elephant11 opt]$ cd nimbus
[nimbus@elephant11 nimbus]$ tar -xzvf ~/Downloads/nimbus-2.3/nimbus-controls-2.3.tar.gz
[nimbus@elephant11 nimbus]$ mv ./nimbus-controls-2.3/workspace-control/* .

Export the Nimbus control agent directory to the worker nodes by adding the following to /etc/exports:

/opt/versions                   elephant*.admin(rw,no_root_squash)

Then issue:

[nimbus@elephant11 nimbus]$ sudo exportfs -a

Step 11: Worker Node Setup

  Verify password-less access to worker nodes:
Line: 316 to 329
 Last login: Tue Feb 16 13:55:22 2010 from elephant11.admin [nimbus@elephant10 ~]$ ssh e11 Last login: Tue Feb 16 13:54:41 2010 from elephant01.admin
Changed:
<
<
[nimbus@elephant11 ~]$ nimbus@canfardev $
>
>
[nimbus@elephant11 ~]$ exit [nimbus@elephant10 ~]$
 

Ensure Xen is installed and running:

Changed:
<
<
[nimbus@elephant11 ~]$ sudo xm list
>
>
[nimbus@elephant10 ~]$ sudo xm list
 Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 23690 16 r----- 1410.6
Changed:
<
<
[nimbus@elephant11 ~]$
>
>
[nimbus@elephant10 ~]$
 

If Xen is not installed:

Changed:
<
<
[nimbus@elephant11 ~]$ sudo yum install xen kernel-xen [nimbus@elephant11 ~]$ sudo chkconfig xend on [nimbus@elephant11 ~]$ sudo shutdown -r now
>
>
[nimbus@elephant10 ~]$ sudo yum install xen kernel-xen [nimbus@elephant10 ~]$ sudo chkconfig xend on [nimbus@elephant10 ~]$ sudo shutdown -r now
 

Xen configuration change for temporary Nimbus networking bug workaround

Changed:
<
<
[nimbus@elephant11 ~]$ sudo vi /etc/xen/xend-config.sxp
>
>
[nimbus@elephant10 ~]$ sudo vi /etc/xen/xend-config.sxp
 

Find the line:

Line: 354 to 367
  Save the change and restart the Xen daemon:
Changed:
<
<
[nimbus@elephant11 ~]$ sudo /etc/init.d/xend restart
>
>
[nimbus@elephant10 ~]$ sudo /etc/init.d/xend restart
 

End of workaround

Install ebtables. For the x86_64 kernels on elephant:

Changed:
<
<
[nimbus@elephant11 ~]$ cd /tmp [nimbus@elephant11 tmp]$ wget --no-check-certificate https://wiki.gridx1.ca/twiki/pub/HEPrc/GVWandEBTables/ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm [nimbus@elephant11 tmp]$ sudo rpm -ivhf ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm [nimbus@elephant11 tmp]$ rm ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm

Install dhcp.

[nimbus@elephant11 tmp]$ sudo yum install dhcp
>
>
[nimbus@elephant10 ~]$ cd /tmp [nimbus@elephant10 tmp]$ wget --no-check-certificate https://wiki.gridx1.ca/twiki/pub/HEPrc/GVWandEBTables/ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm [nimbus@elephant10 tmp]$ sudo rpm -ivhf ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm [nimbus@elephant10 tmp]$ rm ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm
 
Deleted:
<
<
Install the back-end tools (Nimbus control agents) in /opt/nimbus:
[nimbus@elephant11 tmp]$ cd /opt/nimbus
[nimbus@elephant11 nimbus]$ tar -xzvf ~/Downloads/nimbus-2.3/nimbus-controls-2.3.tar.gz
[nimbus@elephant11 nimbus]$ mv ./nimbus-controls-2.3/workspace-control/* .
 
Changed:
<
<
Configure DHCP:
>
>
Install and configure DHCP.
 
Added:
>
>
[nimbus@elephant11 nimbus]$ sudo yum install dhcp
 [nimbus@elephant11 nimbus]$ sudo cp ./share/workspace-control/dhcpd.conf.example /etc/dhcpd.conf [nimbus@elephant11 nimbus]$ sudo vi /etc/dhcpd.conf
Changed:
<
<
defining one subnet as:
>
>
defining appropriate subnet, e.g.:
 
subnet 10.200.200.0 netmask 255.255.255.0 {
}
Changed:
<
<
The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.
>
>
Configure libvert:
 

Adding Node to Nimbus Config

Line: 405 to 407
 guilin 3072
Added:
>
>

Step 11: Auto-configure and Test Nimbus.Worker Node Setup

The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.
 Your worker node should now be ready!

Now run the auto-configuration program. Following is a transcript of running this program on canfardev:

Revision 142010-02-24 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 362 to 362
 Install ebtables. For the x86_64 kernels on elephant:
[nimbus@elephant11 ~]$ cd /tmp
Changed:
<
<
[nimbus@elephant11 tmp]$ wget https://wiki.gridx1.ca/twiki/pub/HEPrc/GVWandEBTables/ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm [nimbus@elephant11 tmp]$ rpm -ivhf ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm
>
>
[nimbus@elephant11 tmp]$ wget --no-check-certificate https://wiki.gridx1.ca/twiki/pub/HEPrc/GVWandEBTables/ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm [nimbus@elephant11 tmp]$ sudo rpm -ivhf ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm
 [nimbus@elephant11 tmp]$ rm ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm

Revision 132010-02-22 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Original document written by PatrickArmstrong. Modified for initial implementation of the split configuration on the elephant cluster. In this configuration, there are eleven nodes, with the head node residing on elephant01. During initial deployment the first three nodes will be used to develop cluster management, while the remaining nodes, elephant04 to elehant11 will be used for an interim cloud cluster, with elephant11 acting as its head node.
Added:
>
>
 

Overview

Changed:
<
<
  1. Create privileged "nimbus" user on all cluster nodes.
  2. Switch to nimbus user for the remainder of the installation and create public/private keys.
  3. Download all required packages.
  4. Switch to Nimbus head node and install prerequisites
  5. Install Globus ws-core Web Services container.
  6. Install required X509 grid certificates.
  7. Test the Web Services container.
  8. Automate the Web Services start/stop.
  9. Install Nimbus.
  10. Setting Up Worker Nodes.
>
>
  1. Create privileged "nimbus" user on all cluster nodes.
  2. Switch to nimbus user for the remainder of the installation and create public/private keys.
  3. Download all required packages.
  4. Switch to Nimbus head node and install prerequisites
  5. Install Globus ws-core Web Services container.
  6. Install required X509 grid certificates.
  7. Test the Web Services container.
  8. Automate the Web Services start/stop.
  9. Install Nimbus.
  10. Setting Up Worker Nodes.
 
    • Xen configuration change for temporary Nimbus networking bug workaround
Changed:
<
<

Step 1: Create privileged "nimbus" user on all cluster nodes.

>
>

Step 1: Create privileged "nimbus" user on all cluster nodes.

  Create the nimbus account on elephant head node (elephant01) with the required sudo privileges.
Line: 47 to 49
 [crlb@elephant01 ~]$ sudo /usr/local/sbin/usync
Changed:
<
<

Step 2: Switch to nimbus user for the remainder of the installation and create public/private keys.

>
>

Step 2: Switch to nimbus user for the remainder of the installation and create public/private keys.

 
[crlb@elephant01 ~]$ sudo su - nimbus
Password: ********
Line: 66 to 69
 [nimbus@elephant01 ~]$
Changed:
<
<

Step 3: Download all required packages.

>
>

Step 3: Download all required packages.

 
[nimbus@elephant01 ~]$ mkdir -p Downloads/nimbus-2.3
[nimbus@elephant01 ~]$ cd Downloads/nimbus-2.3
Line: 77 to 81
 [nimbus@elephant01 nimbus-2.3]$ wget http://www.gridcanada.ca/ca/bffbd7d0.0
Changed:
<
<

Step 4: Switch to Nimbus head node and install prerequisites.

>
>

Step 4: Switch to Nimbus head node and install prerequisites.

 
[nimbus@elephant01 ~]$ ssh e11
The authenticity of host 'e11 (10.200.200.11)' can't be established.
Line: 114 to 119
 [nimbus@elephant11 opt]$ sudo ln -s nimbus-2.3 nimbus
Changed:
<
<

Step 5: Install Globus ws-core Web Services container.

>
>

Step 5: Install Globus ws-core Web Services container.

 
Line: 141 to 148
 [nimbus@elephant11 nimbus]$ touch $GLOBUS_LOCATION/share/grid-mapfile
Changed:
<
<

Step 6: Install required X509 grid certificates.

>
>

Step 6: Install required X509 grid certificates.

  Make our certificates directory and put the grid canada root certificates in there.
Line: 199 to 207
 
Changed:
<
<

Step 7: Test the Web Services container.

>
>

Step 7: Test the Web Services container.

  Now that we've set up security, we can try starting our container for the first time. To do so, run globus-start-container. You should see something like the following:
Line: 224 to 233
 and put the bffbd7d0.0 file into the X509_CERT_DIR and try starting the container again.
Changed:
<
<

Step 8: Automate the Web Services start/stop.

>
>

Step 8: Automate the Web Services start/stop.

  Now that we know our container works, we can create a script to run our container at login. Paste the following script into $GLOBUS_LOCATION/bin/globus-start-stop:
#!/bin/sh
Line: 279 to 289
 [nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop stop
Changed:
<
<

Step 9: Install Nimbus

>
>

Step 9: Install Nimbus

  Unpack the nimbus package into a temporary directory and run the install script. When the install script completes, Nimbus will have been installed in $GLOBUS_LOCATION, and the temporary files can be removed.
Line: 292 to 303
 [nimbus@elephant11 ~]$
Changed:
<
<

Step 10: Setting Up Worker Nodes

>
>

Step 10: Setting Up Worker Nodes

  Verify password-less access to worker nodes:

Revision 122010-02-19 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 14 to 14
 
  1. Test the Web Services container.
  2. Automate the Web Services start/stop.
  3. Install Nimbus.
Deleted:
<
<
  1. Xen configuration change for temporary Nimbus networking bug workaround
 
  1. Setting Up Worker Nodes.
Added:
>
>
    • Xen configuration change for temporary Nimbus networking bug workaround
 

Step 1: Create privileged "nimbus" user on all cluster nodes.

Line: 292 to 292
 [nimbus@elephant11 ~]$
Changed:
<
<

Step 11: Setting Up Worker Nodes

>
>

Step 10: Setting Up Worker Nodes

  Verify password-less access to worker nodes:
Line: 347 to 347
 

End of workaround

Changed:
<
<
Install ebtables and dhcp. Do this by first enabling the DAG repository and installing with yum:
>
>
Install ebtables. For the x86_64 kernels on elephant:
 
Changed:
<
<
[nimbus@elephant11 ~]$ sudo rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-.3.6-1.el5.rf.i386.rpm [nimbus@elephant11 ~]$ sudo yum install ebtables dhcp
>
>
[nimbus@elephant11 ~]$ cd /tmp [nimbus@elephant11 tmp]$ wget https://wiki.gridx1.ca/twiki/pub/HEPrc/GVWandEBTables/ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm [nimbus@elephant11 tmp]$ rpm -ivhf ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm [nimbus@elephant11 tmp]$ rm ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm

Install dhcp.

[nimbus@elephant11 tmp]$ sudo yum install dhcp
 

Install the back-end tools (Nimbus control agents) in /opt/nimbus:

Changed:
<
<
[nimbus@elephant11 ~]$ cd /opt/nimbus
>
>
[nimbus@elephant11 tmp]$ cd /opt/nimbus
 [nimbus@elephant11 nimbus]$ tar -xzvf ~/Downloads/nimbus-2.3/nimbus-controls-2.3.tar.gz [nimbus@elephant11 nimbus]$ mv ./nimbus-controls-2.3/workspace-control/* .

Revision 112010-02-17 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 292 to 292
 [nimbus@elephant11 ~]$
Deleted:
<
<

Step 10 Xen configuration change for temporary Nimbus networking bug workaround

[nimbus@elephant11 ~]$ sudo vi /etc/xen/xend-config.sxp

Find the line:

#(xend-tcp-xmlrpc-server no)

and change it to::

(xend-http-server yes)

Save the change and restart the Xen daemon:

[nimbus@elephant11 ~]$ sudo /etc/init.d/xend restart
 

Step 11: Setting Up Worker Nodes

Verify password-less access to worker nodes:

Line: 345 to 324
 [nimbus@elephant11 ~]$ sudo shutdown -r now
Added:
>
>

Xen configuration change for temporary Nimbus networking bug workaround

[nimbus@elephant11 ~]$ sudo vi /etc/xen/xend-config.sxp

Find the line:

#(xend-tcp-xmlrpc-server no)

and change it to::

(xend-http-server yes)

Save the change and restart the Xen daemon:

[nimbus@elephant11 ~]$ sudo /etc/init.d/xend restart

End of workaround

 Install ebtables and dhcp. Do this by first enabling the DAG repository and installing with yum:
[nimbus@elephant11 ~]$ sudo rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-.3.6-1.el5.rf.i386.rpm

Revision 102010-02-17 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 14 to 14
 
  1. Test the Web Services container.
  2. Automate the Web Services start/stop.
  3. Install Nimbus.
Added:
>
>
  1. Xen configuration change for temporary Nimbus networking bug workaround
 
  1. Setting Up Worker Nodes.

Step 1: Create privileged "nimbus" user on all cluster nodes.

Line: 291 to 292
 [nimbus@elephant11 ~]$
Changed:
<
<

Step 10: Setting Up Worker Nodes

>
>

Step 10 Xen configuration change for temporary Nimbus networking bug workaround

[nimbus@elephant11 ~]$ sudo vi /etc/xen/xend-config.sxp

Find the line:

#(xend-tcp-xmlrpc-server no)

and change it to::

(xend-http-server yes)

Save the change and restart the Xen daemon:

[nimbus@elephant11 ~]$ sudo /etc/init.d/xend restart

Step 11: Setting Up Worker Nodes

  Verify password-less access to worker nodes:

Revision 92010-02-17 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 123 to 123
 [nimbus@elephant11 nimbus]$ tar -xzf ~/Downloads/ws-core-4.0.8-bin.tar.gz [nimbus@elephant11 nimbus]$ mv ws-core-4.0.8/* . [nimbus@elephant11 nimbus]$ rmdir ws-core-4.0.8
Added:
>
>
[nimbus@elephant11 nimbus]$ mkdir var
 

Set environment variables. Example assumes bash as the nimbus user's shell:

Revision 82010-02-17 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 14 to 14
 
  1. Test the Web Services container.
  2. Automate the Web Services start/stop.
  3. Install Nimbus.
Added:
>
>
  1. Setting Up Worker Nodes.
 

Step 1: Create privileged "nimbus" user on all cluster nodes.

Revision 72010-02-16 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 9 to 9
 
  1. Switch to nimbus user for the remainder of the installation and create public/private keys.
  2. Download all required packages.
  3. Switch to Nimbus head node and install prerequisites
Changed:
<
<
  1. Install Globus ws-core Web Services container and required X509 certificates.
>
>
  1. Install Globus ws-core Web Services container.
  2. Install required X509 grid certificates.
 
  1. Test the Web Services container.
Added:
>
>
  1. Automate the Web Services start/stop.
  2. Install Nimbus.
 

Step 1: Create privileged "nimbus" user on all cluster nodes.

Line: 57 to 60
 The key fingerprint is: e4:43:60:84:0c:ea:dc:02:dd:4b:93:fd:f4:e4:38:e8 nimbus@elephant01.heprc.uvic.ca [nimbus@elephant01 ~]$ cp .ssh/id_rsa.pub .ssh/authorized_keys
Added:
>
>
[nimbus@elephant01 ~]$ chmod 600 .ssh/authorized_keys
 [nimbus@elephant01 ~]$
Line: 108 to 112
 [nimbus@elephant11 opt]$ sudo ln -s nimbus-2.3 nimbus
Changed:
<
<

Step 5: Install Globus ws-core Web Services container and required X509 certificates.

>
>

Step 5: Install Globus ws-core Web Services container.

 
Changed:
<
<
The remainder of this procedure is done entirely as the nimbus user. The frontend tools will be installed to /usr/local/nimbus, and the backend tools to /opt/nimbus. Both of these directories are owned by nimbus, and need to be created by root. Additionally, /opt/nimbus is NFS mounted on the worker nodes. This install uses a custom build of Nimbus that has features and fixes that are not in a release at this time. In the future, this install should be done from the latest release on the Nimbus website.

This install will install the minimum set of utilities to run Nimbus. Nimbus needs to run in a Globus Container to work, and we will install the bare essentials of the Globus container, and will use a cert from Grid Canada. If you need the full set of globus utilities, please refer to the instructions on the GridX1 Wiki, and skip "Installing the Webservice Core" on this page.

>
>
The frontend tools (globus ws-core and nimbus) will be Installed to /usr/local/nimbus:
 
[nimbus@elephant11 ~]$ cd /usr/local/nimbus
[nimbus@elephant11 nimbus]$ tar -xzf ~/Downloads/ws-core-4.0.8-bin.tar.gz
Line: 124 to 124
 [nimbus@elephant11 nimbus]$ rmdir ws-core-4.0.8
Changed:
<
<
Create an empty grid-mapfile. This file will contain the certificate subjects of the users of your cloud-enabled cluster.
[nimbus@elephant11 nimbus]$  touch /usr/local/nimbus/share/grid-mapfile

Now set our environment variables. Example assumes bash as the nimbus user's shell:

>
>
Set environment variables. Example assumes bash as the nimbus user's shell:
 
[nimbus@elephant11 nimbus]$ cd
[nimbus@elephant11 ~]$  echo "export GLOBUS_LOCATION=/usr/local/nimbus" >> .bashrc
Line: 138 to 133
 [nimbus@elephant11 ~]$ . .bashrc
Changed:
<
<

Certificates

>
>
Create an empty grid-mapfile to contain the certificate subjects of the users of the cloud-enabled cluster.
[nimbus@elephant11 nimbus]$  touch $GLOBUS_LOCATION/share/grid-mapfile

Step 6: Install required X509 grid certificates.

 
Changed:
<
<
Now we can set up the certificates. We're going to put them in our $X509_CERT_DIR . First, we make our certificates directory and put the grid canada root certificates in there.
>
>
Make our certificates directory and put the grid canada root certificates in there.
 
[nimbus@elephant11 ~]$ mkdir -p $X509_CERT_DIR
Changed:
<
<
[nimbus@elephant11 ~]$ cd $X509_CERT_DIR [nimbus@elephant11 ~]$ wget http://www.gridcanada.ca/ca/bffbd7d0.0
>
>
[nimbus@elephant11 ~]$ mv Downloads/nimbus-2.3/bffbd7d0.0 $X509_CERT_DIR/
 

Then create a host certificate request to send to our CA.

Line: 169 to 168
  The private key is stored in /usr/local/nimbus/share/certificates/hostkey.pem The request is stored in /usr/local/nimbus/share/certificates/hostcert_request.pem
Deleted:
<
<
 
Changed:
<
<
Now mail this request file (/usr/local/nimbus/share/certificates/hostcert_request.pem) to ca@gridcanada.ca . It might take a day or so before you get your certificate back.

Once you have your key, paste the contents into /usr/local/nimbus/share/certificates/hostcert.pem

>
>
Now mail this request file ($X509_CERT_DIR/hostcert_request.pem) to ca@gridcanada.ca . It might take a day or so before you get your certificate back.
 
Changed:
<
<
Now that we have our certificate, we have to point our container to our key and certificate and to our empty grid-mapfile. To do so, edit $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml to point to your new certificates and modify the gridmap value:
>
>
Once you have your key, paste the contents into $X509_CERT_DIR/hostcert.pem
 
Added:
>
>
Now that we have our certificate, we have to point our container to our key and certificate and to our empty grid-mapfile. To do so, edit $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml to point to your new certificates and modify the gridmap value:
 
[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml 
<?xml version="1.0" encoding="UTF-8"?>
Line: 191 to 187
 
Changed:
<
<
Now we'll activate our security configuration by adding a element under the CONTAINER_SECURITY_DESCRIPTOR:
>
>
Activate the security configuration by adding a element under the CONTAINER_SECURITY_DESCRIPTOR:
 
[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
<!-- @CONTAINER_SECURITY_DESCRIPTOR@ -->
Line: 200 to 196
 
Changed:
<
<

Step 6: Test the Web Services container.

Testing our Container

>
>

Step 7: Test the Web Services container.

  Now that we've set up security, we can try starting our container for the first time. To do so, run globus-start-container. You should see something like the following:
Deleted:
<
<
 
[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-container
Starting SOAP server at: https://204.174.103.121:8443/wsrf/services/ 
Line: 226 to 220
 You are probably missing the Grid Canada .0 file(bffbd7d0.0 in this case). Either copy the file from another globus machine's X509_CERT_DIR or download the GC CA Bundle from the GC Certificate Authority website. and put the bffbd7d0.0 file into the X509_CERT_DIR and try starting the container again.
Deleted:
<
<

Automate Startup of Container

 
Changed:
<
<
Now that we know our container works, we can create a script to run our container at login. Paste the following script into $GLOBUS_LOCATION/bin/globus-start-stop:
>
>

Step 8: Automate the Web Services start/stop.

 
Added:
>
>
Now that we know our container works, we can create a script to run our container at login. Paste the following script into $GLOBUS_LOCATION/bin/globus-start-stop:
 
#!/bin/sh
set -e
export GLOBUS_OPTIONS="-Xms256M -Xmx1024M -Dorg.globus.tcp.port.range=50000,51999"
Line: 265 to 259
 
Changed:
<
<
Then mark it as executable:

[nimbus@elephant11 ~]$ chmod 744 $GLOBUS_LOCATION/bin/globus-start-stop
>
>
Mark it as executable:
[nimbus@elephant11 ~]
$ chmod 744 $GLOBUS_LOCATION/bin/globus-start-stop
 

We can now try starting and stopping the container with this script, and see if we're listening on 8443:

Line: 271 to 265
 

We can now try starting and stopping the container with this script, and see if we're listening on 8443:

Deleted:
<
<
 
[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop start
Changed:
<
<
$ netstat -an | grep 8443
>
>
[nimbus@elephant11 ~]$ netstat -an | grep 8443
 tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN
Line: 284 to 276
 [nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop stop
Added:
>
>

Step 9: Install Nimbus

 
Changed:
<
<

Install Nimbus

Unpack the nimbus package and run the install script.
>
>
Unpack the nimbus package into a temporary directory and run the install script. When the install script completes, Nimbus will have been installed in $GLOBUS_LOCATION, and the temporary files can be removed.
 
[nimbus@elephant11 ~]$ cd /tmp
[nimbus@elephant11 tmp]$ tar -xvf ~/Downloads/nimbus-2.3.tar.gz
[nimbus@elephant11 tmp]$ /tmp/nimbus-2.3/bin/all-build-and-install.sh
[nimbus@elephant11 tmp]$ rm -rf /tmp/nimbus-2.3/
[nimbus@elephant11 tmp]$ rmdir hsperfdata_nimbus
Added:
>
>
[nimbus@elephant11 tmp]$ cd [nimbus@elephant11 ~]$
 
Added:
>
>

Step 10: Setting Up Worker Nodes

 
Changed:
<
<

Setting Up Worker Nodes

Setting up passwordless access to worker nodes

Nimbus needs to be able to ssh without a password from the head node to the worker nodes and vice versa. This is for sending commands back and forth. The following setup assumes you have the nimbus home directory mounted over NFS between the head node and the worker nodes. If you don't you'll just need to copy the .ssh directory on the head node to the nimbus home directory on each worker.

$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
9c:75:52:2f:d9:bd:5a:05:43:ee:3f:b2:83:cc:f2:0b nimbus@canfardev.dao.nrc.ca
$ cd ~/.ssh
$ cp id_rsa.pub authorized_keys
$ chmod 600 authorized_keys

Now test it:

nimbus@canfardev $ ssh gildor
nimbus@ gildor $ ssh canfardev.dao.nrc.ca
>
>
Verify password-less access to worker nodes:
[nimbus@elephant11 ~]$ ssh e10
The authenticity of host 'e10 (10.200.200.10)' can't be established.
RSA key fingerprint is 81:ba:c1:49:1f:a5:22:30:60:a6:b8:ba:19:0b:38:2c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'e10,10.200.200.10' (RSA) to the list of known hosts.
Last login: Tue Feb 16 13:55:22 2010 from elephant11.admin
[nimbus@elephant10 ~]$ ssh e11
Last login: Tue Feb 16 13:54:41 2010 from elephant01.admin
[nimbus@elephant11 ~]$ 
 nimbus@canfardev $
Deleted:
<
<
Great. It works. You may be asked to authorize a new host key. If so, just answer "yes".

Setting up Xen, ebtables and dhcpd

First, make sure Xen is installed. If it is, you should see something like the following when you run these commands:

 
Added:
>
>
Ensure Xen is installed and running:
 
Changed:
<
<
# which xm /usr/sbin/xm # uname -r 2.6.18-128.1.1.el5xen $ ps aux | grep xen root 21 0.0 0.0 0 0 ? S< 16:34 0:00 [xenwatch] root 22 0.0 0.0 0 0 ? S< 16:34 0:00 [xenbus] root 2549 0.0 0.0 2188 956 ? S 16:35 0:00 xenstored --pid-file /var/run/xenstore.pid root 2554 0.0 0.1 12176 3924 ? S 16:35 0:00 python /usr/sbin/xend start root 2555 0.0 0.1 63484 4836 ? Sl 16:35 0:00 python /usr/sbin/xend start root 2557 0.0 0.0 12212 364 ? Sl 16:35 0:00 xenconsoled --log none --timestamp none --log-dir /var/log/xen/console
>
>
[nimbus@elephant11 ~]$ sudo xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 23690 16 r----- 1410.6 [nimbus@elephant11 ~]$
 
Changed:
<
<
If it's not installed, you can do so with:
>
>
If Xen is not installed:
 
Changed:
<
<
# yum install xen kernel-xen # chkconfig xend on
>
>
[nimbus@elephant11 ~]$ sudo yum install xen kernel-xen [nimbus@elephant11 ~]$ sudo chkconfig xend on [nimbus@elephant11 ~]$ sudo shutdown -r now
 
Changed:
<
<
Then reboot.

You'll also need to install ebtables (not currently used) and dhcp. Do this by first enabling the DAG repository, then installing with yum:

>
>
Install ebtables and dhcp. Do this by first enabling the DAG repository and installing with yum:
 
Changed:
<
<
# rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm # yum install ebtables dhcp
>
>
[nimbus@elephant11 ~]$ sudo rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-.3.6-1.el5.rf.i386.rpm [nimbus@elephant11 ~]$ sudo yum install ebtables dhcp
 
Changed:
<
<
Now, edit the dhcpd config file. Make sure it looks something like this:

# vim /etc/dhcpd.conf
# dhcpd.conf
#
# Configuration file for ISC dhcpd for workspaces


#################
## GLOBAL OPTS ##
#################

# Option definitions common or default to all supported networks

# Keep this:
ddns-update-style none;

# Can be overriden in host entry:
default-lease-time 120;
max-lease-time 240;


#############
## SUBNETS ##
#############

# Make an entry like this for each supported subnet.  Otherwise, the DHCP
# daemon will not listen for requests on the interface of that subnet.

subnet 172.21.0.0 netmask 255.255.0.0 {
}

### DO NOT EDIT BELOW, the following entries are added and 
### removed programmatically.

### DHCP-CONFIG-AUTOMATIC-BEGINS ###

>
>
Install the back-end tools (Nimbus control agents) in /opt/nimbus:
[nimbus@elephant11 ~]$ cd /opt/nimbus
[nimbus@elephant11 nimbus]$ tar -xzvf ~/Downloads/nimbus-2.3/nimbus-controls-2.3.tar.gz
[nimbus@elephant11 nimbus]$ mv ./nimbus-controls-2.3/workspace-control/* .
 
Changed:
<
<

Setting Up Control Agents

The Nimbus Control Agents are the binaries on the worker node that act on behalf of the head node. They need to be installed on each worker node.

If you've already set up the control agents on one node, you shouldn't need to do the following steps on the other nodes. Just make sure the install directory is NFS mounted.

First, make sure we have the install directory:

>
>
Configure DHCP:
 
Changed:
<
<
# ls /opt/nimbus /opt/nimbus
>
>
[nimbus@elephant11 nimbus]$ sudo cp ./share/workspace-control/dhcpd.conf.example /etc/dhcpd.conf [nimbus@elephant11 nimbus]$ sudo vi /etc/dhcpd.conf
 
Changed:
<
<
Now do the install:
>
>
defining one subnet as:
 
Changed:
<
<
# wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz # tar xzf nimbus-controls-TP2.2.tar.gz # cd nimbus-controls-TP2.2/workspace-control # cp worksp.conf.example /opt/nimbus/worksp.conf # python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus
>
>
subnet 10.200.200.0 netmask 255.255.255.0 { }
 
Added:
>
>
 The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Adding Node to Nimbus Config

Line: 985 to 908
 

-- PatrickArmstrong - 2009-09-04

Added:
>
>
and the backend tools to /opt/nimbus. Both of these directories are owned by nimbus, and need to be created by root. Additionally, /opt/nimbus is NFS mounted on the worker nodes. This install uses a custom build of Nimbus that has features and fixes that are not in a release at this time. In the future, this install should be done from the latest release on the Nimbus website.

Revision 62010-02-16 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install on Elephant Cluster

Line: 8 to 8
 
  1. Create privileged "nimbus" user on all cluster nodes.
  2. Switch to nimbus user for the remainder of the installation and create public/private keys.
  3. Download all required packages.
Changed:
<
<
  1. Switch to temporary head node and install prerequisites
>
>
  1. Switch to Nimbus head node and install prerequisites
  2. Install Globus ws-core Web Services container and required X509 certificates.
  3. Test the Web Services container.
 

Step 1: Create privileged "nimbus" user on all cluster nodes.

Line: 69 to 71
 [nimbus@elephant01 nimbus-2.3]$ wget http://www.gridcanada.ca/ca/bffbd7d0.0
Changed:
<
<
Switch to the interim cloud cluster head node, elephant11 and install java-1.6.0-sun-compat:
>
>

Step 4: Switch to Nimbus head node and install prerequisites.

 
Changed:
<
<
[crlb@elephant01 ~]$ ssh e11 [crlb@elephant11 ~]$ sudo yum -y install java-1.6.0-sun-compat
>
>
[nimbus@elephant01 ~]$ ssh e11 The authenticity of host 'e11 (10.200.200.11)' can't be established. RSA key fingerprint is 7c:92:13:5d:35:59:dd:ca:2e:bd:95:b4:97:ed:f0:97. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'e11,10.200.200.11' (RSA) to the list of known hosts. Last login: Tue Feb 16 09:52:21 2010 from elephant01.admin [nimbus@elephant11 ~]$
 
Changed:
<
<
Install Apache Ant
>
>
Install java-1.6.0-sun-compat:
 
Changed:
<
<
[crlb@elephant11 ~]$ cd /usr/local [crlb@elephant11 local]$ sudo tar -xjvf ~nimbus/Downloads/apache-ant-1.8.0-bin.tar.bz2
>
>
[nimbus@elephant11 ~]$ sudo yum -y install java-1.6.0-sun-compat
 
Changed:
<
<
Create home for nimbus/globus ws-core
>
>
Install Apache Ant
 
Changed:
<
<
[crlb@elephant11 local]$ sudo mkdir nimbus-2.3 [crlb@elephant11 local]$ sudo chown nimbus.nimbus nimbus-2.3 [crlb@elephant11 local]$ sudo ln -s nimbus-2.3 nimbus
>
>
[nimbus@elephant11 ~]$ cd /usr/local [nimbus@elephant11 local]$ sudo tar -xjvf ~nimbus/Downloads/apache-ant-1.8.0-bin.tar.bz2
 
Changed:
<
<
and for Nimbus worker node control software
>
>
Create a home for nimbus/globus ws-core
 
Changed:
<
<
[crlb@elephant11 local]$ sudo mkdir -p /opt/nimbus-2.3 [crlb@elephant11 local]$ sudo chown nimbus.nimbus /opt/nimbus-2.3
>
>
[nimbus@elephant11 local]$ sudo mkdir nimbus-2.3 [nimbus@elephant11 local]$ sudo chown nimbus.nimbus nimbus-2.3 [nimbus@elephant11 local]$ sudo ln -s nimbus-2.3 nimbus
 
Changed:
<
<
Switch to the nimbus account
>
>
and a home for nimbus worker node control software
 
Changed:
<
<
[crlb@elephant11 local]$ sudo su - nimbus [nimbus@elephant11 ~]$
>
>
[nimbus@elephant11 local]$ cd /opt [nimbus@elephant11 opt]$ sudo mkdir nimbus-2.3 [nimbus@elephant11 opt]$ sudo chown nimbus.nimbus imbus-2.3 [nimbus@elephant11 opt]$ sudo ln -s nimbus-2.3 nimbus
 
Added:
>
>

Step 5: Install Globus ws-core Web Services container and required X509 certificates.

 The remainder of this procedure is done entirely as the nimbus user. The frontend tools will be installed to /usr/local/nimbus, and the backend tools to /opt/nimbus. Both of these directories are owned by nimbus, and need to be created by root. Additionally, /opt/nimbus is NFS mounted on the worker nodes. This install uses a custom build of Nimbus that has features and fixes that are not in a release at this time. In the future, this install should be done from the latest release on the Nimbus website.

This install will install the minimum set of utilities to run Nimbus. Nimbus needs to run in a Globus Container to work, and we will install the bare essentials of the Globus container, and will use a cert from Grid Canada. If you need the full set of globus utilities, please refer to the instructions on the GridX1 Wiki, and skip "Installing the Webservice Core" on this page.

Deleted:
<
<

Installing the Webservice Core

First installwe set up the basic globus webservice core. First, download and install the basic core tools.

 
[nimbus@elephant11 ~]$ cd /usr/local/nimbus
Line: 117 to 125
 

Create an empty grid-mapfile. This file will contain the certificate subjects of the users of your cloud-enabled cluster.

Added:
>
>
[nimbus@elephant11 nimbus]$  touch /usr/local/nimbus/share/grid-mapfile
 
Changed:
<
<
[nimbus@elephant11 nimbus]$  touch /usr/local/nimbus/share/grid-mapfile

Now set our environment variables. I'm assuming bash is your nimbus user's shell. If you're using csh or ksh, you might want to try substituting .profile for .bashrc:

>
>
Now set our environment variables. Example assumes bash as the nimbus user's shell:
 
[nimbus@elephant11 nimbus]$ cd
[nimbus@elephant11 ~]$  echo "export GLOBUS_LOCATION=/usr/local/nimbus" >> .bashrc
Line: 192 to 200
 
Added:
>
>

Step 6: Test the Web Services container.

 

Testing our Container

Now that we've set up security, we can try starting our container for the first time. To do so, run globus-start-container. You should see something like the following:

Revision 52010-02-16 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11
Changed:
<
<

Nimbus Install with just ws-core

>
>

Nimbus Install on Elephant Cluster

 Original document written by PatrickArmstrong. Modified for initial implementation of the split configuration on the elephant cluster. In this configuration, there are eleven nodes, with the head node residing on elephant01. During initial deployment the first three nodes will be used to develop cluster management, while the remaining nodes, elephant04 to elehant11 will be used for an interim cloud cluster, with elephant11 acting as its head node.
Changed:
<
<

Install Prerequisites

>
>

Overview

  1. Create privileged "nimbus" user on all cluster nodes.
  2. Switch to nimbus user for the remainder of the installation and create public/private keys.
  3. Download all required packages.
  4. Switch to temporary head node and install prerequisites
 
Changed:
<
<
Create the nimbus account on elephant head node (elephant01) with required sudo privileges.
>
>

Step 1: Create privileged "nimbus" user on all cluster nodes.

Create the nimbus account on elephant head node (elephant01) with the required sudo privileges.

 
[crlb@elephant01 ~]$ sudo adduser nimbus
[crlb@elephant01 ~]$ sudo visudo
Changed:
<
<
Comment out the requiretty directive
>
>
Comment out the requiretty directive:
 
#Defaults    requiretty
Changed:
<
<
Add the following privileges
>
>
Allow any command with a password:
nimbus  ALL=(ALL)       ALL

And the following commands with no password:

 
nimbus ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/mount-alter.sh
nimbus ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/dhcp-config.sh
nimbus ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/xen-ebtables-config.sh
Changed:
<
<
Save changes and propagate to every node in the cluster.
>
>
Save changes and propagate to every node in the cluster:
 
[crlb@elephant01 ~]$ sudo /usr/local/sbin/usync
Changed:
<
<
Download everything that the nimbus user will need.
>
>

Step 2: Switch to nimbus user for the remainder of the installation and create public/private keys.

 
[crlb@elephant01 ~]$ sudo su - nimbus
Changed:
<
<
[nimbus@elephant01 ~]$ mkdir Downloads [nimbus@elephant01 ~]$ cd Downloads [nimbus@elephant01 Downloads]$ wget http://www.nimbusproject.org/downloads/nimbus-2.3.tar.gz [nimbus@elephant01 Downloads]$ wget http://www.nimbusproject.org/downloads/nimbus-controls-2.3.tar.gz [nimbus@elephant01 Downloads]$ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.8/ws-core/bin/ws-core-4.0.8-bin.tar.gz [nimbus@elephant01 Downloads]$ wget http://mirror.csclub.uwaterloo.ca/apache/ant/binaries/apache-ant-1.8.0-bin.tar.bz2 [nimbus@elephant01 Downloads]$ exit [crlb@elephant01 ~]$
>
>
Password: ****** [nimbus@elephant01 ~]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): Created directory '/home/nimbus/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/nimbus/.ssh/id_rsa. Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub. The key fingerprint is: e4:43:60:84:0c:ea:dc:02:dd:4b:93:fd:f4:e4:38:e8 nimbus@elephant01.heprc.uvic.ca [nimbus@elephant01 ~]$ cp .ssh/id_rsa.pub .ssh/authorized_keys [nimbus@elephant01 ~]$

Step 3: Download all required packages.

[nimbus@elephant01 ~]$ mkdir -p Downloads/nimbus-2.3
[nimbus@elephant01 ~]$ cd Downloads/nimbus-2.3
[nimbus@elephant01 nimbus-2.3]$ wget http://www.nimbusproject.org/downloads/nimbus-2.3.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://www.nimbusproject.org/downloads/nimbus-controls-2.3.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.8/ws-core/bin/ws-core-4.0.8-bin.tar.gz
[nimbus@elephant01 nimbus-2.3]$ wget http://mirror.csclub.uwaterloo.ca/apache/ant/binaries/apache-ant-1.8.0-bin.tar.bz2
[nimbus@elephant01 nimbus-2.3]$ wget http://www.gridcanada.ca/ca/bffbd7d0.0
 

Switch to the interim cloud cluster head node, elephant11 and install java-1.6.0-sun-compat:

Revision 42010-02-15 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install with just ws-core

Line: 6 to 6
 

Install Prerequisites

Changed:
<
<
Create the nimbus account on elephant head node (elephant01) and propagate to all nodes in the cluster.
>
>
Create the nimbus account on elephant head node (elephant01) with required sudo privileges.
 
[crlb@elephant01 ~]$ sudo adduser nimbus
Added:
>
>
[crlb@elephant01 ~]$ sudo visudo

Comment out the requiretty directive

#Defaults    requiretty

Add the following privileges

nimbus ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/mount-alter.sh
nimbus ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/dhcp-config.sh
nimbus ALL=(root) NOPASSWD: /opt/nimbus/libexec/workspace-control/xen-ebtables-config.sh

Save changes and propagate to every node in the cluster.

 [crlb@elephant01 ~]$ sudo /usr/local/sbin/usync
Line: 32 to 49
 

Install Apache Ant

Changed:
<
<
[nimbus@elephant11 nimbus]$ 
[crlb@elephant11 ant]$ cd /usr/local
>
>
[crlb@elephant11 ~]$ cd /usr/local
 [crlb@elephant11 local]$ sudo tar -xjvf ~nimbus/Downloads/apache-ant-1.8.0-bin.tar.bz2
Line: 234 to 251
 
Changed:
<
<

Installing Nimbus

>
>

Install Nimbus

 Unpack the nimbus package and run the install script.
Changed:
<
<
[nimbus@elephant11 ~]$
>
>
[nimbus@elephant11 ~]$ cd /tmp [nimbus@elephant11 tmp]$ tar -xvf ~/Downloads/nimbus-2.3.tar.gz [nimbus@elephant11 tmp]$ /tmp/nimbus-2.3/bin/all-build-and-install.sh [nimbus@elephant11 tmp]$ rm -rf /tmp/nimbus-2.3/ [nimbus@elephant11 tmp]$ rmdir hsperfdata_nimbus

Setting Up Worker Nodes

Setting up passwordless access to worker nodes

Nimbus needs to be able to ssh without a password from the head node to the worker nodes and vice versa. This is for sending commands back and forth. The following setup assumes you have the nimbus home directory mounted over NFS between the head node and the worker nodes. If you don't you'll just need to copy the .ssh directory on the head node to the nimbus home directory on each worker.

$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
9c:75:52:2f:d9:bd:5a:05:43:ee:3f:b2:83:cc:f2:0b nimbus@canfardev.dao.nrc.ca
$ cd ~/.ssh
$ cp id_rsa.pub authorized_keys
$ chmod 600 authorized_keys

Now test it:

nimbus@canfardev $ ssh gildor
nimbus@ gildor $ ssh canfardev.dao.nrc.ca
nimbus@canfardev $

Great. It works. You may be asked to authorize a new host key. If so, just answer "yes".

Setting up Xen, ebtables and dhcpd

First, make sure Xen is installed. If it is, you should see something like the following when you run these commands:

# which xm
/usr/sbin/xm
# uname -r
2.6.18-128.1.1.el5xen
$ ps aux | grep xen
root        21  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenwatch]
root        22  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenbus]
root      2549  0.0  0.0   2188   956 ?        S    16:35   0:00 xenstored --pid-file /var/run/xenstore.pid
root      2554  0.0  0.1  12176  3924 ?        S    16:35   0:00 python /usr/sbin/xend start
root      2555  0.0  0.1  63484  4836 ?        Sl   16:35   0:00 python /usr/sbin/xend start
root      2557  0.0  0.0  12212   364 ?        Sl   16:35   0:00 xenconsoled --log none --timestamp none --log-dir /var/log/xen/console

If it's not installed, you can do so with:

# yum install xen kernel-xen
# chkconfig xend on

Then reboot.

You'll also need to install ebtables (not currently used) and dhcp. Do this by first enabling the DAG repository, then installing with yum:

# rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
# yum install ebtables dhcp

Now, edit the dhcpd config file. Make sure it looks something like this:

 
Changed:
<
<
Get Nimbus from the Nimbus website. You'll need the Nimbus package.
>
>
# vim /etc/dhcpd.conf
# dhcpd.conf
#
# Configuration file for ISC dhcpd for workspaces
 
Changed:
<
<
$ wget http://workspace.globus.org/downloads/nimbus-TP2.2.tar.gz
$ tar xzf nimbus-TP2.2.tar.gz
>
>
################# ## GLOBAL OPTS ## #################

# Option definitions common or default to all supported networks

# Keep this: ddns-update-style none;

# Can be overriden in host entry: default-lease-time 120; max-lease-time 240;

############# ## SUBNETS ## #############

# Make an entry like this for each supported subnet. Otherwise, the DHCP # daemon will not listen for requests on the interface of that subnet.

subnet 172.21.0.0 netmask 255.255.0.0 { }

### DO NOT EDIT BELOW, the following entries are added and ### removed programmatically.

### DHCP-CONFIG-AUTOMATIC-BEGINS ###

Setting Up Control Agents

The Nimbus Control Agents are the binaries on the worker node that act on behalf of the head node. They need to be installed on each worker node.

If you've already set up the control agents on one node, you shouldn't need to do the following steps on the other nodes. Just make sure the install directory is NFS mounted.

First, make sure we have the install directory:

# ls /opt/nimbus
/opt/nimbus
 
Changed:
<
<
Installing Nimbus depends on Ant and Ant depends on the xml-commons-api.
>
>
Now do the install:
 
Changed:
<
<
# yum install ant ant-trax ant-apache-regexp ant-nodeps xml-commons-api perl-XML-Parser
>
>
# wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz # tar xzf nimbus-controls-TP2.2.tar.gz # cd nimbus-controls-TP2.2/workspace-control # cp worksp.conf.example /opt/nimbus/worksp.conf # python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus
 
Changed:
<
<
Now we're ready to install Nimbus. There is an auto configuration that we can run that will help our installation. See the http://workspace.globus.org/vm/TP2.2/index.html for more details than are listed here.
>
>
The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Adding Node to Nimbus Config

This should be done after you've already installed Nimbus on the head node. If you haven't done that yet, come back to this section.

 
Added:
>
>
Edit $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools/canfardevpool to add the new node. Your file should look something like this:
 
Changed:
<
<
$ cd nimbus-TP2.2 $ sh ./bin/all-build-and-install.sh
>
>
#Some comments up here gildor 3072 guilin 3072
 
Added:
>
>
Your worker node should now be ready!
 Now run the auto-configuration program. Following is a transcript of running this program on canfardev:
$ $GLOBUS_LOCATION/share/nimbus-autoconfig/autoconfig.sh
Line: 796 to 942
 If you encounter an ebtables problem. You can try a patched version of ebtables. See This page for details.
Deleted:
<
<

Setting Up Worker Nodes

Setting up passwordless access to worker nodes

Nimbus needs to be able to ssh without a password from the head node to the worker nodes and vice versa. This is for sending commands back and forth. The following setup assumes you have the nimbus home directory mounted over NFS between the head node and the worker nodes. If you don't you'll just need to copy the .ssh directory on the head node to the nimbus home directory on each worker.

$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
9c:75:52:2f:d9:bd:5a:05:43:ee:3f:b2:83:cc:f2:0b nimbus@canfardev.dao.nrc.ca
$ cd ~/.ssh
$ cp id_rsa.pub authorized_keys
$ chmod 600 authorized_keys

Now test it:

nimbus@canfardev $ ssh gildor
nimbus@ gildor $ ssh canfardev.dao.nrc.ca
nimbus@canfardev $

Great. It works. You may be asked to authorize a new host key. If so, just answer "yes".

Setting up Xen, ebtables and dhcpd

First, make sure Xen is installed. If it is, you should see something like the following when you run these commands:

# which xm
/usr/sbin/xm
# uname -r
2.6.18-128.1.1.el5xen
$ ps aux | grep xen
root        21  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenwatch]
root        22  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenbus]
root      2549  0.0  0.0   2188   956 ?        S    16:35   0:00 xenstored --pid-file /var/run/xenstore.pid
root      2554  0.0  0.1  12176  3924 ?        S    16:35   0:00 python /usr/sbin/xend start
root      2555  0.0  0.1  63484  4836 ?        Sl   16:35   0:00 python /usr/sbin/xend start
root      2557  0.0  0.0  12212   364 ?        Sl   16:35   0:00 xenconsoled --log none --timestamp none --log-dir /var/log/xen/console

If it's not installed, you can do so with:

# yum install xen kernel-xen
# chkconfig xend on

Then reboot.

You'll also need to install ebtables (not currently used) and dhcp. Do this by first enabling the DAG repository, then installing with yum:

# rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
# yum install ebtables dhcp

Now, edit the dhcpd config file. Make sure it looks something like this:

# vim /etc/dhcpd.conf
# dhcpd.conf
#
# Configuration file for ISC dhcpd for workspaces


#################
## GLOBAL OPTS ##
#################

# Option definitions common or default to all supported networks

# Keep this:
ddns-update-style none;

# Can be overriden in host entry:
default-lease-time 120;
max-lease-time 240;


#############
## SUBNETS ##
#############

# Make an entry like this for each supported subnet.  Otherwise, the DHCP
# daemon will not listen for requests on the interface of that subnet.

subnet 172.21.0.0 netmask 255.255.0.0 {
}

### DO NOT EDIT BELOW, the following entries are added and 
### removed programmatically.

### DHCP-CONFIG-AUTOMATIC-BEGINS ###


Setting up Sudo

You need a few rules for the nimbus user to be able to run the xm scripts it needs:

Add the following rules to sudoers:

nimbus ALL=(root) NOPASSWD: /opt/nimbus/bin/mount-alter.sh
nimbus ALL=(root) NOPASSWD: /opt/nimbus/bin/dhcp-config.sh
nimbus ALL=(root) NOPASSWD: /usr/sbin/xm
nimbus ALL=(root) NOPASSWD: /usr/sbin/xend

And set requiretty to false in sudoers.

Now that we've set up our pre-requisites, we can install the worker node tools.

Setting Up Control Agents

The Nimbus Control Agents are the binaries on the worker node that act on behalf of the head node. They need to be installed on each worker node.

If you've already set up the control agents on one node, you shouldn't need to do the following steps on the other nodes. Just make sure the install directory is NFS mounted.

First, make sure we have the install directory:

# ls /opt/nimbus
/opt/nimbus

Now do the install:

# wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz
# tar xzf nimbus-controls-TP2.2.tar.gz
# cd nimbus-controls-TP2.2/workspace-control
# cp worksp.conf.example /opt/nimbus/worksp.conf
# python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus

The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Adding Node to Nimbus Config

This should be done after you've already installed Nimbus on the head node. If you haven't done that yet, come back to this section.

Edit $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools/canfardevpool to add the new node. Your file should look something like this:

#Some comments up here
gildor 3072
guilin 3072

Your worker node should now be ready!

 

-- PatrickArmstrong - 16 Jul 2009

Revision 32010-02-15 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install with just ws-core

Line: 8 to 8
  Create the nimbus account on elephant head node (elephant01) and propagate to all nodes in the cluster.
Changed:
<
<
[crlb@elephant01 ~] sudo adduser nimbus [crlb@elephant01 ~] sudo /usr/local/sbin/usync
>
>
[crlb@elephant01 ~]$ sudo adduser nimbus [crlb@elephant01 ~]$ sudo /usr/local/sbin/usync

Download everything that the nimbus user will need.

[crlb@elephant01 ~]$ sudo su - nimbus
[nimbus@elephant01 ~]$ mkdir Downloads
[nimbus@elephant01 ~]$ cd Downloads
[nimbus@elephant01 Downloads]$ wget http://www.nimbusproject.org/downloads/nimbus-2.3.tar.gz
[nimbus@elephant01 Downloads]$ wget http://www.nimbusproject.org/downloads/nimbus-controls-2.3.tar.gz
[nimbus@elephant01 Downloads]$ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.8/ws-core/bin/ws-core-4.0.8-bin.tar.gz
[nimbus@elephant01 Downloads]$ wget http://mirror.csclub.uwaterloo.ca/apache/ant/binaries/apache-ant-1.8.0-bin.tar.bz2
[nimbus@elephant01 Downloads]$ exit
[crlb@elephant01 ~]$ 
 

Switch to the interim cloud cluster head node, elephant11 and install java-1.6.0-sun-compat:

Line: 19 to 32
 

Install Apache Ant

Changed:
<
<
[crlb@elephant11 ~]$ mkdir -p Downloads/ant
[crlb@elephant11 ~]$ cd ~/Downloads/ant
[crlb@elephant11 ant]$ wget http://mirror.csclub.uwaterloo.ca/apache/ant/binaries/apache-ant-1.8.0-bin.tar.bz2
>
>
[nimbus@elephant11 nimbus]$ 
 [crlb@elephant11 ant]$ cd /usr/local
Changed:
<
<
[crlb@elephant11 local]$ sudo tar -xjvf ~/Downloads/ant/apache-ant-1.8.0-bin.tar.bz2
>
>
[crlb@elephant11 local]$ sudo tar -xjvf ~nimbus/Downloads/apache-ant-1.8.0-bin.tar.bz2
 
Changed:
<
<
Create home for globus ws-core
>
>
Create home for nimbus/globus ws-core
 
Changed:
<
<
[crlb@elephant11 local]$ sudo mkdir nimbus [crlb@elephant11 local]$ sudo chown nimbus.nimbus nimbus
>
>
[crlb@elephant11 local]$ sudo mkdir nimbus-2.3 [crlb@elephant11 local]$ sudo chown nimbus.nimbus nimbus-2.3 [crlb@elephant11 local]$ sudo ln -s nimbus-2.3 nimbus
 

and for Nimbus worker node control software

Changed:
<
<
[crlb@elephant11 local]$ sudo mkdir -p /opt/nimbus [crlb@elephant11 local]$ sudo chown nimbus.nimbus /opt/nimbus
>
>
[crlb@elephant11 local]$ sudo mkdir -p /opt/nimbus-2.3 [crlb@elephant11 local]$ sudo chown nimbus.nimbus /opt/nimbus-2.3
 
Line: 52 to 63
 

Installing the Webservice Core

Changed:
<
<
First we set up the basic globus webservice core. First, download and install the basic core tools.
>
>
First installwe set up the basic globus webservice core. First, download and install the basic core tools.
 
Changed:
<
<
nimbus$ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.8/ws-core/bin/ws-core-4.0.8-bin.tar.gz nimbus$ tar xzf ws-core-4.0.8-bin.tar.gz nimbus$ cp -R ws-core-4.0.8/* /usr/local/nimbus nimbus$ rm -Rf ws-core-4.0.8*
>
>
[nimbus@elephant11 ~]$ cd /usr/local/nimbus [nimbus@elephant11 nimbus]$ tar -xzf ~/Downloads/ws-core-4.0.8-bin.tar.gz [nimbus@elephant11 nimbus]$ mv ws-core-4.0.8/* . [nimbus@elephant11 nimbus]$ rmdir ws-core-4.0.8
 

Create an empty grid-mapfile. This file will contain the certificate subjects of the users of your cloud-enabled cluster.

Changed:
<
<
nimbus$ touch /usr/local/nimbus/share/grid-mapfile
>
>
[nimbus@elephant11 nimbus]$  touch /usr/local/nimbus/share/grid-mapfile
  Now set our environment variables. I'm assuming bash is your nimbus user's shell. If you're using csh or ksh, you might want to try substituting .profile for .bashrc:
Changed:
<
<
nimbus$ echo "export GLOBUS_LOCATION=/usr/local/nimbus" >> ~/.bashrc nimbus$ echo "export X509_CERT_DIR=/usr/local/nimbus/share/certificates" >> ~/.bashrc nimbus$ . ~/.bashrc
>
>
[nimbus@elephant11 nimbus]$ cd [nimbus@elephant11 ~]$ echo "export GLOBUS_LOCATION=/usr/local/nimbus" >> .bashrc [nimbus@elephant11 ~]$ echo "export X509_CERT_DIR=/usr/local/nimbus/share/certificates" >> .bashrc [nimbus@elephant11 ~]$ echo "export PATH=$PATH:/usr/local/apache-ant-1.8.0/bin" >> .bashrc [nimbus@elephant11 ~]$ . .bashrc
 

Certificates

Now we can set up the certificates. We're going to put them in our $X509_CERT_DIR . First, we make our certificates directory and put the grid canada root certificates in there.

Changed:
<
<
nimbus$ mkdir -p $X509_CERT_DIR nimbus$ cd $X509_CERT_DIR nimbus$ wget http://www.gridcanada.ca/ca/bffbd7d0.0
>
>
[nimbus@elephant11 ~]$ mkdir -p $X509_CERT_DIR [nimbus@elephant11 ~]$ cd $X509_CERT_DIR [nimbus@elephant11 ~]$ wget http://www.gridcanada.ca/ca/bffbd7d0.0
 

Then create a host certificate request to send to our CA.

Changed:
<
<
nimbus$ $GLOBUS_LOCATION/bin/grid-cert-request -int -host `hostname -f` -dir $X509_CERT_DIR -caEmail ca@gridcanada.ca -force
>
>
[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/grid-cert-request -int -host `hostname -f` -dir $X509_CERT_DIR -caEmail ca@gridcanada.ca -force
 
You are about to be asked to enter information that will be incorporated into your certificate request.
Line: 115 to 128
  to point to your new certificates and modify the gridmap value:
Changed:
<
<
$ cat /usr/local/nimbus/etc/globus_wsrf_core/global_security_descriptor.xml
>
>
[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml
 
Line: 127 to 140
 

Now we'll activate our security configuration by adding a element under the CONTAINER_SECURITY_DESCRIPTOR:

Changed:
<
<
$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
>
>
[nimbus@elephant11 ~]$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
 
<-- @CONTAINER_SECURITY_DESCRIPTOR@ -->
<parameter name="containerSecDesc" value="etc/globus_wsrf_core/global_security_descriptor.xml"/>
Line: 140 to 153
 Now that we've set up security, we can try starting our container for the first time. To do so, run globus-start-container. You should see something like the following:
Changed:
<
<
$ $GLOBUS_LOCATION/bin/globus-start-container
>
>
[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-container
 Starting SOAP server at: https://204.174.103.121:8443/wsrf/services/ With the following services:
Line: 201 to 214
  Then mark it as executable:
Changed:
<
<
$ chmod 744 $GLOBUS_LOCATION/bin/start-stop
>
>
[nimbus@elephant11 ~]$ chmod 744 $GLOBUS_LOCATION/bin/globus-start-stop
 

We can now try starting and stopping the container with this script, and see if we're listening on 8443:

Changed:
<
<
$ $GLOBUS_LOCATION/bin/globus-start-stop start
>
>
[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop start
 $ netstat -an | grep 8443 tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN
Line: 215 to 228
 Great! Now we have a running container. Let's stop it before we carry on with our installation.
Changed:
<
<
$ $GLOBUS_LOCATION/bin/globus-start-stop stop

Setting Up Worker Nodes

Setting up passwordless access to worker nodes

Nimbus needs to be able to ssh without a password from the head node to the worker nodes and vice versa. This is for sending commands back and forth. The following setup assumes you have the nimbus home directory mounted over NFS between the head node and the worker nodes. If you don't you'll just need to copy the .ssh directory on the head node to the nimbus home directory on each worker.

$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
9c:75:52:2f:d9:bd:5a:05:43:ee:3f:b2:83:cc:f2:0b nimbus@canfardev.dao.nrc.ca
$ cd ~/.ssh
$ cp id_rsa.pub authorized_keys
$ chmod 600 authorized_keys

Now test it:

nimbus@canfardev $ ssh gildor
nimbus@ gildor $ ssh canfardev.dao.nrc.ca
nimbus@canfardev $

Great. It works. You may be asked to authorize a new host key. If so, just answer "yes".

Setting up Xen, ebtables and dhcpd

First, make sure Xen is installed. If it is, you should see something like the following when you run these commands:

# which xm
/usr/sbin/xm
# uname -r
2.6.18-128.1.1.el5xen
$ ps aux | grep xen
root        21  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenwatch]
root        22  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenbus]
root      2549  0.0  0.0   2188   956 ?        S    16:35   0:00 xenstored --pid-file /var/run/xenstore.pid
root      2554  0.0  0.1  12176  3924 ?        S    16:35   0:00 python /usr/sbin/xend start
root      2555  0.0  0.1  63484  4836 ?        Sl   16:35   0:00 python /usr/sbin/xend start
root      2557  0.0  0.0  12212   364 ?        Sl   16:35   0:00 xenconsoled --log none --timestamp none --log-dir /var/log/xen/console

If it's not installed, you can do so with:

# yum install xen kernel-xen
# chkconfig xend on

Then reboot.

You'll also need to install ebtables and dhcp. Do this by first enabling the DAG repository, then installing with yum:

# rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
# yum install ebtables dhcp

Now, edit the dhcpd config file. Make sure it looks something like this:

# vim /etc/dhcpd.conf
# dhcpd.conf
#
# Configuration file for ISC dhcpd for workspaces


#################
## GLOBAL OPTS ##
#################

# Option definitions common or default to all supported networks

# Keep this:
ddns-update-style none;

# Can be overriden in host entry:
default-lease-time 120;
max-lease-time 240;


#############
## SUBNETS ##
#############

# Make an entry like this for each supported subnet.  Otherwise, the DHCP
# daemon will not listen for requests on the interface of that subnet.

subnet 172.21.0.0 netmask 255.255.0.0 {
}

### DO NOT EDIT BELOW, the following entries are added and 
### removed programmatically.

### DHCP-CONFIG-AUTOMATIC-BEGINS ###


Setting up Sudo

You need a few rules for the nimbus user to be able to run the xm scripts it needs:

Add the following rules to sudoers:

nimbus ALL=(root) NOPASSWD: /opt/nimbus/bin/mount-alter.sh
nimbus ALL=(root) NOPASSWD: /opt/nimbus/bin/dhcp-config.sh
nimbus ALL=(root) NOPASSWD: /usr/sbin/xm
nimbus ALL=(root) NOPASSWD: /usr/sbin/xend
>
>
[nimbus@elephant11 ~]$ $GLOBUS_LOCATION/bin/globus-start-stop stop
 
Deleted:
<
<
And set requiretty to false in sudoers.

Now that we've set up our pre-requisites, we can install the worker node tools.

Setting Up Control Agents

The Nimbus Control Agents are the binaries on the worker node that act on behalf of the head node. They need to be installed on each worker node.

If you've already set up the control agents on one node, you shouldn't need to do the following steps on the other nodes. Just make sure the install directory is NFS mounted.

First, make sure we have the install directory:

# ls /opt/nimbus
/opt/nimbus
 
Deleted:
<
<
Now do the install:
# wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz
# tar xzf nimbus-controls-TP2.2.tar.gz
# cd nimbus-controls-TP2.2/workspace-control
# cp worksp.conf.example /opt/nimbus/worksp.conf
# python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus

The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Adding Node to Nimbus Config

This should be done after you've already installed Nimbus on the head node. If you haven't done that yet, come back to this section.

Edit $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools/canfardevpool to add the new node. Your file should look something like this:

#Some comments up here
gildor 3072
guilin 3072
 
Deleted:
<
<
Your worker node should now be ready!
 

Installing Nimbus

Added:
>
>
Unpack the nimbus package and run the install script.
[nimbus@elephant11 ~]$
  Get Nimbus from the Nimbus website. You'll need the Nimbus package.
Line: 935 to 796
 If you encounter an ebtables problem. You can try a patched version of ebtables. See This page for details.
Added:
>
>

Setting Up Worker Nodes

Setting up passwordless access to worker nodes

Nimbus needs to be able to ssh without a password from the head node to the worker nodes and vice versa. This is for sending commands back and forth. The following setup assumes you have the nimbus home directory mounted over NFS between the head node and the worker nodes. If you don't you'll just need to copy the .ssh directory on the head node to the nimbus home directory on each worker.

$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
9c:75:52:2f:d9:bd:5a:05:43:ee:3f:b2:83:cc:f2:0b nimbus@canfardev.dao.nrc.ca
$ cd ~/.ssh
$ cp id_rsa.pub authorized_keys
$ chmod 600 authorized_keys

Now test it:

nimbus@canfardev $ ssh gildor
nimbus@ gildor $ ssh canfardev.dao.nrc.ca
nimbus@canfardev $

Great. It works. You may be asked to authorize a new host key. If so, just answer "yes".

Setting up Xen, ebtables and dhcpd

First, make sure Xen is installed. If it is, you should see something like the following when you run these commands:

# which xm
/usr/sbin/xm
# uname -r
2.6.18-128.1.1.el5xen
$ ps aux | grep xen
root        21  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenwatch]
root        22  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenbus]
root      2549  0.0  0.0   2188   956 ?        S    16:35   0:00 xenstored --pid-file /var/run/xenstore.pid
root      2554  0.0  0.1  12176  3924 ?        S    16:35   0:00 python /usr/sbin/xend start
root      2555  0.0  0.1  63484  4836 ?        Sl   16:35   0:00 python /usr/sbin/xend start
root      2557  0.0  0.0  12212   364 ?        Sl   16:35   0:00 xenconsoled --log none --timestamp none --log-dir /var/log/xen/console

If it's not installed, you can do so with:

# yum install xen kernel-xen
# chkconfig xend on

Then reboot.

You'll also need to install ebtables (not currently used) and dhcp. Do this by first enabling the DAG repository, then installing with yum:

# rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
# yum install ebtables dhcp

Now, edit the dhcpd config file. Make sure it looks something like this:

# vim /etc/dhcpd.conf
# dhcpd.conf
#
# Configuration file for ISC dhcpd for workspaces


#################
## GLOBAL OPTS ##
#################

# Option definitions common or default to all supported networks

# Keep this:
ddns-update-style none;

# Can be overriden in host entry:
default-lease-time 120;
max-lease-time 240;


#############
## SUBNETS ##
#############

# Make an entry like this for each supported subnet.  Otherwise, the DHCP
# daemon will not listen for requests on the interface of that subnet.

subnet 172.21.0.0 netmask 255.255.0.0 {
}

### DO NOT EDIT BELOW, the following entries are added and 
### removed programmatically.

### DHCP-CONFIG-AUTOMATIC-BEGINS ###


Setting up Sudo

You need a few rules for the nimbus user to be able to run the xm scripts it needs:

Add the following rules to sudoers:

nimbus ALL=(root) NOPASSWD: /opt/nimbus/bin/mount-alter.sh
nimbus ALL=(root) NOPASSWD: /opt/nimbus/bin/dhcp-config.sh
nimbus ALL=(root) NOPASSWD: /usr/sbin/xm
nimbus ALL=(root) NOPASSWD: /usr/sbin/xend
 
Added:
>
>
And set requiretty to false in sudoers.

Now that we've set up our pre-requisites, we can install the worker node tools.

Setting Up Control Agents

The Nimbus Control Agents are the binaries on the worker node that act on behalf of the head node. They need to be installed on each worker node.

If you've already set up the control agents on one node, you shouldn't need to do the following steps on the other nodes. Just make sure the install directory is NFS mounted.

First, make sure we have the install directory:

# ls /opt/nimbus
/opt/nimbus

Now do the install:

# wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz
# tar xzf nimbus-controls-TP2.2.tar.gz
# cd nimbus-controls-TP2.2/workspace-control
# cp worksp.conf.example /opt/nimbus/worksp.conf
# python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus

The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Adding Node to Nimbus Config

This should be done after you've already installed Nimbus on the head node. If you haven't done that yet, come back to this section.

Edit $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools/canfardevpool to add the new node. Your file should look something like this:

#Some comments up here
gildor 3072
guilin 3072

Your worker node should now be ready!

 

-- PatrickArmstrong - 16 Jul 2009

Revision 22010-02-12 - crlb

Line: 1 to 1
 
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install with just ws-core

Line: 33 to 33
 [crlb@elephant11 local]$ sudo chown nimbus.nimbus nimbus
Added:
>
>
and for Nimbus worker node control software
[crlb@elephant11 local]$ sudo mkdir -p /opt/nimbus
[crlb@elephant11 local]$ sudo chown nimbus.nimbus /opt/nimbus
 

Switch to the nimbus account

Revision 12010-02-11 - crlb

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="ColinLeavettBrown"
-- ColinLeavettBrown - 2010-02-11

Nimbus Install with just ws-core

Original document written by PatrickArmstrong. Modified for initial implementation of the split configuration on the elephant cluster. In this configuration, there are eleven nodes, with the head node residing on elephant01. During initial deployment the first three nodes will be used to develop cluster management, while the remaining nodes, elephant04 to elehant11 will be used for an interim cloud cluster, with elephant11 acting as its head node.

Install Prerequisites

Create the nimbus account on elephant head node (elephant01) and propagate to all nodes in the cluster.

[crlb@elephant01 ~] sudo adduser nimbus
[crlb@elephant01 ~] sudo /usr/local/sbin/usync

Switch to the interim cloud cluster head node, elephant11 and install java-1.6.0-sun-compat:

[crlb@elephant01 ~]$ ssh e11
[crlb@elephant11 ~]$ sudo yum -y install java-1.6.0-sun-compat

Install Apache Ant

[crlb@elephant11 ~]$ mkdir -p Downloads/ant
[crlb@elephant11 ~]$ cd ~/Downloads/ant
[crlb@elephant11 ant]$ wget http://mirror.csclub.uwaterloo.ca/apache/ant/binaries/apache-ant-1.8.0-bin.tar.bz2
[crlb@elephant11 ant]$ cd /usr/local
[crlb@elephant11 local]$ sudo tar -xjvf ~/Downloads/ant/apache-ant-1.8.0-bin.tar.bz2

Create home for globus ws-core

[crlb@elephant11 local]$ sudo mkdir nimbus
[crlb@elephant11 local]$ sudo chown nimbus.nimbus nimbus

Switch to the nimbus account

[crlb@elephant11 local]$ sudo su - nimbus
[nimbus@elephant11 ~]$ 

The remainder of this procedure is done entirely as the nimbus user. The frontend tools will be installed to /usr/local/nimbus, and the backend tools to /opt/nimbus. Both of these directories are owned by nimbus, and need to be created by root. Additionally, /opt/nimbus is NFS mounted on the worker nodes. This install uses a custom build of Nimbus that has features and fixes that are not in a release at this time. In the future, this install should be done from the latest release on the Nimbus website.

This install will install the minimum set of utilities to run Nimbus. Nimbus needs to run in a Globus Container to work, and we will install the bare essentials of the Globus container, and will use a cert from Grid Canada. If you need the full set of globus utilities, please refer to the instructions on the GridX1 Wiki, and skip "Installing the Webservice Core" on this page.

Installing the Webservice Core

First we set up the basic globus webservice core. First, download and install the basic core tools.

nimbus$ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.8/ws-core/bin/ws-core-4.0.8-bin.tar.gz
nimbus$ tar xzf ws-core-4.0.8-bin.tar.gz
nimbus$ cp -R ws-core-4.0.8/* /usr/local/nimbus
nimbus$ rm -Rf ws-core-4.0.8*

Create an empty grid-mapfile. This file will contain the certificate subjects of the users of your cloud-enabled cluster.

nimbus$ touch /usr/local/nimbus/share/grid-mapfile

Now set our environment variables. I'm assuming bash is your nimbus user's shell. If you're using csh or ksh, you might want to try substituting .profile for .bashrc:

nimbus$ echo "export GLOBUS_LOCATION=/usr/local/nimbus" >> ~/.bashrc
nimbus$ echo "export X509_CERT_DIR=/usr/local/nimbus/share/certificates" >> ~/.bashrc
nimbus$ . ~/.bashrc

Certificates

Now we can set up the certificates. We're going to put them in our $X509_CERT_DIR . First, we make our certificates directory and put the grid canada root certificates in there.

nimbus$ mkdir -p $X509_CERT_DIR
nimbus$ cd $X509_CERT_DIR
nimbus$ wget http://www.gridcanada.ca/ca/bffbd7d0.0

Then create a host certificate request to send to our CA.

nimbus$ $GLOBUS_LOCATION/bin/grid-cert-request -int -host `hostname -f` -dir $X509_CERT_DIR -caEmail ca@gridcanada.ca -force
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
Enter organization DN by entering individual component names and their values.
The component name can be one of: [givenname, surname, ou, uid, cn, initials, unstructuredname, t, unstructuredaddress, emailaddress, o, st, l, generation, sn, e, c, dc]
-----
Enter name component: C
Enter 'C' value: CA
Enter name component: O
Enter 'O' value: Grid
Enter name component: 
Generating a 1024 bit RSA private key
A private key and a certificate request has been generated with the subject:

/C=CA/O=Grid/CN=host/canfardev.dao.nrc.ca

The private key is stored in /usr/local/nimbus/share/certificates/hostkey.pem
The request is stored in /usr/local/nimbus/share/certificates/hostcert_request.pem

Now mail this request file (/usr/local/nimbus/share/certificates/hostcert_request.pem) to ca@gridcanada.ca . It might take a day or so before you get your certificate back.

Once you have your key, paste the contents into /usr/local/nimbus/share/certificates/hostcert.pem

Now that we have our certificate, we have to point our container to our key and certificate and to our empty grid-mapfile. To do so, edit $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml to point to your new certificates and modify the gridmap value:

$ cat /usr/local/nimbus/etc/globus_wsrf_core/global_security_descriptor.xml 
<?xml version="1.0" encoding="UTF-8"?>
<securityConfig xmlns="http://www.globus.org">
    <credential>
        <key-file value="/usr/local/nimbus/share/certificates/hostkey.pem"/>
        <cert-file value="/usr/local/nimbus/share/certificates/hostcert.pem"/>
    </credential>
    <gridmap value="/usr/local/nimbus/share/grid-mapfile"/>
</securityConfig>

Now we'll activate our security configuration by adding a element under the CONTAINER_SECURITY_DESCRIPTOR:

$ vim $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd

<!-- @CONTAINER_SECURITY_DESCRIPTOR@ -->
<parameter name="containerSecDesc"
              value="etc/globus_wsrf_core/global_security_descriptor.xml"/>

Testing our Container

Now that we've set up security, we can try starting our container for the first time. To do so, run globus-start-container. You should see something like the following:

$ $GLOBUS_LOCATION/bin/globus-start-container
Starting SOAP server at: https://204.174.103.121:8443/wsrf/services/ 
With the following services:

[1]: https://204.174.103.121:8443/wsrf/services/AdminService
[2]: https://204.174.103.121:8443/wsrf/services/AuthzCalloutTestService
[3]: https://204.174.103.121:8443/wsrf/services/ContainerRegistryEntryService
...
[25]: https://204.174.103.121:8443/wsrf/services/gsi/AuthenticationService

If you do, hit control-c. Congratulations! Your container is working.

If you get the following error

org.globus.common.ChainedIOException: Authentication failed [Caused by: Failure unspecified at GSS-API level [Caused by: Unknown CA]]
You are probably missing the Grid Canada .0 file(bffbd7d0.0 in this case). Either copy the file from another globus machine's X509_CERT_DIR or download the GC CA Bundle from the GC Certificate Authority website. and put the bffbd7d0.0 file into the X509_CERT_DIR and try starting the container again.

Automate Startup of Container

Now that we know our container works, we can create a script to run our container at login. Paste the following script into $GLOBUS_LOCATION/bin/globus-start-stop:

#!/bin/sh
set -e
export GLOBUS_OPTIONS="-Xms256M -Xmx1024M -Dorg.globus.tcp.port.range=50000,51999"
export GLOBUS_TCP_PORT_RANGE=50000,51999

cd $GLOBUS_LOCATION
case "$1" in
    start)
        nohup $GLOBUS_LOCATION/bin/globus-start-container -p 8443 \
       >>$GLOBUS_LOCATION/var/container.log &
        ;;
    stop)
        $GLOBUS_LOCATION/bin/grid-proxy-init \
            -key $GLOBUS_LOCATION/share/certificates/hostkey.pem\
            -cert $GLOBUS_LOCATION/share/certificates/hostcert.pem\
            -out /tmp/shutdownproxy.pem\
            >/dev/null
        export X509_USER_PROXY=/tmp/shutdownproxy.pem
        $GLOBUS_LOCATION/bin/globus-stop-container hard
        unset X509_USER_PROXY
        rm /tmp/shutdownproxy.pem
        ;;
    restart)
        $GLOBUS_LOCATION/start-stop stop
        $GLOBUS_LOCATION/start-stop start
        ;;
    *)
        echo "Usage: globus {start|stop}" >&2
        exit 1
       ;;
esac
exit 0

Then mark it as executable:

$ chmod 744 $GLOBUS_LOCATION/bin/start-stop

We can now try starting and stopping the container with this script, and see if we're listening on 8443:

$ $GLOBUS_LOCATION/bin/globus-start-stop start
$ netstat -an | grep 8443
tcp        0      0 0.0.0.0:8443                0.0.0.0:*                   LISTEN     

Great! Now we have a running container. Let's stop it before we carry on with our installation.

$ $GLOBUS_LOCATION/bin/globus-start-stop stop

Setting Up Worker Nodes

Setting up passwordless access to worker nodes

Nimbus needs to be able to ssh without a password from the head node to the worker nodes and vice versa. This is for sending commands back and forth. The following setup assumes you have the nimbus home directory mounted over NFS between the head node and the worker nodes. If you don't you'll just need to copy the .ssh directory on the head node to the nimbus home directory on each worker.

$ ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nimbus/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nimbus/.ssh/id_rsa.
Your public key has been saved in /home/nimbus/.ssh/id_rsa.pub.
The key fingerprint is:
9c:75:52:2f:d9:bd:5a:05:43:ee:3f:b2:83:cc:f2:0b nimbus@canfardev.dao.nrc.ca
$ cd ~/.ssh
$ cp id_rsa.pub authorized_keys
$ chmod 600 authorized_keys

Now test it:

nimbus@canfardev $ ssh gildor
nimbus@ gildor $ ssh canfardev.dao.nrc.ca
nimbus@canfardev $

Great. It works. You may be asked to authorize a new host key. If so, just answer "yes".

Setting up Xen, ebtables and dhcpd

First, make sure Xen is installed. If it is, you should see something like the following when you run these commands:

# which xm
/usr/sbin/xm
# uname -r
2.6.18-128.1.1.el5xen
$ ps aux | grep xen
root        21  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenwatch]
root        22  0.0  0.0      0     0 ?        S<   16:34   0:00 [xenbus]
root      2549  0.0  0.0   2188   956 ?        S    16:35   0:00 xenstored --pid-file /var/run/xenstore.pid
root      2554  0.0  0.1  12176  3924 ?        S    16:35   0:00 python /usr/sbin/xend start
root      2555  0.0  0.1  63484  4836 ?        Sl   16:35   0:00 python /usr/sbin/xend start
root      2557  0.0  0.0  12212   364 ?        Sl   16:35   0:00 xenconsoled --log none --timestamp none --log-dir /var/log/xen/console

If it's not installed, you can do so with:

# yum install xen kernel-xen
# chkconfig xend on

Then reboot.

You'll also need to install ebtables and dhcp. Do this by first enabling the DAG repository, then installing with yum:

# rpm -Uhv http://apt.sw.be/redhat/el5/en/i386/rpmforge/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
# yum install ebtables dhcp

Now, edit the dhcpd config file. Make sure it looks something like this:

# vim /etc/dhcpd.conf
# dhcpd.conf
#
# Configuration file for ISC dhcpd for workspaces


#################
## GLOBAL OPTS ##
#################

# Option definitions common or default to all supported networks

# Keep this:
ddns-update-style none;

# Can be overriden in host entry:
default-lease-time 120;
max-lease-time 240;


#############
## SUBNETS ##
#############

# Make an entry like this for each supported subnet.  Otherwise, the DHCP
# daemon will not listen for requests on the interface of that subnet.

subnet 172.21.0.0 netmask 255.255.0.0 {
}

### DO NOT EDIT BELOW, the following entries are added and 
### removed programmatically.

### DHCP-CONFIG-AUTOMATIC-BEGINS ###


Setting up Sudo

You need a few rules for the nimbus user to be able to run the xm scripts it needs:

Add the following rules to sudoers:

nimbus ALL=(root) NOPASSWD: /opt/nimbus/bin/mount-alter.sh
nimbus ALL=(root) NOPASSWD: /opt/nimbus/bin/dhcp-config.sh
nimbus ALL=(root) NOPASSWD: /usr/sbin/xm
nimbus ALL=(root) NOPASSWD: /usr/sbin/xend

And set requiretty to false in sudoers.

Now that we've set up our pre-requisites, we can install the worker node tools.

Setting Up Control Agents

The Nimbus Control Agents are the binaries on the worker node that act on behalf of the head node. They need to be installed on each worker node.

If you've already set up the control agents on one node, you shouldn't need to do the following steps on the other nodes. Just make sure the install directory is NFS mounted.

First, make sure we have the install directory:

# ls /opt/nimbus
/opt/nimbus

Now do the install:

# wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz
# tar xzf nimbus-controls-TP2.2.tar.gz
# cd nimbus-controls-TP2.2/workspace-control
# cp worksp.conf.example /opt/nimbus/worksp.conf
# python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus

The installer will ask you a bunch of questions. Answer them out to the best of your knowledge, and don't worry too much if you're not sure of the answers to some of the questions. Chances are though, you will just answer yes to all of them.

Adding Node to Nimbus Config

This should be done after you've already installed Nimbus on the head node. If you haven't done that yet, come back to this section.

Edit $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools/canfardevpool to add the new node. Your file should look something like this:

#Some comments up here
gildor 3072
guilin 3072

Your worker node should now be ready!

Installing Nimbus

Get Nimbus from the Nimbus website. You'll need the Nimbus package.

$ wget http://workspace.globus.org/downloads/nimbus-TP2.2.tar.gz
$ tar xzf nimbus-TP2.2.tar.gz

Installing Nimbus depends on Ant and Ant depends on the xml-commons-api.

# yum install ant ant-trax ant-apache-regexp ant-nodeps xml-commons-api perl-XML-Parser

Now we're ready to install Nimbus. There is an auto configuration that we can run that will help our installation. See the http://workspace.globus.org/vm/TP2.2/index.html for more details than are listed here.

$ cd nimbus-TP2.2
$ sh ./bin/all-build-and-install.sh

Now run the auto-configuration program. Following is a transcript of running this program on canfardev:

$ $GLOBUS_LOCATION/share/nimbus-autoconfig/autoconfig.sh

# ------------------------- #
# Nimbus auto-configuration #
# ------------------------- #

Using GLOBUS_LOCATION: /usr/local/nimbus

Is the current account (nimbus) the one the container will run under? y/n:
y

Pick a VMM to test with, enter a hostname: 
gildor

----------

How much RAM (MB) should be allocated for a test VM on the 'gildor' VMM?
256

Will allocate 256 MB RAM for test VM on the 'gildor' VMM.

----------

Is the current account (nimbus) also the account the privileged scripts will run under on the VMM (gildor)? y/n:
y

Does the container account (nimbus) need a special (non-default) SSH key to access the 'nimbus' account on the VMM nodes? y/n:
n

----------

Testing basic SSH access to nimbus@gildor

Test command (1): ssh -T -n -o BatchMode=yes nimbus@gildor /bin/true

Basic SSH test (1) working to nimbus@gildor

----------

Now we'll set up the *hostname* that VMMs will use to contact the container over SSHd

Even if you plan on ever setting up just one VMM and it is localhost to the container, you should still pick a hostname here ('localhost' if you must)

*** It looks like you have a hostname set up: canfardev.dao.nrc.ca

Would you like to manually enter a different hostname? y/n:
n

Using hostname: canfardev.dao.nrc.ca

----------

Is your local SSHd server on a port different than 22?  Enter 'n' or a port number: 
n

Attempting to connect to: canfardev.dao.nrc.ca:22

Contacted a server @ canfardev.dao.nrc.ca:22

----------

Now we will test the basic SSH notification conduit from the VMM to the container

Test command (2): ssh -T -n -o BatchMode=yes nimbus@gildor ssh -T -n -o BatchMode=yes -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Host key verification failed.

*** That failed.

Try it manually in another terminal?  There should be no keyboard interaction necessary for this test to pass.

You may need to run it first without extra options, and perhaps accept the host key.  For example, try this in another terminal (make sure you are using the VMM account 'nimbus' account on the test VMM node 'gildor'):

ssh -p 22 nimbus@canfardev.dao.nrc.ca /bin/true

Hit return when you are ready to try the test again:

Notification test (2) working (ssh from nimbus@gildor to nimbus@canfardev.dao.nrc.ca at port 22)

----------

OK, looking good.

---------------------------------------------------------------------
---------------------------------------------------------------------

If you have not followed the instructions for setting up workspace control yet, please do the basic installation steps now.

Look for the documentation at:
  - http://workspace.globus.org/vm/TP2.2/admin/quickstart.html#part-III

----------

A sample installation command set can be provided for you here if you supply a group name.  Group privileges are used for some configurations and programs.

What is a privileged unix group of nimbus on gildor?  Or type 'n' to skip this step.
nimbus

----------

*** Sample workspace-control installation commands:

    ssh root@gildor
        ^^^^ YOU NEED TO BE ROOT

    wget http://workspace.globus.org/downloads/nimbus-controls-TP2.2.tar.gz
    tar xzf nimbus-controls-TP2.2.tar.gz
    cd nimbus-controls-TP2.2

    mkdir -p /opt/nimbus
    cp worksp.conf.example /opt/nimbus/worksp.conf
    python install.py -i -c /opt/nimbus/worksp.conf -a nimbus -g nimbus


*** (see 'python install.py -h' for other options, including non-interactive installation)

----------

Waiting for you to install workspace control for the account 'nimbus' on the test VMM 'gildor'

After this is accomplished, press return to continue.

----------

Going to test container access to workspace control installation.

On 'gildor', did you install workspace-control somewhere else besides '/opt/nimbus/bin/workspace-control'? y/n:
n

Test command (3): ssh -T -n -o BatchMode=yes nimbus@gildor /opt/nimbus/bin/workspace-control -h 1>/dev/null

Workspace control test (3) working

----------

Testing ability to push files to workspace control installation.

We are looking for the directory on the VMM to push customization files from the container node. This defaults to '/opt/workspace/tmp'

Did you install workspace-control under some other base directory besides /opt/workspace? y/n: 
n
Test command (4): scp -o BatchMode=yes /usr/local/nimbus/share/nimbus-autoconfig/lib/transfer-test-file.txt nimbus@gildor:/opt/workspace/tmp/

transfer-test-file.txt                                         100%   73     0.1KB/s   00:00    

SCP test (4) working

----------

Great.

---------------------------------------------------------------------
---------------------------------------------------------------------

Now you will choose a test network address to give to an incoming VM.

Does the test VMM (gildor) have an IP address on the same subnet that VMs will be assigned network addresses from? y/n:
y
----------

What is a free IP address on the same subnet as 'gildor' (whose IP address is '172.21.1.197')
172.21.1.200 

----------

Even if it does not resolve, a hostname is required for '172.21.1.200' to include in the DHCP lease the VM will get:
canfardevtest

----------

What is the default gateway for 172.21.1.200? (guessing it is 172.21.1.1)
You can type 'none' if you are sure you don't want the VM to have a gateway

Please enter a gateway IP address or type 'none'.

What is the default gateway for 172.21.1.200? (guessing it is 172.21.1.1)
You can type 'none' if you are sure you don't want the VM to have a gateway
172.21.1.1

----------

What is the IP address of the DNS server that should be used by the VM? (guessing it is 172.21.1.34)
You can type 'none' if you are sure you don't want the VM to have DNS
172.21.1.34
----------

OK, in the 'make adjustments' step that follows, the service will be configured to provide this ONE network address to ONE guest VM.

You should add more VMMs and more available network addresses to assign guest VMs only after you successfully test with one VMM and one network address.

----------

*** Changes to your configuration are about to be executed.

So far, no configurations have been changed.  The following adjustments will be made based on the questions and tests we just went through:

 - The GLOBUS_LOCATION in use: /usr/local/nimbus
 - The account running the container/service: nimbus
 - The hostname running the container/service: canfardev.dao.nrc.ca
 - The contact address of the container/service for notifications: nimbus@canfardev.dao.nrc.ca (port 22)

 - The test VMM: gildor
 - The available RAM on that VMM: 256
 - The privileged account on the VMM: nimbus

 - The workspace-control path on VMM: /opt/workspace/bin/workspace-control
 - The workspace-control tmpdir on VMM: /opt/workspace/tmp

 - Test network address IP: 172.21.1.200
 - Test network address hostname: canfardevtest
 - Test network address gateway: 172.21.1.1
 - Test network address DNS: 172.21.1.34

----------


These settings are now stored in '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-decisions.sh'

If you type 'y', that script will be run for you with the settings.

Or you can answer 'n' to the next question and adjust this file.
And then manually run '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-adjustments.sh' at your leisure.


OK, point of no return.  Proceed? y/n
y

*** Running /usr/local/nimbus/share/nimbus-autoconfig/autoconfig-adjustments.sh . . .

# ------------------------------------------- #
# Nimbus auto-configuration: make adjustments #
# ------------------------------------------- #

Read settings from '/usr/local/nimbus/share/nimbus-autoconfig/autoconfig-decisions.sh'

----------

[*] The 'service.sshd.contact.string' configuration was:
    ... set to 'nimbus@canfardev.dao.nrc.ca:22'
    ... (it used to be set to 'REPLACE_WITH_SERVICE_NODE_HOSTNAME:22')
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'

----------

[*] The 'control.ssh.user' configuration was:
    ... set to 'nimbus'
    ... (it used to be set blank)
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'

----------

[*] The 'use.identity' configuration does not need to be changed.
    ... already set to be blank
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/ssh.conf'

----------

[*] The 'control.path' configuration does not need to be changed.
    ... already set to '/opt/workspace/bin/workspace-control'
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/vmm.conf'

----------

[*] The 'control.tmp.dir' configuration does not need to be changed.
    ... already set to '/opt/workspace/tmp'
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/vmm.conf'

----------

[*] Backing up old resource pool settings
    ... created new directory '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/.backups/old-pools-01'
    ... moved 'pool1' to '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/.backups/old-pools-01'

----------

[*] Creating new resource pool
    ... created '/usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/testpool'

----------

[*] Backing up old network settings
    ... created new directory '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'
    ... moved 'private' to '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'
    ... moved 'public' to '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/.backups/old-networks-01'

----------

[*] Creating new network called 'public'
    ... created '/usr/local/nimbus/etc/nimbus/workspace-service/network-pools/public'

----------

NOTE: you need to MATCH this network in the workspace-control configuration file.
This configuration file is at '/opt/workspace/worksp.conf' by default

For example, you might have this line:

association_0: public; xenbr0; vif0.1 ; none; 172.21.1.200/24

    ... "public" is the name of the network we chose.
    ... "xenbr0" is the name of the bridge to put VMs in this network on.
    ... "vif0.1" is the interface where the DHCP server is listening in dom0 on the VMM
    ... and the network address range serves as a sanity check (you can disable that check in the conf file)

----------

Making sure 'fake mode' is off:

[*] The 'fake.mode' configuration was:
    ... set to 'false'
    ... (it used to be set to 'true')
    ... in the file '/usr/local/nimbus/etc/nimbus/workspace-service/other/common.conf'

----------

Finished.

See 'NOTE' above.

Great. That seemed to work okay.

Let's carry on with the configuration.

Nimbus Configuration

First we need to tell Nimbus which machines we can boot virtual machines on. To do this, we need to edit this Nimbus frontend configuration files. These are in $GLOBUS_LOCATION/etc/nimbus . Let's define the machines we can boot on:

$ vim /usr/local/nimbus/etc/nimbus/workspace-service/vmm-pools/testpool
NOTE: a node may not be in more than one pool at the same time, this will
#       result in an initialization error

# Supported form:
# node_name  memory_to_manage networks_supported
#
# If third field is blank (or marked with '*'), it is assumed that pool
# node supports all networks available to remote clients.  Otherwise use a comma
# separated list (no spaces between).
#
# Note that if you list a network here that is not valid at runtime,
# it will silently be ignored (check your spelling).


# File contents injected @ Mon Jul 20 11:57:41 PDT 2009
gildor 1024

For now, we only have one machine in our pool, and it has 1024MB free ram with which to boot VMs.

Now we need to set up networking. To do this, we need to create a pool of network addresses we can assign to machines we boot on the cluster. Since we're going to start with only provate networking, we will create a private pool. We define this as a text file in $GLOBUS_LOCATION/etc/nimbus/workspace-service/network-pools

This file contains a DNS server for these machines, as well as a list of ip addresses, hostnames, and other networking details. We will set this file up for two addresses now.

$ cat $GLOBUS_LOCATION/etc/nimbus/workspace-service/network-pools/private
# DNS server IP or 'none'
172.21.1.34

# hostname ipaddress gateway broadcast subnetmask
canfardev00 172.21.1.200 172.21.1.1 none none
canfardev01 172.21.1.201 172.21.1.1 none none

Now we need to set up an equivalent networking association in the worksp.conf file on the worker nodes. You need to associate each network pool with a virtual interface on each worker node.

From worksp.conf on gildor:

association_0: private; xenbr0; vif0.0 ; none; 172.21.1.0/24

Now finally, point Nimbus to the grid-mapfile we created earlier:

$ vim $GLOBUS_LOCATION/etc/nimbus/factory-security-config.xml
<securityConfig xmlns="http://www.globus.org">
    <auth-method>
        <GSITransport/>
        <GSISecureMessage/>
        <GSISecureConversation/>
    </auth-method>
    <authz value="gridmap"/>
    <gridmap value="share/grid-mapfile"/>
</securityConfig>

Initial Test

Start up your container with globus-start-container. You should see the following new services:

https://10.20.0.1:8443/wsrf/services/ElasticNimbusService
https://10.20.0.1:8443/wsrf/services/WorkspaceContextBroker
https://10.20.0.1:8443/wsrf/services/WorkspaceEnsembleService
https://10.20.0.1:8443/wsrf/services/WorkspaceFactoryService
https://10.20.0.1:8443/wsrf/services/WorkspaceGroupService
https://10.20.0.1:8443/wsrf/services/WorkspaceService
https://10.20.0.1:8443/wsrf/services/WorkspaceStatusService

Now we'll run a test script. Let's try.

$ wget http://workspace.globus.org/vm/TP2.2/admin/test-create.sh
$ grid-proxy-init
$ sh ./test-create.sh

Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "/usr/local/nimbus/share/nimbus-clients/sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"...
Problem: General error: org.globus.wsrf.impl.security.authorization.exceptions.AuthorizationException: "/C=CA/O=Grid/OU=phys.uvic.ca/CN=Patrick Armstrong" is not authorized to use operation: {http://www.globus.org/2008/06/workspace}create on this service

Whoops! I need to add myself to the grid-mapfile:

$ echo '"/C=CA/O=Grid/OU=phys.uvic.ca/CN=Patrick Armstrong" nimbus' >> $GLOBUS_LOCATION/share/grid-mapfile
$ sh ./test-create.sh

Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "/usr/local/nimbus/share/nimbus-clients/sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"...
Problem: Resource request denied: Error creating workspace(s): 'public' is not a valid network name

Oh, whoops again. It looks like our test script wants to use public networking, and we don't have that set up.

$ cp $GLOBUS_LOCATION/share/nimbus-clients/sample-workspace.xml sample-workspace.xml
$ vim sample-workspace.xml (change public to private)
$ vim test-create.sh (change $GLOBUS_LOCATION/share/nimbus-clients/sample-workspace.xml to sample-workspace.xml)
$ sh test-create
...
Invalid:
--------
  - fatal, image '/opt/workspace/images/ttylinux-xen' does not exist on the filesystem
  - IMG #1 is invalid
  - no valid partitions/HD images
  - fatal, number of mountpoints (1) does not match number of valid partitions/HD images (0)
  - fatal, image '/opt/workspace/images/vmlinuz-2.6-xen' does not exist on the filesystem
  - fatal, no images configured
  - failure is triggered, backing out any networking reservations

for help use --help
"http://example1/localhost/image": Corrupted, calling destroy for you.
"http://example1/localhost/image" was terminated.

Whoops! Looks like we need to put the ttylinux files into the images directory on the worker node:

gildor # cd /opt/workspace/images
gildor # wget wget http://workspace.globus.org/downloads/ttylinux-xen.tgz
gildor # tar xzvf ttylinux-xen.tgz 
ttylinux-xen
ttylinux-xen.conf
gildor # rm -Rf ttylinux-xen.tgz
gildor # cp /boot/vmlinuz-2.6.18-128.1.1.el5xen vmlinuz-2.6-xen
Try again:

$ sh test-create.sh
Workspace Factory Service:
    https://127.0.0.1:8443/wsrf/services/WorkspaceFactoryService

Read metadata file: "sample-workspace.xml"
Created deployment request soley from arguments.

Creating workspace "http://example1/localhost/image"... done.



Workspace created: id 6
eth0
      Association: private
       IP address: 172.21.1.200
         Hostname: canfardev00
          Gateway: 172.21.1.1

       Start time: Mon Jul 20 13:53:39 PDT 2009
         Duration: 30 minutes.
    Shutdown time: Mon Jul 20 14:23:39 PDT 2009
 Termination time: Mon Jul 20 14:33:39 PDT 2009

Wrote EPR to "test.epr"


Waiting for updates.

"http://example1/localhost/image" state change: Unstaged --> Propagated
"http://example1/localhost/image" state change: Propagated --> Running

Oh it worked! Neat. Now let's kill the VM.

$ workspace --destroy -e test.epr 

Destroying workspace 6 @ "https://204.174.103.121:8443/wsrf/services/WorkspaceService"... destroyed.

Great! Now we're done! Other things to do now are add machines to the list of vmm-pools and network-pools.

Troubleshooting

If you encounter dhcp problems. Check /etc/dhcpd.conf on the worker nodes and make sure you are listening on the correct subnet(s).

If you encounter an ebtables problem. You can try a patched version of ebtables. See This page for details.

-- PatrickArmstrong - 16 Jul 2009

-- PatrickArmstrong - 2009-09-04

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback