Difference: ConfigureCernVM (1 vs. 20)

Revision 202013-02-22 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMOld"
Line: 15 to 15
 The old CernVM configuration recipe is at ConfigureCernVMOld.

CernVM v2.6 Batch Node Configuration

Added:
>
>
Note: this recipe is now superseded by BuildingDHAtlasVM
 

Xen Image Configuration

Download the Xen image and mount it locally to modify it:

Revision 172013-01-31 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMOld"
Line: 249 to 249
  Disable puppet in chkconfig ?
Added:
>
>
Set /etc/sysconfig/clock to use UTC?
  Something to keep in mind about a federation: the DQ2_LOCAL_SITE_ID is set via AGIS based on ATLAS_SITE_NAME. This variable can influence the choice of DQ2 endpoint to be used for analysis output, and is also used to choose the nearest replicas in the cloud for the job's input files.

Revision 162013-01-24 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMOld"
Line: 43 to 43
 /sbin/ifconfig eth0 txqueuelen 10000
Changed:
<
<

User accounts

>
>

SSH keys

 Optionally, add ssh keys for debugging:
mkdir /mnt/root/.ssh
Line: 52 to 52
 chmod 600 /mnt/root/.ssh/authorized_keys
Added:
>
>

User accounts

 Add the condor user:
echo "condor:x:102:102:Owner of Condor Daemons:/var/lib/condor:/sbin/nologin" >> /mnt/etc/passwd
Line: 107 to 109
 CVMFS_HTTP_PROXY="http://chrysaor.westgrid.ca:3128;http://cernvm-webfs.atlas-canada.ca:3128;DIRECT"
Added:
>
>
NOTE: there seems to be a CVMFS bug that prevents this from working. It works if the ${CERNVM_SERVER_URL:= part (and the closing brace) are left out.
 Create /mnt/etc/cvmfs/domain.d/cern.ch.local containing:
# For Europe:

Revision 152013-01-23 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMOld"
Line: 161 to 161
 Modify /mnt/etc/condor/condor_config as follows:
#NUM_SLOTS = 1
Deleted:
<
<
RELEASE_DIR = /opt/condor
 ALLOW_WRITE = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST) ALLOW_DAEMON = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST)
Added:
>
>
INCLUDE = $(RELEASE_DIR)/include LIBEXEC = $(RELEASE_DIR)/libexec JAVA = /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
 

TODO: may need to adjust ALLOW_DAEMON ... actually does it need to be set at all? Currently the VMs have ALLOW_DAEMON = $(FULL_HOSTNAME), $(CONDOR_HOST)

Revision 142013-01-16 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMOld"
Line: 166 to 166
 ALLOW_DAEMON = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST)
Changed:
<
<
TODO: may need to adjust ALLOW_DAEMON ...
>
>
TODO: may need to adjust ALLOW_DAEMON ... actually does it need to be set at all? Currently the VMs have ALLOW_DAEMON = $(FULL_HOSTNAME), $(CONDOR_HOST)
  Modify /mnt/etc/condor/condor_config.local and add the following lines: (Note: MaxJobRetirementTime and SHUTDOWN_GRACEFUL_TIMEOUT will soon be set by default in the script from github.)

Revision 132012-12-18 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMOld"
Line: 243 to 243
  Disable puppet in chkconfig ?
Added:
>
>
Something to keep in mind about a federation: the DQ2_LOCAL_SITE_ID is set via AGIS based on ATLAS_SITE_NAME. This variable can influence the choice of DQ2 endpoint to be used for analysis output, and is also used to choose the nearest replicas in the cloud for the job's input files.
 

Repoman

Check that /mnt/.image.metadata exists and contains the name (VMType) of the image.

Revision 122012-12-06 - rptaylor

Line: 1 to 1
Changed:
<
<
META TOPICPARENT name="ConfigureCernVMCS"
>
>
META TOPICPARENT name="ConfigureCernVMOld"
 

Useful Documentation

Line: 12 to 12
 Colin's recipe for dual-hypervisor configuration:
Added:
>
>
The old CernVM configuration recipe is at ConfigureCernVMOld.
 

CernVM v2.6 Batch Node Configuration

Xen Image Configuration

Download the Xen image and mount it locally to modify it:
Line: 135 to 137
 LABEL=blankpartition0 /scratch ext2 noatime 0 0
Changed:
<
<
I experimented with different journaling options. noatime and data=writeback should give better performance; the drawbacks don't matter since the VMs are not persistent. However, using 'data=writeback' seemed to cause the filesystem to inexplicably become read-only. Doing mount -o remount /dev/root / , without changing anything else, fixed the issue. Maybe a workaround could be to run tune2fs -o journal_data_writeback /dev/sda at bootup instead of putting data=writeback in fstab. In that case, let's just wait until we can try ext4.

ext4 support exists in the e4fsprogs package in CernVM v2.6. We should try using ext4 with no journaling; the performance should be better. It is disabled like this: tune4fs -O ^has_journal /dev/sdb and you can verify whether the has_journal property is there using /sbin/dumpe4fs /dev/sdb . Or, just create it without journaling in the first place: mkfs.ext4 -O ^has_journal /dev/sdb . Maybe try the nobarrier option too, although this might not matter as much.

However, in the case of Nimbus, some new development will be needed to use ext4 partitions.

>
>
ext4 support exists in the e4fsprogs package in CernVM v2.6. We should try using ext4 with no journaling; the performance should be better. It is disabled like this: tune4fs -O ^has_journal /dev/sdb and you can verify whether the has_journal property is there using /sbin/dumpe4fs /dev/sdb . Or, just create it without journaling in the first place: mkfs.ext4 -O ^has_journal /dev/sdb . Maybe try the nobarrier and noatime options too, although they might not be as significant. However, in the case of Nimbus, some new development will be needed to use ext4 partitions.
 

Set up Condor

Line: 168 to 166
 ALLOW_DAEMON = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST)
Changed:
<
<
TODO: figure out ALLOW_DAEMON ...
>
>
TODO: may need to adjust ALLOW_DAEMON ...
  Modify /mnt/etc/condor/condor_config.local and add the following lines: (Note: MaxJobRetirementTime and SHUTDOWN_GRACEFUL_TIMEOUT will soon be set by default in the script from github.)
Line: 266 to 264
 We set up automated proxy renewal using MyProxy as described at CsGsiSupport#Credential_renewal.

-- RyanTaylor - 2012-11-04 \ No newline at end of file

Added:
>
>
META TOPICMOVED by="rptaylor" date="1354832173" from="Main.ConfigureCernVMNew" to="Main.ConfigureCernVM"

Revision 112012-11-30 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMCS"
Line: 165 to 165
 #NUM_SLOTS = 1 RELEASE_DIR = /opt/condor ALLOW_WRITE = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST)
Changed:
<
<
ALLOW_DAEMON = $(FULL_HOSTNAME), $(CONDOR_HOST)
>
>
ALLOW_DAEMON = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST)
 

TODO: figure out ALLOW_DAEMON ...

Revision 102012-11-30 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMCS"
Line: 165 to 165
 #NUM_SLOTS = 1 RELEASE_DIR = /opt/condor ALLOW_WRITE = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST)
Changed:
<
<
ALLOW_DAEMON = $(FULL_HOSTNAME)
>
>
ALLOW_DAEMON = $(FULL_HOSTNAME), $(CONDOR_HOST)
 
Added:
>
>
TODO: figure out ALLOW_DAEMON ...
 Modify /mnt/etc/condor/condor_config.local and add the following lines: (Note: MaxJobRetirementTime and SHUTDOWN_GRACEFUL_TIMEOUT will soon be set by default in the script from github.)

Revision 92012-11-29 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMCS"
Line: 97 to 97
 . /cvmfs/atlas.cern.ch/repo/sw/local/bin/auto-setup
Deleted:
<
<
and create a .bash_profile (TODO: this might not be needed.)
#if [ -f ~/.bashrc ]; then
#        . ~/.bashrc
#fi

TODO: this might not be needed Then copy the .bashrc and .bash_profile files into each atlas home directory:

for i in `seq -w 1 16`
do 
  uid=$((499+${i##0}))
  cp /tmp/bashrc /mnt/home/atlas$i/.bashrc
  chown $uid.$uid /mnt/home/atlas$i/.bashrc
  cp /tmp/bash_profile /mnt/home/atlas$i/.bash_profile
  chown $uid.$uid /mnt/home/atlas$i/.bash_profile
done
 

CVMFS

In /mnt/etc/cvmfs/default.local add the following lines:
Line: 185 to 165
 #NUM_SLOTS = 1 RELEASE_DIR = /opt/condor ALLOW_WRITE = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST)
Added:
>
>
ALLOW_DAEMON = $(FULL_HOSTNAME)
 

Modify /mnt/etc/condor/condor_config.local and add the following lines:

Line: 264 to 245
 

Repoman

Changed:
<
<
Check that /mnt/.image.metadata and contains the name (VMType) of the image.
>
>
Check that /mnt/.image.metadata exists and contains the name (VMType) of the image.
  TODO ... stuff about Repoman and the dual-hypervisor image.
Added:
>
>
 TODO ... check that the ssh keys don't get removed

Submitting jobs

Revision 72012-11-23 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMCS"
Line: 65 to 65
 

Grid Environment

Deleted:
<
<
TODO: Update this with whatever works.

TODO: what about a .csh file?

 Create a file /mnt/etc/profile.d/grid-setup.sh containing:
# Keep grid setup out of root's environment; it causes a problem when starting condor.
Line: 86 to 82
  ## Set up grid environment: ## Option 1: gLite 3.2 in /cvmfs/grid.cern.ch
Changed:
<
<
## TODO: if this works, just put a symlink in profile.d/ instead . /cvmfs/grid.cern.ch/3.2.11-1/etc/profile.d/grid-env.sh
>
>
## Currently this doesn't work because 32-bit lfc libraries are configured instead of 64-bit #. /cvmfs/grid.cern.ch/3.2.11-1/etc/profile.d/grid-env.sh
 ## Option 2: gLite 3.2 in AtlasLocalRootBase
Changed:
<
<
#shopt -s expand_aliases #export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase #alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh' #setupATLAS --quiet #localSetupGLite
>
>
shopt -s expand_aliases export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh' setupATLAS --quiet localSetupGLite
 ## Fix for using AtlasLocalRootBase with a kit
Changed:
<
<
#unset AtlasSetupSite #rm ~/.asetup
>
>
unset AtlasSetupSite rm ~/.asetup
 
Changed:
<
<
# Site-specific variables (e.g. Frontier and Squid servers) # are set based on ATLAS_SITE_NAME, which is set in the JDL.
>
>
# Site-specific variables (e.g. Frontier and Squid servers) are set based on ATLAS_SITE_NAME (from JDL).
 # This auto-setup is only temporarily needed, and will soon become automatic . /cvmfs/atlas.cern.ch/repo/sw/local/bin/auto-setup
Line: 134 to 129
  Create /mnt/etc/cvmfs/domain.d/cern.ch.local containing:
Changed:
<
<
# For North America CVMFS_SERVER_URL=${CERNVM_SERVER_URL:="http://cvmfs.racf.bnl.gov:8000/opt/@org@;http://cvmfs-stratum-one.cern.ch:8000/opt/@org@;http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@"} # For Europe #CVMFS_SERVER_URL=${CERNVM_SERVER_URL:="http://cvmfs-stratum-one.cern.ch:8000/opt/@org@;http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@;http://cvmfs.racf.bnl.gov:8000/opt/@org@"} # For Australia #CVMFS_SERVER_URL=${CERNVM_SERVER_URL:="http://cvmfs.racf.bnl.gov:8000/opt/@org@;http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@;http://cvmfs-stratum-one.cern.ch:8000/opt/@org@"}
>
>
# For Europe: #CVMFS_SERVER_URL=${CERNVM_SERVER_URL:="http://cvmfs-stratum-one.cern.ch:8000/opt/@org@;http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@;http://cvmfs.racf.bnl.gov:8000/opt/@org@;http://cvmfs.fnal.gov:8000/opt/@org@;http://cvmfs02.grid.sinica.edu.tw:8000/opt/@org@"} # For North America: CVMFS_SERVER_URL=${CERNVM_SERVER_URL:="http://cvmfs.racf.bnl.gov:8000/opt/@org@;http://cvmfs.fnal.gov:8000/opt/@org@;http://cvmfs-stratum-one.cern.ch:8000/opt/@org@;http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@;http://cvmfs02.grid.sinica.edu.tw:8000/opt/@org@"} # For Australia: #CVMFS_SERVER_URL=${CERNVM_SERVER_URL:="http://cvmfs.fnal.gov:8000/opt/@org@;http://cvmfs.racf.bnl.gov:8000/opt/@org@;http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@;http://cvmfs-stratum-one.cern.ch:8000/opt/@org@;http://cvmfs02.grid.sinica.edu.tw:8000/opt/@org@"}
 

CernVM settings

Line: 162 to 157
 LABEL=blankpartition0 /scratch ext2 noatime 0 0
Changed:
<
<
TODO I experimented with different journaling options. noatime and data=writeback should give better performance; the drawbacks don't matter since the VMs are not persistent. However, using 'data=writeback' seemed to cause the filesystem to inexplicably become read-only. Doing mount -o remount /dev/root / , without changing anything else, fixed the issue. Maybe a workaround could be to run tune2fs -o journal_data_writeback /dev/sda at bootup instead of putting data=writeback in fstab ? In which case, just wait until we can try ext4.
>
>
I experimented with different journaling options. noatime and data=writeback should give better performance; the drawbacks don't matter since the VMs are not persistent. However, using 'data=writeback' seemed to cause the filesystem to inexplicably become read-only. Doing mount -o remount /dev/root / , without changing anything else, fixed the issue. Maybe a workaround could be to run tune2fs -o journal_data_writeback /dev/sda at bootup instead of putting data=writeback in fstab. In that case, let's just wait until we can try ext4.

ext4 support exists in the e4fsprogs package in CernVM v2.6. We should try using ext4 with no journaling; the performance should be better. It is disabled like this: tune4fs -O ^has_journal /dev/sdb and you can verify whether the has_journal property is there using /sbin/dumpe4fs /dev/sdb . Or, just create it without journaling in the first place: mkfs.ext4 -O ^has_journal /dev/sdb . Maybe try the nobarrier option too, although this might not matter as much.

 
Changed:
<
<
TODO ext4 support exists in the e4fsprogs package in SL 5.6 and later. Investigate using ext4 with no journaling; the performance should be much better. It is disabled like this: tune4fs -O ^has_journal /dev/sdb and you can verify whether the has_journal property is there using /sbin/dumpe4fs /dev/sdb . Or, just create it without journaling in the first place: mkfs.ext4 -O ^has_journal /dev/sdb . Maybe try the nobarrier option too, although this might not matter as much ...
>
>
However, in the case of Nimbus, some new development will be needed to use ext4 partitions.
 

Set up Condor

Revision 52012-11-17 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMCS"
Line: 70 to 70
  Create a file /mnt/etc/profile.d/grid-setup.sh containing:
Changed:
<
<
# Avoid grid setup for condor.
>
>
# Keep grid setup out of root's environment; it causes a problem when starting condor.
 if $UID -eq 0; then return 0 fi
Line: 98 to 98
 #rm ~/.asetup

# Site-specific variables (e.g. Frontier and Squid servers)

Changed:
<
<
# are set based on ATLAS_SITE_NAME. # TODO: This may not be needed if it can be set in the JDL instead #export ATLAS_SITE_NAME=FutureGrid
>
>
# are set based on ATLAS_SITE_NAME, which is set in the JDL.
 # This auto-setup is only temporarily needed, and will soon become automatic . /cvmfs/atlas.cern.ch/repo/sw/local/bin/auto-setup
Changed:
<
<
TODO: this might not be needed. #and a .bash_profile:
>
>
and create a .bash_profile (TODO: this might not be needed.)
 
#if [ -f ~/.bashrc ]; then
#        . ~/.bashrc
Line: 188 to 185
 CONDOR_CONFIG_VAL=/opt/condor/bin/condor_config_val
Changed:
<
<
Modify /mnt/etc/condor/condor_config . Comment out the NUM_SLOTS parameter, so that the VMs will have as many job slots as cores.
>
>
Modify /mnt/etc/condor/condor_config as follows:
 
#NUM_SLOTS = 1
RELEASE_DIR             = /opt/condor
Deleted:
<
<

TODO: check that this line doesn't need to be changed:

 ALLOW_WRITE = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST)
Line: 283 to 276
 

Submitting jobs

Changed:
<
<
Get a recent pilot wrapper script: =wget http://walkerr.web.cern.ch/walkerr/runpilot3-wrapper-sep19.sh=
>
>
Get a recent pilot wrapper script:
wget http://walkerr.web.cern.ch/walkerr/runpilot3-wrapper-sep19.sh
 Launch the condor job like this:
Executable = runpilot3-wrapper.sh

Revision 42012-11-13 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMCS"
Line: 200 to 200
 

Modify /mnt/etc/condor/condor_config.local and add the following lines:

Added:
>
>
(Note: MaxJobRetirementTime and SHUTDOWN_GRACEFUL_TIMEOUT will soon be set by default in the script from github.)
 
# How long to wait for jobs to retire before killing them
MaxJobRetirementTime = 3600 * 24 * 2

Revision 32012-11-11 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMCS"
Line: 201 to 201
  Modify /mnt/etc/condor/condor_config.local and add the following lines:
Added:
>
>
# How long to wait for jobs to retire before killing them MaxJobRetirementTime = 3600 * 24 * 2 # How long to wait for daemons to retire before killing them SHUTDOWN_GRACEFUL_TIMEOUT = 3600 * 25 * 2
 SLOT1_USER = atlas01 SLOT2_USER = atlas02 SLOT3_USER = atlas03

Revision 22012-11-05 - rptaylor

Line: 1 to 1
 
META TOPICPARENT name="ConfigureCernVMCS"
Line: 70 to 70
  Create a file /mnt/etc/profile.d/grid-setup.sh containing:
Changed:
<
<
# Workaround for condor not setting $HOME.
>
>
# Avoid grid setup for condor. if $UID -eq 0; then return 0 fi

export GLOBUS_FTP_CLIENT_GRIDFTP2=true

# Workaround for condor not setting $HOME for atlas users.

 # voms-proxy-info requires this. if -z "$HOME" ; then export HOME=`eval echo ~$USER`
Line: 160 to 167
 TODO I experimented with different journaling options. noatime and data=writeback should give better performance; the drawbacks don't matter since the VMs are not persistent. However, using 'data=writeback' seemed to cause the filesystem to inexplicably become read-only. Doing mount -o remount /dev/root / , without changing anything else, fixed the issue. Maybe a workaround could be to run tune2fs -o journal_data_writeback /dev/sda at bootup instead of putting data=writeback in fstab ? In which case, just wait until we can try ext4.
Changed:
<
<
TODO ext4 support exists in the e4fsprogs package in SL 5.6 and later. Investigate using ext4 with no journaling; the performance should be much better. It is disabled like this: tune4fs -O ^has_journal /dev/sdb and you can verify whether the has_journal property is there using /sbin/dumpe4fs /dev/sdb . Or, just create it without journaling in the first place: mkfs.ext4 -O ^has_journal /dev/sdb
>
>
TODO ext4 support exists in the e4fsprogs package in SL 5.6 and later. Investigate using ext4 with no journaling; the performance should be much better. It is disabled like this: tune4fs -O ^has_journal /dev/sdb and you can verify whether the has_journal property is there using /sbin/dumpe4fs /dev/sdb . Or, just create it without journaling in the first place: mkfs.ext4 -O ^has_journal /dev/sdb . Maybe try the nobarrier option too, although this might not matter as much ...
 

Set up Condor

Line: 257 to 264
 chmod 600 /mnt/etc/grid-security/hostkey.pem
Added:
>
>

TODO

Disable puppet in chkconfig ?

 

Repoman

Check that /mnt/.image.metadata and contains the name (VMType) of the image.

Revision 12012-11-04 - rptaylor

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="ConfigureCernVMCS"

Useful Documentation

Dan's recipe for CloudSigma may be a useful reference:

Colin's recipe for dual-hypervisor configuration:

CernVM v2.6 Batch Node Configuration

Xen Image Configuration

Download the Xen image and mount it locally to modify it:

wget http://cernvm.cern.ch/releases/17/cernvm-batch-node-2.6.0-4-1-x86_64.ext3.gz
gunzip cernvm-batch-node-2.6.0-4-1-x86_64.ext3.gz
mount -o loop cernvm-batch-node-2.6.0-4-1-x86_64.ext3 /mnt

Network tuning

Add the following to /mnt/etc/sysctl.conf
# Network tuning: http://fasterdata.es.net/fasterdata/host-tuning/linux/
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216 
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.netdev_max_backlog = 30000
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1

Add the following to /mnt/etc/rc.local

# increase txqueuelen for 10G NICS
/sbin/ifconfig eth0 txqueuelen 10000

User accounts

Optionally, add ssh keys for debugging:
mkdir /mnt/root/.ssh
chmod 700 /mnt/root/.ssh
vi /mnt/root/.ssh/authorized_keys #add ssh public keys
chmod 600 /mnt/root/.ssh/authorized_keys

Add the condor user:

echo "condor:x:102:102:Owner of Condor Daemons:/var/lib/condor:/sbin/nologin" >> /mnt/etc/passwd
echo "condor:x:102:" >> /mnt/etc/group

Add the atlas users to /etc/passwd and /etc/group and create their home directories. (There should be at least as many atlas accounts as cores in the VM.)

for i in `seq -w 1 32`; do  uid=$((499+${i##0})); echo "atlas$i:x:$uid:$uid::/home/atlas$i:/bin/bash" >> /mnt/etc/passwd; echo "atlas$i:x:$uid:" >> /mnt/etc/group; mkdir /mnt/home/atlas$i; chown $uid.$uid /mnt/home/atlas$i; done

Grid Environment

TODO: Update this with whatever works.

TODO: what about a .csh file?

Create a file /mnt/etc/profile.d/grid-setup.sh containing:

# Workaround for condor not setting $HOME.
# voms-proxy-info requires this.
if [[ -z "$HOME" ]] ; then
  export HOME=`eval echo ~$USER`
fi

## Set up grid environment:
## Option 1: gLite 3.2 in /cvmfs/grid.cern.ch
## TODO: if this works, just put a symlink in profile.d/ instead
. /cvmfs/grid.cern.ch/3.2.11-1/etc/profile.d/grid-env.sh
## Option 2: gLite 3.2 in AtlasLocalRootBase
#shopt -s expand_aliases
#export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
#alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'
#setupATLAS --quiet
#localSetupGLite
## Fix for using AtlasLocalRootBase with a kit
#unset  AtlasSetupSite
#rm ~/.asetup

# Site-specific variables (e.g. Frontier and Squid servers) 
# are set based on ATLAS_SITE_NAME.
# TODO: This may not be needed if it can be set in the JDL instead
#export ATLAS_SITE_NAME=FutureGrid
# This auto-setup is only temporarily needed, and will soon become automatic 
. /cvmfs/atlas.cern.ch/repo/sw/local/bin/auto-setup

TODO: this might not be needed. #and a .bash_profile:

#if [ -f ~/.bashrc ]; then
#        . ~/.bashrc
#fi

TODO: this might not be needed Then copy the .bashrc and .bash_profile files into each atlas home directory:

for i in `seq -w 1 16`
do 
  uid=$((499+${i##0}))
  cp /tmp/bashrc /mnt/home/atlas$i/.bashrc
  chown $uid.$uid /mnt/home/atlas$i/.bashrc
  cp /tmp/bash_profile /mnt/home/atlas$i/.bash_profile
  chown $uid.$uid /mnt/home/atlas$i/.bash_profile
done

CVMFS

In /mnt/etc/cvmfs/default.local add the following lines:
CVMFS_REPOSITORIES=atlas.cern.ch,atlas-condb.cern.ch,grid.cern.ch
CVMFS_QUOTA_LIMIT=3500
CVMFS_HTTP_PROXY="http://chrysaor.westgrid.ca:3128;http://cernvm-webfs.atlas-canada.ca:3128;DIRECT"

Create /mnt/etc/cvmfs/domain.d/cern.ch.local containing:

# For North America
CVMFS_SERVER_URL=${CERNVM_SERVER_URL:="http://cvmfs.racf.bnl.gov:8000/opt/@org@;http://cvmfs-stratum-one.cern.ch:8000/opt/@org@;http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@"}
# For Europe
#CVMFS_SERVER_URL=${CERNVM_SERVER_URL:="http://cvmfs-stratum-one.cern.ch:8000/opt/@org@;http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@;http://cvmfs.racf.bnl.gov:8000/opt/@org@"}
# For Australia
#CVMFS_SERVER_URL=${CERNVM_SERVER_URL:="http://cvmfs.racf.bnl.gov:8000/opt/@org@;http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@;http://cvmfs-stratum-one.cern.ch:8000/opt/@org@"}

CernVM settings

Add the following to /mnt/etc/cernvm/site.conf
CERNVM_CVMFS2=on
CERNVM_EDITION=Basic
CERNVM_ORGANISATION=atlas
CERNVM_USER_SHELL=/bin/bash
CVMFS_REPOSITORIES=atlas,atlas-condb,grid

Filesystem

Set up the mount point for the blankspace partition:
mkdir /mnt/scratch
In /mnt/etc/fstab add the /scratch filesystem:
LABEL=blankpartition0   /scratch                    ext2    noatime         0 0

TODO I experimented with different journaling options. noatime and data=writeback should give better performance; the drawbacks don't matter since the VMs are not persistent. However, using 'data=writeback' seemed to cause the filesystem to inexplicably become read-only. Doing mount -o remount /dev/root / , without changing anything else, fixed the issue. Maybe a workaround could be to run tune2fs -o journal_data_writeback /dev/sda at bootup instead of putting data=writeback in fstab ? In which case, just wait until we can try ext4.

TODO ext4 support exists in the e4fsprogs package in SL 5.6 and later. Investigate using ext4 with no journaling; the performance should be much better. It is disabled like this: tune4fs -O ^has_journal /dev/sdb and you can verify whether the has_journal property is there using /sbin/dumpe4fs /dev/sdb . Or, just create it without journaling in the first place: mkfs.ext4 -O ^has_journal /dev/sdb

Set up Condor

Put in the the condor configuration and init.d files from the Cloud Scheduler github repo:

mv /mnt/etc/init.d/condor /mnt/root/
cd /mnt/etc/init.d/
wget https://raw.github.com/hep-gc/cloud-scheduler/master/scripts/condor/worker/condor --no-check-certificate
chmod 755 condor
mv /mnt/etc/condor/condor_config /mnt/root/
cd /mnt/etc/condor
wget https://raw.github.com/hep-gc/cloud-scheduler/master/scripts/condor/worker/condor_config --no-check-certificate
wget https://raw.github.com/hep-gc/cloud-scheduler/master/scripts/condor/worker/condor_config.local --no-check-certificate

Modify /mnt/etc/init.d/condor

CONDOR_CONFIG_VAL=/opt/condor/bin/condor_config_val

Modify /mnt/etc/condor/condor_config . Comment out the NUM_SLOTS parameter, so that the VMs will have as many job slots as cores.

#NUM_SLOTS = 1
RELEASE_DIR             = /opt/condor

TODO: check that this line doesn't need to be changed:

ALLOW_WRITE = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR), $(CONDOR_HOST)

Modify /mnt/etc/condor/condor_config.local and add the following lines:

SLOT1_USER = atlas01
SLOT2_USER = atlas02
SLOT3_USER = atlas03
SLOT4_USER = atlas04
SLOT5_USER = atlas05
SLOT6_USER = atlas06
SLOT7_USER = atlas07
SLOT8_USER = atlas08
SLOT9_USER = atlas09
SLOT10_USER = atlas10
SLOT11_USER = atlas11
SLOT12_USER = atlas12
SLOT13_USER = atlas13
SLOT14_USER = atlas14
SLOT15_USER = atlas15
SLOT16_USER = atlas16
SLOT17_USER = atlas17
SLOT18_USER = atlas18
SLOT19_USER = atlas19
SLOT20_USER = atlas20
SLOT21_USER = atlas21
SLOT22_USER = atlas22
SLOT23_USER = atlas23
SLOT24_USER = atlas24
SLOT25_USER = atlas25
SLOT26_USER = atlas26
SLOT27_USER = atlas27
SLOT28_USER = atlas28
SLOT29_USER = atlas29
SLOT30_USER = atlas30
SLOT31_USER = atlas31
SLOT32_USER = atlas32
DEDICATED_EXECUTE_ACCOUNT_REGEXP = atlas[0-9]+
STARTER_ALLOW_RUNAS_OWNER = False
USER_JOB_WRAPPER=/usr/local/bin/condor-job-wrapper
EXECUTE=/scratch/condor

Create the wrapper script /mnt/usr/local/bin/condor-job-wrapper

#!/bin/bash -l
exec "$@"
Then chmod 755 /mnt/usr/local/bin/condor-job-wrapper

By default, Condor is configured to use directories in /var but they are missing in the CernVM image, so create the missing directories. Also, some directories in /etc/grid-security must exist when Condor uses GSI authentication.

mkdir /mnt/var/log/condor
mkdir /mnt/var/run/condor
mkdir /mnt/var/lib/condor
mkdir /mnt/var/lib/condor/spool

chown 102:102 /mnt/var/log/condor
chown 102:102 /mnt/var/run/condor
chown 102:102 /mnt/var/lib/condor
chown 102:102 /mnt/var/lib/condor/spool

mkdir /mnt/etc/grid-security
mkdir /mnt/etc/grid-security/certificates
touch /mnt/etc/grid-security/hostkey.pem
chmod 600 /mnt/etc/grid-security/hostkey.pem

Repoman

Check that /mnt/.image.metadata and contains the name (VMType) of the image.

TODO ... stuff about Repoman and the dual-hypervisor image. TODO ... check that the ssh keys don't get removed

Submitting jobs

Get a recent pilot wrapper script: =wget http://walkerr.web.cern.ch/walkerr/runpilot3-wrapper-sep19.sh= Launch the condor job like this:

Executable = runpilot3-wrapper.sh
Arguments  = -s ANALY_IAAS -h ANALY_IAAS -p 25443 -w https://pandaserver.cern.ch -j false -k 0 -u user

We set up automated proxy renewal using MyProxy as described at CsGsiSupport#Credential_renewal.

-- RyanTaylor - 2012-11-04

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback