About & News
Getting Started
   Get An Account
   Login
   Environment customization
   ACES queues
   Queue Examples
      MPI
      Charm++
   Compile code
   Hardware Groups
   Itanium2 nodes and IA64 software
   Storage
   Office hours
Sites
Available software
Status
People
FAQ
Mailing Lists
Quick Links
Contact Us
Sponsors
Search

MIT logo

Example ACES queue Charm++ jobs

The Charm++ system is installed on the ACES clusters and offers both a native parallel programming environment as well as a method of converting MPI codes to Charm++ codes allowing for the possibility of dynamic load balancing etc. More details refer to the Adaptive MPI manual.

Parallel jobs

PBS parallel jobs use the TCP-based MPI communications (over Gigabit or Fast Ethernet depending on the node) by default.

script file example-charm.csh

#!/bin/csh
# invoking Charm++/AMPI on ACES Linux clusters
#
# All PBS options start as "#PBS " and can be specified on the command line
# after qsub instead of being embedded in the script file.
 
#----------------------------------------------
# o Queue name
# -q queue
# Parallel queues available on itrda are:
# four (2hours,16nodes),four-twelve (12hours,26nodes),long (168hours,64nodes)
 
#PBS -q four
 
#----------------------------------------------
# o Job name instead of the PBS script filename
# -N Job name (use a distinguishing name)
 
#PBS -N MyNameCharm
 
#----------------------------------------------
# o Resource lists
# -l resource lists, separated by a ","
# To ask for N nodes use "nodes=N"
# To ask for 2 processor per node use ":ppn=2", otherwise ":ppn=1"
# after the nodes=N. Preferably use ppn=2 and ask for less nodes.
# To ask for Myrinet use ":myrinet", for Gigabit Ethernet use ":gigabit"
# after the nodes=N:ppn=M
# To specify total wallclock time use "walltime=hh:mm:ss"
 
#PBS -l nodes=16:ppn=1,walltime=00:10:00
 
#----------------------------------------------
# o stderr/out combination
# -j {eo|oe}
# Causes the standard error and standard output to be combined in one file.
# For standard output to be added to standard error use "eo"
# For standard error to be added to standard output  use "oe"
#
# o stderr/out (specify them instead if getting script.[oe]$PBS_JOBID
# -e standard error file
# -o standard output file
# You can append ${PBS_JOBID} to ensure distict filenames
 
#PBS -e myrunCharm.stderr
#PBS -o myrunCharm.stdout
 
#----------------------------------------------
# o Starting time
# -a time
# Declares the time after which the job is eligible for execution.
 
#----------------------------------------------
# o User notification
# -m {a|b|e}
# Send mail to the user when:
# job aborts: "a", job begins running: "b", job ends: "e"
 
#PBS -m ae
 
#----------------------------------------------
# o Exporting of environment
# -V export all my environment var's
 
#PBS -V
 
#----------------------------------------------
                                                                                
# Begin execution
 
#
# Check the environment variables
#
#printenv
 
#
# Get the right Charm variant module 
# net-linux    : UDP comms
# net-linux-smp: SMP on a node, UDP outside, higher latency for off-node comms
# net-linux-tcp: TCP comms, more reliable, less performance
# net-linux-gm : Myrinet

#module add charm/5.9/net-linux
module add charm/5.9/net-linux-smp
#module add charm/5.9/net-linux-tcp
#module add charm/5.9/net-linux-gm

#
# get PBS node info
#
echo $PBS_NODEFILE
cat  $PBS_NODEFILE
 
#----------------------------------------------
# cd to the working directory from which the job was submitted
#
cd $PBS_O_WORKDIR

# How many procs do I have?
setenv NP `wc -l $PBS_NODEFILE | awk '{print $1}'`
 
# Get a file with unique hostnames
uniq $PBS_NODEFILE > machinefile.uniq.$PBS_JOBID
# How many nodes do I have?
setenv NPU `wc -l machinefile.uniq.$PBS_JOBID | awk '{print $1}'`

# setup the required variables
setenv NODELIST nodelist.$PBS_JOBID

# need this for the foreach loop
setenv NODES `cat machinefile.uniq.$PBS_JOBID`

# Create the Charm++ nodefile
echo group main > $NODELIST
# in 
foreach node ($NODES)
echo host $node ++cpus 2 >> $NODELIST
end

#
# Run the AMPI code called "executable", provided it is in PBS_O_WORKDIR
# and your PATH includes "."
# 
ampirun -np $NP ./executable
# Otherwise run directly:
charmrun +p$NP ./executable
# Say you want to run on double the number of physical processors:
@ NP2 = 2 * NP
charmrun +p$NP +vp$NP2 ./executable

# Cleanup
# Remove the unique machinefiles
rm $PBS_O_WORKDIR/machinefile.uniq.$PBS_JOBID
rm $PBS_O_WORKDIR/$NODELIST

#
# Exit (not strictly necessary)
#
exit