SBATCH scripts

Hi all,

Could anyone give me an example of SBATCH script or teach me how to run ROMS/COAWST on cluster using SBATCH ? I see there is an example in the source codes, but that is for PBS. I was using the below command to test my installation on our cluster, but it does not work.

Thanks for help.


#SBATCH -p physical
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu=16000
#SBATCH --time=0:30:00

cd /data/COAWST/Projects/Estuary_test2
mpirun -np 8 ./coawstM > cwstv3.out

Re: SBATCH scripts

Really, this depends on how your system is set up. It's going to vary from one system to the next, you have to ask your system people. Here's what I'm using:

#SBATCH -t 144:00:00
#SBATCH --ntasks=192
#SBATCH --job-name=ARCTIC4
#SBATCH --tasks-per-node=24
#SBATCH -p t2standard
#SBATCH --account=akwaters
#SBATCH --output=ARCTIC4.%j
#SBATCH --no-requeue

. /usr/share/Modules/init/bash
module purge
module load slurm
module load toolchain/pic-iompi/2016b
module load numlib/imkl/
module load toolchain/pic-intel/2016b
module load compiler/icc/2016.3.210-GCC-5.4.0-2.26
module load compiler/ifort/2016.3.210-GCC-5.4.0-2.26
module load openmpi/intel/1.10.4
module load data/netCDF-Fortran/4.4.4-pic-intel-2016b
module list

#  Prolog
echo " "
echo "++++ Chinook ++++ $PGM_NAME began:    `date`"
echo "++++ Chinook ++++ $PGM_NAME hostname: `hostname`"
echo "++++ Chinook ++++ $PGM_NAME uname -a: `uname -a`"
echo " "
TBEGIN=`echo "print time();" | perl`

srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM

#  Epilog
TEND=`echo "print time();" | perl`
echo " "
echo "++++ Chinook ++++ $PGM_NAME pwd:      `pwd`"
echo "++++ Chinook ++++ $PGM_NAME ended:    `date`"
echo "++++ Chinook ++++ $PGM_NAME walltime: `expr $TEND - $TBEGIN` seconds"
That strange command to move the restart file is there because I'm having trouble modifying an existing file on restart, better to start fresh. This didn't used to be necessary - I think it's because I'm using hdf5 compression now.

The Prolog and Epilog are things I stole from a colleague.

Re: SBATCH scripts

Thanks Kate for sharing!

The SBATCH file I previously shown was given to me by our cluster admin. It works for me when I run on SWAN codes alone. For example, when I run SWAN on 1 nodes, 8 cores, I just prepare a slurm file like this:
#SBATCH -p physical
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu=16000
#SBATCH --time=10:00:00
mpirun --np 8 /data/models/swan/swan.exe

So maybe with ROMS/COAWST, it is more complicated.

Looking at your SBATCH script, I am puzzle :shock:, however I really want to make it work for me, so do you mind to explain further? For example:

Why do you need Prolog and Epilog? what does it mean?
Why do you move and to where?

and regarding two lines below:
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM

Why do you need -machinefile ./nodes --mca mpi_paffinity_alone 1? Is it compulsory?

Sorry for asking you further detail or too much.

Thanks & Regards

Re: SBATCH scripts

barack99 wrote:So maybe with ROMS/COAWST, it is more complicated.
It shouldn't be.
Why do you need Prolog and Epilog? what does it mean?
They are both informational only, not required.
Why do you move and to where?
I'm moving to I tell ROMS to read the latter and write to the former. With Netcdf3, it could read and write to the same file.
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM
The -machinefile comes from our system guys - I just do what they say. The mpi_paffinity_alone comes from a tip for making it run more efficiently.
Sorry for asking you further detail or too much.
No worries, you have to learn somehow.

Whey you say it doesn't work for you, what happens?

Re: SBATCH scripts

Oh yeah! It should not be more complicated as you said! :D :lol:

Previously, I got an error message: "SIGSEGV: Segmentation fault". After adjusting the grid/mesh to right directories, It works for me now.

Thanks Kate a lot for help and sharing & Have a great sunday!


