SBATCH scripts

Message

barack99 · #1 Unread post by **barack99** » Fri Sep 08, 2017 11:19 am

Hi all,

Could anyone give me an example of SBATCH script or teach me how to run ROMS/COAWST on cluster using SBATCH ? I see there is an example in the source codes, but that is for PBS. I was using the below command to test my installation on our cluster, but it does not work.

Thanks for help.

Regards
Barack

#!/bin/bash
#SBATCH -p physical
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu=16000
#SBATCH --time=0:30:00

cd /data/COAWST/Projects/Estuary_test2
mpirun -np 8 ./coawstM ocean_estuary_test2.in > cwstv3.out

kate · #2 Unread post by **kate** » Sat Sep 09, 2017 6:21 am

Really, this depends on how your system is set up. It's going to vary from one system to the next, you have to ask your system people. Here's what I'm using:

Code: Select all

#!/bin/bash
#SBATCH -t 144:00:00
#SBATCH --ntasks=192
#SBATCH --job-name=ARCTIC4
#SBATCH --tasks-per-node=24
#SBATCH -p t2standard
#SBATCH --account=akwaters
#SBATCH --output=ARCTIC4.%j
#SBATCH --no-requeue

cd $SLURM_SUBMIT_DIR
. /usr/share/Modules/init/bash
module purge
module load slurm
module load toolchain/pic-iompi/2016b
module load numlib/imkl/11.3.3.210-pic-iompi-2016b
module load toolchain/pic-intel/2016b
module load compiler/icc/2016.3.210-GCC-5.4.0-2.26
module load compiler/ifort/2016.3.210-GCC-5.4.0-2.26
module load openmpi/intel/1.10.4
module load data/netCDF-Fortran/4.4.4-pic-intel-2016b
module list

#
#  Prolog
#
echo " "
echo "++++ Chinook ++++ $PGM_NAME began:    `date`"
echo "++++ Chinook ++++ $PGM_NAME hostname: `hostname`"
echo "++++ Chinook ++++ $PGM_NAME uname -a: `uname -a`"
echo " "
TBEGIN=`echo "print time();" | perl`

mv arctic4_rst.nc arctic4_foo.nc
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in

#
#  Epilog
#
TEND=`echo "print time();" | perl`
echo " "
echo "++++ Chinook ++++ $PGM_NAME pwd:      `pwd`"
echo "++++ Chinook ++++ $PGM_NAME ended:    `date`"
echo "++++ Chinook ++++ $PGM_NAME walltime: `expr $TEND - $TBEGIN` seconds"

That strange command to move the restart file is there because I'm having trouble modifying an existing file on restart, better to start fresh. This didn't used to be necessary - I think it's because I'm using hdf5 compression now.

The Prolog and Epilog are things I stole from a colleague.

barack99 · #3 Unread post by **barack99** » Sat Sep 09, 2017 11:24 pm

Thanks Kate for sharing!

The SBATCH file I previously shown was given to me by our cluster admin. It works for me when I run on SWAN codes alone. For example, when I run SWAN on 1 nodes, 8 cores, I just prepare a slurm file like this:
#!/bin/bash
#SBATCH -p physical
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu=16000
#SBATCH --time=10:00:00
mpirun --np 8 /data/models/swan/swan.exe

So maybe with ROMS/COAWST, it is more complicated.

Looking at your SBATCH script, I am puzzle

, however I really want to make it work for me, so do you mind to explain further? For example:

Why do you need Prolog and Epilog? what does it mean?
Why do you move arctic4_rst.nc arctic4_foo.nc? and to where?

and regarding two lines below:
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in

Why do you need -machinefile ./nodes --mca mpi_paffinity_alone 1? Is it compulsory?

Sorry for asking you further detail or too much.

Thanks & Regards
Barack

kate · #4 Unread post by **kate** » Sun Sep 10, 2017 12:16 am

barack99 wrote:So maybe with ROMS/COAWST, it is more complicated.

It shouldn't be.

Why do you need Prolog and Epilog? what does it mean?

They are both informational only, not required.

Why do you move arctic4_rst.nc arctic4_foo.nc? and to where?

I'm moving arctic4_rst.nc to arctic4_foo.nc. I tell ROMS to read the latter and write to the former. With Netcdf3, it could read and write to the same file.

srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in

The -machinefile comes from our system guys - I just do what they say. The mpi_paffinity_alone comes from a tip for making it run more efficiently.

Sorry for asking you further detail or too much.

No worries, you have to learn somehow.

Whey you say it doesn't work for you, what happens?

barack99 · #5 Unread post by **barack99** » Sun Sep 10, 2017 1:06 am

Oh yeah! It should not be more complicated as you said!

Previously, I got an error message: "SIGSEGV: Segmentation fault". After adjusting the grid/mesh to right directories, It works for me now.

Thanks Kate a lot for help and sharing & Have a great sunday!

Regards
Barack

Ocean Modeling Discussion

SBATCH scripts

SBATCH scripts

Re: SBATCH scripts

Re: SBATCH scripts

Re: SBATCH scripts

Re: SBATCH scripts