Hi all,
Could anyone give me an example of SBATCH script or teach me how to run ROMS/COAWST on cluster using SBATCH ? I see there is an example in the source codes, but that is for PBS. I was using the below command to test my installation on our cluster, but it does not work.
Thanks for help.
Regards
Barack
#!/bin/bash
#SBATCH -p physical
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu=16000
#SBATCH --time=0:30:00
cd /data/COAWST/Projects/Estuary_test2
mpirun -np 8 ./coawstM ocean_estuary_test2.in > cwstv3.out
SBATCH scripts
Re: SBATCH scripts
Really, this depends on how your system is set up. It's going to vary from one system to the next, you have to ask your system people. Here's what I'm using:
That strange command to move the restart file is there because I'm having trouble modifying an existing file on restart, better to start fresh. This didn't used to be necessary - I think it's because I'm using hdf5 compression now.
The Prolog and Epilog are things I stole from a colleague.
Code: Select all
#!/bin/bash
#SBATCH -t 144:00:00
#SBATCH --ntasks=192
#SBATCH --job-name=ARCTIC4
#SBATCH --tasks-per-node=24
#SBATCH -p t2standard
#SBATCH --account=akwaters
#SBATCH --output=ARCTIC4.%j
#SBATCH --no-requeue
cd $SLURM_SUBMIT_DIR
. /usr/share/Modules/init/bash
module purge
module load slurm
module load toolchain/pic-iompi/2016b
module load numlib/imkl/11.3.3.210-pic-iompi-2016b
module load toolchain/pic-intel/2016b
module load compiler/icc/2016.3.210-GCC-5.4.0-2.26
module load compiler/ifort/2016.3.210-GCC-5.4.0-2.26
module load openmpi/intel/1.10.4
module load data/netCDF-Fortran/4.4.4-pic-intel-2016b
module list
#
# Prolog
#
echo " "
echo "++++ Chinook ++++ $PGM_NAME began: `date`"
echo "++++ Chinook ++++ $PGM_NAME hostname: `hostname`"
echo "++++ Chinook ++++ $PGM_NAME uname -a: `uname -a`"
echo " "
TBEGIN=`echo "print time();" | perl`
mv arctic4_rst.nc arctic4_foo.nc
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in
#
# Epilog
#
TEND=`echo "print time();" | perl`
echo " "
echo "++++ Chinook ++++ $PGM_NAME pwd: `pwd`"
echo "++++ Chinook ++++ $PGM_NAME ended: `date`"
echo "++++ Chinook ++++ $PGM_NAME walltime: `expr $TEND - $TBEGIN` seconds"
The Prolog and Epilog are things I stole from a colleague.
Re: SBATCH scripts
Thanks Kate for sharing!
The SBATCH file I previously shown was given to me by our cluster admin. It works for me when I run on SWAN codes alone. For example, when I run SWAN on 1 nodes, 8 cores, I just prepare a slurm file like this:
#!/bin/bash
#SBATCH -p physical
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu=16000
#SBATCH --time=10:00:00
mpirun --np 8 /data/models/swan/swan.exe
So maybe with ROMS/COAWST, it is more complicated.
Looking at your SBATCH script, I am puzzle , however I really want to make it work for me, so do you mind to explain further? For example:
Why do you need Prolog and Epilog? what does it mean?
Why do you move arctic4_rst.nc arctic4_foo.nc? and to where?
and regarding two lines below:
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in
Why do you need -machinefile ./nodes --mca mpi_paffinity_alone 1? Is it compulsory?
Sorry for asking you further detail or too much.
Thanks & Regards
Barack
The SBATCH file I previously shown was given to me by our cluster admin. It works for me when I run on SWAN codes alone. For example, when I run SWAN on 1 nodes, 8 cores, I just prepare a slurm file like this:
#!/bin/bash
#SBATCH -p physical
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu=16000
#SBATCH --time=10:00:00
mpirun --np 8 /data/models/swan/swan.exe
So maybe with ROMS/COAWST, it is more complicated.
Looking at your SBATCH script, I am puzzle , however I really want to make it work for me, so do you mind to explain further? For example:
Why do you need Prolog and Epilog? what does it mean?
Why do you move arctic4_rst.nc arctic4_foo.nc? and to where?
and regarding two lines below:
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in
Why do you need -machinefile ./nodes --mca mpi_paffinity_alone 1? Is it compulsory?
Sorry for asking you further detail or too much.
Thanks & Regards
Barack
Barack
Re: SBATCH scripts
It shouldn't be.barack99 wrote:So maybe with ROMS/COAWST, it is more complicated.
They are both informational only, not required.Why do you need Prolog and Epilog? what does it mean?
I'm moving arctic4_rst.nc to arctic4_foo.nc. I tell ROMS to read the latter and write to the former. With Netcdf3, it could read and write to the same file.Why do you move arctic4_rst.nc arctic4_foo.nc? and to where?
The -machinefile comes from our system guys - I just do what they say. The mpi_paffinity_alone comes from a tip for making it run more efficiently.srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in
No worries, you have to learn somehow.Sorry for asking you further detail or too much.
Whey you say it doesn't work for you, what happens?
Re: SBATCH scripts
Oh yeah! It should not be more complicated as you said!
Previously, I got an error message: "SIGSEGV: Segmentation fault". After adjusting the grid/mesh to right directories, It works for me now.
Thanks Kate a lot for help and sharing & Have a great sunday!
Regards
Barack
Previously, I got an error message: "SIGSEGV: Segmentation fault". After adjusting the grid/mesh to right directories, It works for me now.
Thanks Kate a lot for help and sharing & Have a great sunday!
Regards
Barack
Barack