Hi all,
The product of the two domain decomposition parameters(NtileI,NtileJ) equals to the number of CPUs allocated to the executable oceanM.
Obviously, that's the chief law in using MPI mode. My question is
"Is it the sufficient condition that will make sure the model keeping on going correctly?"
The reason why I would raise this question and ask for your explanation is reported as follows.
We have a cluster with one master nodes and 16 slave nodes, each of which has two CPUs. Namely, 32 CPUs are available in all. I have always run ROMS by using of 16 CPUs(one in each slave node) because of some wired MPI software issues. Recently, we updated the software from mpich2 to openmpi and all the 32 CPUs are available for me.
However, the same model keeps on blowing-up by using any possible combination of the decomposition parameters(2x16,4x8,8x4,16x2, as well as 1x16, and 1x15).
Before I turned to 32 CPUs, the domain decomposition parameters are
NtileI == 16 ! I-direction partition
NtileJ == 1 ! J-direction partition
And, the model goes pretty good except its low efficiency.
Now, the model runs much more faster. But, it always blows up around 16 hrs(model time).
Can anyone lay some hints on the reason for that? Thank you very much!
Domain decomposition in ROMS
Re: Domain decomposition in ROMS
So it has never run with openmpi, is that what you're saying? Or does it still run with only 16 processes? Are other people happily running mpi jobs on it now or do you have any mpi test codes to try?
What are your Lm and Mm values? I'm only asking because you said you have NtileI=16 and NtileJ=1. Did you ever try other values for these? You might get better performance from NtileI=1 and NtileJ=16 if your processor does pipelining (which would be in the i direction). That's just about performance and not correctness, though.
What are your Lm and Mm values? I'm only asking because you said you have NtileI=16 and NtileJ=1. Did you ever try other values for these? You might get better performance from NtileI=1 and NtileJ=16 if your processor does pipelining (which would be in the i direction). That's just about performance and not correctness, though.
Re: Domain decomposition in ROMS
Hi Kate, Thank you.
Yes. Under mpich2, all other combinations for 16 CPUs worked fine with a little bit different efficiency.kate wrote:So it has never run with openmpi, is that what you're saying?
Correct
NO. It doesn't under openmpi.kate wrote:Or does it still run with only 16 processes?
No any mpi jobs on the cluster now.kate wrote:Are other people happily running mpi jobs on it now or do you have any mpi test codes to try?
MPI has past the test case after the software was set up.
Lm = 360;kate wrote:What are your Lm and Mm values?
Mm = 300
kate wrote: I'm only asking because you said you have NtileI=16 and NtileJ=1. Did you ever try other values for these?
Last edited by feroda on Mon Dec 07, 2009 8:10 pm, edited 2 times in total.
Re: Domain decomposition in ROMS
It is weird for me that the same model setup works fine on 16 nodes(mpich2), while it walks out on 32 nodes(openmpi).
Does it possibly due to the decomposition of MPI? Information exchanges between different blocks may not be perfect --- I don't know actually.
What's your comment on that?
Thank you
Does it possibly due to the decomposition of MPI? Information exchanges between different blocks may not be perfect --- I don't know actually.
What's your comment on that?
Thank you
Last edited by feroda on Mon Dec 07, 2009 8:19 pm, edited 1 time in total.
Re: Domain decomposition in ROMS
If ROMS is working correctly, it will work on any decomposition and it will give the same answer for any tiling (at least to machine precision). I don't think the change from 16 to 32 processes is the culprit. We are not using openmpi here, but I thought others were using it successfully with ROMS.
Do you get the same answer with any tiling? As the serial code? I have experienced parallel bugs in ROMS, especially the parts not in Hernan's code - they weren't *all* my fault.
Do you get the same answer with any tiling? As the serial code? I have experienced parallel bugs in ROMS, especially the parts not in Hernan's code - they weren't *all* my fault.
Re: Domain decomposition in ROMS
YES, I do. I tested on the mpi code and the same error came out with any tiling.kate wrote:Do you get the same answer with any tiling? As the serial code?
I just have a try on the serial code. The results are posted as follows:
STEP Day HH:MM:SS KINETIC_ENRG POTEN_ENRG TOTAL_ENRG NET_VOLUME
0 0 00:00:00 1.775125E-03 2.094761E+04 2.094761E+04 1.162121E+16
DEF_HIS - creating history file: year_bmk_his_0001.nc
WRT_HIS - wrote history fields (Index=1,1) into time record = 0000001
DEF_AVG - creating average file: year_bmk_avg_0001.nc
1 0 00:04:00 NaN NaN NaN NaN
Blowing-up: Saving latest model state into RESTART file
In parallel code, the model goes to 0 16:40:00, and blows up.
Re: Domain decomposition in ROMS
Just want to report what I found after fighting with the issue.
I regenerated the initial file with more smoothed distribution pattern, and the model won't blow up in 32 CPUs. --- I do not know why.
I regenerated the initial file with more smoothed distribution pattern, and the model won't blow up in 32 CPUs. --- I do not know why.