Domain decomposition in ROMS

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
feroda

Domain decomposition in ROMS

#1 Unread post by feroda »

Hi all,
The product of the two domain decomposition parameters(NtileI,NtileJ) equals to the number of CPUs allocated to the executable oceanM.

Obviously, that's the chief law in using MPI mode. My question is
"Is it the sufficient condition that will make sure the model keeping on going correctly?"

The reason why I would raise this question and ask for your explanation is reported as follows.

We have a cluster with one master nodes and 16 slave nodes, each of which has two CPUs. Namely, 32 CPUs are available in all. I have always run ROMS by using of 16 CPUs(one in each slave node) because of some wired MPI software issues. Recently, we updated the software from mpich2 to openmpi and all the 32 CPUs are available for me.

However, the same model keeps on blowing-up by using any possible combination of the decomposition parameters(2x16,4x8,8x4,16x2, as well as 1x16, and 1x15).

Before I turned to 32 CPUs, the domain decomposition parameters are
NtileI == 16 ! I-direction partition
NtileJ == 1 ! J-direction partition

And, the model goes pretty good except its low efficiency.
Now, the model runs much more faster. But, it always blows up around 16 hrs(model time).

Can anyone lay some hints on the reason for that? Thank you very much!

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Domain decomposition in ROMS

#2 Unread post by kate »

So it has never run with openmpi, is that what you're saying? Or does it still run with only 16 processes? Are other people happily running mpi jobs on it now or do you have any mpi test codes to try?

What are your Lm and Mm values? I'm only asking because you said you have NtileI=16 and NtileJ=1. Did you ever try other values for these? You might get better performance from NtileI=1 and NtileJ=16 if your processor does pipelining (which would be in the i direction). That's just about performance and not correctness, though.

feroda

Re: Domain decomposition in ROMS

#3 Unread post by feroda »

Hi Kate, Thank you.
kate wrote:So it has never run with openmpi, is that what you're saying?
Correct
kate wrote:Or does it still run with only 16 processes?
NO. It doesn't under openmpi.
kate wrote:Are other people happily running mpi jobs on it now or do you have any mpi test codes to try?
No any mpi jobs on the cluster now.
MPI has past the test case after the software was set up.

kate wrote:What are your Lm and Mm values?
Lm = 360;
Mm = 300

kate wrote: I'm only asking because you said you have NtileI=16 and NtileJ=1. Did you ever try other values for these?
Yes. Under mpich2, all other combinations for 16 CPUs worked fine with a little bit different efficiency.
Last edited by feroda on Mon Dec 07, 2009 8:10 pm, edited 2 times in total.

feroda

Re: Domain decomposition in ROMS

#4 Unread post by feroda »

It is weird for me that the same model setup works fine on 16 nodes(mpich2), while it walks out on 32 nodes(openmpi).
Does it possibly due to the decomposition of MPI? Information exchanges between different blocks may not be perfect --- I don't know actually.

What's your comment on that?

Thank you
Last edited by feroda on Mon Dec 07, 2009 8:19 pm, edited 1 time in total.

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Domain decomposition in ROMS

#5 Unread post by kate »

If ROMS is working correctly, it will work on any decomposition and it will give the same answer for any tiling (at least to machine precision). I don't think the change from 16 to 32 processes is the culprit. We are not using openmpi here, but I thought others were using it successfully with ROMS.

Do you get the same answer with any tiling? As the serial code? I have experienced parallel bugs in ROMS, especially the parts not in Hernan's code - they weren't *all* my fault. :roll:

feroda

Re: Domain decomposition in ROMS

#6 Unread post by feroda »

kate wrote:Do you get the same answer with any tiling? As the serial code?
YES, I do. I tested on the mpi code and the same error came out with any tiling.


I just have a try on the serial code. The results are posted as follows:
STEP Day HH:MM:SS KINETIC_ENRG POTEN_ENRG TOTAL_ENRG NET_VOLUME

0 0 00:00:00 1.775125E-03 2.094761E+04 2.094761E+04 1.162121E+16
DEF_HIS - creating history file: year_bmk_his_0001.nc
WRT_HIS - wrote history fields (Index=1,1) into time record = 0000001
DEF_AVG - creating average file: year_bmk_avg_0001.nc

1 0 00:04:00 NaN NaN NaN NaN

Blowing-up: Saving latest model state into RESTART file


In parallel code, the model goes to 0 16:40:00, and blows up.

feroda

Re: Domain decomposition in ROMS

#7 Unread post by feroda »

Just want to report what I found after fighting with the issue.

I regenerated the initial file with more smoothed distribution pattern, and the model won't blow up in 32 CPUs. --- I do not know why.

Post Reply