I have been having horrific problems trying to run ROMS 2.1 with MPI on an HP Itanium cluster and ifort 8.1. No matter what I try, I cannot run on more than one 4-processor node. Does anyone else in the ROMS community have experience with ROMS and MPI+Itanium+ifort?
The same model configuration works extremely well on an IBM Opteron cluster with both ifort 8.1 and PGI Fortran90, although there were a few hoops to jump through with ifort.
ROMS on Itanium with MPI and ifort
-
- Posts: 19
- Joined: Wed Apr 23, 2003 1:34 pm
- Location: IMR, Bergen, Norway
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
I am not aware of such problems with the Itanium. However, we did discovered an odd behavior in the distributed-memory layer of ROMS when the tile partition is nonuniform accross the communicator group. This will generate access violations on MPI because the internal communication buffers are of different size. This has never happened to me because all my applications have a grid size that it is a power of 2 in both directions. I am very picky about this. I want balanced load in all my parallel applications.
I don't know if this is your problem but many users do not pay that much attention to the grid size. This problem is also in the released version of ROMS 2.2. However, there is a correction to this problem. See release notes:
viewtopic.php?t=198
Now, the internal buffers are dimensioned to the maximum size of all tiles. See parameter TileSize in mod_param.F.
I think that you need to update mod_param.F, inp_par.F and distribute.F to fix this potential problem for uneven tile partitions.
Good luck
I don't know if this is your problem but many users do not pay that much attention to the grid size. This problem is also in the released version of ROMS 2.2. However, there is a correction to this problem. See release notes:
viewtopic.php?t=198
Now, the internal buffers are dimensioned to the maximum size of all tiles. See parameter TileSize in mod_param.F.
I think that you need to update mod_param.F, inp_par.F and distribute.F to fix this potential problem for uneven tile partitions.
Good luck
-
- Posts: 19
- Joined: Wed Apr 23, 2003 1:34 pm
- Location: IMR, Bergen, Norway
Thanks Hernan. I have now tried using the corrected (Aug. 11) version of ROMS 2.2 with the UPWELLING case and get the same problem. If I use Lm=41, Mm=80, N=16 (in mod_param.F), with NtileI=2, NtileJ=4 (in External/ocean_upw.in), everything works fine. But if I use Lm=252, Mm=296, N=30 (same as lanerolle of April 28 ), using NtileI, NtileJ as before, ROMS crashes with a segmentation fault after:
INITIAL: Configurating and initializing forward nonlinear model ...
I'm using mpiexec with mpich in a PBS queueing system. The same configuration works fine under OpenMP with 4 processors on a single node, and in serial mode, in both cases on the PBS batch queueing system.
The Linux operating system kernel is 2.4.21-20 and the compiler is ifort 8.1 . Previous experience with an HP Itanium suggests that such an old version of the kernel with a relatively new version of ifort may lead to problems, but I'm just guessing.
I've done all the obvious things like setting the stack size to unlimited and increasing the number of tiles (and processors, of course), but nothing seems to help.
The same MPI configuration works fine on an IBM Opteron cluster, IBM Regatta and SGI Origin 3800 systems.
Any suggestions?
INITIAL: Configurating and initializing forward nonlinear model ...
I'm using mpiexec with mpich in a PBS queueing system. The same configuration works fine under OpenMP with 4 processors on a single node, and in serial mode, in both cases on the PBS batch queueing system.
The Linux operating system kernel is 2.4.21-20 and the compiler is ifort 8.1 . Previous experience with an HP Itanium suggests that such an old version of the kernel with a relatively new version of ifort may lead to problems, but I'm just guessing.
I've done all the obvious things like setting the stack size to unlimited and increasing the number of tiles (and processors, of course), but nothing seems to help.
The same MPI configuration works fine on an IBM Opteron cluster, IBM Regatta and SGI Origin 3800 systems.
Any suggestions?
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
If the problem is with ifort and the kernel there is not much that we can do. Perhaps, you can try other compilers to see if this is the case. You may try an older ifc compiler. An alternative is to install a working version of g95. ROMS works well with g95. I have a version that works very well on my laptop. We also had a version of g95 that worked on our cluster but we made the mistake of updating it and now it is broken. We have not been able to find one that works yet. The GNU software changes daily.
Good luck
Good luck