Attached are output files from two consecutive runs of the FLT_TEST case. In both cases ROMS (latest source) is run under OpenMP with 1x2 tiles, using Gfortran under Cygwin. The first run (rom001.log) proceeds as it should: the model is initialised analytically (NRREC=0) and runs to 1152 time steps, or 0.8 days. The second (rom002.log) starts from the restart fields saved by the first run (NRREC=-1) and initially looks OK. However after time step 1214 we have time step 63, and the time jumps back accordingly. Then after time step 73 we jump back to 1226, and from there the time step counter jumps back and forth from time to time. It finally hangs, for no obvious reason at time step 2304.
My interpretation of this is that one of the threads knows about the data read from the restart file and the other doesn't.
This misbehaviour occurs only in OpenMP runs, not in serial or MPI runs. It occurs with Cygwin/Gfortran and AIX/xlf, but not with Linux/Gfortran. Overall it looks like a matter of misplaced OpenMP directives, subject to different interpretation by the different compilers and is somewhat reminiscent (perhaps) of the OpenMP FLOAT_VWALK problem I encountered in 2012:
viewtopic.php?f=19&t=2584
I will look into it further when I can.
OpenMP problem with restart from file
- m.hadfield
- Posts: 521
- Joined: Tue Jul 01, 2003 4:12 am
- Location: NIWA
OpenMP problem with restart from file
- Attachments
-
- rom002.log
- Output file from second run (initialised from restart)
- (190.85 KiB) Downloaded 377 times
-
- rom001.log
- Output file from first run (initialised analytically)
- (191.96 KiB) Downloaded 359 times
- m.hadfield
- Posts: 521
- Joined: Tue Jul 01, 2003 4:12 am
- Location: NIWA
Re: OpenMP problem with restart from file
A couple of extra data points:
- The problem I reported also occurs in the UPWELLING test case. It's not specific to float simulations (though float simulations are where I tend to use OpenMP).
- Contrary to what I said earlier, it does occur with Linux/Gfortran, as well as Cygwin/Gfortran and AIX/xlf.
- m.hadfield
- Posts: 521
- Joined: Tue Jul 01, 2003 4:12 am
- Location: NIWA
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: OpenMP problem with restart from file
Yes, I was able to reproduce this bug. I finally have time to check it in the debugger. I corrected the bug. See ticket for details. Thank you for bringing this to my attention. Please update.
- m.hadfield
- Posts: 521
- Joined: Tue Jul 01, 2003 4:12 am
- Location: NIWA
Re: OpenMP problem with restart from file
Thanks, Hernan!