got different results running on different number of CPUs
-
- Posts: 4
- Joined: Thu Dec 10, 2009 4:16 pm
- Location: Ocean University of China
got different results running on different number of CPUs
I have run the same case using different number of CPUs. The initial conditions and forcing field were same in two runs. I checked the results, temperature for example were different for the two runs. First time 24 cpus were used and 48 cpus were used for the second run. The model simulated values are not exactly the same for the two runs. How does this happen and how to solve this problem?
Re: got different results running on different number of CPU
what are the differences?
if you pcolor zeta_tileXY - zeta_tileYX do you see stripes, or are the differences at the boundaries?
you could try differnet advection schemes, etc.
if you pcolor zeta_tileXY - zeta_tileYX do you see stripes, or are the differences at the boundaries?
you could try differnet advection schemes, etc.
-
- Posts: 4
- Joined: Thu Dec 10, 2009 4:16 pm
- Location: Ocean University of China
Re: got different results running on different number of CPU
Thanks!jcwarner wrote:what are the differences?
if you pcolor zeta_tileXY - zeta_tileYX do you see stripes, or are the differences at the boundaries?
you could try differnet advection schemes, etc.
I did temp_case1 - temp_case2, the values are not zero,not only at the boundaries. temp_case1 and temp_case2 are at same day and the same layer in the two runs with different cpus.
Re: got different results running on different number of CPU
The way to debug these things is to save history snapshots for the first few timesteps of a run then run ncdiff on the two cases. Which is the first field to show anything? Is it the first timestep or the second? What do the differences look like? I've included an example a few timesteps into a (very old) run. The field shown is ice thickness, but the ocean temperature went bad first.
Also, are you running MPI or OpenMP? Is it the latest code?
Also, are you running MPI or OpenMP? Is it the latest code?
- Attachments
-
- Difference field.
- tiles.png (32.19 KiB) Viewed 2758 times
-
- Posts: 4
- Joined: Thu Dec 10, 2009 4:16 pm
- Location: Ocean University of China
Re: got different results running on different number of CPU
I am not quite sure which field to show difference. The model was run for several days and the surface temperature field for the second day was shown in the image. The model was run on MPI and it was not the latest code. Is the difference due to mpi version?
Re: got different results running on different number of CPU
Honestly, we can tell nothing from your plot. If the differences have the pattern as in my plot after ten steps, by the time two days go by they will look like your plot. To get to the bottom of it, you will have to do some more research, saving every timestep for say five timesteps, right at the beginning of both runs.
I would start by updating to Hernan's latest code - you don't want to report a bug only to have Hernan reply that he fixed it four months ago. He doesn't have a lot of patience for that sort of nonsense.
Also, as John suggests, you might try other advection schemes, other diffusion options, etc. There are at least some configurations where ROMS is known to give the same answer on 4x1 vs. 1x4 decompositions.
I would start by updating to Hernan's latest code - you don't want to report a bug only to have Hernan reply that he fixed it four months ago. He doesn't have a lot of patience for that sort of nonsense.
Also, as John suggests, you might try other advection schemes, other diffusion options, etc. There are at least some configurations where ROMS is known to give the same answer on 4x1 vs. 1x4 decompositions.
-
- Posts: 4
- Joined: Thu Dec 10, 2009 4:16 pm
- Location: Ocean University of China
Re: got different results running on different number of CPU
Much thanks! Really appreciate it.