got different results running on different number of CPUs

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
dingyang2009
Posts: 4
Joined: Thu Dec 10, 2009 4:16 pm
Location: Ocean University of China

got different results running on different number of CPUs

#1 Unread post by dingyang2009 »

I have run the same case using different number of CPUs. The initial conditions and forcing field were same in two runs. I checked the results, temperature for example were different for the two runs. First time 24 cpus were used and 48 cpus were used for the second run. The model simulated values are not exactly the same for the two runs. How does this happen and how to solve this problem? :?:

jcwarner
Posts: 1200
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: got different results running on different number of CPU

#2 Unread post by jcwarner »

what are the differences?
if you pcolor zeta_tileXY - zeta_tileYX do you see stripes, or are the differences at the boundaries?
you could try differnet advection schemes, etc.

dingyang2009
Posts: 4
Joined: Thu Dec 10, 2009 4:16 pm
Location: Ocean University of China

Re: got different results running on different number of CPU

#3 Unread post by dingyang2009 »

jcwarner wrote:what are the differences?
if you pcolor zeta_tileXY - zeta_tileYX do you see stripes, or are the differences at the boundaries?
you could try differnet advection schemes, etc.
Thanks!
I did temp_case1 - temp_case2, the values are not zero,not only at the boundaries. temp_case1 and temp_case2 are at same day and the same layer in the two runs with different cpus.

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: got different results running on different number of CPU

#4 Unread post by kate »

The way to debug these things is to save history snapshots for the first few timesteps of a run then run ncdiff on the two cases. Which is the first field to show anything? Is it the first timestep or the second? What do the differences look like? I've included an example a few timesteps into a (very old) run. The field shown is ice thickness, but the ocean temperature went bad first.

Also, are you running MPI or OpenMP? Is it the latest code?
Attachments
Difference field.
Difference field.
tiles.png (32.19 KiB) Viewed 2757 times

dingyang2009
Posts: 4
Joined: Thu Dec 10, 2009 4:16 pm
Location: Ocean University of China

Re: got different results running on different number of CPU

#5 Unread post by dingyang2009 »

I am not quite sure which field to show difference. The model was run for several days and the surface temperature field for the second day was shown in the image. The model was run on MPI and it was not the latest code. Is the difference due to mpi version?
t_diff.jpg

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: got different results running on different number of CPU

#6 Unread post by kate »

Honestly, we can tell nothing from your plot. If the differences have the pattern as in my plot after ten steps, by the time two days go by they will look like your plot. To get to the bottom of it, you will have to do some more research, saving every timestep for say five timesteps, right at the beginning of both runs.

I would start by updating to Hernan's latest code - you don't want to report a bug only to have Hernan reply that he fixed it four months ago. He doesn't have a lot of patience for that sort of nonsense.

Also, as John suggests, you might try other advection schemes, other diffusion options, etc. There are at least some configurations where ROMS is known to give the same answer on 4x1 vs. 1x4 decompositions.

dingyang2009
Posts: 4
Joined: Thu Dec 10, 2009 4:16 pm
Location: Ocean University of China

Re: got different results running on different number of CPU

#7 Unread post by dingyang2009 »

Much thanks! Really appreciate it.

Post Reply