Greetings,
I understand that asking once more about this matter
is kind of annoying and i am apologizing in advance,
but to be completely honest I am completely confused
/lost after reading several posts concerning ROMS tilling.
I recently gained access to a HPC system (https://hpc.grnet.gr/en/)
and I can use up to 500 cores for each run. My domain is
409*421*30 with a resolution almost 1.2km. I am using
DT=60 sec and ndtfast=28. A years run lasts 11.9 hours
when using all available cores (500). I am using gfortran
4.9.2 and openmpi 1.8.7 (newer versions for both compiler and openmpi
are available and also intel fortran and intel mpi).I am suspecting that
my tilling (NtilesI=20 and NtilesJ=25) combined with the
memory size (36GB) causes mpi overheads. I understand that my domain's
I and J points are not powers of 2, as Hernan suggests, and that
might cause additional problems. I am attaching the log file.
Any suggestion or help is highly appreciated.
Giannis
Optimal tilling once again
Optimal tilling once again
- Attachments
-
- log.txt
- (42.64 KiB) Downloaded 285 times
Re: Optimal tilling once again
You didn't include the most interesting part of the log file, which looks something like:
You might have to beat on the code to get this report, which I've posted about before.
Anyway, the key number here is 38% communications. This is with a much larger grid on fewer processors than you have. Yes, we have a problem. If you can't get the profile report, you can get some sense of the problem by timing with only 250 cores.
Code: Select all
Nonlinear ocean model elapsed time profile, Grid: 01
Allocation and array initialization .............. 90.454 ( 0.0007 %)
Ocean state initialization ....................... 31.542 ( 0.0002 %)
Reading of input data ............................ 40250.144 ( 0.2971 %)
Processing of input data ......................... 414569.094 ( 3.0604 %)
Processing of output time averaged data .......... 394142.308 ( 2.9096 %)
Computation of vertical boundary conditions ...... 110130.695 ( 0.8130 %)
Computation of global information integrals ...... 114130.480 ( 0.8425 %)
Writing of output data ........................... 940830.542 ( 6.9453 %)
Model 2D kernel .................................. 2321895.180 (17.1405 %)
Lagrangian floats trajectories ................... 402274.804 ( 2.9696 %)
Tidal forcing .................................... 501715.363 ( 3.7037 %)
2D/3D coupling, vertical metrics ................. 274590.295 ( 2.0271 %)
Omega vertical velocity .......................... 205581.862 ( 1.5176 %)
Equation of state for seawater ................... 259865.225 ( 1.9184 %)
Atmosphere-Ocean bulk flux parameterization ...... 79697.338 ( 0.5883 %)
KPP vertical mixing parameterization ............. 1377123.183 (10.1661 %)
3D equations right-side terms .................... 718296.863 ( 5.3026 %)
3D equations predictor step ...................... 1097942.595 ( 8.1052 %)
Pressure gradient ................................ 301603.839 ( 2.2265 %)
Harmonic mixing of tracers, geopotentials ........ 661710.353 ( 4.8848 %)
Harmonic stress tensor, S-surfaces ............... 265737.636 ( 1.9617 %)
Corrector time-step for 3D momentum .............. 450813.130 ( 3.3280 %)
Corrector time-step for tracers .................. 739768.657 ( 5.4611 %)
Total: 11672791.582 86.1700
Nonlinear sea-ice model elapsed time profile, Grid: 01
Ice thermodynamics................................ 155231.894 ( 1.1459 %)
Ice rheology coefficients......................... 334788.137 ( 2.4714 %)
Iterative solver of ice dynamics.................. 1270084.208 ( 9.3759 %)
Advection of ice tracers.......................... 12315.957 ( 0.0909 %)
Total: 1772420.197 13.0842
Nonlinear model message Passage profile, Grid: 01
Message Passage: 2D halo exchanges ............... 3118310.029 (23.0198 %)
Message Passage: 3D halo exchanges ............... 545759.649 ( 4.0289 %)
Message Passage: 4D halo exchanges ............... 197523.818 ( 1.4581 %)
Message Passage: data broadcast .................. 677661.168 ( 5.0026 %)
Message Passage: data reduction .................. 104697.043 ( 0.7729 %)
Message Passage: data gathering .................. 199152.622 ( 1.4702 %)
Message Passage: data scattering.................. 9544.377 ( 0.0705 %)
Message Passage: boundary data gathering ......... 44721.520 ( 0.3301 %)
Message Passage: point data gathering ............ 266922.198 ( 1.9705 %)
Total: 5164292.424 38.1235
All percentages are with respect to total time = 13546232.643
Anyway, the key number here is 38% communications. This is with a much larger grid on fewer processors than you have. Yes, we have a problem. If you can't get the profile report, you can get some sense of the problem by timing with only 250 cores.
Re: Optimal tilling once again
Kate thank you very much for your immediate
reply. Indeed i forgot to attach the interesting part .
I am attaching now and is huge
Thanks again.
reply. Indeed i forgot to attach the interesting part .
I am attaching now and is huge
Thanks again.
Code: Select all
Nonlinear model elapsed time profile, Grid: 01
Allocation and array initialization .............. 19.252 ( 0.0001 %)
Ocean state initialization ....................... 63.371 ( 0.0002 %)
Reading of input data ............................ 93076.292 ( 0.2657 %)
Processing of input data ......................... 571945.560 ( 1.6327 %)
Processing of output time averaged data .......... 33733.628 ( 0.0963 %)
Computation of vertical boundary conditions ...... 40680.552 ( 0.1161 %)
Computation of global information integrals ...... 383520.952 ( 1.0948 %)
Writing of output data ........................... 10406000.055 (29.7061 %)
Model 2D kernel .................................. 12428259.379 (35.4790 %)
2D/3D coupling, vertical metrics ................. 928478.851 ( 2.6505 %)
Omega vertical velocity .......................... 321127.202 ( 0.9167 %)
Equation of state for seawater ................... 699199.743 ( 1.9960 %)
Atmosphere-Ocean bulk flux parameterization ...... 647643.459 ( 1.8488 %)
GLS vertical mixing parameterization ............. 5110085.944 (14.5878 %)
3D equations right-side terms .................... 252692.246 ( 0.7214 %)
3D equations predictor step ...................... 1072043.243 ( 3.0604 %)
Pressure gradient ................................ 59713.574 ( 0.1705 %)
Harmonic mixing of tracers, isopycnals ........... 208303.559 ( 0.5946 %)
Biharmonic mixing of tracers, geopotentials ...... 197069.262 ( 0.5626 %)
Harmonic stress tensor, S-surfaces ............... 117924.569 ( 0.3366 %)
Corrector time-step for 3D momentum .............. 671867.787 ( 1.9180 %)
Corrector time-step for tracers .................. 582515.630 ( 1.6629 %)
Total: 34825964.112 99.4179
Nonlinear model message Passage profile, Grid: 01
Message Passage: 2D halo exchanges ............... 7413745.121 (21.1641 %)
Message Passage: 3D halo exchanges ............... 3494146.151 ( 9.9748 %)
Message Passage: 4D halo exchanges ............... 1162333.564 ( 3.3181 %)
Message Passage: data broadcast .................. 9659988.786 (27.5764 %)
Message Passage: data reduction .................. 6610110.739 (18.8699 %)
Message Passage: data gathering .................. 392563.011 ( 1.1207 %)
Message Passage: data scattering.................. 401.971 ( 0.0011 %)
Message Passage: point data gathering ............ 298777.777 ( 0.8529 %)
Total: 29032067.120 82.8781
All percentages are with respect to total time = 35029855.776
-
- Posts: 128
- Joined: Tue Feb 01, 2005 8:21 pm
- Location: Istanbul Technical University (ITU)
- Contact:
Re: Optimal tilling once again
Hi,
I did a test with ROMS for Mediterranean domain as a part of PRACE preparatory phase project to find the optimal tiling combination. So, the figure might help to define the tiling parameters.
The 1/12° horizontal resolution ocean domain is used in this case. The grey and red areas in the plot show the performance of the ocean model with different 2d decomposition configuration (tile in x and y directions). The black solid lines indicate wall clock time and red solid lines shows speed-up ratios. The simulation length is 5 days in this case. The number indicates the best tiling options.
I hope it helps,
I did a test with ROMS for Mediterranean domain as a part of PRACE preparatory phase project to find the optimal tiling combination. So, the figure might help to define the tiling parameters.
The 1/12° horizontal resolution ocean domain is used in this case. The grey and red areas in the plot show the performance of the ocean model with different 2d decomposition configuration (tile in x and y directions). The black solid lines indicate wall clock time and red solid lines shows speed-up ratios. The simulation length is 5 days in this case. The number indicates the best tiling options.
I hope it helps,
Re: Optimal tilling once again
Note that the profile shows 30% outputting results. I found a speed-up by saving my restart less often.
Perhaps you were asking about the optimal shape of tiles? The code should "vectorize" better with tiles that are long in the i-direction. The communications are reduced for tiles that are closer to square. The optimal shape is likely longer in i than in j, but might depend on your computer. You want the tiles small enough to fit into cache.
Perhaps you were asking about the optimal shape of tiles? The code should "vectorize" better with tiles that are long in the i-direction. The communications are reduced for tiles that are closer to square. The optimal shape is likely longer in i than in j, but might depend on your computer. You want the tiles small enough to fit into cache.