Optimal tilling once again

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
ymamoutos
Posts: 71
Joined: Fri Nov 19, 2010 2:33 pm
Location: University of Aegean

Optimal tilling once again

#1 Unread post by ymamoutos »

Greetings,

I understand that asking once more about this matter
is kind of annoying and i am apologizing in advance,
but to be completely honest I am completely confused
/lost after reading several posts concerning ROMS tilling.
I recently gained access to a HPC system (https://hpc.grnet.gr/en/)
and I can use up to 500 cores for each run. My domain is
409*421*30 with a resolution almost 1.2km. I am using
DT=60 sec and ndtfast=28. A years run lasts 11.9 hours
when using all available cores (500). I am using gfortran
4.9.2 and openmpi 1.8.7 (newer versions for both compiler and openmpi
are available and also intel fortran and intel mpi).I am suspecting that
my tilling (NtilesI=20 and NtilesJ=25) combined with the
memory size (36GB) causes mpi overheads. I understand that my domain's
I and J points are not powers of 2, as Hernan suggests, and that
might cause additional problems. I am attaching the log file.
Any suggestion or help is highly appreciated.

Giannis
Attachments
log.txt
(42.64 KiB) Downloaded 285 times

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Optimal tilling once again

#2 Unread post by kate »

You didn't include the most interesting part of the log file, which looks something like:

Code: Select all

 Nonlinear ocean model elapsed time profile, Grid: 01

  Allocation and array initialization ..............        90.454  ( 0.0007 %)
  Ocean state initialization .......................        31.542  ( 0.0002 %)
  Reading of input data ............................     40250.144  ( 0.2971 %)
  Processing of input data .........................    414569.094  ( 3.0604 %)
  Processing of output time averaged data ..........    394142.308  ( 2.9096 %)
  Computation of vertical boundary conditions ......    110130.695  ( 0.8130 %)
  Computation of global information integrals ......    114130.480  ( 0.8425 %)
  Writing of output data ...........................    940830.542  ( 6.9453 %)
  Model 2D kernel ..................................   2321895.180  (17.1405 %)
  Lagrangian floats trajectories ...................    402274.804  ( 2.9696 %)
  Tidal forcing ....................................    501715.363  ( 3.7037 %)
  2D/3D coupling, vertical metrics .................    274590.295  ( 2.0271 %)
  Omega vertical velocity ..........................    205581.862  ( 1.5176 %)
  Equation of state for seawater ...................    259865.225  ( 1.9184 %)
  Atmosphere-Ocean bulk flux parameterization ......     79697.338  ( 0.5883 %)
  KPP vertical mixing parameterization .............   1377123.183  (10.1661 %)
  3D equations right-side terms ....................    718296.863  ( 5.3026 %)
  3D equations predictor step ......................   1097942.595  ( 8.1052 %)
  Pressure gradient ................................    301603.839  ( 2.2265 %)
  Harmonic mixing of tracers, geopotentials ........    661710.353  ( 4.8848 %)
  Harmonic stress tensor, S-surfaces ...............    265737.636  ( 1.9617 %)
  Corrector time-step for 3D momentum ..............    450813.130  ( 3.3280 %)
  Corrector time-step for tracers ..................    739768.657  ( 5.4611 %)
                                              Total:  11672791.582   86.1700

 Nonlinear sea-ice model elapsed time profile, Grid: 01

  Ice thermodynamics................................    155231.894  ( 1.1459 %)
  Ice rheology coefficients.........................    334788.137  ( 2.4714 %)
  Iterative solver of ice dynamics..................   1270084.208  ( 9.3759 %)
  Advection of ice tracers..........................     12315.957  ( 0.0909 %)
                                              Total:   1772420.197   13.0842

 Nonlinear model message Passage profile, Grid: 01

  Message Passage: 2D halo exchanges ...............   3118310.029  (23.0198 %)
  Message Passage: 3D halo exchanges ...............    545759.649  ( 4.0289 %)
  Message Passage: 4D halo exchanges ...............    197523.818  ( 1.4581 %)
  Message Passage: data broadcast ..................    677661.168  ( 5.0026 %)
  Message Passage: data reduction ..................    104697.043  ( 0.7729 %)
  Message Passage: data gathering ..................    199152.622  ( 1.4702 %)
  Message Passage: data scattering..................      9544.377  ( 0.0705 %)
  Message Passage: boundary data gathering .........     44721.520  ( 0.3301 %)
  Message Passage: point data gathering ............    266922.198  ( 1.9705 %)
                                              Total:   5164292.424   38.1235

 All percentages are with respect to total time =     13546232.643
You might have to beat on the code to get this report, which I've posted about before.

Anyway, the key number here is 38% communications. This is with a much larger grid on fewer processors than you have. Yes, we have a problem. If you can't get the profile report, you can get some sense of the problem by timing with only 250 cores.

ymamoutos
Posts: 71
Joined: Fri Nov 19, 2010 2:33 pm
Location: University of Aegean

Re: Optimal tilling once again

#3 Unread post by ymamoutos »

Kate thank you very much for your immediate
reply. Indeed i forgot to attach the interesting part :oops: .
I am attaching now and is huge :shock:

Thanks again.

Code: Select all


Nonlinear model elapsed time profile, Grid: 01

  Allocation and array initialization ..............        19.252  ( 0.0001 %)
  Ocean state initialization .......................        63.371  ( 0.0002 %)
  Reading of input data ............................     93076.292  ( 0.2657 %)
  Processing of input data .........................    571945.560  ( 1.6327 %)
  Processing of output time averaged data ..........     33733.628  ( 0.0963 %)
  Computation of vertical boundary conditions ......     40680.552  ( 0.1161 %)
  Computation of global information integrals ......    383520.952  ( 1.0948 %)
  Writing of output data ...........................  10406000.055  (29.7061 %)
  Model 2D kernel ..................................  12428259.379  (35.4790 %)
  2D/3D coupling, vertical metrics .................    928478.851  ( 2.6505 %)
  Omega vertical velocity ..........................    321127.202  ( 0.9167 %)
  Equation of state for seawater ...................    699199.743  ( 1.9960 %)
  Atmosphere-Ocean bulk flux parameterization ......    647643.459  ( 1.8488 %)
  GLS vertical mixing parameterization .............   5110085.944  (14.5878 %)
  3D equations right-side terms ....................    252692.246  ( 0.7214 %)
  3D equations predictor step ......................   1072043.243  ( 3.0604 %)
  Pressure gradient ................................     59713.574  ( 0.1705 %)
  Harmonic mixing of tracers, isopycnals ...........    208303.559  ( 0.5946 %)
  Biharmonic mixing of tracers, geopotentials ......    197069.262  ( 0.5626 %)
  Harmonic stress tensor, S-surfaces ...............    117924.569  ( 0.3366 %)
  Corrector time-step for 3D momentum ..............    671867.787  ( 1.9180 %)
  Corrector time-step for tracers ..................    582515.630  ( 1.6629 %)
                                              Total:  34825964.112   99.4179

 Nonlinear model message Passage profile, Grid: 01

  Message Passage: 2D halo exchanges ...............   7413745.121  (21.1641 %)
  Message Passage: 3D halo exchanges ...............   3494146.151  ( 9.9748 %)
  Message Passage: 4D halo exchanges ...............   1162333.564  ( 3.3181 %)
  Message Passage: data broadcast ..................   9659988.786  (27.5764 %)
  Message Passage: data reduction ..................   6610110.739  (18.8699 %)
  Message Passage: data gathering ..................    392563.011  ( 1.1207 %)
  Message Passage: data scattering..................       401.971  ( 0.0011 %)
  Message Passage: point data gathering ............    298777.777  ( 0.8529 %)
                                              Total:  29032067.120   82.8781

 All percentages are with respect to total time =     35029855.776


turuncu
Posts: 128
Joined: Tue Feb 01, 2005 8:21 pm
Location: Istanbul Technical University (ITU)
Contact:

Re: Optimal tilling once again

#4 Unread post by turuncu »

Hi,

I did a test with ROMS for Mediterranean domain as a part of PRACE preparatory phase project to find the optimal tiling combination. So, the
roms_benchmark.pdf
(199.94 KiB) Downloaded 348 times
figure might help to define the tiling parameters.

The 1/12° horizontal resolution ocean domain is used in this case. The grey and red areas in the plot show the performance of the ocean model with different 2d decomposition configuration (tile in x and y directions). The black solid lines indicate wall clock time and red solid lines shows speed-up ratios. The simulation length is 5 days in this case. The number indicates the best tiling options.

I hope it helps,

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Optimal tilling once again

#5 Unread post by kate »

Note that the profile shows 30% outputting results. I found a speed-up by saving my restart less often.

Perhaps you were asking about the optimal shape of tiles? The code should "vectorize" better with tiles that are long in the i-direction. The communications are reduced for tiles that are closer to square. The optimal shape is likely longer in i than in j, but might depend on your computer. You want the tiles small enough to fit into cache.

Post Reply