Dear all,
To figure out optimal tiling and processor allocation for a COAWST two-way coupled ROMS & SWAN run, we have been doing some tests and looking at the profile output of the ROMS log. Mostly it is very clear but there is a an item called "Unused 03" which is the largest fraction of time at ~26% (see below). Does anyone know what this means? My hypothesis is that this represents that time that ROMS is waiting to get the wave parameters from SWAN? Generally in this test case SWAN is catching up to ROMS before they do their data exchange.
Thank you.
Falk
*** snippet of ROMS log profiling section ********
Nonlinear model elapsed CPU time profile, Grid: 01
Allocation and array initialization .............. 282.799 ( 0.1540 %)
Ocean state initialization ....................... 318.921 ( 0.1736 %)
Reading of input data ............................ 2070.066 ( 1.1270 %)
Processing of input data ......................... 15139.615 ( 8.2426 %)
Computation of vertical boundary conditions ...... 260.675 ( 0.1419 %)
Computation of global information integrals ...... 1560.029 ( 0.8493 %)
Writing of output data ........................... 3945.330 ( 2.1480 %)
Model 2D kernel .................................. 24406.620 (13.2879 %)
2D/3D coupling, vertical metrics ................. 1630.168 ( 0.8875 %)
Omega vertical velocity .......................... 1149.860 ( 0.6260 %)
Equation of state for seawater ................... 1938.799 ( 1.0556 %)
Atmosphere-Ocean bulk flux parameterization ...... 5269.806 ( 2.8691 %)
GLS vertical mixing parameterization ............. 19169.954 (10.4369 %)
3D equations right-side terms .................... 11441.360 ( 6.2291 %)
3D equations predictor step ...................... 5949.225 ( 3.2390 %)
Pressure gradient ................................ 569.562 ( 0.3101 %)
Harmonic mixing of tracers, S-surfaces ........... 914.906 ( 0.4981 %)
Harmonic stress tensor, S-surfaces ............... 735.144 ( 0.4002 %)
Corrector time-step for 3D momentum .............. 2699.383 ( 1.4697 %)
Corrector time-step for tracers .................. 31736.207 (17.2785 %)
Reading model state vector ....................... 681.525 ( 0.3710 %)
Unused 03 ........................................ 47929.711 (26.0948 %)
Total: 179799.663 97.8901
Nonlinear model message Passage profile, Grid: 01
Message Passage: 2D halo exchanges ............... 25143.580 (13.6892 %)
Message Passage: 3D halo exchanges ............... 6113.792 ( 3.3286 %)
Message Passage: 4D halo exchanges ............... 9222.768 ( 5.0212 %)
Message Passage: data broadcast .................. 7697.290 ( 4.1907 %)
Message Passage: data reduction .................. 5241.044 ( 2.8534 %)
Message Passage: data gathering .................. 215.118 ( 0.1171 %)
Message Passage: data scattering.................. 534.478 ( 0.2910 %)
Message Passage: boundary data gathering ......... 4418.573 ( 2.4056 %)
Message Passage: synchronization barrier ......... 219.320 ( 0.1194 %)
Total: 58805.962 32.0163
Profiling COAWST with 2-way coupled ROMS/SWAN
Re: Profiling COAWST with 2-way coupled ROMS/SWAN
not sure what Unused 03 is.
But for the coupling, if you see something like this:
97 2020-07-21 04:51:00.00 3.755311E-02 2.266493E+04 2.266496E+04 2.088627E+16
(762,292,16) 3.718707E-02 5.124825E-02 4.006800E-01 3.243301E+00
98 2020-07-21 04:54:00.00 3.755672E-02 2.266491E+04 2.266495E+04 2.088623E+16
(765,292,16) 1.518682E-02 1.697811E-02 4.845900E-01 3.268283E+00
+time 20200721.044000 , step 56; iteration 1; sweep 3 grid 1
99 2020-07-21 04:57:00.00 3.756018E-02 2.266489E+04 2.266493E+04 2.088620E+16
(765,292,16) 1.511296E-02 1.668745E-02 4.611711E-01 3.298804E+00
+time 20200721.044000 , step 56; iteration 1; sweep 4 grid 1
100 2020-07-21 05:00:00.00 3.756349E-02 2.266487E+04 2.266491E+04 2.088616E+16
(763,292,16) 3.729829E-02 4.003904E-02 4.723960E-01 3.327010E+00
+time 20200721.044500 , step 57; iteration 1; sweep 1 grid 1
+time 20200721.044500 , step 57; iteration 1; sweep 2 grid 1
+time 20200721.044500 , step 57; iteration 1; sweep 3 grid 1
+time 20200721.044500 , step 57; iteration 1; sweep 4 grid 1
...
+time 20200721.045500 , step 59; iteration 1; sweep 1 grid 1
+time 20200721.045500 , step 59; iteration 1; sweep 2 grid 1
+time 20200721.045500 , step 59; iteration 1; sweep 3 grid 1
+time 20200721.045500 , step 59; iteration 1; sweep 4 grid 1
+time 20200721.050000 , step 60; iteration 1; sweep 1 grid 1
+time 20200721.050000 , step 60; iteration 1; sweep 2 grid 1
+time 20200721.050000 , step 60; iteration 1; sweep 3 grid 1
+time 20200721.050000 , step 60; iteration 1; sweep 4 grid 1
== SWAN grid 1 sent wave data to ROMS grid 1
** ROMS grid 1 recv data from SWAN grid 1
SWANtoROMS Min/Max DISBOT (Wm-2): 0.000000E+00 3.483969E-04
SWANtoROMS Min/Max DISSURF (Wm-2): 0.000000E+00 0.000000E+00
SWANtoROMS Min/Max DISWCAP (Wm-2): 0.000000E+00 1.135154E-03
.....
This means that ROMS got to the exchange interval first, and is waiting for SWAN. You could re-allocate the number of processors and provide more to SWAN for the next run. This has to occur at startup, and is not changeable during run time. Maybe in the future it could change during the run, but that would involve memory re-distribution. The run times for each model depend on grid sizes, physics selected, model time steps, etc.
-john
But for the coupling, if you see something like this:
97 2020-07-21 04:51:00.00 3.755311E-02 2.266493E+04 2.266496E+04 2.088627E+16
(762,292,16) 3.718707E-02 5.124825E-02 4.006800E-01 3.243301E+00
98 2020-07-21 04:54:00.00 3.755672E-02 2.266491E+04 2.266495E+04 2.088623E+16
(765,292,16) 1.518682E-02 1.697811E-02 4.845900E-01 3.268283E+00
+time 20200721.044000 , step 56; iteration 1; sweep 3 grid 1
99 2020-07-21 04:57:00.00 3.756018E-02 2.266489E+04 2.266493E+04 2.088620E+16
(765,292,16) 1.511296E-02 1.668745E-02 4.611711E-01 3.298804E+00
+time 20200721.044000 , step 56; iteration 1; sweep 4 grid 1
100 2020-07-21 05:00:00.00 3.756349E-02 2.266487E+04 2.266491E+04 2.088616E+16
(763,292,16) 3.729829E-02 4.003904E-02 4.723960E-01 3.327010E+00
+time 20200721.044500 , step 57; iteration 1; sweep 1 grid 1
+time 20200721.044500 , step 57; iteration 1; sweep 2 grid 1
+time 20200721.044500 , step 57; iteration 1; sweep 3 grid 1
+time 20200721.044500 , step 57; iteration 1; sweep 4 grid 1
...
+time 20200721.045500 , step 59; iteration 1; sweep 1 grid 1
+time 20200721.045500 , step 59; iteration 1; sweep 2 grid 1
+time 20200721.045500 , step 59; iteration 1; sweep 3 grid 1
+time 20200721.045500 , step 59; iteration 1; sweep 4 grid 1
+time 20200721.050000 , step 60; iteration 1; sweep 1 grid 1
+time 20200721.050000 , step 60; iteration 1; sweep 2 grid 1
+time 20200721.050000 , step 60; iteration 1; sweep 3 grid 1
+time 20200721.050000 , step 60; iteration 1; sweep 4 grid 1
== SWAN grid 1 sent wave data to ROMS grid 1
** ROMS grid 1 recv data from SWAN grid 1
SWANtoROMS Min/Max DISBOT (Wm-2): 0.000000E+00 3.483969E-04
SWANtoROMS Min/Max DISSURF (Wm-2): 0.000000E+00 0.000000E+00
SWANtoROMS Min/Max DISWCAP (Wm-2): 0.000000E+00 1.135154E-03
.....
This means that ROMS got to the exchange interval first, and is waiting for SWAN. You could re-allocate the number of processors and provide more to SWAN for the next run. This has to occur at startup, and is not changeable during run time. Maybe in the future it could change during the run, but that would involve memory re-distribution. The run times for each model depend on grid sizes, physics selected, model time steps, etc.
-john
Re: Profiling COAWST with 2-way coupled ROMS/SWAN
hi john,
thanks for this. we've already been tracking the coupling and whether ROMS or SWAN is waiting. It seems the "Unused 03" is related to ROMS waiting. We tried a simple case where there were a total of 180 processors and split them so that swan had 20, 24, and 30. At 30 SWAN and ROMS were pretty even. The amount of time in "Unused 03" was like 26% for SWAN np=20 to ~2% for SWAN np=30. So it suggests that is what it is.
Definitely coordinating processors so that they are running at similar speeds helps as we increased the model speed by 30%
thanks
falk
thanks for this. we've already been tracking the coupling and whether ROMS or SWAN is waiting. It seems the "Unused 03" is related to ROMS waiting. We tried a simple case where there were a total of 180 processors and split them so that swan had 20, 24, and 30. At 30 SWAN and ROMS were pretty even. The amount of time in "Unused 03" was like 26% for SWAN np=20 to ~2% for SWAN np=30. So it suggests that is what it is.
Definitely coordinating processors so that they are running at similar speeds helps as we increased the model speed by 30%
thanks
falk