one problem in run time
-
- Posts: 79
- Joined: Sun Dec 30, 2012 2:58 pm
- Location: inio:Iranian National Institute for Oceanography
one problem in run time
Dear
it is strange for me that by increasing the number of processors form 16 to 24 ( NnodesWAV = 12 and NnodesOCN = 12), run time improved just 2 hours (it takes 12 and 10 hours long, respectively). is it reasonable I do not know there is some thing wrong in my model setup (Investigated model is ROMS and SWAN coupled in refined sate for 4 grids) Or is source code?
How can i diagnostic what is problem (if there is problem).
I become so appreciate to receive any help to find if model is working in the best and correct way.
Your experience will help me. Thanks for your kindness to help me
cheers
fereshte
it is strange for me that by increasing the number of processors form 16 to 24 ( NnodesWAV = 12 and NnodesOCN = 12), run time improved just 2 hours (it takes 12 and 10 hours long, respectively). is it reasonable I do not know there is some thing wrong in my model setup (Investigated model is ROMS and SWAN coupled in refined sate for 4 grids) Or is source code?
How can i diagnostic what is problem (if there is problem).
I become so appreciate to receive any help to find if model is working in the best and correct way.
Your experience will help me. Thanks for your kindness to help me
cheers
fereshte
- Attachments
-
- log.txt
- (211.9 KiB) Downloaded 333 times
Re: one problem in run time
You might try turning on the profile output as described here. Your grid 2 at least doesn't have enough grid points in it to support much parallelism. Also, I thought more than 2 grids wasn't working these days, unless you still have the older COAWST. Which version are you running?
-
- Posts: 79
- Joined: Sun Dec 30, 2012 2:58 pm
- Location: inio:Iranian National Institute for Oceanography
Re: one problem in run time
Code: Select all
You might try turning on the profile output
Also you are on right I am using older version of COAWST (3.0).
(NtileI= 3, NtileJ=4) mpirun -np 24 -f ./machinefile ./coawstM coupling.in
Code: Select all
Model Coupling:
Ocean Model MPI nodes: 000 - 011
Waves Model MPI nodes: 012 - 023
Process Information:
Node # 8 (pid= 26691) is active.
Node # 9 (pid= 26692) is active.
Node # 10 (pid= 26693) is active.
Node # 11 (pid= 26694) is active.
Node # 0 (pid= 31850) is active.
Node # 1 (pid= 31851) is active.
Node # 2 (pid= 31852) is active.
Node # 3 (pid= 31853) is active.
Node # 4 (pid= 31854) is active.
Node # 5 (pid= 31855) is active.
Node # 6 (pid= 31856) is active.
Node # 7 (pid= 31857) is active.
Node # 0 (pid= 31850) is active.
Node # 1 (pid= 31851) is active.
Node # 2 (pid= 31852) is active.
Node # 3 (pid= 31853) is active.
Node # 4 (pid= 31854) is active.
Node # 5 (pid= 31855) is active.
Node # 6 (pid= 31856) is active.
Node # 7 (pid= 31857) is active.
Node # 8 (pid= 26691) is active.
Node # 1 (pid= 31851) is active.
Node # 9 (pid= 26692) is active.
Node # 2 (pid= 31852) is active.
Node # 10 (pid= 26693) is active.
Node # 10 (pid= 26693) is active.
Node # 3 (pid= 31853) is active.
Node # 11 (pid= 26694) is active.
Node # 11 (pid= 26694) is active.
Node # 4 (pid= 31854) is active.
Node # 8 (pid= 26691) is active.
Node # 5 (pid= 31855) is active.
Node # 9 (pid= 26692) is active.
Node # 6 (pid= 31856) is active.
Node # 7 (pid= 31857) is active.
Node # 0 (pid= 31850) is active.
Node # 1 (pid= 31851) is active.
Node # 2 (pid= 31852) is active.
Node # 3 (pid= 31853) is active.
Node # 4 (pid= 31854) is active.
Node # 5 (pid= 31855) is active.
Node # 6 (pid= 31856) is active.
Node # 7 (pid= 31857) is active.
Node # 8 (pid= 26691) is active.
Node # 0 (pid= 31850) is active.
Node # 9 (pid= 26692) is active.
Node # 10 (pid= 26693) is active.
Node # 11 (pid= 26694) is active.
Model Input Parameters: ROMS/TOMS version 3.4
Thursday - November 27, 2014 - 6:14:10 PM
.....
....
Elapsed CPU time (seconds):
Node # 8 CPU: 3790.685
Node # 9 CPU: 3788.949
Node # 10 CPU: 3805.866
Node # 11 CPU: 3769.084
Node # 0 CPU: 3764.847
Node # 1 CPU: 3803.442
Node # 2 CPU: 3809.394
Node # 3 CPU: 3802.290
Node # 4 CPU: 3770.040
Node # 5 CPU: 3784.465
Node # 6 CPU: 3808.146
Node # 7 CPU: 3799.149
Total: 45496.355
Nonlinear model elapsed time profile:
Initialization ................................... 5000.781 (10.9916 %)
OI data assimilation ............................. 0.028 ( 0.0001 %)
Reading of input data ............................ 30.118 ( 0.0662 %)
Processing of input data ......................... 28.910 ( 0.0635 %)
Processing of output time averaged data .......... 6.072 ( 0.0133 %)
Computation of vertical boundary conditions ...... 0.440 ( 0.0010 %)
Computation of global information integrals ...... 0.312 ( 0.0007 %)
Writing of output data ........................... 164.778 ( 0.3622 %)
Model 2D kernel .................................. 142.437 ( 0.3131 %)
2D/3D coupling, vertical metrics ................. 1.316 ( 0.0029 %)
Omega vertical velocity .......................... 1.136 ( 0.0025 %)
Equation of state for seawater ................... 1.740 ( 0.0038 %)
GLS vertical mixing parameterization ............. 4.032 ( 0.0089 %)
3D equations right-side terms .................... 13.645 ( 0.0300 %)
3D equations predictor step ...................... 2.652 ( 0.0058 %)
Pressure gradient ................................ 1.260 ( 0.0028 %)
Harmonic mixing of tracers, S-surfaces ........... 0.148 ( 0.0003 %)
Harmonic stress tensor, S-surfaces ............... 0.316 ( 0.0007 %)
Corrector time-step for 3D momentum .............. 4.508 ( 0.0099 %)
Corrector time-step for tracers .................. 1.856 ( 0.0041 %)
Total: 5406.486 11.8833
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 76.425 ( 0.1680 %)
Message Passage: 3D halo exchanges ............... 11.585 ( 0.0255 %)
Message Passage: 4D halo exchanges ............... 5.100 ( 0.0112 %)
Message Passage: data broadcast .................. 155.858 ( 0.3426 %)
Message Passage: data reduction .................. 1.220 ( 0.0027 %)
Message Passage: data gathering .................. 69.524 ( 0.1528 %)
Message Passage: data scattering.................. 41.379 ( 0.0909 %)
Message Passage: multi-model coupling ............ 4927.572 (10.8307 %)
Total: 5288.663 11.6244
All percentages are with respect to total time = 45496.355
Node # 0 CPU: 3764.771
Node # 1 CPU: 3803.334
Node # 2 CPU: 3809.294
Node # 3 CPU: 3802.198
Node # 4 CPU: 3769.940
Node # 5 CPU: 3784.369
Node # 6 CPU: 3808.046
Node # 7 CPU: 3799.053
Node # 8 CPU: 3790.689
Node # 9 CPU: 3788.949
Node # 10 CPU: 3805.866
Node # 11 CPU: 3769.100
Total: 45495.607
Nonlinear model elapsed time profile:
Initialization ................................... 503.879 ( 1.1075 %)
OI data assimilation ............................. 0.036 ( 0.0001 %)
Reading of input data ............................ 675.958 ( 1.4858 %)
Processing of input data ......................... 732.962 ( 1.6111 %)
Processing of output time averaged data .......... 9.121 ( 0.0200 %)
Computation of vertical boundary conditions ...... 0.616 ( 0.0014 %)
Computation of global information integrals ...... 0.872 ( 0.0019 %)
Writing of output data ........................... 21.289 ( 0.0468 %)
Model 2D kernel .................................. 76.397 ( 0.1679 %)
2D/3D coupling, vertical metrics ................. 2.364 ( 0.0052 %)
Omega vertical velocity .......................... 1.428 ( 0.0031 %)
Equation of state for seawater ................... 2.944 ( 0.0065 %)
GLS vertical mixing parameterization ............. 3.068 ( 0.0067 %)
3D equations right-side terms .................... 11.097 ( 0.0244 %)
3D equations predictor step ...................... 1.296 ( 0.0028 %)
Pressure gradient ................................ 0.172 ( 0.0004 %)
Harmonic mixing of tracers, S-surfaces ........... 0.020 ( 0.0000 %)
Harmonic stress tensor, S-surfaces ............... 0.048 ( 0.0001 %)
Corrector time-step for 3D momentum .............. 3.260 ( 0.0072 %)
Corrector time-step for tracers .................. 8.925 ( 0.0196 %)
Total: 2055.752 4.5186
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 93.814 ( 0.2062 %)
Message Passage: 3D halo exchanges ............... 21.513 ( 0.0473 %)
Message Passage: 4D halo exchanges ............... 15.533 ( 0.0341 %)
Message Passage: data broadcast .................. 43.819 ( 0.0963 %)
Message Passage: data reduction .................. 1.896 ( 0.0042 %)
Message Passage: data gathering .................. 3.676 ( 0.0081 %)
Message Passage: data scattering.................. 3.956 ( 0.0087 %)
Message Passage: multi-model coupling ............ 1038.989 ( 2.2837 %)
Total: 1223.196 2.6886
All percentages are with respect to total time = 45495.607
Node # 0 CPU: 3764.783
Node # 2 CPU: 3809.302
Node # 1 CPU: 3803.346
Node # 3 CPU: 3802.206
Node # 4 CPU: 3769.948
Node # 5 CPU: 3784.381
Node # 6 CPU: 3808.054
Node # 7 CPU: 3799.061
Node # 8 CPU: 3790.673
Node # 9 CPU: 3788.933
Node # 10 CPU: 3805.854
Node # 11 CPU: 3769.084
Total: 45495.623
Nonlinear model elapsed time profile:
Initialization ................................... 1341.936 ( 2.9496 %)
OI data assimilation ............................. 0.036 ( 0.0001 %)
Reading of input data ............................ 8396.373 (18.4553 %)
Processing of input data ......................... 8537.314 (18.7651 %)
Processing of output time averaged data .......... 31.646 ( 0.0696 %)
Computation of vertical boundary conditions ...... 3.656 ( 0.0080 %)
Computation of global information integrals ...... 5.324 ( 0.0117 %)
Writing of output data ........................... 61.180 ( 0.1345 %)
Model 2D kernel .................................. 761.656 ( 1.6741 %)
2D/3D coupling, vertical metrics ................. 12.093 ( 0.0266 %)
Omega vertical velocity .......................... 9.481 ( 0.0208 %)
Equation of state for seawater ................... 20.325 ( 0.0447 %)
GLS vertical mixing parameterization ............. 30.874 ( 0.0679 %)
3D equations right-side terms .................... 81.801 ( 0.1798 %)
3D equations predictor step ...................... 14.309 ( 0.0315 %)
Pressure gradient ................................ 4.240 ( 0.0093 %)
Harmonic mixing of tracers, S-surfaces ........... 0.720 ( 0.0016 %)
Harmonic stress tensor, S-surfaces ............... 0.900 ( 0.0020 %)
Corrector time-step for 3D momentum .............. 32.710 ( 0.0719 %)
Corrector time-step for tracers .................. 49.007 ( 0.1077 %)
Total: 19395.580 42.6317
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 675.386 ( 1.4845 %)
Message Passage: 3D halo exchanges ............... 136.072 ( 0.2991 %)
Message Passage: 4D halo exchanges ............... 178.419 ( 0.3922 %)
Message Passage: data broadcast .................. 84.773 ( 0.1863 %)
Message Passage: data reduction .................. 6.952 ( 0.0153 %)
Message Passage: data gathering .................. 35.338 ( 0.0777 %)
Message Passage: data scattering.................. 18.937 ( 0.0416 %)
Message Passage: multi-model coupling ............ 1553.393 ( 3.4144 %)
Total: 2689.272 5.9111
All percentages are with respect to total time = 45495.623
Node # 0 CPU: 3764.771
Node # 2 CPU: 3809.294
Node # 3 CPU: 3802.198
Node # 1 CPU: 3803.330
Node # 4 CPU: 3769.940
Node # 5 CPU: 3784.369
Node # 6 CPU: 3808.042
Node # 7 CPU: 3799.049
Node # 8 CPU: 3790.681
Node # 9 CPU: 3788.945
Node # 10 CPU: 3805.862
Node # 11 CPU: 3769.092
Total: 45495.571
Nonlinear model elapsed time profile:
Initialization ................................... 10363.228 (22.7785 %)
OI data assimilation ............................. 0.116 ( 0.0003 %)
Reading of input data ............................ 2087.002 ( 4.5873 %)
Processing of input data ......................... 2220.687 ( 4.8811 %)
Processing of output time averaged data .......... 186.704 ( 0.4104 %)
Computation of vertical boundary conditions ...... 16.649 ( 0.0366 %)
Computation of global information integrals ...... 26.202 ( 0.0576 %)
Writing of output data ........................... 296.855 ( 0.6525 %)
Model 2D kernel .................................. 11631.991 (25.5673 %)
2D/3D coupling, vertical metrics ................. 54.816 ( 0.1205 %)
Omega vertical velocity .......................... 48.727 ( 0.1071 %)
Equation of state for seawater ................... 82.713 ( 0.1818 %)
GLS vertical mixing parameterization ............. 284.102 ( 0.6245 %)
3D equations right-side terms .................... 641.244 ( 1.4095 %)
3D equations predictor step ...................... 186.348 ( 0.4096 %)
Pressure gradient ................................ 63.140 ( 0.1388 %)
Harmonic mixing of tracers, S-surfaces ........... 15.585 ( 0.0343 %)
Harmonic stress tensor, S-surfaces ............... 32.722 ( 0.0719 %)
Corrector time-step for 3D momentum .............. 264.573 ( 0.5815 %)
Corrector time-step for tracers .................. 77.873 ( 0.1712 %)
Total: 28581.274 62.8221
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 4357.292 ( 9.5774 %)
Message Passage: 3D halo exchanges ............... 564.511 ( 1.2408 %)
Message Passage: 4D halo exchanges ............... 194.072 ( 0.4266 %)
Message Passage: data broadcast .................. 249.944 ( 0.5494 %)
Message Passage: data reduction .................. 24.198 ( 0.0532 %)
Message Passage: data gathering .................. 967.784 ( 2.1272 %)
Message Passage: data scattering.................. 128.240 ( 0.2819 %)
Message Passage: multi-model coupling ............ 10452.069 (22.9738 %)
Total: 16938.110 37.2302
All percentages are with respect to total time = 45495.571
ROMS/TOMS - Output NetCDF summary for Grid 01:
number of time records written in HISTORY file = 00000002
number of time records written in RESTART file = 00000001
number of time records written in AVERAGE file = 00000001
ROMS/TOMS - Output NetCDF summary for Grid 02:
number of time records written in HISTORY file = 00000002
number of time records written in RESTART file = 00000001
number of time records written in AVERAGE file = 00000001
ROMS/TOMS - Output NetCDF summary for Grid 03:
number of time records written in HISTORY file = 00000002
number of time records written in RESTART file = 00000001
number of time records written in AVERAGE file = 00000001
ROMS/TOMS - Output NetCDF summary for Grid 04:
number of time records written in HISTORY file = 00000002
number of time records written in RESTART file = 00000001
number of time records written in AVERAGE file = 00000001
Analytical header files used:
ROMS/Functionals/ana_btflux.h
ROMS/TOMS: DONE... Thursday - November 27, 2014 - 7:18:47 PM
Code: Select all
Model Coupling:
Ocean Model MPI nodes: 000 - 007
Waves Model MPI nodes: 008 - 015
Process Information:
SWAN grid 1 is preparing computation
Node # 0 (pid= 6495) is active.
Node # 1 (pid= 6496) is active.
Node # 2 (pid= 6497) is active.
Node # 4 (pid= 6499) is active.
Node # 5 (pid= 6500) is active.
Node # 6 (pid= 6501) is active.
Node # 0 (pid= 6495) is active.
Node # 0 (pid= 6495) is active.
Node # 0 (pid= 6495) is active.
Model Input Parameters: ROMS/TOMS version 3.4
Saturday - November 29, 2014 - 10:02:19 AM
........
.........
Elapsed CPU time (seconds):
Node # 0 CPU: 2866.715
Total: 23291.540
Nonlinear model elapsed time profile:
Initialization ................................... 2092.783 ( 8.9852 %)
OI data assimilation ............................. 0.004 ( 0.0000 %)
Reading of input data ............................ 1.248 ( 0.0054 %)
Processing of input data ......................... 0.660 ( 0.0028 %)
Processing of output time averaged data .......... 3.048 ( 0.0131 %)
Computation of vertical boundary conditions ...... 0.012 ( 0.0001 %)
Computation of global information integrals ...... 0.084 ( 0.0004 %)
Writing of output data ........................... 48.815 ( 0.2096 %)
Model 2D kernel .................................. 138.245 ( 0.5935 %)
2D/3D coupling, vertical metrics ................. 0.168 ( 0.0007 %)
Omega vertical velocity .......................... 0.240 ( 0.0010 %)
Equation of state for seawater ................... 0.268 ( 0.0012 %)
GLS vertical mixing parameterization ............. 2.552 ( 0.0110 %)
3D equations right-side terms .................... 6.296 ( 0.0270 %)
3D equations predictor step ...................... 2.500 ( 0.0107 %)
Pressure gradient ................................ 1.220 ( 0.0052 %)
Harmonic mixing of tracers, S-surfaces ........... 0.356 ( 0.0015 %)
Harmonic stress tensor, S-surfaces ............... 0.688 ( 0.0030 %)
Corrector time-step for 3D momentum .............. 2.468 ( 0.0106 %)
Corrector time-step for tracers .................. 1.236 ( 0.0053 %)
Total: 2302.892 9.8872
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 13.945 ( 0.0599 %)
Message Passage: 3D halo exchanges ............... 0.716 ( 0.0031 %)
Message Passage: 4D halo exchanges ............... 0.580 ( 0.0025 %)
Message Passage: data broadcast .................. 43.507 ( 0.1868 %)
Message Passage: data reduction .................. 0.108 ( 0.0005 %)
Message Passage: data gathering .................. 5.600 ( 0.0240 %)
Message Passage: data scattering.................. 0.564 ( 0.0024 %)
Message Passage: multi-model coupling ............ 2094.919 ( 8.9943 %)
Total: 2159.939 9.2735
All percentages are with respect to total time = 23291.540
Node # 0 CPU: 2866.707
Total: 23291.464
Nonlinear model elapsed time profile:
Initialization ................................... 155.950 ( 0.6696 %)
Reading of input data ............................ 13.237 ( 0.0568 %)
Processing of input data ......................... 17.889 ( 0.0768 %)
Processing of output time averaged data .......... 0.788 ( 0.0034 %)
Computation of vertical boundary conditions ...... 0.008 ( 0.0000 %)
Computation of global information integrals ...... 0.016 ( 0.0001 %)
Writing of output data ........................... 6.984 ( 0.0300 %)
Model 2D kernel .................................. 14.013 ( 0.0602 %)
2D/3D coupling, vertical metrics ................. 0.068 ( 0.0003 %)
Omega vertical velocity .......................... 0.052 ( 0.0002 %)
Equation of state for seawater ................... 0.084 ( 0.0004 %)
GLS vertical mixing parameterization ............. 0.684 ( 0.0029 %)
3D equations right-side terms .................... 1.692 ( 0.0073 %)
3D equations predictor step ...................... 0.812 ( 0.0035 %)
Pressure gradient ................................ 0.180 ( 0.0008 %)
Harmonic mixing of tracers, S-surfaces ........... 0.064 ( 0.0003 %)
Harmonic stress tensor, S-surfaces ............... 0.104 ( 0.0004 %)
Corrector time-step for 3D momentum .............. 0.940 ( 0.0040 %)
Corrector time-step for tracers .................. 0.396 ( 0.0017 %)
Total: 213.961 0.9186
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 4.052 ( 0.0174 %)
Message Passage: 3D halo exchanges ............... 0.448 ( 0.0019 %)
Message Passage: 4D halo exchanges ............... 0.448 ( 0.0019 %)
Message Passage: data broadcast .................. 7.252 ( 0.0311 %)
Message Passage: data reduction .................. 0.260 ( 0.0011 %)
Message Passage: data gathering .................. 0.448 ( 0.0019 %)
Message Passage: data scattering.................. 0.012 ( 0.0001 %)
Message Passage: multi-model coupling ............ 834.144 ( 3.5813 %)
Total: 847.065 3.6368
All percentages are with respect to total time = 23291.464
Node # 0 CPU: 2866.707
Total: 23291.460
Nonlinear model elapsed time profile:
Initialization ................................... 628.551 ( 2.6986 %)
OI data assimilation ............................. 0.004 ( 0.0000 %)
Reading of input data ............................ 167.775 ( 0.7203 %)
Processing of input data ......................... 176.711 ( 0.7587 %)
Processing of output time averaged data .......... 15.045 ( 0.0646 %)
Computation of vertical boundary conditions ...... 0.172 ( 0.0007 %)
Computation of global information integrals ...... 0.380 ( 0.0016 %)
Writing of output data ........................... 14.873 ( 0.0639 %)
Model 2D kernel .................................. 644.328 ( 2.7664 %)
2D/3D coupling, vertical metrics ................. 1.096 ( 0.0047 %)
Omega vertical velocity .......................... 1.108 ( 0.0048 %)
Equation of state for seawater ................... 1.832 ( 0.0079 %)
GLS vertical mixing parameterization ............. 13.861 ( 0.0595 %)
3D equations right-side terms .................... 30.438 ( 0.1307 %)
3D equations predictor step ...................... 15.265 ( 0.0655 %)
Pressure gradient ................................ 4.272 ( 0.0183 %)
Harmonic mixing of tracers, S-surfaces ........... 1.508 ( 0.0065 %)
Harmonic stress tensor, S-surfaces ............... 1.596 ( 0.0069 %)
Corrector time-step for 3D momentum .............. 16.321 ( 0.0701 %)
Corrector time-step for tracers .................. 6.800 ( 0.0292 %)
Total: 1741.937 7.4789
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 122.740 ( 0.5270 %)
Message Passage: 3D halo exchanges ............... 9.305 ( 0.0399 %)
Message Passage: 4D halo exchanges ............... 5.128 ( 0.0220 %)
Message Passage: data broadcast .................. 14.653 ( 0.0629 %)
Message Passage: data reduction .................. 0.032 ( 0.0001 %)
Message Passage: data gathering .................. 18.177 ( 0.0780 %)
Message Passage: data scattering.................. 0.076 ( 0.0003 %)
Message Passage: multi-model coupling ............ 1073.223 ( 4.6078 %)
Total: 1243.334 5.3382
All percentages are with respect to total time = 23291.460
Node # 0 CPU: 2866.707
Total: 23291.464
Nonlinear model elapsed time profile:
Initialization ................................... 4457.643 (19.1385 %)
OI data assimilation ............................. 0.008 ( 0.0000 %)
Reading of input data ............................ 85.221 ( 0.3659 %)
Processing of input data ......................... 149.581 ( 0.6422 %)
Processing of output time averaged data .......... 228.878 ( 0.9827 %)
Computation of vertical boundary conditions ...... 2.900 ( 0.0125 %)
Computation of global information integrals ...... 8.529 ( 0.0366 %)
Writing of output data ........................... 87.185 ( 0.3743 %)
Model 2D kernel .................................. 11264.152 (48.3617 %)
2D/3D coupling, vertical metrics ................. 11.349 ( 0.0487 %)
Omega vertical velocity .......................... 15.169 ( 0.0651 %)
Equation of state for seawater ................... 21.441 ( 0.0921 %)
GLS vertical mixing parameterization ............. 224.462 ( 0.9637 %)
3D equations right-side terms .................... 557.083 ( 2.3918 %)
3D equations predictor step ...................... 218.934 ( 0.9400 %)
Pressure gradient ................................ 65.376 ( 0.2807 %)
Harmonic mixing of tracers, S-surfaces ........... 34.790 ( 0.1494 %)
Harmonic stress tensor, S-surfaces ............... 53.955 ( 0.2317 %)
Corrector time-step for 3D momentum .............. 195.736 ( 0.8404 %)
Corrector time-step for tracers .................. 92.322 ( 0.3964 %)
Total: 17774.715 76.3143
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 1175.954 ( 5.0489 %)
Message Passage: 3D halo exchanges ............... 78.409 ( 0.3366 %)
Message Passage: 4D halo exchanges ............... 37.562 ( 0.1613 %)
Message Passage: data broadcast .................. 78.609 ( 0.3375 %)
Message Passage: data reduction .................. 1.708 ( 0.0073 %)
Message Passage: data gathering .................. 43.431 ( 0.1865 %)
Message Passage: data scattering.................. 1.264 ( 0.0054 %)
Message Passage: multi-model coupling ............ 4798.112 (20.6003 %)
Total: 6215.049 26.6838
All percentages are with respect to total time = 23291.464
ROMS/TOMS - Output NetCDF summary for Grid 01:
number of time records written in HISTORY file = 00000002
number of time records written in RESTART file = 00000001
number of time records written in AVERAGE file = 00000001
ROMS/TOMS - Output NetCDF summary for Grid 02:
number of time records written in HISTORY file = 00000002
number of time records written in RESTART file = 00000001
number of time records written in AVERAGE file = 00000001
ROMS/TOMS - Output NetCDF summary for Grid 03:
number of time records written in HISTORY file = 00000002
number of time records written in RESTART file = 00000001
number of time records written in AVERAGE file = 00000001
Node # 1 CPU: 2910.318
Node # 1 CPU: 2910.306
Node # 1 CPU: 2910.306
Node # 1 CPU: 2910.306
Node # 2 CPU: 2902.869
Node # 2 CPU: 2902.861
Node # 2 CPU: 2902.861
Node # 2 CPU: 2902.861
Node # 3 CPU: 2904.386
Node # 3 CPU: 2904.374
Node # 3 CPU: 2904.374
Node # 3 CPU: 2904.374
Node # 4 CPU: 2917.582
Node # 4 CPU: 2917.574
Node # 4 CPU: 2917.570
Node # 4 CPU: 2917.574
Node # 5 CPU: 2935.547
Node # 5 CPU: 2935.535
Node # 5 CPU: 2935.535
Node # 5 CPU: 2935.535
Node # 6 CPU: 2935.131
Node # 6 CPU: 2935.119
Node # 6 CPU: 2935.119
Node # 6 CPU: 2935.119
Node # 7 CPU: 2918.990
Node # 7 CPU: 2918.986
Node # 7 CPU: 2918.986
Node # 7 CPU: 2918.986
ROMS/TOMS - Output NetCDF summary for Grid 04:
number of time records written in HISTORY file = 00000002
number of time records written in RESTART file = 00000001
number of time records written in AVERAGE file = 00000001
Analytical header files used:
ROMS/Functionals/ana_btflux.h
ROMS/TOMS: DONE... Saturday - November 29, 2014 - 10:52:24 AM
cheers
fereshte
- Attachments
-
- 16core.txt
- coupled and nested model's output by using 16 cores
- (160.63 KiB) Downloaded 315 times
-
- 24core.txt
- coupled and nested model's output by using 24 cores
- (165.33 KiB) Downloaded 319 times
Re: one problem in run time
You are spending a significant fraction of the time in communications, so no, I would not expect great speedups when adding more processors.
One person reported much better performance from the ROMS_Agrif code over both the Rutgers ROMS and the COAWST ROMS when using the nesting. You might try that.
One person reported much better performance from the ROMS_Agrif code over both the Rutgers ROMS and the COAWST ROMS when using the nesting. You might try that.
- jivica
- Posts: 172
- Joined: Mon May 05, 2003 2:41 pm
- Location: The University of Western Australia, Perth, Australia
- Contact:
Re: one problem in run time
Well you can get better interconnection btw nodes to get better efficiency;
For example I setup cluster last week (4 x Dell r620 E5 with 40 cores) using infiniband Mellanox FDR 56G/s (!!) and I have really nice low latency and high speed communication.
So, in short you hit the wall, you have to change structure.
As Kate told, you can have small threads and all you are exchanging are ghost points.
Good luck
Ivica
For example I setup cluster last week (4 x Dell r620 E5 with 40 cores) using infiniband Mellanox FDR 56G/s (!!) and I have really nice low latency and high speed communication.
So, in short you hit the wall, you have to change structure.
As Kate told, you can have small threads and all you are exchanging are ghost points.
Good luck
Ivica
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: one problem in run time
Actually, this has a good explanation that mostly everybody is not aware. The barotropic engine in either the UCLA ROMS or ROMS_Agrid (I believe) it is much simpler that our version fo ROMS. It has less number of right-hand-side terms (like the stress tensor, horizontal advection, etc). These terms are not resolved for the barotropic time scales and enter via the vertical integrated residual terms rufrc and rvfrc. The time-step is also different (forward/backward) and I believe that larger baroclinic/barotropic time-step is possible. Well, this what Sasha had told me in the past. I actually have been see the latest ROMS versions of their code in very long time.kate wrote: One person reported much better performance from the ROMS_Agrif code over both the Rutgers ROMS and the COAWST ROMS when using the nesting. You might try that.
Since the barotropic engine (predictor and corrector steps) is the most expensive part in ROMS, you will get better performance if step2d has less terms and the code become much faster. I think that this strategy makes sense to me and Sasha has carefully designed and test it. This is my to do list. For me is kind of complex because I will have to rewrite the adjoint and tangent linear (perturbation and finite amplitude transformations) version of several routines. In addition, I will have to rework the adjoint of the time indices which is not that trivial...
-
- Posts: 79
- Joined: Sun Dec 30, 2012 2:58 pm
- Location: inio:Iranian National Institute for Oceanography
Re: one problem in run time
Dear friends
many thanks for your replies.
many thanks for your replies.
from which part you found this?You are spending a significant fraction of the time in communications
Re: one problem in run time
For instance, here:
This is your finest, most expensive grid. You are spending 22% in initialization, so perhaps this is a short run. This fraction will only go down for longer runs, meaning that 37% of time communicating for 62% of the time on this grid is really very significant. Well, the grid fractions add up to more than 100% so I'm not positive how this works, but it still seems that 22% in multi-model coupling and 9.5% in 2D halo exchanges has got to hurt.
The rule of thumb is that once you are spending 50% of your time in communications, there is absolutely no point in adding more processes to your job - you would instead slow it down.
Code: Select all
Nonlinear model elapsed time profile:
Initialization ................................... 10363.228 (22.7785 %)
OI data assimilation ............................. 0.116 ( 0.0003 %)
Reading of input data ............................ 2087.002 ( 4.5873 %)
Processing of input data ......................... 2220.687 ( 4.8811 %)
Processing of output time averaged data .......... 186.704 ( 0.4104 %)
Computation of vertical boundary conditions ...... 16.649 ( 0.0366 %)
Computation of global information integrals ...... 26.202 ( 0.0576 %)
Writing of output data ........................... 296.855 ( 0.6525 %)
Model 2D kernel .................................. 11631.991 (25.5673 %)
2D/3D coupling, vertical metrics ................. 54.816 ( 0.1205 %)
Omega vertical velocity .......................... 48.727 ( 0.1071 %)
Equation of state for seawater ................... 82.713 ( 0.1818 %)
GLS vertical mixing parameterization ............. 284.102 ( 0.6245 %)
3D equations right-side terms .................... 641.244 ( 1.4095 %)
3D equations predictor step ...................... 186.348 ( 0.4096 %)
Pressure gradient ................................ 63.140 ( 0.1388 %)
Harmonic mixing of tracers, S-surfaces ........... 15.585 ( 0.0343 %)
Harmonic stress tensor, S-surfaces ............... 32.722 ( 0.0719 %)
Corrector time-step for 3D momentum .............. 264.573 ( 0.5815 %)
Corrector time-step for tracers .................. 77.873 ( 0.1712 %)
Total: 28581.274 62.8221
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 4357.292 ( 9.5774 %)
Message Passage: 3D halo exchanges ............... 564.511 ( 1.2408 %)
Message Passage: 4D halo exchanges ............... 194.072 ( 0.4266 %)
Message Passage: data broadcast .................. 249.944 ( 0.5494 %)
Message Passage: data reduction .................. 24.198 ( 0.0532 %)
Message Passage: data gathering .................. 967.784 ( 2.1272 %)
Message Passage: data scattering.................. 128.240 ( 0.2819 %)
Message Passage: multi-model coupling ............ 10452.069 (22.9738 %)
Total: 16938.110 37.2302
The rule of thumb is that once you are spending 50% of your time in communications, there is absolutely no point in adding more processes to your job - you would instead slow it down.