one problem in run time

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
fereshteh
Posts: 79
Joined: Sun Dec 30, 2012 2:58 pm
Location: inio:Iranian National Institute for Oceanography

one problem in run time

#1 Unread post by fereshteh »

Dear
it is strange for me that by increasing the number of processors form 16 to 24 ( NnodesWAV = 12 and NnodesOCN = 12), run time improved just 2 hours (it takes 12 and 10 hours long, respectively). is it reasonable :?: I do not know there is some thing wrong in my model setup (Investigated model is ROMS and SWAN coupled in refined sate for 4 grids) Or is source code?
How can i diagnostic what is problem (if there is problem).
I become so appreciate to receive any help to find if model is working in the best and correct way.
Your experience will help me. Thanks for your kindness to help me
cheers
fereshte
Attachments
log.txt
(211.9 KiB) Downloaded 333 times

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: one problem in run time

#2 Unread post by kate »

You might try turning on the profile output as described here. Your grid 2 at least doesn't have enough grid points in it to support much parallelism. Also, I thought more than 2 grids wasn't working these days, unless you still have the older COAWST. Which version are you running?

fereshteh
Posts: 79
Joined: Sun Dec 30, 2012 2:58 pm
Location: inio:Iranian National Institute for Oceanography

Re: one problem in run time

#3 Unread post by fereshteh »

Code: Select all

You might try turning on the profile output
if your mean is to active 'time profile' , I have activated it and then ran same test (coupled and nested for 4 grids)by using 24 cores and then 16 cores of different PC systems (each one Processor is Intel® Core™ i7-4770K CPU @ 3.50GHz × 8 ) . As you can see in attached files, run time when 24 cores are active is more that when 16 cores are active, about 66 min and 52 min, respectively. Also run time of same test which just 2 girds are used, by using 16 cores is more (about 2 times) than when 24 cores is used :shock: It is too strange :?: I request you to help me to find what the problem is.I cant find its reason.
Also you are on right I am using older version of COAWST (3.0).

(NtileI= 3, NtileJ=4) mpirun -np 24 -f ./machinefile ./coawstM coupling.in

Code: Select all

Model Coupling: 
       Ocean Model MPI nodes: 000 - 011
       Waves Model MPI nodes: 012 - 023
 Process Information:
 Node #  8 (pid=   26691) is active.
 Node #  9 (pid=   26692) is active.
 Node # 10 (pid=   26693) is active.
 Node # 11 (pid=   26694) is active.
 Node #  0 (pid=   31850) is active.
 Node #  1 (pid=   31851) is active.
 Node #  2 (pid=   31852) is active.
 Node #  3 (pid=   31853) is active.
 Node #  4 (pid=   31854) is active.
 Node #  5 (pid=   31855) is active.
 Node #  6 (pid=   31856) is active.
 Node #  7 (pid=   31857) is active.
 Node #  0 (pid=   31850) is active.
 Node #  1 (pid=   31851) is active.
 Node #  2 (pid=   31852) is active.
 Node #  3 (pid=   31853) is active.
 Node #  4 (pid=   31854) is active.
 Node #  5 (pid=   31855) is active.
 Node #  6 (pid=   31856) is active.
 Node #  7 (pid=   31857) is active.
 Node #  8 (pid=   26691) is active.
 Node #  1 (pid=   31851) is active.
 Node #  9 (pid=   26692) is active.
 Node #  2 (pid=   31852) is active.
 Node # 10 (pid=   26693) is active.
 Node # 10 (pid=   26693) is active.
 Node #  3 (pid=   31853) is active.
 Node # 11 (pid=   26694) is active.
 Node # 11 (pid=   26694) is active.
 Node #  4 (pid=   31854) is active.
 Node #  8 (pid=   26691) is active.
 Node #  5 (pid=   31855) is active.
 Node #  9 (pid=   26692) is active.
 Node #  6 (pid=   31856) is active.
 Node #  7 (pid=   31857) is active.
 Node #  0 (pid=   31850) is active.
 Node #  1 (pid=   31851) is active.
 Node #  2 (pid=   31852) is active.
 Node #  3 (pid=   31853) is active.
 Node #  4 (pid=   31854) is active.
 Node #  5 (pid=   31855) is active.
 Node #  6 (pid=   31856) is active.
 Node #  7 (pid=   31857) is active.
 Node #  8 (pid=   26691) is active.
 Node #  0 (pid=   31850) is active.
 Node #  9 (pid=   26692) is active.
 Node # 10 (pid=   26693) is active.
 Node # 11 (pid=   26694) is active.
 Model Input Parameters:  ROMS/TOMS version 3.4  
                          Thursday - November 27, 2014 -  6:14:10 PM
.....
....
Elapsed CPU time (seconds):
 Node   #  8 CPU:    3790.685
 Node   #  9 CPU:    3788.949
 Node   # 10 CPU:    3805.866
 Node   # 11 CPU:    3769.084
 Node   #  0 CPU:    3764.847
 Node   #  1 CPU:    3803.442
 Node   #  2 CPU:    3809.394
 Node   #  3 CPU:    3802.290
 Node   #  4 CPU:    3770.040
 Node   #  5 CPU:    3784.465
 Node   #  6 CPU:    3808.146
 Node   #  7 CPU:    3799.149
 Total:             45496.355
 Nonlinear model elapsed time profile:
  Initialization ...................................      5000.781  (10.9916 %)
  OI data assimilation .............................         0.028  ( 0.0001 %)
  Reading of input data ............................        30.118  ( 0.0662 %)
  Processing of input data .........................        28.910  ( 0.0635 %)
  Processing of output time averaged data ..........         6.072  ( 0.0133 %)
  Computation of vertical boundary conditions ......         0.440  ( 0.0010 %)
  Computation of global information integrals ......         0.312  ( 0.0007 %)
  Writing of output data ...........................       164.778  ( 0.3622 %)
  Model 2D kernel ..................................       142.437  ( 0.3131 %)
  2D/3D coupling, vertical metrics .................         1.316  ( 0.0029 %)
  Omega vertical velocity ..........................         1.136  ( 0.0025 %)
  Equation of state for seawater ...................         1.740  ( 0.0038 %)
  GLS vertical mixing parameterization .............         4.032  ( 0.0089 %)
  3D equations right-side terms ....................        13.645  ( 0.0300 %)
  3D equations predictor step ......................         2.652  ( 0.0058 %)
  Pressure gradient ................................         1.260  ( 0.0028 %)
  Harmonic mixing of tracers, S-surfaces ...........         0.148  ( 0.0003 %)
  Harmonic stress tensor, S-surfaces ...............         0.316  ( 0.0007 %)
  Corrector time-step for 3D momentum ..............         4.508  ( 0.0099 %)
  Corrector time-step for tracers ..................         1.856  ( 0.0041 %)
                                              Total:      5406.486   11.8833
 Nonlinear model message Passage profile:
  Message Passage: 2D halo exchanges ...............        76.425  ( 0.1680 %)
  Message Passage: 3D halo exchanges ...............        11.585  ( 0.0255 %)
  Message Passage: 4D halo exchanges ...............         5.100  ( 0.0112 %)
  Message Passage: data broadcast ..................       155.858  ( 0.3426 %)
  Message Passage: data reduction ..................         1.220  ( 0.0027 %)
  Message Passage: data gathering ..................        69.524  ( 0.1528 %)
  Message Passage: data scattering..................        41.379  ( 0.0909 %)
  Message Passage: multi-model coupling ............      4927.572  (10.8307 %)
                                              Total:      5288.663   11.6244
 All percentages are with respect to total time =        45496.355
 Node   #  0 CPU:    3764.771
 Node   #  1 CPU:    3803.334
 Node   #  2 CPU:    3809.294
 Node   #  3 CPU:    3802.198
 Node   #  4 CPU:    3769.940
 Node   #  5 CPU:    3784.369
 Node   #  6 CPU:    3808.046
 Node   #  7 CPU:    3799.053
 Node   #  8 CPU:    3790.689
 Node   #  9 CPU:    3788.949
 Node   # 10 CPU:    3805.866
 Node   # 11 CPU:    3769.100
 Total:             45495.607
 Nonlinear model elapsed time profile:
  Initialization ...................................       503.879  ( 1.1075 %)
  OI data assimilation .............................         0.036  ( 0.0001 %)
  Reading of input data ............................       675.958  ( 1.4858 %)
  Processing of input data .........................       732.962  ( 1.6111 %)
  Processing of output time averaged data ..........         9.121  ( 0.0200 %)
  Computation of vertical boundary conditions ......         0.616  ( 0.0014 %)
  Computation of global information integrals ......         0.872  ( 0.0019 %)
  Writing of output data ...........................        21.289  ( 0.0468 %)
  Model 2D kernel ..................................        76.397  ( 0.1679 %)
  2D/3D coupling, vertical metrics .................         2.364  ( 0.0052 %)
  Omega vertical velocity ..........................         1.428  ( 0.0031 %)
  Equation of state for seawater ...................         2.944  ( 0.0065 %)
  GLS vertical mixing parameterization .............         3.068  ( 0.0067 %)
  3D equations right-side terms ....................        11.097  ( 0.0244 %)
  3D equations predictor step ......................         1.296  ( 0.0028 %)
  Pressure gradient ................................         0.172  ( 0.0004 %)
  Harmonic mixing of tracers, S-surfaces ...........         0.020  ( 0.0000 %)
  Harmonic stress tensor, S-surfaces ...............         0.048  ( 0.0001 %)
  Corrector time-step for 3D momentum ..............         3.260  ( 0.0072 %)
  Corrector time-step for tracers ..................         8.925  ( 0.0196 %)
                                              Total:      2055.752    4.5186
 Nonlinear model message Passage profile:
  Message Passage: 2D halo exchanges ...............        93.814  ( 0.2062 %)
  Message Passage: 3D halo exchanges ...............        21.513  ( 0.0473 %)
  Message Passage: 4D halo exchanges ...............        15.533  ( 0.0341 %)
  Message Passage: data broadcast ..................        43.819  ( 0.0963 %)
  Message Passage: data reduction ..................         1.896  ( 0.0042 %)
  Message Passage: data gathering ..................         3.676  ( 0.0081 %)
  Message Passage: data scattering..................         3.956  ( 0.0087 %)
  Message Passage: multi-model coupling ............      1038.989  ( 2.2837 %)
                                              Total:      1223.196    2.6886
 All percentages are with respect to total time =        45495.607
 Node   #  0 CPU:    3764.783
 Node   #  2 CPU:    3809.302
 Node   #  1 CPU:    3803.346
 Node   #  3 CPU:    3802.206
 Node   #  4 CPU:    3769.948
 Node   #  5 CPU:    3784.381
 Node   #  6 CPU:    3808.054
 Node   #  7 CPU:    3799.061
 Node   #  8 CPU:    3790.673
 Node   #  9 CPU:    3788.933
 Node   # 10 CPU:    3805.854
 Node   # 11 CPU:    3769.084
 Total:             45495.623
 Nonlinear model elapsed time profile:
  Initialization ...................................      1341.936  ( 2.9496 %)
  OI data assimilation .............................         0.036  ( 0.0001 %)
  Reading of input data ............................      8396.373  (18.4553 %)
  Processing of input data .........................      8537.314  (18.7651 %)
  Processing of output time averaged data ..........        31.646  ( 0.0696 %)
  Computation of vertical boundary conditions ......         3.656  ( 0.0080 %)
  Computation of global information integrals ......         5.324  ( 0.0117 %)
  Writing of output data ...........................        61.180  ( 0.1345 %)
  Model 2D kernel ..................................       761.656  ( 1.6741 %)
  2D/3D coupling, vertical metrics .................        12.093  ( 0.0266 %)
  Omega vertical velocity ..........................         9.481  ( 0.0208 %)
  Equation of state for seawater ...................        20.325  ( 0.0447 %)
  GLS vertical mixing parameterization .............        30.874  ( 0.0679 %)
  3D equations right-side terms ....................        81.801  ( 0.1798 %)
  3D equations predictor step ......................        14.309  ( 0.0315 %)
  Pressure gradient ................................         4.240  ( 0.0093 %)
  Harmonic mixing of tracers, S-surfaces ...........         0.720  ( 0.0016 %)
  Harmonic stress tensor, S-surfaces ...............         0.900  ( 0.0020 %)
  Corrector time-step for 3D momentum ..............        32.710  ( 0.0719 %)
  Corrector time-step for tracers ..................        49.007  ( 0.1077 %)
                                              Total:     19395.580   42.6317
 Nonlinear model message Passage profile:
  Message Passage: 2D halo exchanges ...............       675.386  ( 1.4845 %)
  Message Passage: 3D halo exchanges ...............       136.072  ( 0.2991 %)
  Message Passage: 4D halo exchanges ...............       178.419  ( 0.3922 %)
  Message Passage: data broadcast ..................        84.773  ( 0.1863 %)
  Message Passage: data reduction ..................         6.952  ( 0.0153 %)
  Message Passage: data gathering ..................        35.338  ( 0.0777 %)
  Message Passage: data scattering..................        18.937  ( 0.0416 %)
  Message Passage: multi-model coupling ............      1553.393  ( 3.4144 %)
                                              Total:      2689.272    5.9111
 All percentages are with respect to total time =        45495.623
 Node   #  0 CPU:    3764.771
 Node   #  2 CPU:    3809.294
 Node   #  3 CPU:    3802.198
 Node   #  1 CPU:    3803.330
 Node   #  4 CPU:    3769.940
 Node   #  5 CPU:    3784.369
 Node   #  6 CPU:    3808.042
 Node   #  7 CPU:    3799.049
 Node   #  8 CPU:    3790.681
 Node   #  9 CPU:    3788.945
 Node   # 10 CPU:    3805.862
 Node   # 11 CPU:    3769.092
 Total:             45495.571
 Nonlinear model elapsed time profile:
  Initialization ...................................     10363.228  (22.7785 %)
  OI data assimilation .............................         0.116  ( 0.0003 %)
  Reading of input data ............................      2087.002  ( 4.5873 %)
  Processing of input data .........................      2220.687  ( 4.8811 %)
  Processing of output time averaged data ..........       186.704  ( 0.4104 %)
  Computation of vertical boundary conditions ......        16.649  ( 0.0366 %)
  Computation of global information integrals ......        26.202  ( 0.0576 %)
  Writing of output data ...........................       296.855  ( 0.6525 %)
  Model 2D kernel ..................................     11631.991  (25.5673 %)
  2D/3D coupling, vertical metrics .................        54.816  ( 0.1205 %)
  Omega vertical velocity ..........................        48.727  ( 0.1071 %)
  Equation of state for seawater ...................        82.713  ( 0.1818 %)
  GLS vertical mixing parameterization .............       284.102  ( 0.6245 %)
  3D equations right-side terms ....................       641.244  ( 1.4095 %)
  3D equations predictor step ......................       186.348  ( 0.4096 %)
  Pressure gradient ................................        63.140  ( 0.1388 %)
  Harmonic mixing of tracers, S-surfaces ...........        15.585  ( 0.0343 %)
  Harmonic stress tensor, S-surfaces ...............        32.722  ( 0.0719 %)
  Corrector time-step for 3D momentum ..............       264.573  ( 0.5815 %)
  Corrector time-step for tracers ..................        77.873  ( 0.1712 %)
                                              Total:     28581.274   62.8221
 Nonlinear model message Passage profile:
  Message Passage: 2D halo exchanges ...............      4357.292  ( 9.5774 %)
  Message Passage: 3D halo exchanges ...............       564.511  ( 1.2408 %)
  Message Passage: 4D halo exchanges ...............       194.072  ( 0.4266 %)
  Message Passage: data broadcast ..................       249.944  ( 0.5494 %)
  Message Passage: data reduction ..................        24.198  ( 0.0532 %)
  Message Passage: data gathering ..................       967.784  ( 2.1272 %)
  Message Passage: data scattering..................       128.240  ( 0.2819 %)
  Message Passage: multi-model coupling ............     10452.069  (22.9738 %)
                                              Total:     16938.110   37.2302
 All percentages are with respect to total time =        45495.571
 ROMS/TOMS - Output NetCDF summary for Grid 01:
             number of time records written in HISTORY file = 00000002
             number of time records written in RESTART file = 00000001
             number of time records written in AVERAGE file = 00000001
 ROMS/TOMS - Output NetCDF summary for Grid 02:
             number of time records written in HISTORY file = 00000002
             number of time records written in RESTART file = 00000001
             number of time records written in AVERAGE file = 00000001
 ROMS/TOMS - Output NetCDF summary for Grid 03:
             number of time records written in HISTORY file = 00000002
             number of time records written in RESTART file = 00000001
             number of time records written in AVERAGE file = 00000001
 ROMS/TOMS - Output NetCDF summary for Grid 04:
             number of time records written in HISTORY file = 00000002
             number of time records written in RESTART file = 00000001
             number of time records written in AVERAGE file = 00000001
 Analytical header files used:
     ROMS/Functionals/ana_btflux.h
 ROMS/TOMS: DONE... Thursday - November 27, 2014 -  7:18:47 PM 
(NtileI= 2, NtileJ=4) mpirun -np 16 -f ./machinefile ./coawstM coupling.in

Code: Select all

Model Coupling: 
       Ocean Model MPI nodes: 000 - 007
       Waves Model MPI nodes: 008 - 015
 Process Information:
SWAN grid   1 is preparing computation
 Node #  0 (pid=    6495) is active.
 Node #  1 (pid=    6496) is active.
 Node #  2 (pid=    6497) is active.
 Node #  4 (pid=    6499) is active.
 Node #  5 (pid=    6500) is active.
 Node #  6 (pid=    6501) is active.
 Node #  0 (pid=    6495) is active.
 Node #  0 (pid=    6495) is active.
 Node #  0 (pid=    6495) is active.
 Model Input Parameters:  ROMS/TOMS version 3.4  
                          Saturday - November 29, 2014 - 10:02:19 AM
........
.........
 Elapsed CPU time (seconds):
 Node   #  0 CPU:    2866.715
 Total:             23291.540
 Nonlinear model elapsed time profile:
  Initialization ...................................      2092.783  ( 8.9852 %)
  OI data assimilation .............................         0.004  ( 0.0000 %)
  Reading of input data ............................         1.248  ( 0.0054 %)
  Processing of input data .........................         0.660  ( 0.0028 %)
  Processing of output time averaged data ..........         3.048  ( 0.0131 %)
  Computation of vertical boundary conditions ......         0.012  ( 0.0001 %)
  Computation of global information integrals ......         0.084  ( 0.0004 %)
  Writing of output data ...........................        48.815  ( 0.2096 %)
  Model 2D kernel ..................................       138.245  ( 0.5935 %)
  2D/3D coupling, vertical metrics .................         0.168  ( 0.0007 %)
  Omega vertical velocity ..........................         0.240  ( 0.0010 %)
  Equation of state for seawater ...................         0.268  ( 0.0012 %)
  GLS vertical mixing parameterization .............         2.552  ( 0.0110 %)
  3D equations right-side terms ....................         6.296  ( 0.0270 %)
  3D equations predictor step ......................         2.500  ( 0.0107 %)
  Pressure gradient ................................         1.220  ( 0.0052 %)
  Harmonic mixing of tracers, S-surfaces ...........         0.356  ( 0.0015 %)
  Harmonic stress tensor, S-surfaces ...............         0.688  ( 0.0030 %)
  Corrector time-step for 3D momentum ..............         2.468  ( 0.0106 %)
  Corrector time-step for tracers ..................         1.236  ( 0.0053 %)
                                              Total:      2302.892    9.8872
 Nonlinear model message Passage profile:
  Message Passage: 2D halo exchanges ...............        13.945  ( 0.0599 %)
  Message Passage: 3D halo exchanges ...............         0.716  ( 0.0031 %)
  Message Passage: 4D halo exchanges ...............         0.580  ( 0.0025 %)
  Message Passage: data broadcast ..................        43.507  ( 0.1868 %)
  Message Passage: data reduction ..................         0.108  ( 0.0005 %)
  Message Passage: data gathering ..................         5.600  ( 0.0240 %)
  Message Passage: data scattering..................         0.564  ( 0.0024 %)
  Message Passage: multi-model coupling ............      2094.919  ( 8.9943 %)
                                              Total:      2159.939    9.2735
 All percentages are with respect to total time =        23291.540
 Node   #  0 CPU:    2866.707
 Total:             23291.464
 Nonlinear model elapsed time profile:
  Initialization ...................................       155.950  ( 0.6696 %)
  Reading of input data ............................        13.237  ( 0.0568 %)
  Processing of input data .........................        17.889  ( 0.0768 %)
  Processing of output time averaged data ..........         0.788  ( 0.0034 %)
  Computation of vertical boundary conditions ......         0.008  ( 0.0000 %)
  Computation of global information integrals ......         0.016  ( 0.0001 %)
  Writing of output data ...........................         6.984  ( 0.0300 %)
  Model 2D kernel ..................................        14.013  ( 0.0602 %)
  2D/3D coupling, vertical metrics .................         0.068  ( 0.0003 %)
  Omega vertical velocity ..........................         0.052  ( 0.0002 %)
  Equation of state for seawater ...................         0.084  ( 0.0004 %)
  GLS vertical mixing parameterization .............         0.684  ( 0.0029 %)
  3D equations right-side terms ....................         1.692  ( 0.0073 %)
  3D equations predictor step ......................         0.812  ( 0.0035 %)
  Pressure gradient ................................         0.180  ( 0.0008 %)
  Harmonic mixing of tracers, S-surfaces ...........         0.064  ( 0.0003 %)
  Harmonic stress tensor, S-surfaces ...............         0.104  ( 0.0004 %)
  Corrector time-step for 3D momentum ..............         0.940  ( 0.0040 %)
  Corrector time-step for tracers ..................         0.396  ( 0.0017 %)
                                              Total:       213.961    0.9186
 Nonlinear model message Passage profile:
  Message Passage: 2D halo exchanges ...............         4.052  ( 0.0174 %)
  Message Passage: 3D halo exchanges ...............         0.448  ( 0.0019 %)
  Message Passage: 4D halo exchanges ...............         0.448  ( 0.0019 %)
  Message Passage: data broadcast ..................         7.252  ( 0.0311 %)
  Message Passage: data reduction ..................         0.260  ( 0.0011 %)
  Message Passage: data gathering ..................         0.448  ( 0.0019 %)
  Message Passage: data scattering..................         0.012  ( 0.0001 %)
  Message Passage: multi-model coupling ............       834.144  ( 3.5813 %)
                                              Total:       847.065    3.6368
 All percentages are with respect to total time =        23291.464
 Node   #  0 CPU:    2866.707
 Total:             23291.460
 Nonlinear model elapsed time profile:
  Initialization ...................................       628.551  ( 2.6986 %)
  OI data assimilation .............................         0.004  ( 0.0000 %)
  Reading of input data ............................       167.775  ( 0.7203 %)
  Processing of input data .........................       176.711  ( 0.7587 %)
  Processing of output time averaged data ..........        15.045  ( 0.0646 %)
  Computation of vertical boundary conditions ......         0.172  ( 0.0007 %)
  Computation of global information integrals ......         0.380  ( 0.0016 %)
  Writing of output data ...........................        14.873  ( 0.0639 %)
  Model 2D kernel ..................................       644.328  ( 2.7664 %)
  2D/3D coupling, vertical metrics .................         1.096  ( 0.0047 %)
  Omega vertical velocity ..........................         1.108  ( 0.0048 %)
  Equation of state for seawater ...................         1.832  ( 0.0079 %)
  GLS vertical mixing parameterization .............        13.861  ( 0.0595 %)
  3D equations right-side terms ....................        30.438  ( 0.1307 %)
  3D equations predictor step ......................        15.265  ( 0.0655 %)
  Pressure gradient ................................         4.272  ( 0.0183 %)
  Harmonic mixing of tracers, S-surfaces ...........         1.508  ( 0.0065 %)
  Harmonic stress tensor, S-surfaces ...............         1.596  ( 0.0069 %)
  Corrector time-step for 3D momentum ..............        16.321  ( 0.0701 %)
  Corrector time-step for tracers ..................         6.800  ( 0.0292 %)
                                              Total:      1741.937    7.4789
 Nonlinear model message Passage profile:
  Message Passage: 2D halo exchanges ...............       122.740  ( 0.5270 %)
  Message Passage: 3D halo exchanges ...............         9.305  ( 0.0399 %)
  Message Passage: 4D halo exchanges ...............         5.128  ( 0.0220 %)
  Message Passage: data broadcast ..................        14.653  ( 0.0629 %)
  Message Passage: data reduction ..................         0.032  ( 0.0001 %)
  Message Passage: data gathering ..................        18.177  ( 0.0780 %)
  Message Passage: data scattering..................         0.076  ( 0.0003 %)
  Message Passage: multi-model coupling ............      1073.223  ( 4.6078 %)
                                              Total:      1243.334    5.3382
 All percentages are with respect to total time =        23291.460
 Node   #  0 CPU:    2866.707
 Total:             23291.464
 Nonlinear model elapsed time profile:
  Initialization ...................................      4457.643  (19.1385 %)
  OI data assimilation .............................         0.008  ( 0.0000 %)
  Reading of input data ............................        85.221  ( 0.3659 %)
  Processing of input data .........................       149.581  ( 0.6422 %)
  Processing of output time averaged data ..........       228.878  ( 0.9827 %)
  Computation of vertical boundary conditions ......         2.900  ( 0.0125 %)
  Computation of global information integrals ......         8.529  ( 0.0366 %)
  Writing of output data ...........................        87.185  ( 0.3743 %)
  Model 2D kernel ..................................     11264.152  (48.3617 %)
  2D/3D coupling, vertical metrics .................        11.349  ( 0.0487 %)
  Omega vertical velocity ..........................        15.169  ( 0.0651 %)
  Equation of state for seawater ...................        21.441  ( 0.0921 %)
  GLS vertical mixing parameterization .............       224.462  ( 0.9637 %)
  3D equations right-side terms ....................       557.083  ( 2.3918 %)
  3D equations predictor step ......................       218.934  ( 0.9400 %)
  Pressure gradient ................................        65.376  ( 0.2807 %)
  Harmonic mixing of tracers, S-surfaces ...........        34.790  ( 0.1494 %)
  Harmonic stress tensor, S-surfaces ...............        53.955  ( 0.2317 %)
  Corrector time-step for 3D momentum ..............       195.736  ( 0.8404 %)
  Corrector time-step for tracers ..................        92.322  ( 0.3964 %)
                                              Total:     17774.715   76.3143
 Nonlinear model message Passage profile:
  Message Passage: 2D halo exchanges ...............      1175.954  ( 5.0489 %)
  Message Passage: 3D halo exchanges ...............        78.409  ( 0.3366 %)
  Message Passage: 4D halo exchanges ...............        37.562  ( 0.1613 %)
  Message Passage: data broadcast ..................        78.609  ( 0.3375 %)
  Message Passage: data reduction ..................         1.708  ( 0.0073 %)
  Message Passage: data gathering ..................        43.431  ( 0.1865 %)
  Message Passage: data scattering..................         1.264  ( 0.0054 %)
  Message Passage: multi-model coupling ............      4798.112  (20.6003 %)
                                              Total:      6215.049   26.6838
 All percentages are with respect to total time =        23291.464
 ROMS/TOMS - Output NetCDF summary for Grid 01:
             number of time records written in HISTORY file = 00000002
             number of time records written in RESTART file = 00000001
             number of time records written in AVERAGE file = 00000001
 ROMS/TOMS - Output NetCDF summary for Grid 02:
             number of time records written in HISTORY file = 00000002
             number of time records written in RESTART file = 00000001
             number of time records written in AVERAGE file = 00000001
 ROMS/TOMS - Output NetCDF summary for Grid 03:
             number of time records written in HISTORY file = 00000002
             number of time records written in RESTART file = 00000001
             number of time records written in AVERAGE file = 00000001
 Node   #  1 CPU:    2910.318
 Node   #  1 CPU:    2910.306
 Node   #  1 CPU:    2910.306
 Node   #  1 CPU:    2910.306
 Node   #  2 CPU:    2902.869
 Node   #  2 CPU:    2902.861
 Node   #  2 CPU:    2902.861
 Node   #  2 CPU:    2902.861
 Node   #  3 CPU:    2904.386
 Node   #  3 CPU:    2904.374
 Node   #  3 CPU:    2904.374
 Node   #  3 CPU:    2904.374
 Node   #  4 CPU:    2917.582
 Node   #  4 CPU:    2917.574
 Node   #  4 CPU:    2917.570
 Node   #  4 CPU:    2917.574
 Node   #  5 CPU:    2935.547
 Node   #  5 CPU:    2935.535
 Node   #  5 CPU:    2935.535
 Node   #  5 CPU:    2935.535
 Node   #  6 CPU:    2935.131
 Node   #  6 CPU:    2935.119
 Node   #  6 CPU:    2935.119
 Node   #  6 CPU:    2935.119
 Node   #  7 CPU:    2918.990
 Node   #  7 CPU:    2918.986
 Node   #  7 CPU:    2918.986
 Node   #  7 CPU:    2918.986
 ROMS/TOMS - Output NetCDF summary for Grid 04:
             number of time records written in HISTORY file = 00000002
             number of time records written in RESTART file = 00000001
             number of time records written in AVERAGE file = 00000001
 Analytical header files used:
     ROMS/Functionals/ana_btflux.h
 ROMS/TOMS: DONE... Saturday - November 29, 2014 - 10:52:24 AM
many thanks for your attention
cheers
fereshte
Attachments
16core.txt
coupled and nested model's output by using 16 cores
(160.63 KiB) Downloaded 315 times
24core.txt
coupled and nested model's output by using 24 cores
(165.33 KiB) Downloaded 319 times

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: one problem in run time

#4 Unread post by kate »

You are spending a significant fraction of the time in communications, so no, I would not expect great speedups when adding more processors.

One person reported much better performance from the ROMS_Agrif code over both the Rutgers ROMS and the COAWST ROMS when using the nesting. You might try that.

User avatar
jivica
Posts: 172
Joined: Mon May 05, 2003 2:41 pm
Location: The University of Western Australia, Perth, Australia
Contact:

Re: one problem in run time

#5 Unread post by jivica »

Well you can get better interconnection btw nodes to get better efficiency;
For example I setup cluster last week (4 x Dell r620 E5 with 40 cores) using infiniband Mellanox FDR 56G/s (!!) and I have really nice low latency and high speed communication.
So, in short you hit the wall, you have to change structure.
As Kate told, you can have small threads and all you are exchanging are ghost points.
Good luck
Ivica

User avatar
arango
Site Admin
Posts: 1367
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: one problem in run time

#6 Unread post by arango »

kate wrote: One person reported much better performance from the ROMS_Agrif code over both the Rutgers ROMS and the COAWST ROMS when using the nesting. You might try that.
Actually, this has a good explanation that mostly everybody is not aware. The barotropic engine in either the UCLA ROMS or ROMS_Agrid (I believe) it is much simpler that our version fo ROMS. It has less number of right-hand-side terms (like the stress tensor, horizontal advection, etc). These terms are not resolved for the barotropic time scales and enter via the vertical integrated residual terms rufrc and rvfrc. The time-step is also different (forward/backward) and I believe that larger baroclinic/barotropic time-step is possible. Well, this what Sasha had told me in the past. I actually have been see the latest ROMS versions of their code in very long time.

Since the barotropic engine (predictor and corrector steps) is the most expensive part in ROMS, you will get better performance if step2d has less terms and the code become much faster. I think that this strategy makes sense to me and Sasha has carefully designed and test it. This is my to do list. For me is kind of complex because I will have to rewrite the adjoint and tangent linear (perturbation and finite amplitude transformations) version of several routines. In addition, I will have to rework the adjoint of the time indices which is not that trivial...

fereshteh
Posts: 79
Joined: Sun Dec 30, 2012 2:58 pm
Location: inio:Iranian National Institute for Oceanography

Re: one problem in run time

#7 Unread post by fereshteh »

Dear friends
many thanks for your replies.
You are spending a significant fraction of the time in communications
from which part you found this?

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: one problem in run time

#8 Unread post by kate »

For instance, here:

Code: Select all

Nonlinear model elapsed time profile:
  Initialization ...................................     10363.228  (22.7785 %)
  OI data assimilation .............................         0.116  ( 0.0003 %)
  Reading of input data ............................      2087.002  ( 4.5873 %)
  Processing of input data .........................      2220.687  ( 4.8811 %)
  Processing of output time averaged data ..........       186.704  ( 0.4104 %)
  Computation of vertical boundary conditions ......        16.649  ( 0.0366 %)
  Computation of global information integrals ......        26.202  ( 0.0576 %)
  Writing of output data ...........................       296.855  ( 0.6525 %)
  Model 2D kernel ..................................     11631.991  (25.5673 %)
  2D/3D coupling, vertical metrics .................        54.816  ( 0.1205 %)
  Omega vertical velocity ..........................        48.727  ( 0.1071 %)
  Equation of state for seawater ...................        82.713  ( 0.1818 %)
  GLS vertical mixing parameterization .............       284.102  ( 0.6245 %)
  3D equations right-side terms ....................       641.244  ( 1.4095 %)
  3D equations predictor step ......................       186.348  ( 0.4096 %)
  Pressure gradient ................................        63.140  ( 0.1388 %)
  Harmonic mixing of tracers, S-surfaces ...........        15.585  ( 0.0343 %)
  Harmonic stress tensor, S-surfaces ...............        32.722  ( 0.0719 %)
  Corrector time-step for 3D momentum ..............       264.573  ( 0.5815 %)
  Corrector time-step for tracers ..................        77.873  ( 0.1712 %)
                                              Total:     28581.274   62.8221
 Nonlinear model message Passage profile:
  Message Passage: 2D halo exchanges ...............      4357.292  ( 9.5774 %)
  Message Passage: 3D halo exchanges ...............       564.511  ( 1.2408 %)
  Message Passage: 4D halo exchanges ...............       194.072  ( 0.4266 %)
  Message Passage: data broadcast ..................       249.944  ( 0.5494 %)
  Message Passage: data reduction ..................        24.198  ( 0.0532 %)
  Message Passage: data gathering ..................       967.784  ( 2.1272 %)
  Message Passage: data scattering..................       128.240  ( 0.2819 %)
  Message Passage: multi-model coupling ............     10452.069  (22.9738 %)
                                              Total:     16938.110   37.2302
This is your finest, most expensive grid. You are spending 22% in initialization, so perhaps this is a short run. This fraction will only go down for longer runs, meaning that 37% of time communicating for 62% of the time on this grid is really very significant. Well, the grid fractions add up to more than 100% so I'm not positive how this works, but it still seems that 22% in multi-model coupling and 9.5% in 2D halo exchanges has got to hurt.

The rule of thumb is that once you are spending 50% of your time in communications, there is absolutely no point in adding more processes to your job - you would instead slow it down.

Post Reply