What do you mean by the Lanczos algorithm? Are you talking about the minimization routine,
rpcg_lanzos (RPCG)? The Lanczos vectors are stored in a NetCDF and use the adjoint model iterations over all outer and inner loops.
I recently profiled
WC13 with
W4DPSAS:
Nouter=2 and
Ninner=50 on 4 CPUs on my Mac:
Code: Select all
Elapsed CPU time (seconds):
Node # 0 CPU: 2734.718
Node # 1 CPU: 2735.069
Node # 2 CPU: 2735.295
Node # 3 CPU: 2735.149
Total: 22543.769
Nonlinear model elapsed time profile, Grid: 01
Allocation and array initialization .............. 0.094 ( 0.0009 %)
Ocean state initialization ....................... 0.015 ( 0.0001 %)
Reading of input data ............................ 0.450 ( 0.0041 %)
Processing of input data ......................... 0.596 ( 0.0055 %)
Computation of vertical boundary conditions ...... 0.086 ( 0.0008 %)
Computation of global information integrals ...... 1.635 ( 0.0149 %)
Writing of output data ........................... 1.442 ( 0.0132 %)
Model 2D kernel .................................. 37.881 ( 0.3462 %)
2D/3D coupling, vertical metrics ................. 76.908 ( 0.7030 %)
Omega vertical velocity .......................... 60.869 ( 0.5564 %)
Equation of state for seawater ................... 118.920 ( 1.0870 %)
Atmosphere-Ocean bulk flux parameterization ...... 0.971 ( 0.0089 %)
GLS vertical mixing parameterization ............. 28.591 ( 0.2613 %)
3D equations right-side terms .................... 3.249 ( 0.0297 %)
3D equations predictor step ...................... 7.552 ( 0.0690 %)
Pressure gradient ................................ 2.072 ( 0.0189 %)
Harmonic mixing of tracers, geopotentials ........ 3.532 ( 0.0323 %)
Harmonic stress tensor, S-surfaces ............... 1.700 ( 0.0155 %)
Corrector time-step for 3D momentum .............. 3.660 ( 0.0335 %)
Corrector time-step for tracers .................. 3.338 ( 0.0305 %)
Total: 353.561 3.2317
Tangent linear model elapsed time profile, Grid: 01
Ocean state initialization ....................... 0.567 ( 0.0052 %)
Reading of input data ............................ 41.545 ( 0.3797 %)
Processing of input data ......................... 183.269 ( 1.6752 %)
Computation of vertical boundary conditions ...... 2.944 ( 0.0269 %)
Computation of global information integrals ...... 32.821 ( 0.3000 %)
Writing of output data ........................... 2.365 ( 0.0216 %)
Model 2D kernel .................................. 2042.182 (18.6667 %)
2D/3D coupling, vertical metrics ................. 63.879 ( 0.5839 %)
Omega vertical velocity .......................... 88.458 ( 0.8086 %)
Equation of state for seawater ................... 249.450 ( 2.2801 %)
3D equations right-side terms .................... 177.238 ( 1.6201 %)
3D equations predictor step ...................... 431.872 ( 3.9476 %)
Pressure gradient ................................ 119.952 ( 1.0964 %)
Harmonic mixing of tracers, geopotentials ........ 137.110 ( 1.2533 %)
Harmonic stress tensor, S-surfaces ............... 77.808 ( 0.7112 %)
Corrector time-step for 3D momentum .............. 273.983 ( 2.5044 %)
Corrector time-step for tracers .................. 293.129 ( 2.6794 %)
Total: 4218.573 38.5602
Adjoint model elapsed time profile, Grid: 01
Ocean state initialization ....................... 0.425 ( 0.0039 %)
Reading of input data ............................ 42.638 ( 0.3897 %)
Processing of input data ......................... 197.069 ( 1.8013 %)
Computation of vertical boundary conditions ...... 3.994 ( 0.0365 %)
Computation of global information integrals ...... 94.125 ( 0.8604 %)
Writing of output data ........................... 4.823 ( 0.0441 %)
Model 2D kernel .................................. 3047.378 (27.8548 %)
2D/3D coupling, vertical metrics ................. 218.972 ( 2.0015 %)
Omega vertical velocity .......................... 127.261 ( 1.1632 %)
Equation of state for seawater ................... 170.072 ( 1.5546 %)
3D equations right-side terms .................... 260.863 ( 2.3844 %)
3D equations predictor step ...................... 583.112 ( 5.3300 %)
Pressure gradient ................................ 169.630 ( 1.5505 %)
Harmonic mixing of tracers, geopotentials ........ 532.145 ( 4.8641 %)
Harmonic stress tensor, S-surfaces ............... 86.664 ( 0.7922 %)
Corrector time-step for 3D momentum .............. 363.944 ( 3.3267 %)
Corrector time-step for tracers .................. 305.798 ( 2.7952 %)
Total: 6208.914 56.7531
Nonlinear model message Passage profile, Grid: 01
Message Passage: 2D halo exchanges ............... 9.305 ( 0.0851 %)
Message Passage: 3D halo exchanges ............... 61.940 ( 0.5662 %)
Message Passage: 4D halo exchanges ............... 1.017 ( 0.0093 %)
Message Passage: lateral boundary exchanges ...... 1.622 ( 0.0148 %)
Message Passage: data broadcast .................. 3.228 ( 0.0295 %)
Message Passage: data reduction .................. 0.101 ( 0.0009 %)
Message Passage: data gathering .................. 0.308 ( 0.0028 %)
Message Passage: data scattering.................. 1.469 ( 0.0134 %)
Message Passage: point data gathering ............ 0.073 ( 0.0007 %)
Total: 79.063 0.7227
Tangent linear model message Passage profile, Grid: 01
Message Passage: 2D halo exchanges ............... 285.993 ( 2.6141 %)
Message Passage: 3D halo exchanges ............... 101.907 ( 0.9315 %)
Message Passage: 4D halo exchanges ............... 56.638 ( 0.5177 %)
Message Passage: lateral boundary exchanges ...... 6.171 ( 0.0564 %)
Message Passage: data broadcast .................. 42.248 ( 0.3862 %)
Message Passage: data reduction .................. 2.566 ( 0.0235 %)
Message Passage: data gathering .................. 0.822 ( 0.0075 %)
Message Passage: data scattering.................. 9.092 ( 0.0831 %)
Message Passage: point data gathering ............ 0.589 ( 0.0054 %)
Total: 506.028 4.6254
Adjoint model message Passage profile, Grid: 01
Message Passage: 2D halo exchanges ............... 392.205 ( 3.5850 %)
Message Passage: 3D halo exchanges ............... 163.995 ( 1.4990 %)
Message Passage: 4D halo exchanges ............... 58.984 ( 0.5392 %)
Message Passage: lateral boundary exchanges ...... 1.265 ( 0.0116 %)
Message Passage: data broadcast .................. 34.569 ( 0.3160 %)
Message Passage: data reduction .................. 3.558 ( 0.0325 %)
Message Passage: data gathering .................. 5.227 ( 0.0478 %)
Message Passage: data scattering.................. 9.032 ( 0.0826 %)
Message Passage: point data gathering ............ 1.186 ( 0.0108 %)
Total: 670.022 6.1244
All percentages are with respect to total time = 10940.230
We need to be careful when changing ROMS profiling because of how the MPI phase is profiled. You cannot just add another number to the
wclock routines. I currently looking at this, let me know what do you want to separate and I will take a lock at it. The routine
rpcg_lanzos is called few times in 4DVAR but I think that the impact is very minimal when compared with the tangent linear and adjoint models that are iterated 100 times each in the above statistics.