Dear Users,
Has anybody ever experienced the following problem? it has happened in the sixth year of the model run . The model is configured with two, one-way nested domains. I don't know really , but I guess it can be because of the memory process and calculations carried out by the CPU. Thank you in advance for your kind help .
664885 2308 15:05:00 4.384476E-03 1.947892E+04 1.947893E+04 1.179191E+15 01
(071,100,32) 0.000000E+00 2.165410E-02 1.365106E+00 1.204906E+00
1994655 2308 15:05:00 4.166940E-03 9.709681E+03 9.709686E+03 2.807089E+13 02
(120,108,26) 1.272484E-02 1.510138E-03 7.822898E-02 8.277512E-01
1994656 2308 15:06:40 4.149592E-03 9.709604E+03 9.709608E+03 2.807119E+13 02
(120,108,26) 1.256059E-02 1.479506E-03 7.514652E-02 8.372069E-01
Blowing-up: Saving latest model state into RESTART file
WRT_RST - wrote re-start fields (Index=1,2) in record = 0000003 01
Blowing-up: Saving latest model state into RESTART file
Node # 30 CPU: 948932.114
WRT_RST - wrote re-start fields (Index=1,1) in record = 0000003 02
Elapsed CPU time (seconds):
Node # 0 CPU: 948231.009
Node # 22 CPU: 948987.058
Node # 1 CPU: 948848.587
Node # 2 CPU: 949108.055
Node # 3 CPU: 948931.968
Node # 4 CPU: 948258.722
Node # 5 CPU: 949246.523
Node # 6 CPU: 949428.057
Node # 7 CPU: 949589.333
Node # 8 CPU: 949136.765
Node # 9 CPU: 948345.301
Node # 10 CPU: 948864.655
Node # 11 CPU: 948901.377
Node # 12 CPU: 949084.637
Node # 13 CPU: 949021.165
Node # 14 CPU: 948570.448
Node # 15 CPU: 949175.940
Node # 16 CPU: 949067.197
Node # 17 CPU: 949038.260
Node # 18 CPU: 949279.973
Node # 19 CPU: 949803.201
Node # 20 CPU: 949101.272
Node # 21 CPU: 949068.425
Node # 31 CPU: 948373.140
Node # 23 CPU: 948167.371
Node # 24 CPU: 948593.581
Node # 25 CPU: 948825.095
Node # 26 CPU: 948294.588
Node # 27 CPU: 948344.970
Node # 28 CPU: 948572.975
Node # 29 CPU: 949112.105
Total: 30364303.869
Cheers,
Farshid
CPU problem for the Nesting run
Re: CPU problem for the Nesting run
I've had plenty of single-grid domains blow up like that after some years of running. It could have nothing whatever to do with the nesting. I always look at the saved restart file(s) to see where things go bad, how they go bad. I've made diag.F more verbose when things go bad to help find the trouble. Then usually I can restart with a shorter timestep and get through the troubles. If that doesn't work, that's when the real work begins.
Re: CPU problem for the Nesting run
Dear users,
I think the problem can be because of the tiling computation . This is what I have ;
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
I have 2 CPU sockets, each CPU can have, up to, 8 cores and each core can have 2 threads. Maximum thread is: 2 CPU x 8 cores x 2 threads per core = 32. Therefore the maximum thread count is 32, and maximum core count is 16.
Lm == 114 120
Mm == 114 120
NtileI == 4 4
NtileJ == 8 8
4x8 partitioning of the 114x114 grid ("Grid 01") results in MPI subdomain size of 28.5x14.25, It does not seem to be the proper amount , although it seems to be proper for the Grid 02, 30x15. Anyway, I would greatly appreciate it if you kindly give me any comments. Regards, Farshid
I think the problem can be because of the tiling computation . This is what I have ;
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
I have 2 CPU sockets, each CPU can have, up to, 8 cores and each core can have 2 threads. Maximum thread is: 2 CPU x 8 cores x 2 threads per core = 32. Therefore the maximum thread count is 32, and maximum core count is 16.
Lm == 114 120
Mm == 114 120
NtileI == 4 4
NtileJ == 8 8
4x8 partitioning of the 114x114 grid ("Grid 01") results in MPI subdomain size of 28.5x14.25, It does not seem to be the proper amount , although it seems to be proper for the Grid 02, 30x15. Anyway, I would greatly appreciate it if you kindly give me any comments. Regards, Farshid
Re: CPU problem for the Nesting run
That tiling will make things slightly inefficient, but won't cause it to blow up. A blow up from a mis-configured run will happen in timestep one, not after six years.
Re: CPU problem for the Nesting run
Dear Kate,
I've tried for the two different "timestep" as follow ;
NTIMES == 1036800 3110400
DT == 900.0d0 300.0d0
NDTFAST == 30 30
&
NTIMES == 3110400 9331200
DT == 300.0d0 100.0d0
NDTFAST == 50
Second one running too slow. What's your suggestion for having a proper timestep here?
Regards, Farshid
I've tried for the two different "timestep" as follow ;
NTIMES == 1036800 3110400
DT == 900.0d0 300.0d0
NDTFAST == 30 30
&
NTIMES == 3110400 9331200
DT == 300.0d0 100.0d0
NDTFAST == 50
Second one running too slow. What's your suggestion for having a proper timestep here?
Regards, Farshid
Re: CPU problem for the Nesting run
The timestep that worked for six years should be close. If that was the 900, 300, then try 840, 280 or 810, 270.