problem with perfect restart
problem with perfect restart
Hi all,
I found possible bug with perfect restart option in ROMS.
without perfect restart option, we can write restart nc file
continusouly(say up to 5th or 10th).
But with defining perfect restart option, ROMS cannot write more than 3 step in rst nc file.
The error message is
'WRT_RST - error while writing variable Hsbl into restart Netcdf file for time record : 4'
I'm using svn 453.
Any solution with this error?
Regards,
-Peter
I found possible bug with perfect restart option in ROMS.
without perfect restart option, we can write restart nc file
continusouly(say up to 5th or 10th).
But with defining perfect restart option, ROMS cannot write more than 3 step in rst nc file.
The error message is
'WRT_RST - error while writing variable Hsbl into restart Netcdf file for time record : 4'
I'm using svn 453.
Any solution with this error?
Regards,
-Peter
Joonho Lee
Re: problem with perfect restart
It sounds like you're running into a file size limit. PERFECT_RESTART makes the restart files bigger because you need to save more of the state of the model. Are your files 2 GB in size when you get into trouble?
Re: problem with perfect restart
Hi, Kate. Thanks for the reply.
My rst file size is only 504mb when it stops.
I just came to know that it's concerned with vertical mixing scheme.
When I ran the model with LMD and PERFECT_RESTART activated, it couldn't save more than 3 step.
However, when I define different vertical mixing scheme(say gen) and PERFECT_RESTART, it went through.
Any suggestion?
Best,
-Peter
My rst file size is only 504mb when it stops.
I just came to know that it's concerned with vertical mixing scheme.
When I ran the model with LMD and PERFECT_RESTART activated, it couldn't save more than 3 step.
However, when I define different vertical mixing scheme(say gen) and PERFECT_RESTART, it went through.
Any suggestion?
Best,
-Peter
Joonho Lee
Re: problem with perfect restart
hi susonic and kate
i got a same return from the run.yes,without the PERFECT_RESTART option, run can
go on wheels, but with it turn on, run blowup within some steps.
till now, even no one idea rocks me.seeing susonic reported it to all,i deem it's
not a problem to me only.
[quote="susonic"]Hi all,
The error message is
'WRT_RST - error while writing variable Hsbl into restart Netcdf file for time record : 4'
the only difference to susonic is the variable in my err display is "wetdry_mask_rho".i think
there's also err with my "mask",but i check it over and oevr again, it seems ordinary!
so,how can i get rid of them?
ths in advance and any suggestion is appreciate!
i got a same return from the run.yes,without the PERFECT_RESTART option, run can
go on wheels, but with it turn on, run blowup within some steps.
till now, even no one idea rocks me.seeing susonic reported it to all,i deem it's
not a problem to me only.
[quote="susonic"]Hi all,
The error message is
'WRT_RST - error while writing variable Hsbl into restart Netcdf file for time record : 4'
the only difference to susonic is the variable in my err display is "wetdry_mask_rho".i think
there's also err with my "mask",but i check it over and oevr again, it seems ordinary!
so,how can i get rid of them?
ths in advance and any suggestion is appreciate!
-
- Posts: 15
- Joined: Tue May 06, 2008 8:46 pm
- Location: FURG
Re: problem with perfect restart
Hello,
I think that have found this bug.
is all about LMD_SKPP or LMD_BKPP defined together with PERFECT RESTART.
If the user set LMD_MIXING + LMD_SKPP || LMD_BKPP + PERFECT_RESTART + CycleRST == F a error happens when writing the 4 timestamp output of rst file:
If only LMD_BKPP :
WRT_RST - error while writing variable: Hbbl
into restart NetCDF file for time record: 4
only LMD_SKPP or both:
WRT_RST - error while writing variable: Hsbl
into restart NetCDF file for time record: 4
This can be easy reproduced running the wc13 test case, changing the GLS_MIXING by LMD + SKPP or BKPP ( and cyclerst=false). Without the SKPP & BKPP the model write everything fine and pass the 4 step. So the bug seems to be about the dimensions of the variables when writing/define the restart file...
Looking inside def_rst.F:
So i think that this fields r expected to be the like free-surface (shape)...the input to def_var should be:
the variable nvd4 points to 4 (time,3,xi,eta) dimensions. But , when compiling and running the model again, the same error occurs with nans inside the variables ( ocean_time++), maybe by some miss-shape around this variables.
So, this behaviour raised a doubt about the writing of this variables being wrong in the default code. In the default code, when writing 2 timestamps in the rst file, the first timestamp write all variables right (zeta,u,v,temp,etc) and the Hsbl & Hbbl with only 3 dimensions (three,eta,xi),without the ocean_time unlimited dimension. So after the next timestep, u know what occurs, all the fields are writed to the next ocean_time dimension and Hsbl & Hbbl are writed to the second dimension ( not ocean_time), until the 4, when the code complains about writing to a wrong size variable.
So the writing of these variables are wrong. Probably is the shape of this variables, or a missing redirect to a new shape to store the 3 timesteps (dimension "three"), etc. Maybe, the wet_dry case is similar.
Actually, it's seems that the variables Hsbl & Hbbl inside the restart file are not used to restart the model. if the user save only 1 step in the restart file with the fields, remove them , and restart the model with this file, no error is raised about the missing variables
So i think , for now, it's safe to remove the write of this variables in the restart file to permit runs with cyclerst=false.
Sorry about the long text ...
I think that have found this bug.
is all about LMD_SKPP or LMD_BKPP defined together with PERFECT RESTART.
If the user set LMD_MIXING + LMD_SKPP || LMD_BKPP + PERFECT_RESTART + CycleRST == F a error happens when writing the 4 timestamp output of rst file:
If only LMD_BKPP :
WRT_RST - error while writing variable: Hbbl
into restart NetCDF file for time record: 4
only LMD_SKPP or both:
WRT_RST - error while writing variable: Hsbl
into restart NetCDF file for time record: 4
This can be easy reproduced running the wc13 test case, changing the GLS_MIXING by LMD + SKPP or BKPP ( and cyclerst=false). Without the SKPP & BKPP the model write everything fine and pass the 4 step. So the bug seems to be about the dimensions of the variables when writing/define the restart file...
Looking inside def_rst.F:
Code: Select all
status=def_var(ng, iNLM, RST(ng)%ncid, RST(ng)%Vid(idHsbl), &
& NF_FRST, nvd3, t2dgrd, Aval, Vinfo, ncname)
status=def_var(ng, iNLM, RST(ng)%ncid, RST(ng)%Vid(idHbbl), &
& NF_FRST, nvd3, t2dgrd, Aval, Vinfo, ncname)
Code: Select all
status=def_var(ng, iNLM, RST(ng)%ncid, RST(ng)%Vid(idHsbl), &
& NF_FRST, nvd4, t2dgrd, Aval, Vinfo, ncname
status=def_var(ng, iNLM, RST(ng)%ncid, RST(ng)%Vid(idHbbl), &
& NF_FRST, nvd4, t2dgrd, Aval, Vinfo, ncname)
So, this behaviour raised a doubt about the writing of this variables being wrong in the default code. In the default code, when writing 2 timestamps in the rst file, the first timestamp write all variables right (zeta,u,v,temp,etc) and the Hsbl & Hbbl with only 3 dimensions (three,eta,xi),without the ocean_time unlimited dimension. So after the next timestep, u know what occurs, all the fields are writed to the next ocean_time dimension and Hsbl & Hbbl are writed to the second dimension ( not ocean_time), until the 4, when the code complains about writing to a wrong size variable.
So the writing of these variables are wrong. Probably is the shape of this variables, or a missing redirect to a new shape to store the 3 timesteps (dimension "three"), etc. Maybe, the wet_dry case is similar.
Actually, it's seems that the variables Hsbl & Hbbl inside the restart file are not used to restart the model. if the user save only 1 step in the restart file with the fields, remove them , and restart the model with this file, no error is raised about the missing variables
So i think , for now, it's safe to remove the write of this variables in the restart file to permit runs with cyclerst=false.
Sorry about the long text ...
-
- Posts: 128
- Joined: Tue Feb 01, 2005 8:21 pm
- Location: Istanbul Technical University (ITU)
- Contact:
Re: problem with perfect restart
Hi,
I have same problem like,
WRT_RST - error while writing variable: wetdry_mask_rho
into restart NetCDF file for time record: 4
In my configuration following CPP options are defined.
and also i defined to write restart file each day with following options,
Is there any solution for this problem? The model can only write three time step into the restart file and it is not a file size issue (size is around 180 MB). May be it is solved by removing PERFECT_RESTART but i want to used it to be consistent with my previous runs.
Regards,
--ufuk
I have same problem like,
WRT_RST - error while writing variable: wetdry_mask_rho
into restart NetCDF file for time record: 4
In my configuration following CPP options are defined.
Code: Select all
....
#define PERFECT_RESTART
#define WET_DRY
....
Code: Select all
LcycleRST == F
NRST == 288
Regards,
--ufuk
Re: problem with perfect restart
Things to try:
1. Set LcycleRST to T
2. Find out what's really going wrong. What is the netcdf error code when the failure happens? ROMS can be modified to print this out on error. In mod_netcdf.F, I added to each block inside:
1. Set LcycleRST to T
2. Find out what's really going wrong. What is the netcdf error code when the failure happens? ROMS can be modified to print this out on error. In mod_netcdf.F, I added
Code: Select all
PRINT *, trim(nf90_strerror(status))
Code: Select all
IF (status.ne.nf90_noerr) THEN
-
- Posts: 128
- Joined: Tue Feb 01, 2005 8:21 pm
- Location: Istanbul Technical University (ITU)
- Contact:
Re: problem with perfect restart
Hi Kate,
I add the print statement inside the if statement. The code complaining about "index" like following,
I think that i found the problem in the ROMS/Utility/def_rst.F file there is a definition like,
So, if i set the LcycleRST as true then it will create file with unlimited time dimension but in my case i want to set it true. What do you think? BTW, this is ice branch.
Regards,
--ufuk
I add the print statement inside the if statement. The code complaining about "index" like following,
Code: Select all
NetCDF: Index exceeds dimension bound
WRT_RST - error while writing variable: wetdry_mask_rho
into restart NetCDF file for time record: 4
Code: Select all
!
! Set unlimited time record dimension to current value.
!
IF (LcycleRST(ng)) THEN
RST(ng)%Rindex=0
ELSE
RST(ng)%Rindex=rec_size
END IF
Regards,
--ufuk
Re: problem with perfect restart
Thanks for pointing out that this is the ice branch, but of course we want to know if the trunk also has the same issue.
The time dimension should be unlimited no matter how you set LcycleRST. I take it you meant that you want LcycleRST to be false?
I tried a simple 2-D problem with LcycleRST set to false and WET_DRY #defined. I got up to 9 restart records before I killed it. The only weirdness I experienced is that zeta has values inside the land mask. I don't know what to suggest.
The time dimension should be unlimited no matter how you set LcycleRST. I take it you meant that you want LcycleRST to be false?
I tried a simple 2-D problem with LcycleRST set to false and WET_DRY #defined. I got up to 9 restart records before I killed it. The only weirdness I experienced is that zeta has values inside the land mask. I don't know what to suggest.
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: problem with perfect restart
I doubt that it is possible to have a perfect restart with wetting and drying with the current design. In wetting and drying, the free-surface and the land/sea mask is changing at every time-step! So we would need the land/sea masking for three-consecutive time-steps averaged over all barotropic time-steps... We cannot reproduce the changes in land/sea masking for every barotropic time-steps. We would be able to get a restart but it won't be a perfect restart. But I wander if that really matters...
Re: problem with perfect restart
I tried again with PERFECT_RESTART and I can reproduce the problem. In this case, we have:
Note that the dimensionality of the wetdry_masks have a "three" in them, so fail on the fourth. This should be "ocean_time". The problem comes about in def_rst.F, in which t2dgrd is correct for zeta, but is used for both zeta and the mask.
Code: Select all
float wetdry_mask_rho(three, eta_rho, xi_rho) ;
wetdry_mask_rho:long_name = "wet/dry mask on RHO-points" ;
wetdry_mask_rho:flag_values = 0.f, 1.f ;
wetdry_mask_rho:flag_meanings = "land water" ;
wetdry_mask_rho:time = "ocean_time" ;
wetdry_mask_rho:coordinates = "lon_rho lat_rho ocean_time" ;
wetdry_mask_rho:field = "wetdry_mask_rho, scalar, series" ;
double zeta(ocean_time, three, eta_rho, xi_rho) ;
zeta:long_name = "free-surface" ;
zeta:units = "meter" ;
zeta:time = "ocean_time" ;
zeta:coordinates = "lon_rho lat_rho ocean_time" ;
zeta:field = "free-surface, scalar, series" ;
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: problem with perfect restart
OK, thank you for looking at it. I will add it to several corrections that I am currently doing and will update repository early next week.
Re: problem with perfect restart
Hi, all
I know that it isn't new subject in the forum, but I can't understand is it real to use correctly PERFECT_RESTART with WET_DRY options now. In my case (changeset_624) restart file include fields WETDRY_MASK_RHO = 1 in all points for both records (I use LcycleRST == T), WETDRY_MASK_U = 1, WETDRY_MASK_V = 1 for first and second layers and WETDRY_MASK_U = NOVALUE, WETDRY_MASK_V = NOVALUE for third layer. But, in WETDRY_MASK_RHO for the writing time of the restart file were points with 0 values. In the case of running model from this restart file for the first moment in the points where should be WETDRY_MASK_RHO == 0 (but in the restart file WETDRY_MASK_RHO = 1) the temperature start with zero value. If I can use PERFECT_RESTART with WET_DRY options how to do it?
Thanks in advance
Boris
I know that it isn't new subject in the forum, but I can't understand is it real to use correctly PERFECT_RESTART with WET_DRY options now. In my case (changeset_624) restart file include fields WETDRY_MASK_RHO = 1 in all points for both records (I use LcycleRST == T), WETDRY_MASK_U = 1, WETDRY_MASK_V = 1 for first and second layers and WETDRY_MASK_U = NOVALUE, WETDRY_MASK_V = NOVALUE for third layer. But, in WETDRY_MASK_RHO for the writing time of the restart file were points with 0 values. In the case of running model from this restart file for the first moment in the points where should be WETDRY_MASK_RHO == 0 (but in the restart file WETDRY_MASK_RHO = 1) the temperature start with zero value. If I can use PERFECT_RESTART with WET_DRY options how to do it?
Thanks in advance
Boris
Re: problem with perfect restart
I recently posted a bug fix to Trac for LMD plus perfect_restart. It required getting the right indices for Hsbl and Hbbl in def_rst, also storing Ghats. Not to mention reading in all of the above and using the stored values instead of computing new ones on the first step.