"missing_value" for masked regions in NetCDF output files?
"missing_value" for masked regions in NetCDF output files?
ROMS folk,
Although ROMS 3.0 NetCDF output files are "CF-compliant", there is currently no CF-standard to specify masking. As a result, these tools plot whatever values are in the masked regions, which can be ugly at best, misleading at worst.
Since CF visualization and access clients already handle missing values, could we simply write -99999.0 (or some other special value) into the values of "temp", "salt", and other dependent variables in the NetCDF output files in regions that are masked and add the "missing_value" attribute to these variables?
Would this cause any problems for ROMS? If so, I guess this would mean that the values of variables in masked regions DO matter! If not, we should add this to the ROMS wish list, as then the existing clients could correctly show masked regions without implementing new standards and features.
Thanks,
Rich
Although ROMS 3.0 NetCDF output files are "CF-compliant", there is currently no CF-standard to specify masking. As a result, these tools plot whatever values are in the masked regions, which can be ugly at best, misleading at worst.
Since CF visualization and access clients already handle missing values, could we simply write -99999.0 (or some other special value) into the values of "temp", "salt", and other dependent variables in the NetCDF output files in regions that are masked and add the "missing_value" attribute to these variables?
Would this cause any problems for ROMS? If so, I guess this would mean that the values of variables in masked regions DO matter! If not, we should add this to the ROMS wish list, as then the existing clients could correctly show masked regions without implementing new standards and features.
Thanks,
Rich
Re: "missing_value" for masked regions in NetCDF output files?
Good suggestion! The only caveat I can think of is that ROMS should do an extra multiply by mask on reading in fields. We don't want that -999 in the land salinity!
Speaking of masks, the ROMS mask has a _FillValue of 1, which causes trouble for tools expecting that to be the special value. I've had to work around it once or twice.
Speaking of masks, the ROMS mask has a _FillValue of 1, which causes trouble for tools expecting that to be the special value. I've had to work around it once or twice.
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: "missing_value" for masked regions in NetCDF output files?
I am moving away from the missing_value attribute and using the _FillValue attribute instead. Notice that the missing_value attribute:
so any application just needs to multiply by the specified masking array. We will need a logic here for 3D variables since the masking is a 2D array in the ROMS case. I wonder if this should be inside or part of the coordinates attribute. Perhaps, you can bring this issue to the standards committee.
Now, I think that over-writing the masked areas with special value is kind of dangerous. What about adding a new attribute, say mask, which points to the array used for masking:is not treated in any special way by the library or conforming generic applications.
Code: Select all
float temp(ocean_time, s_rho, eta_rho, xi_rho) ;
temp:long_name = "potential temperature" ;
temp:units = "Celsius" ;
temp:time = "ocean_time" ;
temp:coordinates = "lon_rho lat_rho s_rho ocean_time" ;
temp:mask = "mask_rho" ;
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: "missing_value" for masked regions in NetCDF output files?
There is not a _FillValue attribute in the definition of Land/Sea mask arrays of ROMS.kate wrote:Speaking of masks, the ROMS mask has a _FillValue of 1, which causes trouble for tools expecting that to be the special value. I've had to work around it once or twice.
Code: Select all
double mask_rho(eta_rho, xi_rho) ;
mask_rho:long_name = "mask on RHO-points" ;
mask_rho:option_0 = "land" ;
mask_rho:option_1 = "water" ;
mask_rho:coordinates = "lon_rho lat_rho" ;
Re: "missing_value" for masked regions in NetCDF output files?
Hernan, I agree that if we were to propose a "masking" convention to the CF standards committee, it would look exactly as you have indicated, and indeed, it's very simple. But then of course, the existing clients (like NCVIEW, the Matlab/NetCDF tools, etc) would need to implement this standard. Since these tools already handle "missing_value" and "_FillValue" (preferred, as you mention), I was just wondering why couldn't just write the "_FillValue" instead and we wouldn't have to have add a masking standard.arango wrote:I am moving away from the missing_value attribute and using the _FillValue attribute instead. Notice that the missing_value attribute:
is not treated in any special way by the library or conforming generic applications.
What about adding a new attribute, say mask, which points to the array used for masking:so any application just needs to multiply by the specified masking array. We will need a logic here for 3D variables since the masking is a 2D array in the ROMS case. I wonder if this should be inside or part of the coordinates attribute. Perhaps, you can bring this issue to the standards committee.Code: Select all
float temp(ocean_time, s_rho, eta_rho, xi_rho) ; temp:long_name = "potential temperature" ; temp:units = "Celsius" ; temp:time = "ocean_time" ; temp:coordinates = "lon_rho lat_rho s_rho ocean_time" ; temp:mask = "mask_rho" ;
I'm very curious why writing a special value in masked regions would be dangerous. Don't the values in the masked regions always get zeroed out by the mask in ROMS? If the values in the masked regions affect the solution, that would indeed seem dangerous!arango wrote: Now, I think that over-writing the masked areas with special value is kind of dangerous.
-Rich
Re: "missing_value" for masked regions in NetCDF output files?
I can't think of a single good reason not to use _FillValue. I've wondered for years why ROMS does not do this (but have been too lazy to write the wrapper). Other models, like GETM, do this, and suffer no ill effects. And it makes quick looks in ncview much nicer. Note, the values in the model would not need to be changed, just the values written to the file. In the model, the salinity on land could still be zero.
As for reading in values in other programs, most netcdf readers worth their salt obey _FillValue attributes. If not, using -999 makes it pretty clear pretty fast that that is not a real salinity or temperature....
As for reading in values in other programs, most netcdf readers worth their salt obey _FillValue attributes. If not, using -999 makes it pretty clear pretty fast that that is not a real salinity or temperature....
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: "missing_value" for masked regions in NetCDF output files?
It is just what Kate mentioned about applying the mask during reading to make sure that the fill value is removed. We need to be careful with objective analysed data in case that the masking needs to be revisited. We need to be very aware of such values during restart, adjoint-based extensive IO, and interpolation weights for coupling, nesting, and so on.rsignell wrote:I'm very curious why writing a special value in masked regions would be dangerous. Don't the values in the masked regions always get zeroed out by the mask in ROMS? If the values in the masked regions affect the solution, that would indeed seem dangerous!
I assume that implementing this will not hard. Few changes are needed in nf_fread*d.F and nf_fwrite*d.F which already passes the mask array.
There is a lot of discussion about the use of _FillValue versus missing_value attributes. Just google about this to find a lot of discussion about deprecating missing_value. As I mentioned above, I personally prefer the _FillValue attribute. The issue is that we need to define this attribute with the same type as the declared variable. Otherwise, we will get something like:
In NetCDF the fill value is set by the parameter NF90_FILL_REAL which has a value of 9.9692099683868690E+36. This is an awful number which is very difficult to remember. Of course, there are ways to change this value or select any other special value.ERROR: Abnormal termination: NetCDF OUTPUT.
REASON: NetCDF: Not a valid data type or _FillValue type mismatch
Mark Hadfield and I talked about this recently. There is not need to set the _FillValue for a variable explicitly if we use 9.9692099683868690E+36 in masked places, for instance. However, I prefer to have this value defined explicitly for completeness. We currently, use 1.0E+35 in the floats. I believe that a well coded application should inquire about the value of this attribute and process the data accordingly. That's what we do in basic matlab scripts to process NetCDF files.
- m.hadfield
- Posts: 521
- Joined: Tue Jul 01, 2003 4:12 am
- Location: NIWA
Re: "missing_value" for masked regions in NetCDF output files?
As Hernan says, he and I had a discussion a week ago about what fill value to set (or whether to use the netCDF library's default) and how to communicate this info to downstream applications. This was relating to float trajectories. See
https://www.myroms.org/projects/src/ticket/217
and revision 243. However the tricky bit was related to a bug in the netCDF library on the Cray T3E, namely that it can't write 4-byte real values into attributes. This is a bug on a specific, little-used platform, probably fixable with a bit of effort, and I wouldn't like to see it be an impediment to making ROMS behave sensibly.
I argued that the sensible fill value to use is the netCDF default, which as Hernan says is 9.9692099683868690E+36. Yes, this is an awkward-looking number (in decimal notation anyway), but netCDF-aware applications should never have to deal with it in that form: they should all know it as NF90_FILL_REAL. In a CDL file it is represented with a hyphen.
I also argued that, if you use this fill value, then you don't need to set a _FillValue attribute for each variable. However on this issue I have come around to Hernan's point of view: if you're going to use the fill value to represent missing data, then it's wise always to set a _FillValue attribute for the variable in question, as some downstream applications may expect it. (I believe this is true of the NCO utilities.)
Anyway, these are technicalities. I concur with Rob & others that fill values are the way to go.
https://www.myroms.org/projects/src/ticket/217
and revision 243. However the tricky bit was related to a bug in the netCDF library on the Cray T3E, namely that it can't write 4-byte real values into attributes. This is a bug on a specific, little-used platform, probably fixable with a bit of effort, and I wouldn't like to see it be an impediment to making ROMS behave sensibly.
I argued that the sensible fill value to use is the netCDF default, which as Hernan says is 9.9692099683868690E+36. Yes, this is an awkward-looking number (in decimal notation anyway), but netCDF-aware applications should never have to deal with it in that form: they should all know it as NF90_FILL_REAL. In a CDL file it is represented with a hyphen.
I also argued that, if you use this fill value, then you don't need to set a _FillValue attribute for each variable. However on this issue I have come around to Hernan's point of view: if you're going to use the fill value to represent missing data, then it's wise always to set a _FillValue attribute for the variable in question, as some downstream applications may expect it. (I believe this is true of the NCO utilities.)
Anyway, these are technicalities. I concur with Rob & others that fill values are the way to go.
Re: "missing_value" for masked regions in NetCDF output files?
Hernan,
I totally agree with you that we should use "_FillValue" instead of "missing_value" and that we should specify this explicitly. For floats and doubles, I guess the value of 1.0e35 is fine. But for the rest, I suggest we go with the default Unidata values, but also specify them explicitly as attributes in the NetCDF files:
Finally, as you say, we need to pay attention and specify the "_FillValue" attribute value is the same type as the variable it's associated with.
I totally agree with you that we should use "_FillValue" instead of "missing_value" and that we should specify this explicitly. For floats and doubles, I guess the value of 1.0e35 is fine. But for the rest, I suggest we go with the default Unidata values, but also specify them explicitly as attributes in the NetCDF files:
Code: Select all
parameter (nf_fill_byte = -127)
parameter (nf_fill_int1 = nf_fill_byte)
parameter (nf_fill_char = 0)
parameter (nf_fill_short = -32767)
parameter (nf_fill_int2 = nf_fill_short)
parameter (nf_fill_int = -2147483647)
parameter (nf_fill_float = 9.9692099683868690e+36)
parameter (nf_fill_real = nf_fill_float)
parameter (nf_fill_double = 9.9692099683868690e+36)
parameter (nf_fill_ubyte = 255)
parameter (nf_fill_ushort = 65535)
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: "missing_value" for masked regions in NetCDF output files?
I implemented this request. See track ticket 222. The change is generic and uses parameter spval = 1.0E+35 which is declared in mod_scalars.F. The default NetCDF value nf_fill_double = 9.9692099683868690e+36 is not fully written in the CDL file produced by ncdump. This kind of annoys me. Anyway, this declaration is very easy to change to any other value.
- m.hadfield
- Posts: 521
- Joined: Tue Jul 01, 2003 4:12 am
- Location: NIWA
Re: "missing_value" for masked regions in NetCDF output files?
I'd like to request one change: set spval to 1.0E37. Unlike the present value, 1.0E35, this is greater than NF90_FILL_REAL, so, even if a variable does not have an explicit _FillValue attribute, a well-behaved netCDF application will recognise spval as being outside the valid range, according to the conventions described here:
http://www.unidata.ucar.edu/software/ne ... onventions
Specifically:
http://www.unidata.ucar.edu/software/ne ... onventions
Specifically:
PS: I believe the value of 1.E35 comes from NCAR Graphics.If neither valid_min, valid_max nor valid_range is defined then generic applications should define a valid range as follows. If the data type is byte and _FillValue is not explicitly defined, then the valid range should include all possible values. Otherwise, the valid range should exclude the _FillValue (whether defined explicitly or by default) as follows. If the _FillValue is positive then it defines a valid maximum, otherwise it defines a valid minimum. For integer types, there should be a difference of 1 between the _FillValue and this valid minimum or maximum. For floating point types, the difference should be twice the minimum possible (1 in the least significant bit) to allow for rounding error.
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: "missing_value" for masked regions in NetCDF output files?
Yes, good idea. Done. I will change ROMS plotting package tomorrow since it uses the NCAR library.
Re: "missing_value" for masked regions in NetCDF output files?
Hernan,
I did an SVN update to grab the latest ROMS (r247), ran RIVERPLUME2, and then fired up NCVIEW on ocean_his.nc.
Yes!!!! The land is masked:
I tried it with "nc_varget" in Matlab, and works great there too.
No more multiplying by the mask every time you want to make a plot in Matlab!
Fabulous!
Thanks,
-Rich
I did an SVN update to grab the latest ROMS (r247), ran RIVERPLUME2, and then fired up NCVIEW on ocean_his.nc.
Yes!!!! The land is masked:
I tried it with "nc_varget" in Matlab, and works great there too.
No more multiplying by the mask every time you want to make a plot in Matlab!
Fabulous!
Thanks,
-Rich
- arango
- Site Admin
- Posts: 1367
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: "missing_value" for masked regions in NetCDF output files?
Great. I did screw-up the grid variables (f, h, lon, lat, pm, pn, etc). I knew about this but I forgot to put the conditional to avoid overwriting such variables. I just needed to process variables with tindex>0. Please upgrate again. See track ticket 223.
Re: "missing_value" for masked regions in NetCDF output files?
Rich-
A lot of my matlab scripts read in variables such as:
nc=netcdf('ocean_his.nc');
zeta=nc{'zeta'}(:);
When i do a pcolor, i now get large values (because of the new fill values).
Is there a way to have those fill values set to something else when i read
the variable in, or do i now always have to get the mask and multiply by
masking?
A lot of my matlab scripts read in variables such as:
nc=netcdf('ocean_his.nc');
zeta=nc{'zeta'}(:);
When i do a pcolor, i now get large values (because of the new fill values).
Is there a way to have those fill values set to something else when i read
the variable in, or do i now always have to get the mask and multiply by
masking?
Re: "missing_value" for masked regions in NetCDF output files?
John,
Most Matlab/netcdf routines default to handling missing values via
"_fillValue" and scaling via "add_offset"/"scale_factor", but not the
NetCDF Toolbox. You have to turn them on. So in my Matlab
startup.m, I have these lines:
-Rich
Most Matlab/netcdf routines default to handling missing values via
"_fillValue" and scaling via "add_offset"/"scale_factor", but not the
NetCDF Toolbox. You have to turn them on. So in my Matlab
startup.m, I have these lines:
Code: Select all
% Netcdf Toolbox global options (turn autoscale & autonan on)
global nctbx_options
nctbx_options.theAutoNaN=1;
nctbx_options.theAutoscale=1;
Re: "missing_value" for masked regions in NetCDF output files?
Having coherent filling values is good, but for us the real problem of the mask is the excessive use of disk space. The READ_WATER, WRITE_WATER options of ROMS allow to write only the sea points and are certainly nice.
However, Hernan indicated me that the resulting files are not CF-compliant. What would be a longer term solution?
However, Hernan indicated me that the resulting files are not CF-compliant. What would be a longer term solution?