Missing Data Handling in Iris#

This document provides a brief overview of how Iris handles missing data values when datasets are loaded as cubes, and when cubes are saved or modified.

A missing data value, or fill-value, defines the value used within a dataset to indicate that data point is missing or not set. This value is included as part of a dataset’s metadata.

For example, in a gridded global ocean dataset, no data values will be recorded over land, so land points will be missing data. In such a case, land points could be indicated by being set to the dataset’s missing data value.

Loading#

On load, any fill-value or missing data value defined in the loaded dataset should be used as the fill_value of the NumPy masked array data attribute of the Cube. This will only appear when the cube’s data is realised.

Saving#

On save, the fill-value of a cube’s masked data array is not used in saving data. Instead, Iris always uses the default fill-value for the fileformat, except when a fill-value is specified by the user via a fileformat-specific saver.

For example:

>>> iris.save(my_cube, 'my_file.nc', fill_value=-99999)

Note

Not all savers accept the fill_value keyword argument.

Iris will check for and issue warnings of fill-value ‘collisions’ (exception: NetCDF, see the heading below). This basically means that whenever there are unmasked values that would read back as masked, we issue a warning and suggest a workaround.

This will occur in the following cases:

  • where masked data contains unmasked points matching the fill-value, or

  • where unmasked data contains the fill-value (either the format-specific default fill-value, or a fill-value specified by the user in the save call).

NetCDF#

NetCDF Format

NetCDF is a special case, because all ordinary variable data is “potentially masked”, owing to the use of default fill values. The default fill-value used depends on the type of the variable data.

The exceptions to this are:

  • One-byte values are not masked unless the variable has an explicit _FillValue attribute. That is, there is no default fill-value for byte types in NetCDF.

  • Data may be tagged with a _NoFill attribute. This is not currently officially documented or widely implemented.

  • Small integers create problems by not having the exemption applied to byte data. Thus, in principle, int32 data cannot use the full range of 2**16 valid values.

Warnings are not issued for NetCDF fill value collisions. Increasingly large and complex parallel I/O operations unfortunately made this feature un-maintainable and it was retired in Iris 3.9 (PR #5833).

If you need to know about collisions then you can perform your own checks ahead of saving. Such operations can be run lazily (Lazy Data). Here is an example:

>>> default_fill = netCDF4.default_fillvals[my_cube.dtype.str[1:]]
>>> fill_present = (my_cube.lazy_data() == default_fill).any().compute()

Merging#

Merged data should have a fill-value equal to that of the components, if they all have the same fill-value. If the components have differing fill-values, a default fill-value will be used instead.

Other Operations#

Other operations, such as Cube arithmetic operations, generally produce output with a default (NumPy) fill-value. That is, these operations ignore the fill-values of the input(s) to the operation.