Iris ❤️ Xarray #

There is a lot of overlap between Iris and Xarray, but some important differences too. Below is a summary of the most important differences, so that you can be prepared, and to help you choose the best package for your use case. See Package Phrasebook for a broad comparison of terminology.

Overall Experience#

Iris is the more specialised package, focused on making it as easy as possible to work with meteorological and climatological data. Iris is built to natively handle many key concepts, such as the CF conventions, coordinate systems and bounded coordinates. Iris offers a smaller toolkit of operations compared to Xarray, particularly around API for sophisticated computation such as array manipulation and multi-processing.

Xarray’s more generic data model and community-driven development give it a richer range of operations and broader possible uses. Using Xarray specifically for meteorology/climatology may require deeper knowledge compared to using Iris, and you may prefer to add Xarray plugins such as cf-xarray to get the best experience. Advanced users can likely achieve better performance with Xarray than with Iris.

Conversion#

There are multiple ways to convert between Iris and Xarray objects.

Xarray includes the to_iris() and from_iris() methods - detailed in the Xarray IO notes on Iris. Since Iris evolves independently of Xarray, be vigilant for concepts that may be lost during the conversion.
Because both packages are closely linked to the NetCDF Format, it is feasible to save a NetCDF file using one package then load that file using the other package. This will be lossy in places, as both Iris and Xarray are opinionated on how certain NetCDF concepts relate to their data models.
ncdata is a package which the Iris development team have developed to manage netcdf data, which can act as an improved ‘bridge’ between Iris and Xarray :

Ncdata can convert Iris cubes to an Xarray dataset, or vice versa, with minimal overhead and as lossless as possible.

For example :

from ncdata.iris_xarray import cubes_from_xarray, cubes_to_xarray
cubes = cubes_from_xarray(dataset)
xrds = cubes_to_xarray(cubes)

Ncdata avoids the feature limitations previously mentioned regarding Xarray’s to_iris() and from_iris(), because it doesn’t replicate any logic of either Xarray or Iris. Instead, it uses the netcdf file interfaces of both to exchange data “as if” via a netcdf file. So, these conversions behave just like exchanging data via a file, but are far more efficient because they can transfer data without copying arrays or fetching lazy data.

Regridding#

Iris and Xarray offer a range of regridding methods - both natively and via additional packages such as iris-esmf-regrid and xESMF - which overlap in places but tend to cover a different set of use cases (e.g. Iris handles unstructured meshes but offers access to fewer ESMF methods). The behaviour of these regridders also differs slightly (even between different regridders attached to the same package) so the appropriate package to use depends highly on the particulars of the use case.

Plotting#

Xarray and Iris have a large overlap of functionality when creating Matplotlib plots and both support the plotting of multidimensional coordinates. This means the experience is largely similar using either package.

Xarray supports further plotting backends through external packages (e.g. Bokeh through hvPlot) and, if a user is already familiar with pandas, the interface should be familiar. It also supports some different plot types to Iris, and therefore can be used for a wider variety of plots. It also has benefits regarding “out of the box”, quick customisations to plots. However, if further customisation is required, knowledge of matplotlib is still required.

In both cases, Cartopy is/can be used. Iris does more work automatically for the user here, creating Cartopy GeoAxes for latitude and longitude coordinates, whereas the user has to do this manually in Xarray.

Statistics#

Both libraries are quite comparable with generally similar capabilities, performance and laziness. Iris offers more specificity in some cases, such as some more specific unique functions and masked tolerance in most statistics. Xarray seems more approachable however, with some less unique but more convenient solutions (these tend to be wrappers to Dask functions).

Laziness and Multi-Processing with Dask #

Iris and Xarray both support lazy data and out-of-core processing through utilisation of Dask.

While both Iris and Xarray expose NumPy conveniences at the API level (e.g. the ndim() method), only Xarray exposes Dask conveniences. For example xarray.DataArray.chunks, which gives the user direct control over the underlying Dask array chunks. The Iris API instead takes control of such concepts and user control is only possible by manipulating the underlying Dask array directly (accessed via iris.cube.Cube.core_data()).

xarray.DataArrays comply with NEP-18, allowing NumPy arrays to be based on them, and they also include the necessary extra members for Dask arrays to be based on them too. Neither of these is currently possible with Iris Cubes, although an ambition for the future.

NetCDF File Control#

(More info: NetCDF I/O Handling in Iris)

Unlike Iris, Xarray generally provides full control of major file structures, i.e. dimensions + variables, including their order in the file. It mostly respects these in a file input, and can reproduce them on output. However, attribute handling is not so complete: like Iris, it interprets and modifies some recognised aspects, and can add some extra attributes not in the input.

Whereas Iris is primarily designed to handle netCDF data encoded according to CF Conventions , this is not so important to Xarray, which therefore may make it harder to correctly manage this type of data. While Xarray CF support is not complete, it may improve, and obviously cf-xarray may be relevant here. There is also relevant documentation at this page.

In some particular aspects, CF data is not loaded well (or at all), and in many cases output is not fully CF compliant (as-per the cf checker).

xarray has it’s own interpretation of coordinates, which is different from the CF-based approach in Iris, and means that the use of the “coordinates” attribute in output is often not CF compliant.
dates are converted to datetime-like objects internally. There are special features providing support for non-standard calendars, however date units may not always be saved correctly.
CF-style coordinate bounds variables are not fully understood. The CF approach where bounds variables do not usually define their units or standard_names can cause problems. Certain files containing bounds variables with more than 2 bounds (e.g. unstructured data) may not load at all.
missing points are always represented as NaNs, as-per Pandas usage. (See Missing Data ). This means that fill values are not preserved, and that masked integer data is converted to floats. The netCDF default fill-values are not supported, so that variables with no “_FillValue” attribute will have missing points equal to the fill-value in place of NaNs. By default, output variables generally have _FillValue = NaN.

Ultimately, however, nearly everything wanted in a particular desired result file can be achieved in Xarray, via provided override mechanisms (loading keywords and the ‘encoding’ dictionaries).

Missing Data#

Xarray uses numpy.nan to represent missing values and this will support many simple use cases assuming the data are floats. Iris enables more sophisticated missing data handling by representing missing values as masks (numpy.ma.MaskedArray for real data and dask.array.Array for lazy data) which allows data to be any data type and to include either/both a mask and nans.

cf-xarray #

Iris has a data model entirely based on CF Conventions. Xarray has a data model based on NetCDF Format with cf-xarray acting as translation into CF. Xarray/cf-xarray methods can be called and data accessed with CF like arguments (e.g. axis, standard name) and there are some CF specific utilities (similar to Iris utilities). Iris tends to cover more of and be stricter about CF.

Iris ❤️ Xarray#

Overall Experience#

Conversion#

Regridding#

Plotting#

Statistics#

Laziness and Multi-Processing with Dask#

NetCDF File Control#

Missing Data#

cf-xarray#

Iris ❤️ Xarray #

Laziness and Multi-Processing with Dask #

cf-xarray #