Iris ❤️ Xarray#
There is a lot of overlap between Iris and Xarray, but some important differences too. Below is a summary of the most important differences, so that you can be prepared, and to help you choose the best package for your use case.
Iris is the more specialised package, focused on making it as easy as possible to work with meteorological and climatological data. Iris is built to natively handle many key concepts, such as the CF conventions, coordinate systems and bounded coordinates. Iris offers a smaller toolkit of operations compared to Xarray, particularly around API for sophisticated computation such as array manipulation and multi-processing.
Xarray’s more generic data model and community-driven development give it a richer range of operations and broader possible uses. Using Xarray specifically for meteorology/climatology may require deeper knowledge compared to using Iris, and you may prefer to add Xarray plugins such as cf-xarray to get the best experience. Advanced users can likely achieve better performance with Xarray than with Iris.
There are multiple ways to convert between Iris and Xarray objects.
Xarray includes the
from_iris()methods - detailed in the Xarray IO notes on Iris. Since Iris evolves independently of Xarray, be vigilant for concepts that may be lost during the conversion.
Because both packages are closely linked to the NetCDF Format, it is feasible to save a NetCDF file using one package then load that file using the other package. This will be lossy in places, as both Iris and Xarray are opinionated on how certain NetCDF concepts relate to their data models.
The Iris development team are exploring an improved ‘bridge’ between the two packages. Follow the conversation on GitHub: iris#4994. This project is expressly intended to be as lossless as possible.
Iris and Xarray offer a range of regridding methods - both natively and via additional packages such as iris-esmf-regrid and xESMF - which overlap in places but tend to cover a different set of use cases (e.g. Iris handles unstructured meshes but offers access to fewer ESMF methods). The behaviour of these regridders also differs slightly (even between different regridders attached to the same package) so the appropriate package to use depends highly on the particulars of the use case.
Xarray and Iris have a large overlap of functionality when creating Matplotlib plots and both support the plotting of multidimensional coordinates. This means the experience is largely similar using either package.
Xarray supports further plotting backends through external packages (e.g. Bokeh through hvPlot) and, if a user is already familiar with pandas, the interface should be familiar. It also supports some different plot types to Iris, and therefore can be used for a wider variety of plots. It also has benefits regarding “out of the box”, quick customisations to plots. However, if further customisation is required, knowledge of matplotlib is still required.
In both cases, Cartopy is/can be used. Iris does more work
automatically for the user here, creating Cartopy
GeoAxes for latitude and longitude coordinates,
whereas the user has to do this manually in Xarray.
Both libraries are quite comparable with generally similar capabilities, performance and laziness. Iris offers more specificity in some cases, such as some more specific unique functions and masked tolerance in most statistics. Xarray seems more approachable however, with some less unique but more convenient solutions (these tend to be wrappers to Dask functions).
Laziness and Multi-Processing with Dask#
Iris and Xarray both support lazy data and out-of-core processing through utilisation of Dask.
While both Iris and Xarray expose NumPy conveniences at the API level
(e.g. the ndim() method), only Xarray exposes Dask conveniences. For example
xarray.DataArray.chunks, which gives the user direct control
over the underlying Dask array chunks. The Iris API instead takes control of
such concepts and user control is only possible by manipulating the underlying
Dask array directly (accessed via
xarray.DataArrays comply with NEP-18, allowing NumPy arrays to be
based on them, and they also include the necessary extra members for Dask
arrays to be based on them too. Neither of these is currently possible with
Cubes, although an ambition for the future.
NetCDF File Control#
(More info: NetCDF Format)
Unlike Iris, Xarray generally provides full control of major file structures, i.e. dimensions + variables, including their order in the file. It mostly respects these in a file input, and can reproduce them on output. However, attribute handling is not so complete: like Iris, it interprets and modifies some recognised aspects, and can add some extra attributes not in the input.
Handling of dates and fill values have some special problems here.
Ultimately, nearly everything wanted in a particular desired result file can be achieved in Xarray, via provided override mechanisms (loading keywords and the ‘encoding’ dictionaries).
numpy.nan to represent missing values and this will support
many simple use cases assuming the data are floats. Iris enables more
sophisticated missing data handling by representing missing values as masks
numpy.ma.MaskedArray for real data and
for lazy data) which allows data to be any data type and to include either/both
a mask and
Iris has a data model entirely based on CF Conventions. Xarray has a data model based on NetCDF Format with cf-xarray acting as translation into CF. Xarray/cf-xarray methods can be called and data accessed with CF like arguments (e.g. axis, standard name) and there are some CF specific utilities (similar to Iris utilities). Iris tends to cover more of and be stricter about CF.