Which Regridder to Use#
This section compares all the regridding schemes which exist in Iris, and externally in iris-esmf-regrid with a view to helping you to choose the right regridder for your workflow. The choice of regridder is usually limited by the kind of data you are going from and to, but there are also factors of performance and numerical accuracy to consider. This section provides a reference for how each of the regridders differ with respect to these factors, beginning with a set of short tables going into their differences in brief and ending in a more in depth look at how these differences might play out in different contexts.
For an introduction on using regridders, see the user guide.
Regridder Comparison#
We will highlight here some of the properties of each regridder in a table of the following form:
API |
Link to API documentation. |
Method |
The type of algorithm used to calculate the result. See section on comparing methods. |
Source Grid |
The type of coordinates required on the |
Target Grid |
The type of coordinates required on the |
Coordinate System |
The type of coordinate system required on the
|
Lazy Regridding |
If the result is calculated lazily. See real and lazy data. |
Weights Caching |
|
Notes |
Additional details. |
AreaWeighted#
API |
|
Method |
Conservative |
Source Grid |
Pair of 1D lat/lon coordinates, must have bounds. |
Target Grid |
Pair of 1D lat/lon coordinates, must have bounds. |
Coordinate System |
Must be equal on |
Lazy Regridding |
|
Weights Caching |
|
Notes |
Supports masked data with |
Linear#
API |
|
Method |
Linear |
Source Grid |
Pair of 1D lat/lon coordinates. |
Target Grid |
Pair of 1D lat/lon coordinates. |
Coordinate System |
May be present on both |
Lazy Regridding |
|
Weights Caching |
|
Notes |
Supports extrapolation outside source data bounds. |
Nearest#
API |
|
Method |
Nearest (destination to source) |
Source Grid |
Pair of 1D lat/lon coordinates. |
Target Grid |
Pair of 1D lat/lon coordinates. |
Coordinate System |
May be present on both |
Lazy Regridding |
|
Weights Caching |
|
UnstructuredNearest#
API |
|
Method |
Nearest (destination to source) |
Source Grid |
Pair of lat/lon coordinates with any dimensionality (e.g., 1D or 2D). Must be associated to the same axes on the source cube. |
Target Grid |
Pair of 1D lat/lon coordinates. |
Coordinate System |
Must be equal on |
Lazy Regridding |
|
Weights Caching |
|
PointInCell#
API |
|
Method |
Point in cell |
Source Grid |
Pair of lat/lon coordinates with any dimensionality (e.g., 1D or 2D). Must be associated to the same axes on the source cube. |
Target Grid |
Pair of 1D lat/lon coordinates, must have bounds. |
Coordinate System |
Must be equal on |
Lazy Regridding |
|
Weights Caching |
|
External Regridders#
ESMFAreaWeighted#
API |
|
Method |
Conservative |
Source Grid |
May be either:
|
Target Grid |
Any of the above. May be a different type to |
Coordinate System |
|
Lazy Regridding |
|
Weights Caching |
|
Notes |
Supports masked data with |
ESMFBilinear#
API |
|
Method |
Linear |
Source Grid |
May be either:
|
Target Grid |
Any of the above. May be a different type to |
Coordinate System |
|
Lazy Regridding |
|
Weights Caching |
|
ESMFNearest#
API |
|
Method |
Nearest (destination to source) |
Source Grid |
May be either:
|
Target Grid |
Any of the above. May be a different type to |
Coordinate System |
|
Lazy Regridding |
|
Weights Caching |
|
Comparing Methods#
The various regridding algorithms are implementations of the following methods. While there may be slight differences in the way each regridder implements a given method, each regridder broadly follows the principles of that method. We give here a very brief overview of what situations each method are best suited to followed by a more detailed discussion.
Conservative#
Good for representing the entirety of the underlying data. Designed for data represented by cell faces. A fuller description of what it means to be conservative can be found in the section on area conservation.
Linear#
Good for approximating data represented at precise points in space and in cases where it is desirable for the resulting data to be smooth. For more detail, see the section on regridder smoothness.
Nearest#
Tends to be the fastest regridding method. Ensures each resulting data value represents a data value in the source. Good in cases where averaging is inappropriate, e.g., for discontinuous data.
Point in cell#
Similarly to the conservative method, represents the entirety of the underlying data. Works well with data whose source is an unstructured series of points.
Numerical Accuracy#
An important thing to understand when regridding is that no regridding method is perfect. That is to say, you will tend to lose information when you regrid so that if you were to regrid from a source grid to a target and then back onto the original source, you will usually end up with slightly different data. Furthermore, statistical properties such as min, max and standard deviation are not guaranteed to be preserved. While regridding is inherently imperfect, there are some properties which can be better preserved by choosing the appropriate regridding method. These include:
Global Area Weighted Average#
Area weighted regridding schemes such as AreaWeighted
and
ESMFAreaWeighted
use conservative regridding schemes. The property which these regridders
conserve is the global area weighted average of the data (or equivalently,
the area weighted sum). More precisely, this means that:
When regridding from a source cube to a target cube defined
over the same area (e.g., the entire globe), assuming there
are no masked data points, the area weighted average
(weighted by the area covered by each data point) of the
source cube ought to be equal (within minor tolerances)
to the area weighted average of the result.
This property will be particularly important to consider if you are intending to calculate global properties such as average temperature or total rainfall over a given area. It may be less important if you are only interested in local behaviour, e.g., temperature at particular locations.
When there are masked points in your data, the same global conservative properties
no longer strictly hold. This is because the area which the unmasked points in the
source cover is no longer the same as the area covered by unmasked points in the
target. With the keyword argument mdtol=0
this means that there will be an area
around the source mask which will be masked in the result and therefore unaccounted
for in the area weighted average calculation. Conversely, with the keyword argument
mdtol=1
there will be an unmasked area in the result that is masked in the source.
This may be particularly important if you are intending to calculate properties
which depend area e.g., calculating the total global rainfall based on data in units
of kg m-2
as an area weighted sum. With mdtol=0
this will consistently
underestimate this total and with mdtol=1
will consistently overestimate. This can
be somewhat mitigated with a choice of mdtol=0.5
, but you should still be aware of
potential inaccuracies. It should be noted that this choice of mdtol
is highly
context dependent and there will likely be occasions where a choice of mdtol=0
or
mdtol=1
is more suitable. The important thing is to know your data, know what
you’re doing with your data and know how your regridder fits in this process.
Data Gradient/Smoothness#
Alternatively, rather than conserving global properties, it may be more important to
approximate each individual point of data as accurately as possible. In this case, it
may be more appropriate to use a linear regridder such as Linear
or ESMFBilinear
.
The linear method calculates each target point as the weighted average of the four surrounding source points. This average is weighted according to how close this target point is to the surrounding points. Notably, the value assigned to a target point varys continuously with its position (as opposed to nearest neighbour regridding).
Such regridders work best when the data in question can be considered as a collection of measurements made at points on a smoothly varying field. The difference in behaviour between linear and conservative regridders can be seen most clearly when there is a large difference between the source and target grid resolution.
Suppose you were regridding from a high resolution to a low resolution, if you were regridding using a conservative method, each result point would be the average of many result points. On the other hand, if you were using a linear method then the result would only be the average the 4 nearest source points. This means that, while conservative methods will give you a better idea of the totality of the source data, linear methods will give you a better idea of the source data at a particular point.
Conversely, suppose you were regridding from a low resolution to a high resolution. For other regridding methods (conservative and nearest), most of the target points covered by a given source point would have the same value and there would be a steep difference between target points near the cell boundary. For linear regridding however, the resulting data will vary smoothly.
Consistency#
As noted above, each regridding method has its own unique effect on the data. While this can be manageable when contained within context of a particular workflow, you should take care not to compare data which has been regrid with different regridding methods as the artefacts of that regridding method may dominate the underlying differences.
It should also be noted that some implementations of the same method (e.g.,
Nearest
and UnstructuredNearest
) may
differ slightly and so may yield slightly different results when applied to equivalent
data. However this difference will be significantly less than the difference between
regridders based on different methods.
Performance#
Regridding can be an expensive operation, but there are ways to work with regridders to mitigate this cost. For most regridders, the regridding process can be broken down into two steps:
Preparing the regridder by comparing the source and target grids and generating weights.
Performing the regridding by applying those weights to the source data.
Generally, the prepare step is the more expensive of the two. It is better to avoid repeating this step unnecessarily. This can be done by reusing a regridder, as described in the user guide.