.. _interpolation_and_regridding: .. testsetup:: * import numpy as np import iris import warnings warnings.simplefilter('ignore') ================================= Cube Interpolation and Regridding ================================= Iris provides powerful cube-aware interpolation and regridding functionality, exposed through Iris cube methods. This functionality is provided by building upon existing interpolation schemes implemented by SciPy. In Iris we refer to the available types of interpolation and regridding as `schemes`. The following are the interpolation schemes that are currently available in Iris: * linear interpolation (:class:`iris.analysis.Linear`), and * nearest-neighbour interpolation (:class:`iris.analysis.Nearest`). The following are the regridding schemes that are currently available in Iris: * linear regridding (:class:`iris.analysis.Linear`), * nearest-neighbour regridding (:class:`iris.analysis.Nearest` and :class:`iris.analysis.UnstructuredNearest`), * point in cell regridding (:class:`iris.analysis.PointInCell`) and * area-weighted regridding (:class:`iris.analysis.AreaWeighted`, first-order conservative). The linear, nearest-neighbor, and area-weighted regridding schemes support lazy regridding, i.e. if the source cube has lazy data, the resulting cube will also have lazy data. See :doc:`real_and_lazy_data` for an introduction to lazy data. See :doc:`../further_topics/which_regridder_to_use` for a more in depth overview of the different regridders. .. _interpolation: Interpolation ------------- Interpolating a cube is achieved with the :meth:`~iris.cube.Cube.interpolate` method. This method expects two arguments: #. the sample points to interpolate, and #. the interpolation scheme to use. The result is a new cube, interpolated at the sample points. Sample points must be defined as an iterable of ``(coord, value(s))`` pairs. The `coord` argument can be either a coordinate name or coordinate instance. The specified coordinate must exist on the cube being interpolated! For example: * coordinate names and scalar sample points: ``[('latitude', 51.48), ('longitude', 0)]``, * a coordinate instance and a scalar sample point: ``[(cube.coord('latitude'), 51.48)]``, and * a coordinate name and a NumPy array of sample points: ``[('longitude', np.linspace(-11, 2, 14))]`` are all examples of valid sample points. The values for coordinates that correspond to date/times can be supplied as datetime.datetime or cftime.datetime instances, e.g. ``[('time', datetime.datetime(2009, 11, 19, 10, 30))]``). Let's take the air temperature cube we've seen previously: >>> air_temp = iris.load_cube(iris.sample_data_path('air_temp.pp')) >>> print(air_temp) air_temperature / (K) (latitude: 73; longitude: 96) Dimension coordinates: latitude x - longitude - x Scalar coordinates: forecast_period 6477 hours, bound=(-28083.0, 6477.0) hours forecast_reference_time 1998-03-01 03:00:00 pressure 1000.0 hPa time 1998-12-01 00:00:00, bound=(1994-12-01 00:00:00, 1998-12-01 00:00:00) Cell methods: 0 time: mean within years 1 time: mean over years Attributes: STASH m01s16i203 source 'Data from Met Office Unified Model' We can interpolate specific values from the coordinates of the cube: >>> sample_points = [('latitude', 51.48), ('longitude', 0)] >>> print(air_temp.interpolate(sample_points, iris.analysis.Linear())) air_temperature / (K) (scalar cube) Scalar coordinates: forecast_period 6477 hours, bound=(-28083.0, 6477.0) hours forecast_reference_time 1998-03-01 03:00:00 latitude 51.48 degrees longitude 0 degrees pressure 1000.0 hPa time 1998-12-01 00:00:00, bound=(1994-12-01 00:00:00, 1998-12-01 00:00:00) Cell methods: 0 time: mean within years 1 time: mean over years Attributes: STASH m01s16i203 source 'Data from Met Office Unified Model' As we can see, the resulting cube is scalar and has longitude and latitude coordinates with the values defined in our sample points. It isn't necessary to specify sample points for every dimension, only those that you wish to interpolate over: >>> result = air_temp.interpolate([('longitude', 0)], iris.analysis.Linear()) >>> print('Original: ' + air_temp.summary(shorten=True)) Original: air_temperature / (K) (latitude: 73; longitude: 96) >>> print('Interpolated: ' + result.summary(shorten=True)) Interpolated: air_temperature / (K) (latitude: 73) The sample points for a coordinate can be an array of values. When multiple coordinates are provided with arrays instead of scalar sample points, the coordinates on the resulting cube will be orthogonal: >>> sample_points = [('longitude', np.linspace(-11, 2, 14)), ... ('latitude', np.linspace(48, 60, 13))] >>> result = air_temp.interpolate(sample_points, iris.analysis.Linear()) >>> print(result.summary(shorten=True)) air_temperature / (K) (latitude: 13; longitude: 14) Interpolating Non-Horizontal Coordinates ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Interpolation in Iris is not limited to horizontal-spatial coordinates - any coordinate satisfying the prerequisites of the chosen scheme may be interpolated over. For instance, the :class:`iris.analysis.Linear` scheme requires 1D numeric, monotonic, coordinates. Supposing we have a single column cube such as the one defined below: >>> cube = iris.load_cube(iris.sample_data_path('hybrid_height.nc'), 'air_potential_temperature') >>> column = cube[:, 0, 0] >>> print(column.summary(shorten=True)) air_potential_temperature / (K) (model_level_number: 15) This cube has a "hybrid-height" vertical coordinate system, meaning that the vertical coordinate is unevenly spaced in altitude: >>> print(column.coord('altitude').points) [ 418.69836 434.5705 456.7928 485.3665 520.2933 561.5752 609.2145 663.2141 723.57697 790.30664 863.4072 942.8823 1028.737 1120.9764 1219.6051 ] We could regularise the vertical coordinate by defining 10 equally spaced altitude sample points between 400 and 1250 and interpolating our vertical coordinate onto these sample points: >>> sample_points = [('altitude', np.linspace(400, 1250, 10))] >>> new_column = column.interpolate(sample_points, iris.analysis.Linear()) >>> print(new_column.summary(shorten=True)) air_potential_temperature / (K) (model_level_number: 10) Let's look at the original data, the interpolation line and the new data in a plot. This will help us to see what is going on: .. plot:: userguide/regridding_plots/interpolate_column.py The red diamonds on the extremes of the altitude values show that we have extrapolated data beyond the range of the original data. In some cases this is desirable but in other cases it is not. For example, this column defines a surface altitude value of 414m, so extrapolating an "air potential temperature" at 400m makes little physical sense in this case. We can control the extrapolation mode when defining the interpolation scheme. Controlling the extrapolation mode allows us to avoid situations like the above where extrapolating values makes little physical sense. The extrapolation mode is controlled by the ``extrapolation_mode`` keyword. For the available interpolation schemes available in Iris, the ``extrapolation_mode`` keyword must be one of: * ``extrapolate`` -- the extrapolation points will be calculated by extending the gradient of the closest two points, * ``error`` -- a ValueError exception will be raised, notifying an attempt to extrapolate, * ``nan`` -- the extrapolation points will be be set to NaN, * ``mask`` -- the extrapolation points will always be masked, even if the source data is not a MaskedArray, or * ``nanmask`` -- if the source data is a MaskedArray the extrapolation points will be masked. Otherwise they will be set to NaN. Using an extrapolation mode is achieved by constructing an interpolation scheme with the extrapolation mode keyword set as required. The constructed scheme is then passed to the :meth:`~iris.cube.Cube.interpolate` method. For example, to mask values that lie beyond the range of the original data: >>> scheme = iris.analysis.Linear(extrapolation_mode='mask') >>> new_column = column.interpolate(sample_points, scheme) >>> print(new_column.coord('altitude').points) [-- 494.44451904296875 588.888916015625 683.333251953125 777.77783203125 872.2222290039062 966.666748046875 1061.111083984375 1155.555419921875 --] .. _caching_an_interpolator: Caching an Interpolator ^^^^^^^^^^^^^^^^^^^^^^^ If you need to interpolate a cube on multiple sets of sample points you can 'cache' an interpolator to be used for each of these interpolations. This can shorten the execution time of your code as the most computationally intensive part of an interpolation is setting up the interpolator. To cache an interpolator you must set up an interpolator scheme and call the scheme's interpolator method. The interpolator method takes as arguments: #. a cube to be interpolated, and #. an iterable of coordinate names or coordinate instances of the coordinates that are to be interpolated over. For example: >>> air_temp = iris.load_cube(iris.sample_data_path('air_temp.pp')) >>> interpolator = iris.analysis.Nearest().interpolator(air_temp, ['latitude', 'longitude']) When this cached interpolator is called you must pass it an iterable of sample points that have the same form as the iterable of coordinates passed to the constructor. So, to use the cached interpolator defined above: >>> latitudes = np.linspace(48, 60, 13) >>> longitudes = np.linspace(-11, 2, 14) >>> for lat, lon in zip(latitudes, longitudes): ... result = interpolator([lat, lon]) In each case ``result`` will be a cube interpolated from the ``air_temp`` cube we passed to interpolator. Note that you must specify the required extrapolation mode when setting up the cached interpolator. For example:: >>> interpolator = iris.analysis.Nearest(extrapolation_mode='nan').interpolator(cube, coords) .. _regridding: Regridding ---------- Regridding is conceptually a very similar process to interpolation in Iris. The primary difference is that interpolation is based on sample points, while regridding is based on the **horizontal** grid of *another cube*. Regridding a cube is achieved with the :meth:`cube.regrid() ` method. This method expects two arguments: #. *another cube* that defines the target grid onto which the cube should be regridded, and #. the regridding scheme to use. .. note:: Regridding is a common operation needed to allow comparisons of data on different grids. The powerful mapping functionality provided by cartopy, however, means that regridding is often not necessary if performed just for visualisation purposes. Let's load two cubes that have different grids and coordinate systems: >>> global_air_temp = iris.load_cube(iris.sample_data_path('air_temp.pp')) >>> rotated_psl = iris.load_cube(iris.sample_data_path('rotated_pole.nc')) We can visually confirm that they are on different grids by plotting the two cubes: .. plot:: userguide/regridding_plots/regridding_plot.py Let's regrid the ``global_air_temp`` cube onto a rotated pole grid using a linear regridding scheme. To achieve this we pass the ``rotated_psl`` cube to the regridder to supply the target grid to regrid the ``global_air_temp`` cube onto: >>> rotated_air_temp = global_air_temp.regrid(rotated_psl, iris.analysis.Linear()) .. plot:: userguide/regridding_plots/regridded_to_rotated.py We could regrid the pressure values onto the global grid, but this will involve some form of extrapolation. As with interpolation, we can control the extrapolation mode when defining the regridding scheme. For the available regridding schemes in Iris, the ``extrapolation_mode`` keyword must be one of: * ``extrapolate`` -- * for :class:`~iris.analysis.Linear` the extrapolation points will be calculated by extending the gradient of the closest two points. * for :class:`~iris.analysis.Nearest` the extrapolation points will take their value from the nearest source point. * ``nan`` -- the extrapolation points will be be set to NaN. * ``error`` -- a ValueError exception will be raised, notifying an attempt to extrapolate. * ``mask`` -- the extrapolation points will always be masked, even if the source data is not a MaskedArray. * ``nanmask`` -- if the source data is a MaskedArray the extrapolation points will be masked. Otherwise they will be set to NaN. The ``rotated_psl`` cube is defined on a limited area rotated pole grid. If we regridded the ``rotated_psl`` cube onto the global grid as defined by the ``global_air_temp`` cube any linearly extrapolated values would quickly become dominant and highly inaccurate. We can control this behaviour by defining the ``extrapolation_mode`` in the constructor of the regridding scheme to mask values that lie outside of the domain of the rotated pole grid: >>> scheme = iris.analysis.Linear(extrapolation_mode='mask') >>> global_psl = rotated_psl.regrid(global_air_temp, scheme) .. plot:: userguide/regridding_plots/regridded_to_global.py Notice that although we can still see the approximate shape of the rotated pole grid, the cells have now become rectangular in a plate carrée (equirectangular) projection. The spatial grid of the resulting cube is really global, with a large proportion of the data being masked. Area-Weighted Regridding ^^^^^^^^^^^^^^^^^^^^^^^^ It is often the case that a point-based regridding scheme (such as :class:`iris.analysis.Linear` or :class:`iris.analysis.Nearest`) is not appropriate when you need to conserve quantities when regridding. The :class:`iris.analysis.AreaWeighted` scheme is less general than :class:`~iris.analysis.Linear` or :class:`~iris.analysis.Nearest`, but is a conservative regridding scheme, meaning that the area-weighted total is approximately preserved across grids. With the :class:`~iris.analysis.AreaWeighted` regridding scheme, each target grid-box's data is computed as a weighted mean of all grid-boxes from the source grid. The weighting for any given target grid-box is the area of the intersection with each of the source grid-boxes. This scheme performs well when regridding from a high resolution source grid to a lower resolution target grid, since all source data points will be accounted for in the target grid. Let's demonstrate this with the global air temperature cube we saw previously, along with a limited area cube containing total concentration of volcanic ash: >>> global_air_temp = iris.load_cube(iris.sample_data_path('air_temp.pp')) >>> print(global_air_temp.summary(shorten=True)) air_temperature / (K) (latitude: 73; longitude: 96) >>> >>> regional_ash = iris.load_cube(iris.sample_data_path('NAME_output.txt')) >>> regional_ash = regional_ash.collapsed('flight_level', iris.analysis.SUM) >>> print(regional_ash.summary(shorten=True)) VOLCANIC_ASH_AIR_CONCENTRATION / (g/m3) (latitude: 214; longitude: 584) One of the key limitations of the :class:`~iris.analysis.AreaWeighted` regridding scheme is that the two input grids must be defined in the same coordinate system as each other. Both input grids must also contain monotonic, bounded, 1D spatial coordinates. .. note:: The :class:`~iris.analysis.AreaWeighted` regridding scheme requires spatial areas, therefore the longitude and latitude coordinates must be bounded. If the longitude and latitude bounds are not defined in the cube we can guess the bounds based on the coordinates' point values: >>> global_air_temp.coord('longitude').guess_bounds() >>> global_air_temp.coord('latitude').guess_bounds() Using NumPy's masked array module we can mask any data that falls below a meaningful concentration: >>> regional_ash.data = np.ma.masked_less(regional_ash.data, 5e-6) Finally, we can regrid the data using the :class:`~iris.analysis.AreaWeighted` regridding scheme: >>> scheme = iris.analysis.AreaWeighted(mdtol=0.5) >>> global_ash = regional_ash.regrid(global_air_temp, scheme) >>> print(global_ash.summary(shorten=True)) VOLCANIC_ASH_AIR_CONCENTRATION / (g/m3) (latitude: 73; longitude: 96) Note that the :class:`~iris.analysis.AreaWeighted` regridding scheme allows us to define a missing data tolerance (``mdtol``), which specifies the tolerated fraction of masked data in any given target grid-box. If the fraction of masked data within a target grid-box exceeds this value, the data in this target grid-box will be masked in the result. The fraction of masked data is calculated based on the area of masked source grid-boxes that overlaps with each target grid-box. Defining an ``mdtol`` in the :class:`~iris.analysis.AreaWeighted` regridding scheme allows fine control of masked data tolerance. It is worth remembering that defining an ``mdtol`` of anything other than 1 will prevent the scheme from being fully conservative, as some data will be disregarded if it lies close to masked data. To visualise the above regrid, let's plot the original data, along with 3 distinct ``mdtol`` values to compare the result: .. plot:: userguide/regridding_plots/regridded_to_global_area_weighted.py .. _caching_a_regridder: Caching a Regridder ^^^^^^^^^^^^^^^^^^^ If you need to regrid multiple cubes with a common source grid onto a common target grid you can 'cache' a regridder to be used for each of these regrids. This can shorten the execution time of your code as the most computationally intensive part of a regrid is setting up the regridder. To cache a regridder you must set up a regridder scheme and call the scheme's regridder method. The regridder method takes as arguments: #. a cube (that is to be regridded) defining the source grid, and #. a cube defining the target grid to regrid the source cube to. For example: >>> global_air_temp = iris.load_cube(iris.sample_data_path('air_temp.pp')) >>> rotated_psl = iris.load_cube(iris.sample_data_path('rotated_pole.nc')) >>> regridder = iris.analysis.Nearest().regridder(global_air_temp, rotated_psl) When this cached regridder is called you must pass it a cube on the same grid as the source grid cube (in this case ``global_air_temp``) that is to be regridded to the target grid. For example:: >>> for cube in list_of_cubes_on_source_grid: ... result = regridder(cube) In each case ``result`` will be the input cube regridded to the grid defined by the target grid cube (in this case ``rotated_psl``) that we used to define the cached regridder. Regridding Lazy Data ^^^^^^^^^^^^^^^^^^^^ If you are working with large cubes, especially when you are regridding to a high resolution target grid, you may run out of memory when trying to regrid a cube. When this happens, make sure the input cube has lazy data >>> air_temp = iris.load_cube(iris.sample_data_path('A1B_north_america.nc')) >>> air_temp >>> air_temp.has_lazy_data() True and the regridding scheme supports lazy data. All regridding schemes described here support lazy data. If you still run out of memory even while using lazy data, inspect the `chunks `__ : >>> air_temp.lazy_data().chunks ((240,), (37,), (49,)) The cube above consist of a single chunk, because it is fairly small. For larger cubes, iris will automatically create chunks of an optimal size when loading the data. However, because regridding to a high resolution grid may dramatically increase the size of the data, the automatically chosen chunks might be too large. As an example of how to solve this, we could manually re-chunk the time dimension, to regrid it in 8 chunks of 30 timesteps at a time: >>> air_temp.data = air_temp.lazy_data().rechunk([30, None, None]) >>> air_temp.lazy_data().chunks ((30, 30, 30, 30, 30, 30, 30, 30), (37,), (49,)) Assuming that Dask is configured such that it processes only a few chunks of the data array at a time, this will further reduce memory use. Note that chunking in the horizontal dimensions is not supported by the regridding schemes. Chunks in these dimensions will automatically be combined before regridding.