Merge and Concatenate

We saw in the Loading Iris Cubes chapter that Iris tries to load as few cubes as possible. This is done by collecting together multiple fields with a shared standard name (and other key metadata) into a single multidimensional cube. The processes that perform this behaviour in Iris are known as merge and concatenate.

This chapter describes the merge and concatenate processes; it explains why common issues occur when using them and gives advice on how prevent these issues from occurring.

Both merge and concatenate take multiple cubes as input and result in fewer cubes as output. The following diagram illustrates the two processes:

Pictographic of merge and concatenation.

There is one major difference between the merge and concatenate processes.

  • The merge process combines multiple input cubes into a single resultant cube with new dimensions created from the scalar coordinate values of the input cubes.

  • The concatenate process combines multiple input cubes into a single resultant cube with the same number of dimensions as the input cubes, but with the length of one or more dimensions extended by joining together sequential dimension coordinates.

Let’s imagine 28 individual cubes representing the temperature at a location (y, x); one cube for each day of February. We can use merge() to combine the 28 (y, x) cubes into a single (t, y, x) cube, where the length of the t dimension is 28.

Now imagine 12 individual cubes representing daily temperature at a time and location (t, y, x); one cube for each month in the year. We can use concatenate() to combine the 12 (t, y, x) cubes into a single (t, y, x) cube, where the length of the t dimension is now 365.

Merge

We’ve seen that the merge process combines multiple input cubes into a single resultant cube with new dimensions created from the scalar coordinate values of the input cubes.

In order to construct new coordinates for the new dimensions, the merge process requires input cubes with scalar coordinates that can be combined together into monotonic sequences. The order of the input cubes does not affect the merge process.

The merge process can produce a cube that has more than one new dimension, if the scalar coordinate sequences form an orthogonal basis.

Important

The shape, metadata, attributes, coordinates, coordinates metadata, fill value and other aspects of the input cubes must be consistent across all of the input cubes.

The merge process will fail if these are not consistent. Such failures are covered in the Common Issues With Merge and Concatenate section.

The merge process can be accessed using two methods. The two methods are merge() and merge_cube(), which are described below.

Using CubeList.merge

The CubeList.merge method operates on a list of cubes and returns a new CubeList containing the cubes that have been merged.

Let’s have a look at the merge() method in operation. In this example we have a list of three lateral (x, y) cubes in a variable called cubes, each with a scalar z coordinate of differing value. We can merge these cubes by stacking the scalar z coordinates to make a new z dimension coordinate:

>>> print(cubes)
0: air_temperature / (kelvin)          (y: 4; x: 5)
1: air_temperature / (kelvin)          (y: 4; x: 5)
2: air_temperature / (kelvin)          (y: 4; x: 5)

>>> print(cubes[0])
air_temperature / (kelvin)          (y: 4; x: 5)
 ...
     Scalar coordinates:
          z: 1 meters
>>> print(cubes[1])
air_temperature / (kelvin)          (y: 4; x: 5)
 ...
     Scalar coordinates:
          z: 2 meters
>>> print(cubes[2])
air_temperature / (kelvin)          (y: 4; x: 5)
 ...
     Scalar coordinates:
          z: 3 meters

>>> print(cubes.merge())
0: air_temperature / (kelvin)          (z: 3; y: 4; x: 5)

The following diagram illustrates what has taken place in this example:

Pictographic of merge.

The diagram illustrates that we have three input cubes of identical shape that stack on the z dimension. After merging our three input cubes we get a new CubeList containing one cube with a new z coordinate.

Using CubeList.merge_cube

The merge_cube() method guarantees that exactly one cube will be returned as a result of merging the input cubes. If merge_cube() cannot fulfil this guarantee, a descriptive error will be raised providing details to help diagnose the differences between the input cubes. In contrast, the merge() method makes no check on the number of cubes returned.

To demonstrate the differences between merge() and merge_cube(), let’s return to our three cubes from the earlier merge example.

For the purposes of this example a Conventions attribute has been added to the first cube’s attributes dictionary. Remember that the attributes must be consistent across all cubes in order to merge into a single cube:

>>> print(cubes)
0: air_temperature / (kelvin)          (y: 4; x: 5)
1: air_temperature / (kelvin)          (y: 4; x: 5)
2: air_temperature / (kelvin)          (y: 4; x: 5)

>>> print(cubes[0].attributes)
{'Conventions': 'CF-1.5'}
>>> print(cubes[1].attributes)
{}
>>> print(cubes[2].attributes)
{}

>>> print(cubes.merge())
0: air_temperature / (kelvin)          (y: 4; x: 5)
1: air_temperature / (kelvin)          (z: 2; y: 4; x: 5)

>>> print(cubes.merge_cube())
Traceback (most recent call last):
    ...
    raise iris.exceptions.MergeError(msgs)
iris.exceptions.MergeError: failed to merge into a single cube.
  cube.attributes keys differ: 'Conventions'

Note that merge() returns two cubes here. All the cubes that can be merged have been merged. Any cubes that can’t be merged are included unchanged in the returned CubeList. When merge_cube() is called on cubes it raises a descriptive error that highlights the difference in the attributes dictionaries. It is this difference that is preventing cubes being merged into a single cube. An example of fixing an issue like this can be found in the Common Issues With Merge and Concatenate section.

Merge in Iris Load

The CubeList’s merge() method is used internally by the three main Iris load functions introduced in Loading Iris Cubes. For file formats such as GRIB and PP, which store fields as many individual 2D arrays, Iris loading uses the merge process to produce a more intuitive higher dimensional cube of each phenomenon where possible.

Sometimes the merge process doesn’t behave as expected. In almost all cases this is due to the input cubes containing unexpected or inconsistent metadata. For this reason, a fourth Iris file loading function, iris.load_raw(), exists. The load_raw() function is intended as a diagnostic tool that can be used to load cubes from files without the merge process taking place. The return value of iris.load_raw() is always a CubeList instance. You can then call the merge_cube() method on this returned CubeList to help identify merge related load issues.

Concatenate

We’ve seen that the concatenate process combines multiple input cubes into a single resultant cube with the same number of dimensions as the input cubes, but with the length of one or more dimensions extended by joining together sequential dimension coordinates.

In order to extend the dimensions lengths, the concatenate process requires input cubes with dimension coordinates that can be combined together into monotonic sequences. The order of the input cubes does not affect the concatenate process.

Important

The shape, metadata, attributes, coordinates, coordinates metadata, fill value and other aspects of the input cubes must be consistent across all of the input cubes.

The concatenate process will fail if these are not consistent. Such failures are covered in the Common Issues With Merge and Concatenate section.

The concatenate process can be accessed using two methods. The two methods are concatenate() and concatenate_cube(), which are described below.

Using CubeList.concatenate

The CubeList.concatenate method operates on a list of cubes and returns a new CubeList containing the cubes that have been concatenated.

Let’s have a look at the concatenate() method in operation. In the example below we have three 3D (t, y, x) cubes whose t coordinates have sequentially increasing ranges. These cubes can be concatenated by combining the t coordinates of the input cubes to form a new cube with an extended t coordinate:

>>> print(cubes)
0: air_temperature / (kelvin)          (t: 31; y: 3; x: 4)
1: air_temperature / (kelvin)          (t: 28; y: 3; x: 4)
2: air_temperature / (kelvin)          (t: 31; y: 3; x: 4)

>>> print(cubes.concatenate())
0: air_temperature / (kelvin)          (t: 90; y: 3; x: 4)

The following diagram illustrates what has taken place in this example:

Pictographic of concatenate.

The diagram illustrates that we have three 3D input cubes that line up on the t dimension. After concatenating our three input cubes we get a new CubeList containing one cube with an extended t coordinate.

Using CubeList.concatenate_cube

The concatenate_cube() method guarantees that exactly one cube will be returned as a result of concatenating the input cubes. If concatenate_cube() cannot fulfil this guarantee, a descriptive error will be raised providing details to help diagnose the differences between the input cubes. In contrast, the concatenate() method makes no check on the number of cubes returned.

To demonstrate the differences between concatenate() and concatenate_cube(), let’s return to our three cubes from the earlier concatenate example.

For the purposes of this example we’ll add a History attribute to the first cube’s attributes dictionary. Remember that the attributes must be consistent across all cubes in order to concatenate into a single cube:

>>> print(cubes)
0: air_temperature / (kelvin)          (t: 31; y: 3; x: 4)
1: air_temperature / (kelvin)          (t: 28; y: 3; x: 4)
2: air_temperature / (kelvin)          (t: 31; y: 3; x: 4)

>>> print(cubes[0].attributes)
{'History': 'Created 2010-06-30'}
>>> print(cubes[1].attributes)
{}

>>> print(cubes.concatenate())
0: air_temperature / (kelvin)          (t: 31; y: 3; x: 4)
1: air_temperature / (kelvin)          (t: 59; y: 3; x: 4)
>>> print(cubes.concatenate_cube())
Traceback (most recent call last):
    ...
    raise iris.exceptions.ConcatenateError(msgs)
iris.exceptions.ConcatenateError: failed to concatenate into a single cube.
  Cube metadata differs for phenomenon: air_temperature

Note that concatenate() returns two cubes here. All the cubes that can be concatenated have been concatenated. Any cubes that can’t be concatenated are included unchanged in the returned CubeList. When concatenate_cube() is called on cubes it raises a descriptive error that highlights the difference in the attributes dictionaries. It is this difference that is preventing cubes being concatenated into a single cube. An example of fixing an issue like this can be found in the Common Issues With Merge and Concatenate section.

Common Issues With Merge and Concatenate

The Iris algorithms that drive merge() and concatenate() are complex and depend on a number of different elements of the input cubes being consistent across all input cubes. If this consistency is not maintained then the merge() or concatenate() process can fail in a seemingly arbitrary manner.

The methods merge_cube() and concatenate_cube() were introduced to Iris to help you locate differences in input cubes that prevent the input cubes merging or concatenating. Nevertheless, certain difficulties with using merge() and concatenate() occur frequently. This section describes these common difficulties, why they arise and what you can do to avoid them.

Merge

Attributes Mismatch

Differences in the attributes the input cubes probably cause the greatest amount of merge-related difficulties. In recognition of this, Iris has a helper function, equalise_attributes(), to equalise attributes differences in the input cubes.

Note

The functionality provided by iris.util.describe_diff() and iris.cube.Cube.is_compatible() are not designed to give user indication of whether two cubes can be merged.

To demonstrate using equalise_attributes(), let’s return to our non-merging list of input cubes from the merge_cube example from earlier. We’ll call equalise_attributes() on the input cubes before merging the input cubes using merge_cube():

>>> from iris.util import equalise_attributes
>>> print(cubes)
0: air_temperature / (kelvin)          (y: 4; x: 5)
1: air_temperature / (kelvin)          (y: 4; x: 5)
2: air_temperature / (kelvin)          (y: 4; x: 5)

>>> print(cubes[0].attributes)
{'Conventions': 'CF-1.5'}
>>> print(cubes[1].attributes)
{}
>>> print(cubes[2].attributes)
{}

>>> print(cubes.merge_cube())
Traceback (most recent call last):
    ...
    raise iris.exceptions.MergeError(msgs)
iris.exceptions.MergeError: failed to merge into a single cube.
  cube.attributes keys differ: 'Conventions'

>>> equalise_attributes(cubes)

>>> print(cubes[0].attributes)
{}

>>> print(cubes.merge_cube())
air_temperature / (kelvin)          (z: 3; y: 4; x: 5)
     Dimension coordinates:
          z                           x     -     -
          y                           -     x     -
          x                           -     -     x

Incomplete Data

Merging input cubes with inconsistent dimension lengths can cause misleading results. This is a common problem when merging cubes generated by different ensemble members in a model run.

The misleading results cause the merged cube to gain an anonymous leading dimension. All the merged coordinates appear as auxiliary coordinates on the anonymous leading dimension. This is shown in the example below:

>>> print(cube)
surface_temperature / (K)           (-- : 5494; latitude: 325; longitude: 432)
     Dimension coordinates:
          latitude                      -               x               -
          longitude                     -               -               x
     Auxiliary coordinates:
          forecast_month                x               -               -
          forecast_period               x               -               -
          forecast_reference_time       x               -               -
          realization                   x               -               -
          time                          x               -               -

Merging Duplicate Cubes

The Iris load process does not merge duplicate cubes (two or more identical cubes in the input cubes) by default. This behaviour can be changed by setting the unique keyword argument to merge() to False.

Merging duplicate cubes can cause misleading results. Let’s demonstrate these behaviours and misleading results with the following example. In this example we have three input cubes. The first has a scalar z coordinate with value 1, the second has a scalar z coordinate with value 2 and the third has a scalar z coordinate with value 1. The first and third cubes are thus identical. We will demonstrate the effect of merging the input cubes with unique=False (duplicate cubes allowed) and unique=True (duplicate cubes not allowed, which is the default behaviour):

>>> print(cubes)
0: air_temperature / (kelvin)          (y: 4; x: 5)
1: air_temperature / (kelvin)          (y: 4; x: 5)
2: air_temperature / (kelvin)          (y: 4; x: 5)

>>> print(cubes.merge(unique=False))
0: air_temperature / (kelvin)          (z: 2; y: 4; x: 5)
1: air_temperature / (kelvin)          (z: 2; y: 4; x: 5)

>>> print(cubes.merge())  # unique=True is the default.
Traceback (most recent call last):
  ...
iris.exceptions.DuplicateDataError: failed to merge into a single cube.
  Duplicate 'air_temperature' cube, with scalar coordinates z=Cell(point=1, bound=None)

Notice how merging the input cubes with duplicate cubes allowed produces a result with four z coordinate values. Closer inspection of these two resultant cubes demonstrates that the scalar z coordinate with value 2 is found in both cubes.

Trying to merge the input cubes with duplicate cubes not allowed raises an error highlighting the presence of the duplicate cube.

Single Value Coordinates

Coordinates containing only a single value can cause confusion when combining input cubes. Remember:

  • The merge process combines multiple input cubes into a single resultant cube with new dimensions created from the scalar coordinate values of the input cubes.

  • The concatenate process combines multiple input cubes into a single resultant cube with the same number of dimensions as the input cubes, but with the length of one or more dimensions extended by joining together sequential dimension coordinates.

In Iris terminology a scalar coordinate is a coordinate of length 1 which does not describe a data dimension.

Let’s look at two example cubes to demonstrate this.

If your cubes are similar to those below (the single value z coordinate is not on a dimension) then use merge() to combine your cubes:

>>> print(cubes[0])
air_temperature / (kelvin)          (y: 4; x: 5)
     Dimension coordinates:
          x                           x      -
          y                           -      x
     Scalar coordinates:
          z: 1
>>> print(cubes[1])
air_temperature / (kelvin)          (y: 4; x: 5)
     Dimension coordinates:
          x                           x      -
          y                           -      x
     Scalar coordinates:
          z: 2

If your cubes are similar to those below (the single value z coordinate is associated with a dimension) then use concatenate() to combine your cubes:

>>> print(cubes)
0: air_temperature / (kelvin)          (z: 1; y: 4; x: 5)
1: air_temperature / (kelvin)          (z: 1; y: 4; x: 5)

Concatenate

Time Units

Differences in the units of the time coordinates of the input cubes probably cause the greatest amount of concatenate-related difficulties. In recognition of this, Iris has a helper function, unify_time_units(), to apply a common time unit to all the input cubes.

To demonstrate using unify_time_units(), let’s adapt our list of input cubes from the concatenate_cube example from earlier. We’ll give the input cubes unequal time coordinate units and call unify_time_units() on the input cubes before concatenating the input cubes using concatenate_cube():

>>> from iris.util import unify_time_units
>>> print(cubes)
0: air_temperature / (kelvin)          (t: 31; y: 3; x: 4)
1: air_temperature / (kelvin)          (t: 28; y: 3; x: 4)
2: air_temperature / (kelvin)          (t: 31; y: 3; x: 4)

>>> print(cubes[0].coord('t').units)
days since 1990-02-15
>>> print(cubes[1].coord('t').units)
days since 1970-01-01

>>> print(cubes.concatenate_cube())
Traceback (most recent call last):
 ...
ConcatenateError: failed to concatenate into a single cube.
  Dimension coordinates metadata differ: t != t

>>> unify_time_units(cubes)

>>> print(cubes[1].coord('t').units)
days since 1990-02-15

>>> print(cubes.concatenate_cube())
air_temperature / (kelvin)          (t: 90; y: 3; x: 4)
     Dimension coordinates:
          t                           x      -     -
          y                           -      x     -
          x                           -      -     x

Attributes Mismatch

The concatenate process is affected by attributes mismatch on input cubes in the same way that the merge process is. The Attributes Mismatch section earlier in this chapter gives further information on attributes mismatch.