Iris Handling of PP and Fieldsfiles#

This document provides a basic account of how PP and Fieldsfiles data is represented within Iris. It describes how Iris represents data from the Met Office Unified Model (UM), in terms of the metadata elements found in PP and Fieldsfile formats.

For simplicity, we shall describe this mostly in terms of loading of PP data into Iris (i.e. into cubes). However most of the details are identical for Fieldsfiles, and are relevant to saving in these formats as well as loading.

Notes:

Iris treats Fieldsfile data almost exactly as if it were PP – i.e. it treats each field’s lookup table entry like a PP header.
The Iris data model is based on NetCDF CF conventions, so most of this can also be seen as a metadata translation between PP and CF terms, but it is easier to discuss in terms of Iris elements.

For details of Iris terms (cubes, coordinates, attributes), refer to Iris data structures.

For details of CF conventions, see https://cfconventions.org/.

Overview of Loading Process#

The basics of Iris loading are explained at Loading Iris Cubes. Loading as it specifically applies to PP and Fieldsfile data can be summarised as follows:

Input fields are first loaded from the given sources, using iris.fileformats.pp.load(). This returns an iterator, which provides a ‘stream’ of PPField input field objects. Each PPfield object represents a single source field:
- PP header elements are provided as named object attributes (e.g. lbproc).
- Some extra, calculated “convenience” properties are also provided (e.g. t1 and t2 time values).
- There is a iris.fileformats.pp.PPField.data attribute, but the field data is not actually loaded unless/until this is accessed, for greater speed and space efficiency.
Each input field is translated into a two-dimensional Iris cube (with dimensions of latitude and longitude). These are the ‘raw’ cubes, as returned by iris.load_raw(). Within these:
- There are 2 horizontal dimension coordinates containing the latitude and longitude values for the field.
- Certain other header elements are interpreted as ‘coordinate’-type values applying to the input fields, and stored as auxiliary ‘scalar’ (i.e. 1-D) coordinates. These include all header elements defining vertical and time coordinate values, and also more specialised factors such as ensemble number and pseudo-level.
- Other metadata is encoded on the cube in a variety of other forms, such as the cube ‘name’ and ‘units’ properties, attribute values and cell methods.
Lastly, Iris attempts to merge the raw cubes into higher-dimensional ones (using merge()): This combines raw cubes with different values of a scalar coordinate to produce a higher-dimensional cube with the values contained in a new vector coordinate. Where possible, the new vector coordinate is also a dimension coordinate, describing the new dimension. Apart from the original 2 horizontal dimensions, all cube dimensions and dimension coordinates arise in this way – for example, ‘time’, ‘height’, ‘forecast_period’, ‘realization’.

Note

This document covers the essential features of the UM data loading process. The complete details are implemented as follows:

The conversion of fields to raw cubes is performed by the function iris.fileformats.pp_rules.convert(), which is called from iris.fileformats.pp.load_cubes() during loading.
The corresponding save functionality for PP output is implemented by the iris.fileformats.pp.save() function. The relevant ‘save rules’ are defined in a text file (“lib/iris/etc/pp_save_rules.txt”), in a form defined by the iris.fileformats.rules module.

The rest of this document describes various independent sections of related metadata items.

Horizontal Grid#

UM Field elements

LBCODE, BPLAT, BPLON, BZX, BZY, BDX, BDY, X, Y, X_LOWER_BOUNDS, Y_LOWER_BOUNDS

Cube components

(unrotated) : coordinates longitude, latitude

(rotated pole) : coordinates grid_latitude, grid_longitude

Details

At present, only latitude-longitude projections are supported (both normal and rotated). In these cases, LBCODE is typically 1 or 101 (though, in fact, cross-sections with latitude and longitude axes are also supported).

For an ordinary latitude-longitude grid, the cubes have coordinates called ‘longitude’ and ‘latitude’:

These are mapped to the appropriate data dimensions.
They have units of ‘degrees’.
They have a coordinate system of type iris.coord_systems.GeogCS.
The coordinate points are normally set to the regular sequence ZDX/Y + BDX/Y * (1 .. LBNPT/LBROW) (except, if BDX/BDY is zero, the values are taken from the extra data vector X/Y, if present).
If X/Y_LOWER_BOUNDS extra data is available, this appears as bounds values of the horizontal coordinates.

For rotated latitude-longitude coordinates (as for LBCODE=101), the horizontal coordinates differ only slightly –

The names are ‘grid_latitude’ and ‘grid_longitude’.
The coord_system is a iris.coord_systems.RotatedGeogCS, created with a pole defined by BPLAT, BPLON.

For example:

>>> # Load a PP field.
... fname = iris.sample_data_path('air_temp.pp')
>>> fields_iter = iris.fileformats.pp.load(fname)
>>> field = next(fields_iter)
>>>
>>> # Show grid details and first 5 longitude values.
>>> print(' '.join(str(_) for _ in (field.lbcode, field.lbnpt, field.bzx,
...                                 field.bdx)))
1 96 -3.749999 3.749999
>>> print(field.bzx + field.bdx * np.arange(1, 6))
[ 0.    3.75  7.5  11.25 15.  ]
>>>
>>> # Show Iris equivalent information.
... cube = iris.load_cube(fname)
>>> print(cube.coord('longitude').points[:5])
[ 0.    3.75  7.5  11.25 15.  ]

Note

Note that in Iris (as in CF) there is no special distinction between “regular” and “irregular” coordinates. Thus on saving, X and Y extra data sections are written only if the actual values are unevenly spaced.

Phenomenon Identification#

UM Field elements: LBFC, LBUSER4 (aka “stashcode”), LBUSER7 (aka “model code”)
Cube components: cube.standard_name, cube.units, cube.attributes['STASH']

Details

This information is normally encoded in the cube standard_name property. Iris identifies the stash section and item codes from LBUSER4 and the model code in LBUSER7, and compares these against a list of phenomenon types with known CF translations. If the stashcode is recognised, it then defines the appropriate standard_name and units properties of the cube (i.e. iris.cube.Cube.standard_name and iris.cube.Cube.units).

Where any parts of the stash information are outside the valid range, Iris will instead attempt to interpret LBFC, for which a set of known translations is also stored. This is often the case for fieldsfiles, where LBUSER4 is frequently left as 0.

In all cases, Iris also constructs a STASH item to identify the phenomenon, which is stored as a cube attribute named STASH. This preserves the original STASH coding (as standard name translation is not always one-to-one), and can be used when no standard_name translation is identified (for example, to load only certain stashcodes with a constraint – see example at Load constraint examples).

For example:

>>> # Show PPfield phenomenon details.
>>> print(field.lbuser[3])
16203
>>> print(field.lbuser[6])
1
>>>
>>>
>>> # Show Iris equivalents.
>>> print(cube.standard_name)
air_temperature
>>> print(cube.units)
K
>>> print(cube.attributes['STASH'])
m01s16i203

Note

On saving data, no attempt is made to translate a cube standard_name into a STASH code, but any attached ‘STASH’ attribute will be stored into the LBUSER4 and LBUSER7 elements.

Vertical Coordinates#

UM Field elements

LBVC, LBLEV, BRSVD1 (aka “bulev”), BRSVD2 (aka “bhulev”), BLEV, BRLEV, BHLEV, BHRLEV

Cube components

for height levels : coordinate height

for pressure levels : coordinate pressure

for hybrid height levels :

coordinates model_level_number, sigma, level_height, altitude
cube.aux_factories()[0].orography

for hybrid pressure levels :

coordinates model_level_number, sigma, level_pressure, air_pressure
cube.aux_factories()[0].surface_air_pressure

Details

Several vertical coordinate forms are supported, according to different values of LBVC. The commonest ones are:

lbvc=1 : height levels
lbvc=8 : pressure levels
lbvc=65 : hybrid height

In all these cases, vertical coordinates are created, with points and bounds values taken from the appropriate header elements. In the raw cubes, each vertical coordinate is just a single value, but multiple values will usually occur. The subsequent merge operation will then convert these into multiple-valued coordinates, and create a new vertical data dimension (i.e. a “Z” axis) which they map onto.

For height levels (LBVC=1):

A height coordinate is created. This has units ‘m’, points from BLEV, and no bounds. When there are multiple vertical levels, this will become a dimension coordinate mapping to the vertical dimension.

For pressure levels (LBVC=8):

A pressure coordinate is created. This has units ‘hPa’, points from BLEV, and no bounds. When there are multiple vertical levels, this will become a dimension coordinate mapping a vertical dimension.

For hybrid height levels (LBVC=65):

Three basic vertical coordinates are created:

model_level is dimensionless, with points from LBLEV and no bounds.
sigma is dimensionless, with points from BHLEV and bounds from BHRLEV and BHULEV.
level_height has units of ‘m’, points from BLEV and bounds from BRLEV and BULEV.

Also in this case, a HybridHeightFactory is created, which references the ‘level_height’ and ‘sigma’ coordinates. Following raw cube merging, an extra load stage occurs where the attached HybridHeightFactory is called to manufacture a new altitude coordinate:

The altitude coordinate is 3D, mapping to the 2 horizontal dimensions and the new vertical dimension.
Its units are ‘m’.
Its points are calculated from those of the ‘level_height’ and ‘sigma’ coordinates, and an orography field. If ‘sigma’ and ‘level_height’ possess bounds, then bounds are also created for ‘altitude’.

To make the altitude coordinate, there must be an orography field present in the load sources. This is a surface altitude reference field, identified (by stashcode) during the main loading operation, and recorded for later use in the hybrid height calculation. If it is absent, a warning message is printed, and no altitude coordinate is produced.

Note that on merging hybrid height data into a cube, only the ‘model_level’ coordinate becomes a dimension coordinate: The other vertical coordinates remain as auxiliary coordinates, because they may be (variously) multidimensional or non-monotonic.

See an example printout of a hybrid height cube, here. Notice that this contains all of the above coordinates – model_level_number, sigma, level_height and the derived altitude.

Note

Hybrid pressure levels can also be handled (for LBVC=9). Without going into details, the mechanism is very similar to that for hybrid height: it produces basic coordinates ‘model_level_number’, ‘sigma’ and ‘level_pressure’, and a manufactured 3D ‘air_pressure’ coordinate.

Time Information#

UM Field elements

“T1” (i.e. LBYR, LBMON, LBDAT, LBHR, LBMIN, LBDAY/LBSEC),
“T2” (i.e. LBYRD, LBMOND, LBDATD, LBHRD, LBMIND, LBDAYD/LBSECD),
LBTIM, LBFT

Cube components: coordinates time, forecast_reference_time, forecast_period

Details

In Iris (as in CF) times and time intervals are both expressed as simple numbers, following the approach of the UDUNITS project. These values are stored as cube coordinates, where the scaling and calendar information is contained in the units property.

The units of a time interval (e.g. ‘forecast_period’), can be ‘seconds’ or a simple derived unit such as ‘hours’ or ‘days’ – but it does not contain a calendar, so ‘months’ or ‘years’ are not valid.
The units of calendar-based times (including ‘time’ and ‘forecast_reference_time’), are of the general form “<time-unit> since <base-date>”, interpreted according to the unit’s calendar property. The base date for this is always 1st Jan 1970 (times before this are represented as negative values).

The units.calendar property of time coordinates is set from the lowest decimal digit of LBTIM, known as LBTIM.IC. Note that the non-standard calendars (e.g. 360-day ‘model’ calendar) are defined in CF, not udunits.

There are a number of different time encoding methods used in UM data, but the important distinctions are controlled by the next-to-lowest decimal digit of LBTIM, known as “LBTIM.IB”. The most common cases are as follows:

Data at a single measurement timepoint (LBTIM.IB=0):: A single time coordinate is created, with points taken from T1 values. It has no bounds, units of ‘hours since 1970-01-01 00:00:00’ and a calendar defined according to LBTIM.IC.
Values forecast from T2, valid at T1 (LBTIM.IB=1):: Coordinates time and forecast_reference_time are created from the T1 and T2 values, respectively. These have no bounds, and units of ‘hours since 1970-01-01 00:00:00’, with the appropriate calendar. A forecast_period coordinate is also created, with values T1-T2, no bounds and units of ‘hours’.
Time mean values between T1 and T2 (LBTIM.IB=2):: The time coordinates time, forecast_reference_times and forecast_reference_time, are all present, as in the previous case. In this case, however, the ‘time’ and ‘forecast_period’ coordinates also have associated bounds: The ‘time’ bounds are from T1 to T2, and the ‘forecast_period’ bounds are from “LBFT - (T2-T1)” to “LBFT”.

Note that, in those more complex cases where the input defines all three of the ‘time’, ‘forecast_reference_time’ and ‘forecast_period’ values, any or all of these may become dimensions of the resulting data cube. This will depend on the values actually present in the source fields for each of the elements.

See an example printout of a forecast data cube, here. Notice that this example contains all of the above coordinates – time, forecast_period and forecast_reference_time. In this case the data are forecasts, so time is a dimension, forecast_period` varies with time and forecast_reference_time is a constant.

Statistical Measures#

UM Field elements: LBPROC, LBTIM
Cube components: cube.cell_methods

Details

Where a field contains statistically processed data, Iris will add an appropriate iris.coords.CellMethod to the cube, representing the aggregation operation which was performed.

This is implemented for certain binary flag bits within the LBPROC element value. For example:

time mean, when (LBPROC & 128):
Cube has a cell_method of the form “CellMethod(‘mean’, ‘time’).
time period minimum value, when (LBPROC & 4096):
Cube has a cell_method of the form “CellMethod(‘minimum’, ‘time’).
time period maximum value, when (LBPROC & 8192):
Cube has a cell_method of the form “CellMethod(‘maximum’, ‘time’).

In all these cases, if the field LBTIM is also set to denote a time aggregate field (i.e. “LBTIM.IB=2”, see above Time Information), then the second-to-last digit of LBTIM, aka “LBTIM.IA” may also be non-zero, in which case this indicates the aggregation time-interval. In that case, the cell-method intervals attribute is also set to this many hours.

For example:

>>> # Show stats metadata in a test PP field.
... fname = iris.sample_data_path('pre-industrial.pp')
>>> eg_field = next(iris.fileformats.pp.load(fname))
>>> print(eg_field.lbtim)
622
>>> print(eg_field.lbproc)
128
>>>
>>> # Print out the Iris equivalent information.
>>> print(iris.load_cube(fname).cell_methods)
(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),)

Other Metadata#

LBRSVD4#

If non-zero, this is interpreted as an ensemble number. This produces a cube scalar coordinate named ‘realization’ (as defined in the CF conventions).

LBUSER5#

If non-zero, this is interpreted as a ‘pseudo_level’ number. This produces a cube scalar coordinate named ‘pseudo_level’. In the UM documentation LBUSER5 is also sometimes referred to as LBPLEV.