Iris Handling of PP and Fieldsfiles#
This document provides a basic account of how PP and Fieldsfiles data is represented within Iris. It describes how Iris represents data from the Met Office Unified Model (UM), in terms of the metadata elements found in PP and Fieldsfile formats.
For simplicity, we shall describe this mostly in terms of loading of PP data into Iris (i.e. into cubes). However most of the details are identical for Fieldsfiles, and are relevant to saving in these formats as well as loading.
Notes:
Iris treats Fieldsfile data almost exactly as if it were PP – i.e. it treats each field’s lookup table entry like a PP header.
The Iris data model is based on NetCDF CF conventions, so most of this can also be seen as a metadata translation between PP and CF terms, but it is easier to discuss in terms of Iris elements.
For details of Iris terms (cubes, coordinates, attributes), refer to Iris data structures.
For details of CF conventions, see https://cfconventions.org/.
Overview of Loading Process#
The basics of Iris loading are explained at Loading Iris Cubes. Loading as it specifically applies to PP and Fieldsfile data can be summarised as follows:
Input fields are first loaded from the given sources, using
iris.fileformats.pp.load()
. This returns an iterator, which provides a ‘stream’ ofPPField
input field objects. Each PPfield object represents a single source field:PP header elements are provided as named object attributes (e.g.
lbproc
).Some extra, calculated “convenience” properties are also provided (e.g.
t1
andt2
time values).There is a
iris.fileformats.pp.PPField.data
attribute, but the field data is not actually loaded unless/until this is accessed, for greater speed and space efficiency.
Each input field is translated into a two-dimensional Iris cube (with dimensions of latitude and longitude). These are the ‘raw’ cubes, as returned by
iris.load_raw()
. Within these:There are 2 horizontal dimension coordinates containing the latitude and longitude values for the field.
Certain other header elements are interpreted as ‘coordinate’-type values applying to the input fields, and stored as auxiliary ‘scalar’ (i.e. 1-D) coordinates. These include all header elements defining vertical and time coordinate values, and also more specialised factors such as ensemble number and pseudo-level.
Other metadata is encoded on the cube in a variety of other forms, such as the cube ‘name’ and ‘units’ properties, attribute values and cell methods.
Lastly, Iris attempts to merge the raw cubes into higher-dimensional ones (using
merge()
): This combines raw cubes with different values of a scalar coordinate to produce a higher-dimensional cube with the values contained in a new vector coordinate. Where possible, the new vector coordinate is also a dimension coordinate, describing the new dimension. Apart from the original 2 horizontal dimensions, all cube dimensions and dimension coordinates arise in this way – for example, ‘time’, ‘height’, ‘forecast_period’, ‘realization’.
Note
This document covers the essential features of the UM data loading process. The complete details are implemented as follows:
The conversion of fields to raw cubes is performed by the function
iris.fileformats.pp_rules.convert()
, which is called fromiris.fileformats.pp.load_cubes()
during loading.The corresponding save functionality for PP output is implemented by the
iris.fileformats.pp.save()
function. The relevant ‘save rules’ are defined in a text file (“lib/iris/etc/pp_save_rules.txt”), in a form defined by theiris.fileformats.rules
module.
The rest of this document describes various independent sections of related metadata items.
Horizontal Grid#
- UM Field elements
LBCODE, BPLAT, BPLON, BZX, BZY, BDX, BDY, X, Y, X_LOWER_BOUNDS, Y_LOWER_BOUNDS
- Cube components
(unrotated) : coordinates
longitude
,latitude
(rotated pole) : coordinates
grid_latitude
,grid_longitude
Details
At present, only latitude-longitude projections are supported (both normal and rotated). In these cases, LBCODE is typically 1 or 101 (though, in fact, cross-sections with latitude and longitude axes are also supported).
For an ordinary latitude-longitude grid, the cubes have coordinates called ‘longitude’ and ‘latitude’:
These are mapped to the appropriate data dimensions.
They have units of ‘degrees’.
They have a coordinate system of type
iris.coord_systems.GeogCS
.The coordinate points are normally set to the regular sequence
ZDX/Y + BDX/Y * (1 .. LBNPT/LBROW)
(except, if BDX/BDY is zero, the values are taken from the extra data vector X/Y, if present).If X/Y_LOWER_BOUNDS extra data is available, this appears as bounds values of the horizontal coordinates.
For rotated latitude-longitude coordinates (as for LBCODE=101), the horizontal coordinates differ only slightly –
The names are ‘grid_latitude’ and ‘grid_longitude’.
The coord_system is a
iris.coord_systems.RotatedGeogCS
, created with a pole defined by BPLAT, BPLON.
- For example:
>>> # Load a PP field. ... fname = iris.sample_data_path('air_temp.pp') >>> fields_iter = iris.fileformats.pp.load(fname) >>> field = next(fields_iter) >>> >>> # Show grid details and first 5 longitude values. >>> print(' '.join(str(_) for _ in (field.lbcode, field.lbnpt, field.bzx, ... field.bdx))) 1 96 -3.749999 3.749999 >>> print(field.bzx + field.bdx * np.arange(1, 6)) [ 0. 3.75 7.5 11.25 15. ] >>> >>> # Show Iris equivalent information. ... cube = iris.load_cube(fname) >>> print(cube.coord('longitude').points[:5]) [ 0. 3.75 7.5 11.25 15. ]
Note
Note that in Iris (as in CF) there is no special distinction between “regular” and “irregular” coordinates. Thus on saving, X and Y extra data sections are written only if the actual values are unevenly spaced.
Phenomenon Identification#
- UM Field elements
LBFC, LBUSER4 (aka “stashcode”), LBUSER7 (aka “model code”)
- Cube components
cube.standard_name
,cube.units
,cube.attributes['STASH']
Details
This information is normally encoded in the cube standard_name
property.
Iris identifies the stash section and item codes from LBUSER4 and the model
code in LBUSER7, and compares these against a list of phenomenon types with
known CF translations. If the stashcode is recognised, it then defines the
appropriate standard_name
and units
properties of the cube
(i.e. iris.cube.Cube.standard_name
and iris.cube.Cube.units
).
Where any parts of the stash information are outside the valid range, Iris will instead attempt to interpret LBFC, for which a set of known translations is also stored. This is often the case for fieldsfiles, where LBUSER4 is frequently left as 0.
In all cases, Iris also constructs a STASH
item
to identify the phenomenon, which is stored as a cube attribute named
STASH
.
This preserves the original STASH coding (as standard name translation is not
always one-to-one), and can be used when no standard_name translation is
identified (for example, to load only certain stashcodes with a constraint
– see example at Load constraint examples).
- For example:
>>> # Show PPfield phenomenon details. >>> print(field.lbuser[3]) 16203 >>> print(field.lbuser[6]) 1 >>> >>> >>> # Show Iris equivalents. >>> print(cube.standard_name) air_temperature >>> print(cube.units) K >>> print(cube.attributes['STASH']) m01s16i203
Note
On saving data, no attempt is made to translate a cube standard_name into a STASH code, but any attached ‘STASH’ attribute will be stored into the LBUSER4 and LBUSER7 elements.
Vertical Coordinates#
- UM Field elements
LBVC, LBLEV, BRSVD1 (aka “bulev”), BRSVD2 (aka “bhulev”), BLEV, BRLEV, BHLEV, BHRLEV
- Cube components
for height levels : coordinate
height
for pressure levels : coordinate
pressure
for hybrid height levels :
coordinates
model_level_number
,sigma
,level_height
,altitude
cube.aux_factories()[0].orography
for hybrid pressure levels :
coordinates
model_level_number
,sigma
,level_pressure
,air_pressure
cube.aux_factories()[0].surface_air_pressure
Details
Several vertical coordinate forms are supported, according to different values of LBVC. The commonest ones are:
lbvc=1 : height levels
lbvc=8 : pressure levels
lbvc=65 : hybrid height
In all these cases, vertical coordinates are created, with points and bounds values taken from the appropriate header elements. In the raw cubes, each vertical coordinate is just a single value, but multiple values will usually occur. The subsequent merge operation will then convert these into multiple-valued coordinates, and create a new vertical data dimension (i.e. a “Z” axis) which they map onto.
- For height levels (LBVC=1):
A
height
coordinate is created. This has units ‘m’, points from BLEV, and no bounds. When there are multiple vertical levels, this will become a dimension coordinate mapping to the vertical dimension.- For pressure levels (LBVC=8):
A
pressure
coordinate is created. This has units ‘hPa’, points from BLEV, and no bounds. When there are multiple vertical levels, this will become a dimension coordinate mapping a vertical dimension.- For hybrid height levels (LBVC=65):
Three basic vertical coordinates are created:
model_level
is dimensionless, with points from LBLEV and no bounds.sigma
is dimensionless, with points from BHLEV and bounds from BHRLEV and BHULEV.level_height
has units of ‘m’, points from BLEV and bounds from BRLEV and BULEV.
Also in this case, a
HybridHeightFactory
is created, which references the ‘level_height’ and ‘sigma’ coordinates. Following raw cube merging, an extra load stage occurs where the attachedHybridHeightFactory
is called to manufacture a newaltitude
coordinate:The altitude coordinate is 3D, mapping to the 2 horizontal dimensions and the new vertical dimension.
Its units are ‘m’.
Its points are calculated from those of the ‘level_height’ and ‘sigma’ coordinates, and an orography field. If ‘sigma’ and ‘level_height’ possess bounds, then bounds are also created for ‘altitude’.
To make the altitude coordinate, there must be an orography field present in the load sources. This is a surface altitude reference field, identified (by stashcode) during the main loading operation, and recorded for later use in the hybrid height calculation. If it is absent, a warning message is printed, and no altitude coordinate is produced.
Note that on merging hybrid height data into a cube, only the ‘model_level’ coordinate becomes a dimension coordinate: The other vertical coordinates remain as auxiliary coordinates, because they may be (variously) multidimensional or non-monotonic.
See an example printout of a hybrid height cube,
here. Notice that this contains all of the
above coordinates – model_level_number
, sigma
, level_height
and
the derived altitude
.
Note
Hybrid pressure levels can also be handled (for LBVC=9). Without going into details, the mechanism is very similar to that for hybrid height: it produces basic coordinates ‘model_level_number’, ‘sigma’ and ‘level_pressure’, and a manufactured 3D ‘air_pressure’ coordinate.
Surface Fields
In order for surface fields to be recognised when saving, you must include
label_surface_fields=True to iris.fileformats.pp.save()
or
iris.save()
. When surface fields are encountered with this flag set to True,
LBLEV will be set to 9999 and LBVC to 129.
Time Information#
UM Field elements
“T1” (i.e. LBYR, LBMON, LBDAT, LBHR, LBMIN, LBDAY/LBSEC),
“T2” (i.e. LBYRD, LBMOND, LBDATD, LBHRD, LBMIND, LBDAYD/LBSECD),
LBTIM, LBFT
- Cube components
coordinates
time
,forecast_reference_time
,forecast_period
Details
In Iris (as in CF) times and time intervals are both expressed as simple
numbers, following the approach of the
UDUNITS project.
These values are stored as cube coordinates, where the scaling and calendar
information is contained in the units
property.
The units of a time interval (e.g. ‘forecast_period’), can be ‘seconds’ or a simple derived unit such as ‘hours’ or ‘days’ – but it does not contain a calendar, so ‘months’ or ‘years’ are not valid.
The units of calendar-based times (including ‘time’ and ‘forecast_reference_time’), are of the general form “<time-unit> since <base-date>”, interpreted according to the unit’s
calendar
property. The base date for this is always 1st Jan 1970 (times before this are represented as negative values).
The units.calendar property of time coordinates is set from the lowest decimal digit of LBTIM, known as LBTIM.IC. Note that the non-standard calendars (e.g. 360-day ‘model’ calendar) are defined in CF, not udunits.
There are a number of different time encoding methods used in UM data, but the important distinctions are controlled by the next-to-lowest decimal digit of LBTIM, known as “LBTIM.IB”. The most common cases are as follows:
- Data at a single measurement timepoint (LBTIM.IB=0):
A single
time
coordinate is created, with points taken from T1 values. It has no bounds, units of ‘hours since 1970-01-01 00:00:00’ and a calendar defined according to LBTIM.IC.- Values forecast from T2, valid at T1 (LBTIM.IB=1):
Coordinates
time
andforecast_reference_time
are created from the T1 and T2 values, respectively. These have no bounds, and units of ‘hours since 1970-01-01 00:00:00’, with the appropriate calendar. Aforecast_period
coordinate is also created, with values T1-T2, no bounds and units of ‘hours’.- Time mean values between T1 and T2 (LBTIM.IB=2):
The time coordinates
time
,forecast_reference_times
andforecast_reference_time
, are all present, as in the previous case. In this case, however, the ‘time’ and ‘forecast_period’ coordinates also have associated bounds: The ‘time’ bounds are from T1 to T2, and the ‘forecast_period’ bounds are from “LBFT - (T2-T1)” to “LBFT”.
Note that, in those more complex cases where the input defines all three of the ‘time’, ‘forecast_reference_time’ and ‘forecast_period’ values, any or all of these may become dimensions of the resulting data cube. This will depend on the values actually present in the source fields for each of the elements.
See an example printout of a forecast data cube,
here. Notice that this example
contains all of the above coordinates – time
, forecast_period
and
forecast_reference_time
. In this case the data are forecasts, so time
is a dimension, forecast_period`
varies with time and
forecast_reference_time
is a constant.
Statistical Measures#
- UM Field elements
LBPROC, LBTIM
- Cube components
cube.cell_methods
Details
Where a field contains statistically processed data, Iris will add an
appropriate iris.coords.CellMethod
to the cube, representing the
aggregation operation which was performed.
This is implemented for certain binary flag bits within the LBPROC element value. For example:
- time mean, when (LBPROC & 128):
Cube has a cell_method of the form “CellMethod(‘mean’, ‘time’).
- time period minimum value, when (LBPROC & 4096):
Cube has a cell_method of the form “CellMethod(‘minimum’, ‘time’).
- time period maximum value, when (LBPROC & 8192):
Cube has a cell_method of the form “CellMethod(‘maximum’, ‘time’).
In all these cases, if the field LBTIM is also set to denote a time aggregate
field (i.e. “LBTIM.IB=2”, see above Time Information), then the
second-to-last digit of LBTIM, aka “LBTIM.IA” may also be non-zero, in which
case this indicates the aggregation time-interval. In that case, the
cell-method intervals
attribute is also set to
this many hours.
- For example:
>>> # Show stats metadata in a test PP field. ... fname = iris.sample_data_path('pre-industrial.pp') >>> eg_field = next(iris.fileformats.pp.load(fname)) >>> print(eg_field.lbtim) 622 >>> print(eg_field.lbproc) 128 >>> >>> # Print out the Iris equivalent information. >>> print(iris.load_cube(fname).cell_methods) (CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),)
Other Metadata#
LBRSVD4#
If non-zero, this is interpreted as an ensemble number. This produces a cube scalar coordinate named ‘realization’ (as defined in the CF conventions).
LBUSER5#
If non-zero, this is interpreted as a ‘pseudo_level’ number. This produces a cube scalar coordinate named ‘pseudo_level’. In the UM documentation LBUSER5 is also sometimes referred to as LBPLEV.