Introduction

ODB-2 is a compact data format for the storage, transmission and archival of tabular meteorological observation data. ODB-2 data streams are comprised of independent, self-describing messages. Each of these messages contains a number of rows of data sharing the same columnar format.

odc provides C, C++ and Fortran libraries for encoding and decoding ODB-2 data. It also provides an interface for the data and metadata without decoding it, and a collection of command line tools for handling and manipulating ODB-2 data.

Observation Data

ODB-2 supports encoding of observation data from a range of different scientific instruments and sources. The data is in a tabular format, with the types and data sizes of each column of data normally defined by an appropriate external schema. Each cell in the table may contain a value, or be marked as missing. Each row in the table corresponds to an observation, and is treated independently of other rows.

A stream of ODB-2 data consists of a sequence of these tables, which may be unrelated to each other in content or structure. These tables may be grouped according to the needs of the data producer. For archival purposes, a subset of the columns will be used for indexing the data – and in these cases the tables should be grouped such that the data in the index columns is constant within a table.

An Example of Tabular Data

expver

date@hdr

statid@hdr

wigos@hdr

obsvalue@body

0001

20210420

stat00

0-12345-0-67890

0.0000

0001

20210420

stat01

0-12345-0-67891

12.3456

0001

20210420

stat02

0-12345-0-67892

24.6912

0001

20210420

stat03

0-12345-0-67893

37.0368

0001

20210420

stat04

0-12345-0-67894

49.3824

0001

20210420

stat05

0-12345-0-67895

61.7280

0001

20210420

stat06

0-12345-0-67896

74.0736

0001

20210420

stat07

0-12345-0-67897

86.4192

0001

20210420

stat08

0-12345-0-67898

98.7648

0001

20210420

stat09

0-12345-0-67899

111.1104

This structure matches pandas data frames extremely well. Please see the pyodc package for a Python library handling ODB-2 data.

Note

For information on ODB-2 format governance and schemas, please visit https://apps.ecmwf.int/odbgov.

Data Types

Each ODB-2 column stores a specific type of information, which is defined as the data type.

Type Name

Numeric Value

Corresponding API Type

IGNORE

0

NULL

INTEGER

1

long 1

REAL

2

double 2 3

STRING

3

string

BITFIELD

4

long 1 3

DOUBLE

5

double 3

1(1,2)

64-bit integral types are used in the API. Please note the section on Integer Handling.

2

The REAL data type truncates a 64-bit floating point value to a 32-bit floating point value prior to encoding, which results in smaller encoded data at a cost of a loss of precision. Encoding is lossless by default, so the use of the REAL type must be explicit.

3(1,2,3)

The data types stored in the encoded data are not identical to the types used in the API. In particular, BITFIELD data is represented in integral form, and both REAL and DOUBLE data is presented as 64-bit doubles in the API even though they have different encoded precision.

Note

The REAL/DOUBLE naming is a legacy of the Fortran history of this data format.

Bitfields

The BITFIELD type exists to provide a packed data type for flags.

Bitfield Flags Example

Bitfield Flags Example

Within a (32bit) integer, bits can be identified and named by their offset. Groups of bits can be named and identified as well as individual bits, anh hence each elements has an offset and a size.

Data Format

For encoding as ODB-2 data, first large tables will be split into a sequence of smaller chunks. The columns within these chunks are then sorted and compressed to form Frames which comprise the ODB-2 messages. These frames are self-contained and self-describing.

ODB-2 Data Structure

ODB-2 Data Structure

The frames can be concatenated in any order to form a valid stream of ODB-2 data, even if the encoded tables do not have the same structure, and are therefore incompatible. This capability suits the needs of data archival, as large amount of data can be packed, indexed externally, and since the data is self-describing, it can be validated against the index.

The data stream ODB-2 data need not be stored in files – it is used equally as an in-memory format, and for transmitting collections of observation data over the network. As such, ODB-2 is considered to be a message format rather than a file format.