Introduction¶
ODB-2 is a compact data format for the storage, transmission and archival of tabular meteorological observation data. ODB-2 data streams are comprised of independent, self-describing messages. Each of these messages contains a number of rows of data sharing the same columnar format.
odc provides C, C++ and Fortran libraries for encoding and decoding ODB-2 data. It also provides an interface for the data and metadata without decoding it, and a collection of command line tools for handling and manipulating ODB-2 data.
Observation Data¶
ODB-2 supports encoding of observation data from a range of different scientific instruments and sources. The data is in a tabular format, with the types and data sizes of each column of data normally defined by an appropriate external schema. Each cell in the table may contain a value, or be marked as missing. Each row in the table corresponds to an observation, and is treated independently of other rows.
A stream of ODB-2 data consists of a sequence of these tables, which may be unrelated to each other in content or structure. These tables may be grouped according to the needs of the data producer. For archival purposes, a subset of the columns will be used for indexing the data – and in these cases the tables should be grouped such that the data in the index columns is constant within a table.
expver |
date@hdr |
statid@hdr |
wigos@hdr |
obsvalue@body |
---|---|---|---|---|
0001 |
20210420 |
stat00 |
0-12345-0-67890 |
0.0000 |
0001 |
20210420 |
stat01 |
0-12345-0-67891 |
12.3456 |
0001 |
20210420 |
stat02 |
0-12345-0-67892 |
24.6912 |
0001 |
20210420 |
stat03 |
0-12345-0-67893 |
37.0368 |
0001 |
20210420 |
stat04 |
0-12345-0-67894 |
49.3824 |
0001 |
20210420 |
stat05 |
0-12345-0-67895 |
61.7280 |
0001 |
20210420 |
stat06 |
0-12345-0-67896 |
74.0736 |
0001 |
20210420 |
stat07 |
0-12345-0-67897 |
86.4192 |
0001 |
20210420 |
stat08 |
0-12345-0-67898 |
98.7648 |
0001 |
20210420 |
stat09 |
0-12345-0-67899 |
111.1104 |
This structure matches pandas data frames extremely well. Please see the pyodc package for a Python library handling ODB-2 data.
Note
For information on ODB-2 format governance and schemas, please visit https://apps.ecmwf.int/odbgov.
Data Types¶
Each ODB-2 column stores a specific type of information, which is defined as the data type.
Type Name |
Numeric Value |
Corresponding API Type |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- 1(1,2)
64-bit integral types are used in the API. Please note the section on Integer Handling.
- 2
The
REAL
data type truncates a 64-bit floating point value to a 32-bit floating point value prior to encoding, which results in smaller encoded data at a cost of a loss of precision. Encoding is lossless by default, so the use of theREAL
type must be explicit.- 3(1,2,3)
The data types stored in the encoded data are not identical to the types used in the API. In particular,
BITFIELD
data is represented in integral form, and bothREAL
andDOUBLE
data is presented as 64-bit doubles in the API even though they have different encoded precision.
Note
The REAL
/DOUBLE
naming is a legacy of the Fortran history of this data format.
Bitfields¶
The BITFIELD
type exists to provide a packed data type for flags.
Within a (32bit) integer, bits can be identified and named by their offset. Groups of bits can be named and identified as well as individual bits, anh hence each elements has an offset and a size.
Data Format¶
For encoding as ODB-2 data, first large tables will be split into a sequence of smaller chunks. The columns within these chunks are then sorted and compressed to form Frames which comprise the ODB-2 messages. These frames are self-contained and self-describing.
The frames can be concatenated in any order to form a valid stream of ODB-2 data, even if the encoded tables do not have the same structure, and are therefore incompatible. This capability suits the needs of data archival, as large amount of data can be packed, indexed externally, and since the data is self-describing, it can be validated against the index.
The data stream ODB-2 data need not be stored in files – it is used equally as an in-memory format, and for transmitting collections of observation data over the network. As such, ODB-2 is considered to be a message format rather than a file format.