xcube Multi-Resolution Datasets
Version 1.0 Draft, 2023-04-28
Definition
A xcube multi-resolution dataset refers to an N-D image pyramid where an image refers to a 2-D dataset with two spatial dimensions in some horizontal coordinate system.
A multi-resolution dataset comprises a fixed number of
levels, which are regular datasets covering the same spatial area at
different resolutions. Level zero represents the original resolution
res(L=0)
, higher level resolutions decrease by a factor of two:
res(L) = res(0) / 2^L
.
Implementation in xcube
In xcube, multi-resolution datasets are represented by the abstract class
xcube.core.mldataset.MultiLevelDataset
. The xcube data store framework
refers to this datatype using the alias mldataset
. The corresponding
default data format is the xcube Levels format, named levels
.
xcube also supports the Cloud Optimized GeoTIFF (COG) format
for reading multi-resolution datasets.
The xcube Levels Format
The xcube Levels format is basically a single top-level directory.
The filename extension of that directory should be .levels
by convention. The directory entries are Zarr datasets
that are representations of regular xarray datasets named after their zero-based level index,
{level}.zarr
;that comply with the xcube Dataset Convention.
The following is a multi-resolution dataset with three levels:
- test_pyramid.levels/
- 0.zarr/
- 1.zarr/
- 2.zarr/
An important use case is generating image pyramids from existing large datasets without the need to create a copy of level zero.
To support this, the level zero dataset may be a link to an existing
Zarr dataset. The filename is then 0.link
rather than 0.zarr
.
The link file contains the path to the actual Zarr dataset
to be used as level zero as a plain text string. It may be an absolute
path or a path relative to the top-level dataset.
- test_pyramid.levels/
- 0.link # --> link to actual level zero dataset
- 1.zarr/
- 2.zarr/
Starting with xcube 0.13.1, an additional, optional file .zlevels
has been made part of the levels format:
- test_pyramid.levels/
- .zlevels
- 0.zarr/
- 1.zarr/
- 2.zarr/
If present, it is a text file comprising a JSON object with the following properties:
| Name | Type | Description |
|——————–|———————-|—————————————————————|
| version
| "1.0"
| Levels format version. |
| num_levels
| integer | Number of levels in this dataset |
| use_saved_levels
| boolean | If a next level shall be computed from the predecessor level. |
| tile_size
| [integer, integer] | Tile size width and height in pixels. |
| agg_methods
| object | Mapping from variable name to aggregation method. |
Only version
and num_levels
are required.
The properties of the agg_methods
objects are the names of data variables.
The values are aggregation methods. Valid values are
| Value | Description |
|———-|————————————————————–|
| first
| Select the first pixel at (0,0) of a window of N x N pixels. |
| min
| Minimum value of a window of N x N pixels. |
| max
| Minimum value of a window of N x N pixels. |
| mean
| Mean value of a window of N x N pixels. |
| median
| Median value of a window of N x N pixels. |
The following is an example of the .zlevels
file for a dataset with the
data variables CHL
(chlorophyll) if type float32
and a variable
qflags
of type uint16
:
{
"version": "1.0",
"num_levels": 8,
"use_saved_levels": true,
"tile_size": [2048, 2048],
"agg_methods": {
"CHL": "median",
"qflags": "first"
}
}
xcube implementation note:
When writing datasets as multi-level datasets and the agg_methods
parameter is missing, or a data variable’s name is not contained in
given agg_methods
then first
is used for variables that have
an integer data type and median
for a floating point data type.
In xcube Server, when opening datasets and converting them into
multi-level datasets on-the-fly, agg_methods
is first
for all
data variables for best performance.
To be discussed
Allow links for all levels?
Do not write
0.link
file. Instead, provide in.zlevels
where to find each level.No longer use
.zarr
extension for levels. Just use the index as name.Make top-level directory a Zarr group (
.zgroup
), so the multi-level dataset can be opened as a group using thezarr
package.