Generating an xcube dataset

In the following example a tiny demo xcube dataset is generated.

Analysed Sea Surface Temperature over the Global Ocean

Input data for this example is located in the xcube repository. The input files contain analysed sea surface temperature and sea surface temperature anomaly over the global ocean and are provided by Copernicus Marine Environment Monitoring Service. The data is described in a dedicated Product User Manual.

Before starting the example, you need to activate the xcube environment:

$ conda activate xcube

If you want to take a look at the input data you can use xcube dump to print out the metadata of a selected input file:

$ xcube dump examples/gen/data/20170605120000-UKMO-L4_GHRSST-SSTfnd-OSTIAanom-GLOB-v02.0-fv02.0.nc
<xarray.Dataset>
Dimensions:       (lat: 720, lon: 1440, time: 1)
Coordinates:
  * lat           (lat) float32 -89.875 -89.625 -89.375 ... 89.375 89.625 89.875
  * lon           (lon) float32 0.125 0.375 0.625 ... 359.375 359.625 359.875
  * time          (time) object 2017-06-05 12:00:00
Data variables:
    sst_anomaly   (time, lat, lon) float32 ...
    analysed_sst  (time, lat, lon) float32 ...
Attributes:
    Conventions:                CF-1.4
    title:                      Global SST & Sea Ice Anomaly, L4 OSTIA, 0.25 ...
    summary:                    A merged, multi-sensor L4 Foundation SST anom...
    references:                 Donlon, C.J., Martin, M., Stark, J.D., Robert...
    institution:                UKMO
    history:                    Created from sst:temperature regridded with a...
    comment:                    WARNING Some applications are unable to prope...
    license:                    These data are available free of charge under...
    id:                         UKMO-L4LRfnd_GLOB-OSTIAanom
    naming_authority:           org.ghrsst
    product_version:            2.0
    uuid:                       5c1665b7-06e8-499d-a281-857dcbfd07e2
    gds_version_id:             2.0
    netcdf_version_id:          3.6
    date_created:               20170606T061737Z
    start_time:                 20170605T000000Z
    time_coverage_start:        20170605T000000Z
    stop_time:                  20170606T000000Z
    time_coverage_end:          20170606T000000Z
    file_quality_level:         3
    source:                     UKMO-L4HRfnd-GLOB-OSTIA
    platform:                   Aqua, Envisat, NOAA-18, NOAA-19, MetOpA, MSG1...
    sensor:                     AATSR, AMSR, AVHRR, AVHRR_GAC, SEVIRI, TMI
    metadata_conventions:       Unidata Observation Dataset v1.0
    metadata_link:              http://data.nodc.noaa.gov/NESDIS_DataCenters/...
    keywords:                   Oceans > Ocean Temperature > Sea Surface Temp...
    keywords_vocabulary:        NASA Global Change Master Directory (GCMD) Sc...
    standard_name_vocabulary:   NetCDF Climate and Forecast (CF) Metadata Con...
    westernmost_longitude:      0.0
    easternmost_longitude:      360.0
    southernmost_latitude:      -90.0
    northernmost_latitude:      90.0
    spatial_resolution:         0.25 degree
    geospatial_lat_units:       degrees_north
    geospatial_lat_resolution:  0.25 degree
    geospatial_lon_units:       degrees_east
    geospatial_lon_resolution:  0.25 degree
    acknowledgment:             Please acknowledge the use of these data with...
    creator_name:               Met Office as part of CMEMS
    creator_email:              servicedesk.cmems@mercator-ocean.eu
    creator_url:                http://marine.copernicus.eu/
    project:                    Group for High Resolution Sea Surface Tempera...
    publisher_name:             GHRSST Project Office
    publisher_url:              http://www.ghrsst.org
    publisher_email:            ghrsst-po@nceo.ac.uk
    processing_level:           L4
    cdm_data_type:              grid

Below an example xcube dataset will be created, which will contain the variable analysed_sst. The metadata for a specific variable can be viewed by:

$ xcube dump examples/gen/data/20170605120000-UKMO-L4_GHRSST-SSTfnd-OSTIAanom-GLOB-v02.0-fv02.0.nc --var analysed_sst
<xarray.DataArray 'analysed_sst' (time: 1, lat: 720, lon: 1440)>
[1036800 values with dtype=float32]
Coordinates:
  * lat      (lat) float32 -89.875 -89.625 -89.375 ... 89.375 89.625 89.875
  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.375 359.625 359.875
  * time     (time) object 2017-06-05 12:00:00
Attributes:
    long_name:      analysed sea surface temperature
    standard_name:  sea_surface_foundation_temperature
    type:           foundation
    units:          kelvin
    valid_min:      -300
    valid_max:      4500
    source:         UKMO-L4HRfnd-GLOB-OSTIA
    comment:

For creating a toy xcube dataset you can execute the command-line below. Please adjust the paths to your needs:

$ xcube gen -o "your/output/path/demo_SST_xcube.zarr" -c examples/gen/config_files/xcube_sst_demo_config.yml --sort examples/gen/data/*.nc

The configuration file specifies the input processor, which in this case is the default one. The output size is 10240, 5632. The bounding box of the data cube is given by output_region in the configuration file. The output format (output_writer_name) is defined as well. The chunking of the dimensions can be set by the chunksizes attribute of the output_writer_params parameter, and in the example configuration file the chunking is set for latitude and longitude. If the chunking is not set, a automatic chunking is applied. The spatial resampling method (output_resampling) is set to ‘nearest’ and the configuration file contains only one variable which will be included into the xcube dataset - ‘analysed-sst’.

The Analysed Sea Surface Temperature data set contains the variable already as needed. This means no pixel masking needs to be applied. However, this might differ depending on the input data. You can take a look at a configuration file which takes Sentinel-3 Ocean and Land Colour Instrument (OLCI) as input files, which is a bit more complex. The advantage of using pixel expressions is, that the generated cube contains only valid pixels and the user of the data cube does not have to worry about something like land-masking or invalid values. Furthermore, the generated data cube is spatially regular. This means the data are aligned on a common spatial grid and cover the same region. The time stamps are kept from the input data set.

Caution: If you have input data that has file names not only varying with the time stamp but with e.g. A and B as well, you need to pass the input files in the desired order via a text file. Each line of the text file should contain the path to one input file. If you pass the input files in a desired order, then do not use the parameter --sort within the commandline interface.

Optimizing and pruning a xcube dataset

If you want to optimize your generated xcube dataset e.g. for publishing it in a xcube viewer via xcube serve you can use xcube optimize:

$ xcube optimize demo_SST_xcube.zarr -C

By executing the command above, an optimized xcube dataset called demo_SST_xcube-optimized.zarr will be created. You can take a look into the directory of the original xcube dataset and the optimized one, and you will notice that a file called .zmetadata. .zmetadata contains the information stored in .zattrs and .zarray of each variable of the xcube dataset and makes requests of metadata faster. The option -C optimizes coordinate variables by converting any chunked arrays into single, non-chunked, contiguous arrays.

For deleting empty chunks xcube prune can be used. It deletes all data files associated with empty (NaN-only) chunks of an xcube dataset, and is restricted to the ZARR format.

$ xcube prune demo_SST_xcube-optimized.zarr

The pruned xcube dataset is saved in place and does not need an output path. The size of the xcube dataset was 6,8 MB before pruning it and 6,5 MB afterwards. According to the output printed to the terminal, 30 block files were deleted.

The metadata of the xcube dataset can be viewed with xcube dump as well:

$ xcube dump demo_SST_xcube-optimized.zarr
<xarray.Dataset>
Dimensions:       (bnds: 2, lat: 5632, lon: 10240, time: 3)
Coordinates:
  * lat           (lat) float64 62.67 62.66 62.66 62.66 ... 48.01 48.0 48.0
    lat_bnds      (lat, bnds) float64 dask.array<shape=(5632, 2), chunksize=(5632, 2)>
  * lon           (lon) float64 -16.0 -16.0 -15.99 -15.99 ... 10.66 10.66 10.67
    lon_bnds      (lon, bnds) float64 dask.array<shape=(10240, 2), chunksize=(10240, 2)>
  * time          (time) datetime64[ns] 2017-06-05T12:00:00 ... 2017-06-07T12:00:00
    time_bnds     (time, bnds) datetime64[ns] dask.array<shape=(3, 2), chunksize=(3, 2)>
Dimensions without coordinates: bnds
Data variables:
    analysed_sst  (time, lat, lon) float64 dask.array<shape=(3, 5632, 10240), chunksize=(1, 704, 640)>
Attributes:
    acknowledgment:             Data Cube produced based on data provided by ...
    comment:
    contributor_name:
    contributor_role:
    creator_email:              info@brockmann-consult.de
    creator_name:               Brockmann Consult GmbH
    creator_url:                https://www.brockmann-consult.de
    date_modified:              2019-09-25T08:50:32.169031
    geospatial_lat_max:         62.666666666666664
    geospatial_lat_min:         48.0
    geospatial_lat_resolution:  0.002604166666666666
    geospatial_lat_units:       degrees_north
    geospatial_lon_max:         10.666666666666664
    geospatial_lon_min:         -16.0
    geospatial_lon_resolution:  0.0026041666666666665
    geospatial_lon_units:       degrees_east
    history:                    xcube/reproj-snap-nc
    id:                         demo-bc-sst-sns-l2c-v1
    institution:                Brockmann Consult GmbH
    keywords:
    license:                    terms and conditions of the DCS4COP data dist...
    naming_authority:           bc
    processing_level:           L2C
    project:                    xcube
    publisher_email:            info@brockmann-consult.de
    publisher_name:             Brockmann Consult GmbH
    publisher_url:              https://www.brockmann-consult.de
    references:                 https://dcs4cop.eu/
    source:                     CMEMS Global SST & Sea Ice Anomaly Data Cube
    standard_name_vocabulary:
    summary:
    time_coverage_end:          2017-06-08T00:00:00.000000000
    time_coverage_start:        2017-06-05T00:00:00.000000000
    title:                      CMEMS Global SST Anomaly Data Cube

The metadata for the variable analysed_sst can be viewed:

$ xcube dump demo_SST_xcube-optimized.zarr --var analysed_sst
<xarray.DataArray 'analysed_sst' (time: 3, lat: 5632, lon: 10240)>
dask.array<shape=(3, 5632, 10240), dtype=float64, chunksize=(1, 704, 640)>
Coordinates:
  * lat      (lat) float64 62.67 62.66 62.66 62.66 ... 48.01 48.01 48.0 48.0
  * lon      (lon) float64 -16.0 -16.0 -15.99 -15.99 ... 10.66 10.66 10.66 10.67
  * time     (time) datetime64[ns] 2017-06-05T12:00:00 ... 2017-06-07T12:00:00
Attributes:
    comment:
    long_name:           analysed sea surface temperature
    source:              UKMO-L4HRfnd-GLOB-OSTIA
    spatial_resampling:  Nearest
    standard_name:       sea_surface_foundation_temperature
    type:                foundation
    units:               kelvin
    valid_max:           4500
    valid_min:           -300