This chapter is a work in progress and currently less than a draft.
Generating an xcube dataset¶
In the following example a tiny demo xcube dataset is generated.
Analysed Sea Surface Temperature over the Global Ocean¶
Input data for this example is located in the xcube repository. The input files contain analysed sea surface temperature and sea surface temperature anomaly over the global ocean and are provided by Copernicus Marine Environment Monitoring Service. The data is described in a dedicated Product User Manual.
Before starting the example, you need to activate the xcube environment:
$ conda activate xcube
If you want to take a look at the input data you can use cli/xcube dump to print out the metadata of a selected input file:
$ xcube dump examples/gen/data/20170605120000-UKMO-L4_GHRSST-SSTfnd-OSTIAanom-GLOB-v02.0-fv02.0.nc
<xarray.Dataset> Dimensions: (lat: 720, lon: 1440, time: 1) Coordinates: * lat (lat) float32 -89.875 -89.625 -89.375 ... 89.375 89.625 89.875 * lon (lon) float32 0.125 0.375 0.625 ... 359.375 359.625 359.875 * time (time) object 2017-06-05 12:00:00 Data variables: sst_anomaly (time, lat, lon) float32 ... analysed_sst (time, lat, lon) float32 ... Attributes: Conventions: CF-1.4 title: Global SST & Sea Ice Anomaly, L4 OSTIA, 0.25 ... summary: A merged, multi-sensor L4 Foundation SST anom... references: Donlon, C.J., Martin, M., Stark, J.D., Robert... institution: UKMO history: Created from sst:temperature regridded with a... comment: WARNING Some applications are unable to prope... license: These data are available free of charge under... id: UKMO-L4LRfnd_GLOB-OSTIAanom naming_authority: org.ghrsst product_version: 2.0 uuid: 5c1665b7-06e8-499d-a281-857dcbfd07e2 gds_version_id: 2.0 netcdf_version_id: 3.6 date_created: 20170606T061737Z start_time: 20170605T000000Z time_coverage_start: 20170605T000000Z stop_time: 20170606T000000Z time_coverage_end: 20170606T000000Z file_quality_level: 3 source: UKMO-L4HRfnd-GLOB-OSTIA platform: Aqua, Envisat, NOAA-18, NOAA-19, MetOpA, MSG1... sensor: AATSR, AMSR, AVHRR, AVHRR_GAC, SEVIRI, TMI metadata_conventions: Unidata Observation Dataset v1.0 metadata_link: http://data.nodc.noaa.gov/NESDIS_DataCenters/... keywords: Oceans > Ocean Temperature > Sea Surface Temp... keywords_vocabulary: NASA Global Change Master Directory (GCMD) Sc... standard_name_vocabulary: NetCDF Climate and Forecast (CF) Metadata Con... westernmost_longitude: 0.0 easternmost_longitude: 360.0 southernmost_latitude: -90.0 northernmost_latitude: 90.0 spatial_resolution: 0.25 degree geospatial_lat_units: degrees_north geospatial_lat_resolution: 0.25 degree geospatial_lon_units: degrees_east geospatial_lon_resolution: 0.25 degree acknowledgment: Please acknowledge the use of these data with... creator_name: Met Office as part of CMEMS creator_email: email@example.com creator_url: http://marine.copernicus.eu/ project: Group for High Resolution Sea Surface Tempera... publisher_name: GHRSST Project Office publisher_url: http://www.ghrsst.org publisher_email: firstname.lastname@example.org processing_level: L4 cdm_data_type: grid
Below an example xcube dataset will be created, which will contain the variable analysed_sst. The metadata for a specific variable can be viewed by:
$ xcube dump examples/gen/data/20170605120000-UKMO-L4_GHRSST-SSTfnd-OSTIAanom-GLOB-v02.0-fv02.0.nc --var analysed_sst
<xarray.DataArray 'analysed_sst' (time: 1, lat: 720, lon: 1440)> [1036800 values with dtype=float32] Coordinates: * lat (lat) float32 -89.875 -89.625 -89.375 ... 89.375 89.625 89.875 * lon (lon) float32 0.125 0.375 0.625 0.875 ... 359.375 359.625 359.875 * time (time) object 2017-06-05 12:00:00 Attributes: long_name: analysed sea surface temperature standard_name: sea_surface_foundation_temperature type: foundation units: kelvin valid_min: -300 valid_max: 4500 source: UKMO-L4HRfnd-GLOB-OSTIA comment:
For creating a toy xcube dataset you can execute the command-line below. Please adjust the paths to your needs:
$ xcube gen -o "your/output/path/demo_SST_xcube.zarr" -c examples/gen/config_files/xcube_sst_demo_config.yml --sort examples/gen/data/*.nc
The configuration file specifies the input processor, which in this case is the default one.
The output size is 10240, 5632. The bounding box of the data cube is given by
output_region in the configuration file.
The output format (
output_writer_name) is defined as well.
The chunking of the dimensions can be set by the
chunksizes attribute of the
and in the example configuration file the chunking is set for latitude and longitude. If the chunking is not set, a automatic chunking is applied.
The spatial resampling method (
output_resampling) is set to ‘nearest’ and the configuration file contains only one
variable which will be included into the xcube dataset - ‘analysed-sst’.
The Analysed Sea Surface Temperature data set contains the variable already as needed. This means no pixel masking needs to be applied. However, this might differ depending on the input data. You can take a look at a configuration file which takes Sentinel-3 Ocean and Land Colour Instrument (OLCI) as input files, which is a bit more complex. The advantage of using pixel expressions is, that the generated cube contains only valid pixels and the user of the data cube does not have to worry about something like land-masking or invalid values. Furthermore, the generated data cube is spatially regular. This means the data are aligned on a common spatial grid and cover the same region. The time stamps are kept from the input data set.
Caution: If you have input data that has file names not only varying with the time stamp but with e.g. A and B as well,
you need to pass the input files in the desired order via a text file. Each line of the text file should contain the
path to one input file. If you pass the input files in a desired order, then do not use the parameter
the commandline interface.
Optimizing and pruning a xcube dataset¶
If you want to optimize your generated xcube dataset e.g. for publishing it in a xcube viewer via xcube serve you can use cli/xcube optimize:
$ xcube optimize demo_SST_xcube.zarr -C
By executing the command above, an optimized xcube dataset called demo_SST_xcube-optimized.zarr will be created.
You can take a look into the directory of the original xcube dataset and the optimized one, and you will notice that
a file called .zmetadata. .zmetadata contains the information stored in .zattrs and .zarray of each variable of the
xcube dataset and makes requests of metadata faster. The option
-C optimizes coordinate variables by converting any
chunked arrays into single, non-chunked, contiguous arrays.
For deleting empty chunks cli/xcube prune can be used. It deletes all data files associated with empty (NaN-only) chunks of an xcube dataset, and is restricted to the ZARR format.
$ xcube prune demo_SST_xcube-optimized.zarr
The pruned xcube dataset is saved in place and does not need an output path. The size of the xcube dataset was 6,8 MB before pruning it and 6,5 MB afterwards. According to the output printed to the terminal, 30 block files were deleted.
The metadata of the xcube dataset can be viewed with cli/xcube dump as well:
$ xcube dump demo_SST_xcube-optimized.zarr
<xarray.Dataset> Dimensions: (bnds: 2, lat: 5632, lon: 10240, time: 3) Coordinates: * lat (lat) float64 62.67 62.66 62.66 62.66 ... 48.01 48.0 48.0 lat_bnds (lat, bnds) float64 dask.array<shape=(5632, 2), chunksize=(5632, 2)> * lon (lon) float64 -16.0 -16.0 -15.99 -15.99 ... 10.66 10.66 10.67 lon_bnds (lon, bnds) float64 dask.array<shape=(10240, 2), chunksize=(10240, 2)> * time (time) datetime64[ns] 2017-06-05T12:00:00 ... 2017-06-07T12:00:00 time_bnds (time, bnds) datetime64[ns] dask.array<shape=(3, 2), chunksize=(3, 2)> Dimensions without coordinates: bnds Data variables: analysed_sst (time, lat, lon) float64 dask.array<shape=(3, 5632, 10240), chunksize=(1, 704, 640)> Attributes: acknowledgment: Data Cube produced based on data provided by ... comment: contributor_name: contributor_role: creator_email: email@example.com creator_name: Brockmann Consult GmbH creator_url: https://www.brockmann-consult.de date_modified: 2019-09-25T08:50:32.169031 geospatial_lat_max: 62.666666666666664 geospatial_lat_min: 48.0 geospatial_lat_resolution: 0.002604166666666666 geospatial_lat_units: degrees_north geospatial_lon_max: 10.666666666666664 geospatial_lon_min: -16.0 geospatial_lon_resolution: 0.0026041666666666665 geospatial_lon_units: degrees_east history: xcube/reproj-snap-nc id: demo-bc-sst-sns-l2c-v1 institution: Brockmann Consult GmbH keywords: license: terms and conditions of the DCS4COP data dist... naming_authority: bc processing_level: L2C project: xcube publisher_email: firstname.lastname@example.org publisher_name: Brockmann Consult GmbH publisher_url: https://www.brockmann-consult.de references: https://dcs4cop.eu/ source: CMEMS Global SST & Sea Ice Anomaly Data Cube standard_name_vocabulary: summary: time_coverage_end: 2017-06-08T00:00:00.000000000 time_coverage_start: 2017-06-05T00:00:00.000000000 title: CMEMS Global SST Anomaly Data Cube
The metadata for the variable analysed_sst can be viewed:
$ xcube dump demo_SST_xcube-optimized.zarr --var analysed_sst
<xarray.DataArray 'analysed_sst' (time: 3, lat: 5632, lon: 10240)> dask.array<shape=(3, 5632, 10240), dtype=float64, chunksize=(1, 704, 640)> Coordinates: * lat (lat) float64 62.67 62.66 62.66 62.66 ... 48.01 48.01 48.0 48.0 * lon (lon) float64 -16.0 -16.0 -15.99 -15.99 ... 10.66 10.66 10.66 10.67 * time (time) datetime64[ns] 2017-06-05T12:00:00 ... 2017-06-07T12:00:00 Attributes: comment: long_name: analysed sea surface temperature source: UKMO-L4HRfnd-GLOB-OSTIA spatial_resampling: Nearest standard_name: sea_surface_foundation_temperature type: foundation units: kelvin valid_max: 4500 valid_min: -300