Python API

Data Store Framework

Functions

xcube.core.store.new_data_store(data_store_id: str, extension_registry: ExtensionRegistry | None = None, **data_store_params) DataStore | MutableDataStore[source]

Create a new data store instance for given data_store_id and data_store_params.

Parameters:
  • data_store_id – A data store identifier.

  • extension_registry – Optional extension registry. If not given, the global extension registry will be used.

  • **data_store_params – Data store specific parameters.

Returns:

A new data store instance

xcube.core.store.new_fs_data_store(protocol: str, root: str = '', max_depth: int | None = 1, read_only: bool = False, includes: Sequence[str] | None = None, excludes: Sequence[str] | None = None, storage_options: Dict[str, Any] = None) FsDataStore[source]

Create a new instance of a filesystem-based data store.

The data store is capable of filtering the data identifiers reported by get_data_ids(). For this purpose the optional keywords excludes and includes are used which can both take the form of a wildcard pattern or a sequence of wildcard patterns:

  • excludes: if given and if any pattern matches the identifier, the identifier is not reported.

  • includes: if not given or if any pattern matches the identifier, the identifier is reported.

Parameters:
  • protocol – The filesystem protocol, for example “file”, “s3”, “memory”.

  • root – Root or base directory. Defaults to “”.

  • max_depth – Maximum recursion depth. None means limitless. Defaults to 1.

  • read_only – Whether this is a read-only store. Defaults to False.

  • includes – Optional sequence of wildcards that include certain filesystem paths. Affects the data identifiers (paths) returned by get_data_ids(). By default, all paths are included.

  • excludes – Optional sequence of wildcards that exclude certain filesystem paths. Affects the data identifiers (paths) returned by get_data_ids(). By default, no paths are excluded.

  • storage_options – Options specific to the underlying filesystem identified by protocol. Used to instantiate the filesystem.

Returns:

A new data store instance of type FsDataStore.

xcube.core.store.find_data_store_extensions(predicate: Callable[[Extension], bool] = None, extension_registry: ExtensionRegistry | None = None) List[Extension][source]

Find data store extensions using the optional filter function predicate.

Parameters:
  • predicate – An optional filter function.

  • extension_registry – Optional extension registry. If not given, the global extension registry will be used.

Returns:

List of data store extensions.

xcube.core.store.get_data_store_class(data_store_id: str, extension_registry: ExtensionRegistry | None = None) Type[DataStore] | Type[MutableDataStore][source]

Get the class for the data store identified by data_store_id.

Parameters:
  • data_store_id – A data store identifier.

  • extension_registry – Optional extension registry. If not given, the global extension registry will be used.

Returns:

The class for the data store.

xcube.core.store.get_data_store_params_schema(data_store_id: str, extension_registry: ExtensionRegistry | None = None) JsonObjectSchema[source]

Get the JSON schema for instantiating a new data store identified by data_store_id.

Parameters:
  • data_store_id – A data store identifier.

  • extension_registry – Optional extension registry. If not given, the global extension registry will be used.

Returns:

The JSON schema for the data store’s parameters.

Classes

class xcube.core.store.DataStore[source]

A data store represents a collection of data resources that can be enumerated, queried, and opened in order to obtain in-memory representations of the data. The same data resource may be made available using different data types. Therefore, many methods allow specifying a data_type parameter.

A store implementation may use any existing openers/writers, or define its own, or not use any openers/writers at all.

Store implementers should follow the conventions outlined in https://xcube.readthedocs.io/en/latest/storeconv.html .

The DataStore is an abstract base class that both read-only and mutable data stores must implement.

classmethod get_data_store_params_schema() JsonObjectSchema[source]

Get descriptions of parameters that must or can be used to instantiate a new DataStore object. Parameters are named and described by the properties of the returned JSON object schema. The default implementation returns JSON object schema that can have any properties.

abstract classmethod get_data_types() Tuple[str, ...][source]

Get alias names for all data types supported by this store. The first entry in the tuple represents this store’s default data type.

Returns:

The tuple of supported data types.

abstract get_data_types_for_data(data_id: str) Tuple[str, ...][source]

Get alias names for of data types that are supported by this store for the given data_id.

Parameters:

data_id – An identifier of data that is provided by this store

Returns:

A tuple of data types that apply to the given data_id.

Raises:

DataStoreError – If an error occurs.

abstract get_data_ids(data_type: str | None | type | DataType = None, include_attrs: Container[str] = None) Iterator[str] | Iterator[Tuple[str, Dict[str, Any]]][source]

Get an iterator over the data resource identifiers for the given type data_type. If data_type is omitted, all data resource identifiers are returned.

If a store implementation supports only a single data type, it should verify that data_type is either None or compatible with the supported data type.

If include_attrs is provided, it must be a sequence of names of metadata attributes. The store will then return extra metadata for each returned data resource identifier according to the names of the metadata attributes as tuples (data_id, attrs).

Hence, the type of the returned iterator items depends on the value of include_attrs:

  • If include_attrs is None (the default), the method returns an iterator of dataset identifiers data_id of type str.

  • If include_attrs is a sequence of attribute names, even an empty one, the method returns an iterator of tuples (data_id, attrs) of type Tuple[str, Dict], where attrs is a dictionary filled according to the names in include_attrs. If a store cannot provide a given attribute, it should simply ignore it. This may even yield to an empty dictionary for a given data_id.

The individual attributes do not have to exist in the dataset’s metadata, they may also be generated on-the-fly. An example for a generic attribute name is “title”. A store should try to resolve include_attrs=["title"] by returning items such as ("ESACCI-L4_GHRSST-SSTdepth-OSTIA-GLOB_CDR2.1-v02.0-fv01.0.zarr", {"title": "Level-4 GHRSST Analysed Sea Surface Temperature"}).

Parameters:
  • data_type – If given, only data identifiers that are available as this type are returned. If this is omitted, all available data identifiers are returned.

  • include_attrs – A sequence of names of attributes to be returned for each dataset identifier. If given, the store will attempt to provide the set of requested dataset attributes in addition to the data ids. (added in xcube 0.8.0)

Returns:

An iterator over the identifiers and titles of data resources provided by this data store.

Raises:

DataStoreError – If an error occurs.

list_data_ids(data_type: str | None | type | DataType = None, include_attrs: Container[str] = None) List[str] | List[Tuple[str, Dict[str, Any]]][source]

Convenience version of get_data_ids() that returns a list rather than an iterator.

Parameters:
  • data_type – If given, only data identifiers that are available as this type are returned. If this is omitted, all available data identifiers are returned.

  • include_attrs – A sequence of names of attributes to be returned for each dataset identifier. If given, the store will attempt to provide the set of requested dataset attributes in addition to the data ids. (added in xcube 0.8.0)

Returns:

A list comprising the identifiers and titles of data resources provided by this data store.

Raises:

DataStoreError – If an error occurs.

abstract has_data(data_id: str, data_type: str | None | type | DataType = None) bool[source]

Check if the data resource given by data_id is available in this store.

Parameters:
  • data_id – A data identifier

  • data_type – An optional data type. If given, it will also be checked whether the data is available as the specified type. May be given as type alias name, as a type, or as a DataType instance.

Returns:

True, if the data resource is available in this store, False otherwise.

abstract describe_data(data_id: str, data_type: str | None | type | DataType = None) DataDescriptor[source]

Get the descriptor for the data resource given by data_id.

Raises a DataStoreError if data_id does not exist in this store or the data is not available as the specified data_type.

Parameters:
  • data_id – An identifier of data provided by this store

  • data_type – If given, the descriptor of the data will describe the data as specified by the data type. May be given as type alias name, as a type, or as a DataType instance.

Returns: a data-type specific data descriptor

Raises:

DataStoreError – If an error occurs.

abstract get_data_opener_ids(data_id: str = None, data_type: str | None | type | DataType = None) Tuple[str, ...][source]

Get identifiers of data openers that can be used to open data resources from this store.

If data_id is given, data accessors are restricted to the ones that can open the identified data resource. Raises if data_id does not exist in this store.

If data_type is given, only openers that are compatible with this data type are returned.

If a store implementation supports only a single data type, it should verify that data_type is either None or equal to that single data type.

Parameters:
  • data_id – An optional data resource identifier that is known to exist in this data store.

  • data_type – An optional data type that is known to be supported by this data store. May be given as type alias name, as a type, or as a DataType instance.

Returns:

A tuple of identifiers of data openers that can be used to open data resources.

Raises:

DataStoreError – If an error occurs.

abstract get_open_data_params_schema(data_id: str = None, opener_id: str = None) JsonObjectSchema[source]

Get the schema for the parameters passed as open_params to open_data().

If data_id is given, the returned schema will be tailored to the constraints implied by the identified data resource. Some openers might not support this, therefore data_id is optional, and if it is omitted, the returned schema will be less restrictive. If given, the method raises if data_id does not exist in this store.

If opener_id is given, the returned schema will be tailored to the constraints implied by the identified opener. Some openers might not support this, therefore opener_id is optional, and if it is omitted, the returned schema will be less restrictive.

For maximum compatibility of stores, it is strongly encouraged to apply the following conventions on parameter names, types, and their interpretation.

Let P be the value of an optional, data constraining open parameter, then it should be interpreted as follows:

  • _if P is None_ means, parameter not given, hence no constraint applies, hence no additional restrictions on requested data.

  • _if not P_ means, we exclude data that would be included by default.

  • _else_, the given constraint applies.

Given here are names, types, and descriptions of common, constraining open parameters for gridded datasets. Note, whether any of these is optional or mandatory depends on the individual data store. A store may also define other open parameters or support only a subset of the following. Note all parameters may be optional, the Python-types given here refer to _given_, non-Null parameters:

  • variable_names: List[str]: Included data variables. Available coordinate variables will be auto-included for any dimension of the data variables.

  • bbox: Tuple[float, float, float, float]: Spatial coverage as xmin, ymin, xmax, ymax.

  • crs: str: Spatial CRS, e.g. “EPSG:4326” or OGC CRS URI.

  • spatial_res: float: Spatial resolution in coordinates of the spatial CRS.

  • time_range: Tuple[Optional[str], Optional[str]]: Time range interval in UTC date/time units using ISO format. Start or end time may be missing which means everything until available start or end time.

  • ``time_period: str`: Pandas-compatible period/frequency string, e.g. “8D”, “2W”.

E.g. applied to an optional variable_names parameter, this means

  • variable_names is None - include all data variables

  • variable_names == [] - do not include data variables (schema only)

  • variable_names == [“<var_1>”, “<var_2>”, …] only include data variables named “<var_1>”, “<var_2>”, …

Parameters:
  • data_id – An optional data identifier that is known to exist in this data store.

  • opener_id – An optional data opener identifier.

Returns:

The schema for the parameters in open_params.

Raises:

DataStoreError – If an error occurs.

abstract open_data(data_id: str, opener_id: str = None, **open_params) Any[source]

Open the data given by the data resource identifier data_id using the supplied open_params.

The data type of the return value depends on the data opener used to open the data resource.

If opener_id is given, the identified data opener will be used to open the data resource and open_params must comply with the schema of the opener’s parameters. Note that some store implementations may not support using different openers or just support a single one.

Raises if data_id does not exist in this store.

Parameters:
  • data_id – The data identifier that is known to exist in this data store.

  • opener_id – An optional data opener identifier.

  • **open_params – Opener-specific parameters.

Returns:

An in-memory representation of the data resources identified by data_id and open_params.

Raises:

DataStoreError – If an error occurs.

class xcube.core.store.MutableDataStore[source]

A mutable data store is a data store that also allows for adding, updating, and removing data resources.

MutableDataStore is an abstract base class that any mutable data store must implement.

abstract get_data_writer_ids(data_type: str | None | type | DataType = None) Tuple[str, ...][source]

Get identifiers of data writers that can be used to write data resources to this store.

If data_type is given, only writers that support this data type are returned.

If a store implementation supports only a single data type, it should verify that data_type is either None or equal to that single data type.

Parameters:

data_type – An optional data type specifier that is known to be supported by this data store. May be given as type alias name, as a type, or as a DataType instance.

Returns:

A tuple of identifiers of data writers that can be used to write data resources.

Raises:

DataStoreError – If an error occurs.

abstract get_write_data_params_schema(writer_id: str = None) JsonObjectSchema[source]

Get the schema for the parameters passed as write_params to write_data().

If writer_id is given, the returned schema will be tailored to the constraints implied by the identified writer. Some writers might not support this, therefore writer_id is optional, and if it is omitted, the returned schema will be less restrictive.

Given here is a pseudo-code implementation for stores that support multiple writers and where the store has common parameters with the writer:

store_params_schema = self.get_data_store_params_schema() writer_params_schema = get_writer(writer_id).get_write_data_params_schema() return subtract_param_schemas(writer_params_schema, store_params_schema)

Parameters:

writer_id – An optional data writer identifier.

Returns:

The schema for the parameters in write_params.

Raises:

DataStoreError – If an error occurs.

abstract write_data(data: Any, data_id: str = None, writer_id: str = None, replace: bool = False, **write_params) str[source]
Write a data in-memory instance using the supplied data_id

and write_params.

If data identifier data_id is not given, a writer-specific default will be generated, used, and returned.

If writer_id is given, the identified data writer will be used to write the data resource and write_params must comply with the schema of writers’s parameters. Note that some store implementations may not support using different writers or just support a single one.

Given here is a pseudo-code implementation for stores that support multiple writers:

writer_id = writer_id or self.gen_data_id() path = self.resolve_data_id_to_path(data_id) write_params = add_params(self.get_data_store_params(), write_params) get_writer(writer_id).write_data(data, path, **write_params) self.register_data(data_id, data)

Raises if data_id does not exist in this store.

Parameters:
  • data – The data in-memory instance to be written.

  • data_id – An optional data identifier that is known to be unique in this data store.

  • writer_id – An optional data writer identifier.

  • replace – Whether to replace an existing data resource.

  • **write_params – Writer-specific parameters.

Returns:

The data identifier used to write the data.

Raises:

DataStoreError – If an error occurs.

abstract delete_data(data_id: str, **delete_params)[source]

Delete the data resource identified by data_id.

Typically, an implementation would delete the data resource from the physical storage and also remove any registered metadata from an associated database.

Raises if data_id does not exist in this store.

Parameters:

data_id – An data identifier that is known to exist in this data store.

Raises:

DataStoreError – If an error occurs.

abstract register_data(data_id: str, data: Any)[source]

Register the in-memory representation of a data resource data using the given data resource identifier data_id.

This method can be used to register data resources that are already physically stored in the data store, but are not yet searchable or otherwise accessible by the given data_id.

Typically, an implementation would extract metadata from data and store it in a store-specific database. An implementation should just store the metadata of data. It should not write data.

Parameters:
  • data_id – A data resource identifier that is known to be unique in this data store.

  • data – An in-memory representation of a data resource.

Raises:

DataStoreError – If an error occurs.

abstract deregister_data(data_id: str)[source]

De-register a data resource identified by data_id from this data store.

This method can be used to de-register data resources so it will be no longer searchable or otherwise accessible by the given data_id.

Typically, an implementation would extract metadata from data and store it in a store-specific database. An implementation should only remove a data resource’s metadata. It should not delete data from its physical storage space.

Raises if data_id does not exist in this store.

Parameters:

data_id – A data resource identifier that is known to exist in this data store.

Raises:

DataStoreError – If an error occurs.

class xcube.core.store.DataOpener[source]

An interface that specifies a parameterized open_data() operation.

Possible open parameters are implementation-specific and are described by a JSON Schema.

Note this interface uses the term “opener” to underline the expected laziness of the operation. For example, when a xarray.Dataset is returned from a Zarr directory, the actual data is represented by Dask arrays and will be loaded only on-demand.

abstract get_open_data_params_schema(data_id: str = None) JsonObjectSchema[source]

Get the schema for the parameters passed as open_params to open_data(). If data_id is given, the returned schema will be tailored to the constraints implied by the identified data resource. Some openers might not support this, therefore data_id is optional, and if it is omitted, the returned schema will be less restrictive.

Parameters:

data_id – An optional data resource identifier.

Returns:

The schema for the parameters in open_params.

Raises:

DataStoreError – If an error occurs.

abstract open_data(data_id: str, **open_params) Any[source]

Open the data resource given by the data resource identifier data_id using the supplied open_params.

Raises if data_id does not exist.

Parameters:
  • data_id – The data resource identifier.

  • **open_params – Opener-specific parameters.

Returns:

An xarray.Dataset instance.

Raises:

DataStoreError – If an error occurs.

class xcube.core.store.DataSearcher[source]

Allow searching data in a data store.

abstract classmethod get_search_params_schema(data_type: str | None | type | DataType = None) JsonObjectSchema[source]

Get the schema for the parameters that can be passed as search_params to search_data(). Parameters are named and described by the properties of the returned JSON object schema.

Parameters:

data_type – If given, the search parameters will be tailored to search for data for the given data_type.

Returns:

A JSON object schema whose properties describe this store’s search parameters.

abstract search_data(data_type: str | None | type | DataType = None, **search_params) Iterator[DataDescriptor][source]

Search this store for data resources. If data_type is given, the search is restricted to data resources of that type.

Returns an iterator over the search results which are returned as DataDescriptor objects.

If a store implementation supports only a single data type, it should verify that data_type is either None or compatible with the supported data type specifier.

Parameters:
  • data_type – An optional data type that is known to be supported by this data store.

  • **search_params – The search parameters.

Returns:

An iterator of data descriptors for the found data resources.

Raises:

DataStoreError – If an error occurs.

class xcube.core.store.DataWriter[source]

An interface that specifies a parameterized write_data() operation.

Possible write parameters are implementation-specific and are described by a JSON Schema.

abstract get_write_data_params_schema() JsonObjectSchema[source]

Get the schema for the parameters passed as write_params to write_data().

Returns:

The schema for the parameters in write_params.

Raises:

DataStoreError – If an error occurs.

abstract write_data(data: Any, data_id: str, replace: bool = False, **write_params) str[source]

Write a data resource using the supplied data_id and write_params.

Parameters:
  • data – The data resource’s in-memory representation to be written.

  • data_id – A unique data resource identifier.

  • replace – Whether to replace an existing data resource.

  • **write_params – Writer-specific parameters.

Returns:

The data resource identifier used to write the data resource.

Raises:

DataStoreError – If an error occurs.

class xcube.core.store.DataStoreError(message: str)[source]

Raised on error in any of the data store, opener, or writer methods.

Parameters:

message – The error message.

class xcube.core.store.DataDescriptor(data_id: str, data_type: str | None | type | DataType, *, crs: str = None, bbox: Tuple[float, float, float, float] = None, time_range: Tuple[str | None, str | None] = None, time_period: str = None, open_params_schema: JsonObjectSchema = None, **additional_properties)[source]

A generic descriptor for any data. Also serves as a base class for more specific data descriptors.

Parameters:
  • data_id – An identifier for the data

  • data_type – A type specifier for the data

  • crs – A coordinate reference system identifier, as an EPSG, PROJ or WKT string

  • bbox – A bounding box of the data

  • time_range – Start and end time delimiting this data’s temporal extent

  • time_period – The data’s periodicity if it is evenly temporally resolved.

  • open_params_schema – A JSON schema describing the parameters that may be used to open this data.

classmethod get_schema() JsonObjectSchema[source]

Get JSON object schema.

class xcube.core.store.DatasetDescriptor(data_id: str, *, data_type: str | None | type | DataType = 'dataset', crs: str = None, bbox: Tuple[float, float, float, float] = None, time_range: Tuple[str | None, str | None] = None, time_period: str = None, spatial_res: float = None, dims: Mapping[str, int] = None, coords: Mapping[str, VariableDescriptor] = None, data_vars: Mapping[str, VariableDescriptor] = None, attrs: Mapping[Hashable, any] = None, open_params_schema: JsonObjectSchema = None, **additional_properties)[source]

A descriptor for a gridded, N-dimensional dataset represented by xarray.Dataset. Comprises a description of the data variables contained in the dataset.

Regrading time_range and time_period parameters, please refer to https://github.com/dcs4cop/xcube/blob/main/docs/source/storeconv.md#date-time-and-duration-specifications

Parameters:
  • data_id – An identifier for the data

  • data_type – The data type of the data described

  • crs – A coordinate reference system identifier, as an EPSG, PROJ or WKT string

  • bbox – A bounding box of the data

  • time_range – Start and end time delimiting this data’s temporal extent

  • time_period – The data’s periodicity if it is evenly temporally resolved

  • spatial_res – The spatial extent of a pixel in crs units

  • dims – A mapping of the dataset’s dimensions to their sizes

  • coords – mapping of the dataset’s data coordinate names to instances of VariableDescriptor

  • data_vars – A mapping of the dataset’s variable names to instances of VariableDescriptor

  • attrs – A mapping containing arbitrary attributes of the dataset

  • open_params_schema – A JSON schema describing the parameters that may be used to open this data

classmethod get_schema() JsonObjectSchema[source]

Get JSON object schema.

class xcube.core.store.MultiLevelDatasetDescriptor(data_id: str, num_levels: int, *, data_type: str | None | type | DataType = 'mldataset', **kwargs)[source]

A descriptor for a gridded, N-dimensional, multi-level, multi-resolution dataset represented by xcube.core.mldataset.MultiLevelDataset.

Parameters:
  • data_id – An identifier of the multi-level dataset

  • num_levels – The number of levels of this multi-level dataset

  • data_type – A type specifier for the multi-level dataset

classmethod get_schema() JsonObjectSchema[source]

Get JSON object schema.

class xcube.core.store.DatasetDescriptor(data_id: str, *, data_type: str | None | type | DataType = 'dataset', crs: str = None, bbox: Tuple[float, float, float, float] = None, time_range: Tuple[str | None, str | None] = None, time_period: str = None, spatial_res: float = None, dims: Mapping[str, int] = None, coords: Mapping[str, VariableDescriptor] = None, data_vars: Mapping[str, VariableDescriptor] = None, attrs: Mapping[Hashable, any] = None, open_params_schema: JsonObjectSchema = None, **additional_properties)[source]

A descriptor for a gridded, N-dimensional dataset represented by xarray.Dataset. Comprises a description of the data variables contained in the dataset.

Regrading time_range and time_period parameters, please refer to https://github.com/dcs4cop/xcube/blob/main/docs/source/storeconv.md#date-time-and-duration-specifications

Parameters:
  • data_id – An identifier for the data

  • data_type – The data type of the data described

  • crs – A coordinate reference system identifier, as an EPSG, PROJ or WKT string

  • bbox – A bounding box of the data

  • time_range – Start and end time delimiting this data’s temporal extent

  • time_period – The data’s periodicity if it is evenly temporally resolved

  • spatial_res – The spatial extent of a pixel in crs units

  • dims – A mapping of the dataset’s dimensions to their sizes

  • coords – mapping of the dataset’s data coordinate names to instances of VariableDescriptor

  • data_vars – A mapping of the dataset’s variable names to instances of VariableDescriptor

  • attrs – A mapping containing arbitrary attributes of the dataset

  • open_params_schema – A JSON schema describing the parameters that may be used to open this data

classmethod get_schema() JsonObjectSchema[source]

Get JSON object schema.

class xcube.core.store.VariableDescriptor(name: str, dtype: str, dims: Sequence[str], *, chunks: Sequence[int] = None, attrs: Mapping[Hashable, any] = None, **additional_properties)[source]

A descriptor for dataset variable represented by xarray.DataArray instances. They are part of dataset descriptor for an gridded, N-dimensional dataset represented by xarray.Dataset.

Parameters:
  • name – The variable name

  • dtype – The data type of the variable.

  • dims – A list of the names of the variable’s dimensions.

  • chunks – A list of the chunk sizes of the variable’s dimensions

  • attrs – A mapping containing arbitrary attributes of the variable

property ndim: int

Number of dimensions.

classmethod get_schema() JsonObjectSchema[source]

Get JSON object schema.

class xcube.core.store.GeoDataFrameDescriptor(data_id: str, *, data_type: str | None | type | DataType = 'geodataframe', feature_schema: JsonObjectSchema = None, **kwargs)[source]

A descriptor for a geo-vector dataset represented by a geopandas.GeoDataFrame instance.

Parameters:
  • data_id – An identifier of the geopandas.GeoDataFrame

  • feature_schema – A schema describing the properties of the vector data

  • kwargs – Parameters passed to super DataDescriptor

classmethod get_schema() JsonObjectSchema[source]

Get JSON object schema.

Cube generation

xcube.core.gen.gen.gen_cube(input_paths: Sequence[str] = None, input_processor_name: str = None, input_processor_params: Dict = None, input_reader_name: str = None, input_reader_params: Dict[str, Any] = None, output_region: Tuple[float, float, float, float] = None, output_size: Tuple[int, int] = [512, 512], output_resampling: str = 'Nearest', output_path: str = 'out.zarr', output_writer_name: str = None, output_writer_params: Dict[str, Any] = None, output_metadata: Dict[str, Any] = None, output_variables: List[Tuple[str, Dict[str, Any] | None]] = None, processed_variables: List[Tuple[str, Dict[str, Any] | None]] = None, profile_mode: bool = False, no_sort_mode: bool = False, append_mode: bool = None, dry_run: bool = False, monitor: Callable[[...], None] = None) bool[source]

Generate a xcube dataset from one or more input files.

Parameters:
  • no_sort_mode

  • input_paths – The input paths.

  • input_processor_name – Name of a registered input processor (xcube.core.gen.inputprocessor.InputProcessor) to be used to transform the inputs.

  • input_processor_params – Parameters to be passed to the input processor.

  • input_reader_name – Name of a registered input reader (xcube.core.util.dsio.DatasetIO).

  • input_reader_params – Parameters passed to the input reader.

  • output_region – Output region as tuple of floats: (lon_min, lat_min, lon_max, lat_max).

  • output_size – The spatial dimensions of the output as tuple of ints: (width, height).

  • output_resampling – The resampling method for the output.

  • output_path – The output directory.

  • output_writer_name – Name of an output writer (xcube.core.util.dsio.DatasetIO) used to write the cube.

  • output_writer_params – Parameters passed to the output writer.

  • output_metadata – Extra metadata passed to output cube.

  • output_variables – Output variables.

  • processed_variables – Processed variables computed on-the-fly.

  • profile_mode – Whether profiling should be enabled.

  • append_mode – Deprecated. The function will always either insert, replace, or append new time slices.

  • dry_run – Doesn’t write any data. For testing.

  • monitor – A progress monitor.

Returns:

True for success.

xcube.core.new.new_cube(title='Test Cube', width=360, height=180, x_name='lon', y_name='lat', x_dtype='float64', y_dtype=None, x_units='degrees_east', y_units='degrees_north', x_res=1.0, y_res=None, x_start=-180.0, y_start=-90.0, inverse_y=False, time_name='time', time_dtype='datetime64[s]', time_units='seconds since 1970-01-01T00:00:00', time_calendar='proleptic_gregorian', time_periods=5, time_freq='D', time_start='2010-01-01T00:00:00', use_cftime=False, drop_bounds=False, variables=None, crs=None, crs_name=None, time_encoding_dtype='int64')[source]

Create a new empty cube. Useful for creating cubes templates with predefined coordinate variables and metadata. The function is also heavily used by xcube’s unit tests.

The values of the variables dictionary can be either constants, array-like objects, or functions that compute their return value from passed coordinate indexes. The expected signature is::

def my_func(time: int, y: int, x: int) -> Union[bool, int, float]
Parameters:
  • title – A title. Defaults to ‘Test Cube’.

  • width – Horizontal number of grid cells. Defaults to 360.

  • height – Vertical number of grid cells. Defaults to 180.

  • x_name – Name of the x coordinate variable. Defaults to ‘lon’.

  • y_name – Name of the y coordinate variable. Defaults to ‘lat’.

  • x_dtype – Data type of x coordinates. Defaults to ‘float64’.

  • y_dtype – Data type of y coordinates. Defaults to ‘float64’.

  • x_units – Units of the x coordinates. Defaults to ‘degrees_east’.

  • y_units – Units of the y coordinates. Defaults to ‘degrees_north’.

  • x_start – Minimum x value. Defaults to -180.

  • y_start – Minimum y value. Defaults to -90.

  • x_res – Spatial resolution in x-direction. Defaults to 1.0.

  • y_res – Spatial resolution in y-direction. Defaults to 1.0.

  • inverse_y – Whether to create an inverse y axis. Defaults to False.

  • time_name – Name of the time coordinate variable. Defaults to ‘time’.

  • time_periods – Number of time steps. Defaults to 5.

  • time_freq – Duration of each time step. Defaults to `1D’.

  • time_start – First time value. Defaults to ‘2010-01-01T00:00:00’.

  • time_dtype – Numpy data type for time coordinates. Defaults to ‘datetime64[s]’. If used, parameter ‘use_cftime’ must be False.

  • time_units – Units for time coordinates. Defaults to ‘seconds since 1970-01-01T00:00:00’.

  • time_calendar – Calender for time coordinates. Defaults to ‘proleptic_gregorian’.

  • use_cftime – If True, the time will be given as data types according to the ‘cftime’ package. If used, the time_calendar parameter must be also be given with an appropriate value such as ‘gregorian’ or ‘julian’. If used, parameter ‘time_dtype’ must be None.

  • drop_bounds – If True, coordinate bounds variables are not created. Defaults to False.

  • variables – Dictionary of data variables to be added. None by default.

  • crs – pyproj-compatible CRS string or instance of pyproj.CRS or None

  • crs_name – Name of the variable that will hold the CRS information. Ignored, if crs is not given.

  • time_encoding_dtype – data type used to encode the time variable when serializing the dataset

Returns:

A cube instance

Cube computation

xcube.core.compute.compute_cube(cube_func: ~typing.Callable[[...], ~xarray.core.dataarray.DataArray | ~numpy.ndarray | ~typing.Sequence[~xarray.core.dataarray.DataArray | ~numpy.ndarray]], *input_cubes: ~xarray.core.dataset.Dataset, input_cube_schema: ~xcube.core.schema.CubeSchema = None, input_var_names: ~typing.Sequence[str] = None, input_params: ~typing.Dict[str, ~typing.Any] = None, output_var_name: str = 'output', output_var_dtype: ~typing.Any = <class 'numpy.float64'>, output_var_attrs: ~typing.Dict[str, ~typing.Any] = None, vectorize: bool = None, cube_asserted: bool = False) Dataset[source]

Compute a new output data cube with a single variable named output_var_name from variables named input_var_names contained in zero, one, or more input data cubes in input_cubes using a cube factory function cube_func.

For a more detailed description of the function usage, please refer to compute_dataset().

Parameters:
  • cube_func – The cube factory function.

  • input_cubes – An optional sequence of input cube datasets, must be provided if input_cube_schema is not.

  • input_cube_schema – An optional input cube schema, must be provided if input_cubes is not. Will be ignored if input_cubes is provided.

  • input_var_names – A sequence of variable names

  • input_params – Optional dictionary with processing parameters passed to cube_func.

  • output_var_name – Optional name of the output variable, defaults to 'output'.

  • output_var_dtype – Optional numpy datatype of the output variable, defaults to 'float32'.

  • output_var_attrs – Optional metadata attributes for the output variable.

  • vectorize – Whether all input_cubes have the same variables which are concatenated and passed as vectors to cube_func. Not implemented yet.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns:

A new dataset that contains the computed output variable.

xcube.core.evaluate.evaluate_dataset(dataset: Dataset, processed_variables: List[Tuple[str, Dict[str, Any] | None]] = None, errors: str = 'raise') Dataset[source]

Compute new variables or mask existing variables in dataset by the evaluation of Python expressions, that may refer to other existing or new variables. Returns a new dataset that contains the old and new variables, where both may bew now masked.

Expressions may be given by attributes of existing variables in dataset or passed a via the processed_variables argument which is a sequence of variable name / attributes tuples.

Two types of expression attributes are recognized in the attributes:

  1. The attribute expression generates a new variable computed from its attribute value.

  2. The attribute valid_pixel_expression masks out invalid variable values.

In both cases the attribuite value must be a string that forms a valid Python expression that can reference any other preceding variables by name. The expression can also reference any flags defined by another variable according the their CF attributes flag_meaning and flag_values.

Invalid variable values may be masked out using the value the valid_pixel_expression attribute whose value should form a Boolean Python expression. In case, the expression returns zero or false, the value of the _FillValue attribute or NaN will be used in the new variable.

Other attributes will be stored as variable metadata as-is.

Parameters:
  • dataset – A dataset.

  • processed_variables – Optional list of variable name-attributes pairs that will processed in the given order.

  • errors – How to deal with errors while evaluating expressions. May be be one of “raise”, “warn”, or “ignore”.

Returns:

new dataset with computed variables

Cube data extraction

xcube.core.extract.get_cube_values_for_points(cube: Dataset, points: Dataset | DataFrame | Mapping[str, ndarray | Array | DataArray | Series | Sequence[int | float]], var_names: Sequence[str] = None, include_coords: bool = False, include_bounds: bool = False, include_indexes: bool = False, index_name_pattern: str = '{name}_index', include_refs: bool = False, ref_name_pattern: str = '{name}_ref', method: str = 'nearest', cube_asserted: bool = False) Dataset[source]

Extract values from cube variables at given coordinates in points.

Returns a new dataset with values of variables from cube selected by the coordinate columns provided in points. All variables will be 1-D and have the same order as the rows in points.

Parameters:
  • cube – The cube dataset.

  • points – Dictionary that maps dimension name to coordinate arrays.

  • var_names – An optional list of names of data variables in cube whose values shall be extracted.

  • include_coords – Whether to include the cube coordinates for each point in return value.

  • include_bounds – Whether to include the cube coordinate boundaries (if any) for each point in return value.

  • include_indexes – Whether to include computed indexes into the cube for each point in return value.

  • index_name_pattern – A naming pattern for the computed index columns. Must include “{name}” which will be replaced by the index’ dimension name.

  • include_refs – Whether to include point (reference) values from points in return value.

  • ref_name_pattern – A naming pattern for the computed point data columns. Must include “{name}” which will be replaced by the point’s attribute name.

  • method – “nearest” or “linear”.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns:

A new data frame whose columns are values from cube variables at given points.

xcube.core.extract.get_cube_point_indexes(cube: ~xarray.core.dataset.Dataset, points: ~xarray.core.dataset.Dataset | ~pandas.core.frame.DataFrame | ~typing.Mapping[str, ~numpy.ndarray | ~dask.array.core.Array | ~xarray.core.dataarray.DataArray | ~pandas.core.series.Series | ~typing.Sequence[int | float]], dim_name_mapping: ~typing.Mapping[str, str] = None, index_name_pattern: str = '{name}_index', index_dtype=<class 'numpy.float64'>, cube_asserted: bool = False) Dataset[source]

Get indexes of given point coordinates points into the given dataset.

Parameters:
  • cube – The cube dataset.

  • points – A mapping from column names to column data arrays, which must all have the same length.

  • dim_name_mapping – A mapping from dimension names in cube to column names in points.

  • index_name_pattern – A naming pattern for the computed indexes columns. Must include “{name}” which will be replaced by the dimension name.

  • index_dtype – Numpy data type for the indexes. If it is a floating point type (default), then indexes will contain fractions, which may be used for interpolation. For out-of- range coordinates in points, indexes will be -1 if index_dtype is an integer type, and NaN, if index_dtype is a floating point types.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns:

A dataset containing the index columns.

xcube.core.extract.get_cube_values_for_indexes(cube: Dataset, indexes: Dataset | DataFrame | Mapping[str, Any], include_coords: bool = False, include_bounds: bool = False, data_var_names: Sequence[str] = None, index_name_pattern: str = '{name}_index', method: str = 'nearest', cube_asserted: bool = False) Dataset[source]

Get values from the cube at given indexes.

Parameters:
  • cube – A cube dataset.

  • indexes – A mapping from column names to index and fraction arrays for all cube dimensions.

  • include_coords – Whether to include the cube coordinates for each point in return value.

  • include_bounds – Whether to include the cube coordinate boundaries (if any) for each point in return value.

  • data_var_names – An optional list of names of data variables in cube whose values shall be extracted.

  • index_name_pattern – A naming pattern for the computed indexes columns. Must include “{name}” which will be replaced by the dimension name.

  • method – “nearest” or “linear”.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns:

A new data frame whose columns are values from cube variables at given indexes.

xcube.core.extract.get_dataset_indexes(dataset: ~xarray.core.dataset.Dataset, coord_var_name: str, coord_values: ~numpy.ndarray | ~dask.array.core.Array | ~xarray.core.dataarray.DataArray | ~pandas.core.series.Series | ~typing.Sequence[int | float], index_dtype=<class 'numpy.float64'>) DataArray | ndarray[source]

Compute the indexes and their fractions into a coordinate variable coord_var_name of a dataset for the given coordinate values coord_values.

The coordinate variable’s labels must be monotonic increasing or decreasing, otherwise the result will be nonsense.

For any value in coord_values that is out of the bounds of the coordinate variable’s values, the index depends on the value of index_dtype. If index_dtype is an integer type, invalid indexes are encoded as -1 while for floating point types, NaN will be used.

Returns a tuple of indexes as int64 array and fractions as float64 array.

Parameters:
  • dataset – A cube dataset.

  • coord_var_name – Name of a coordinate variable.

  • coord_values – Array-like coordinate values.

  • index_dtype – Numpy data type for the indexes. If it is floating point type (default), then indexes contain fractions, which may be used for interpolation. If dtype is an integer type out-of-range coordinates are indicated by index -1, and NaN if it is is a floating point type.

Returns:

The indexes and their fractions as a tuple of numpy int64 and float64 arrays.

xcube.core.timeseries.get_time_series(cube: Dataset, grid_mapping: GridMapping | None = None, geometry: BaseGeometry | Dict[str, Any] | str | Sequence[float | int] | None = None, var_names: Sequence[str] | None = None, start_date: datetime64 | str | None = None, end_date: datetime64 | str | None = None, agg_methods: str | Sequence[str] | AbstractSet[str] = 'mean', use_groupby: bool = False, cube_asserted: bool | None = None) Dataset | None[source]

Get a time series dataset from a data cube.

geometry may be provided as a (shapely) geometry object, a valid GeoJSON object, a valid WKT string, a sequence of box coordinates (x1, y1, x2, y2), or point coordinates (x, y). If geometry covers an area, i.e. is not a point, the function aggregates the variables to compute a mean value and if desired, the number of valid observations and the standard deviation.

start_date and end_date may be provided as a numpy.datetime64 or an ISO datetime string.

Returns a time-series dataset whose data variables have a time dimension but no longer have spatial dimensions, hence the resulting dataset’s variables will only have N-2 dimensions. A global attribute max_number_of_observations will be set to the maximum number of observations that could have been made in each time step. If the given geometry does not overlap the cube’s boundaries, or if not output variables remain, the function returns None.

Parameters:
  • cube – The xcube dataset

  • grid_mapping – Grid mapping of cube.

  • geometry – Optional geometry

  • var_names – Optional sequence of names of variables to be included.

  • start_date – Optional start date.

  • end_date – Optional end date.

  • agg_methods – Aggregation methods. May be single string or sequence of strings. Possible values are ‘mean’, ‘median’, ‘min’, ‘max’, ‘std’, ‘count’. Defaults to ‘mean’. Ignored if geometry is a point.

  • use_groupby – Use group-by operation. May increase or decrease runtime performance and/or memory consumption.

  • cube_asserted – Deprecated and ignored since xcube 0.11.0. No replacement.

Cube Resampling

xcube.core.resampling.affine_transform_dataset(dataset: Dataset, source_gm: GridMapping, target_gm: GridMapping, var_configs: Mapping[Hashable, Mapping[str, Any]] = None, encode_cf: bool = True, gm_name: str | None = None, reuse_coords: bool = False) Dataset[source]

Resample dataset according to an affine transformation.

The affine transformation will be applied only if the CRS of source_gm and the CRS of target_gm are both geographic or equal. Otherwise, a ValueError will be raised.

Parameters:
  • dataset – The source dataset

  • source_gm – Source grid mapping of dataset. Must be regular. Must have same CRS as target_gm.

  • target_gm – Target grid mapping. Must be regular. Must have same CRS as source_gm.

  • var_configs – Optional resampling configurations for individual variables.

  • encode_cf – Whether to encode the target grid mapping into the resampled dataset in a CF-compliant way. Defaults to True.

  • gm_name – Name for the grid mapping variable. Defaults to “crs”. Used only if encode_cf is True.

  • reuse_coords – Whether to either reuse target coordinate arrays from target_gm or to compute new ones.

Returns:

The resampled target dataset.

xcube.core.resampling.resample_ndimage(image: ~numpy.ndarray | ~dask.array.core.Array, scale: float | ~typing.Tuple[float, float] = 1, offset: float | ~typing.Tuple[float, float] = None, shape: int | ~typing.Tuple[int, int] = None, chunks: ~typing.Sequence[int] = None, spline_order: int = 1, aggregator: ~typing.Callable[[~numpy.ndarray | ~dask.array.core.Array], ~numpy.ndarray | ~dask.array.core.Array] | None = <function nanmean>, recover_nan: bool = False) Array[source]
xcube.core.resampling.encode_grid_mapping(ds: Dataset, gm: GridMapping, gm_name: str | None = None, force: bool | None = None) Dataset[source]

Encode the given grid mapping gm into a copy of ds in a CF-compliant way and return the dataset copy. The function removes any existing grid mappings.

If the CRS of gm is geographic and the spatial dimension and coordinate names are “lat”, “lon” and force is False, or force is None and no former grid mapping was encoded in ds, then nothing else is done and the dataset copy is returned without further action.

Otherwise, for every spatial data variable with dims=(…, y, x), the function sets the attribute “grid_mapping” to gm_name. The grid mapping CRS is encoded in a new 0-D variable named gm_name.

Parameters:
  • ds – The dataset.

  • gm – The dataset’s grid mapping.

  • gm_name – Name for the grid mapping variable. Defaults to “crs”.

  • force – Whether to force encoding of grid mapping even if CRS is geographic and spatial dimension names are “lon”, “lat”. Optional value, if not provided, force will be assumed True if a former grid mapping was encoded in ds.

Returns:

A copy of ds with gm encoded into it.

xcube.core.resampling.rectify_dataset(source_ds: Dataset, *, var_names: str | Sequence[str] = None, source_gm: GridMapping = None, target_gm: GridMapping = None, encode_cf: bool = True, gm_name: str | None = None, tile_size: int | Tuple[int, int] = None, is_j_axis_up: bool = None, output_ij_names: Tuple[str, str] = None, compute_subset: bool = True, uv_delta: float = 0.001, interpolation: str | None = None, xy_var_names: Tuple[str, str] = None) Dataset | None[source]

Reproject dataset source_ds using its per-pixel x,y coordinates or the given source_gm.

The function expects source_ds or the given source_gm to have either one- or two-dimensional coordinate variables that provide spatial x,y coordinates for every data variable with the same spatial dimensions.

For example, a dataset may comprise variables with spatial dimensions var(..., y_dim, x_dim), then one the function expects coordinates to be provided in two forms:

  1. One-dimensional x_var(x_dim) and y_var(y_dim) (coordinate) variables.

  2. Two-dimensional x_var(y_dim, x_dim) and y_var(y_dim, x_dim) (coordinate) variables.

If target_gm is given and it defines a tile size or tile_size is given, and the number of tiles is greater than one in the output’s x- or y-direction, then the returned dataset will be composed of lazy, chunked dask arrays. Otherwise, the returned dataset will be composed of ordinary numpy arrays.

Parameters:
  • source_ds – Source dataset grid mapping.

  • var_names – Optional variable name or sequence of variable names.

  • source_gm – Source dataset grid mapping.

  • target_gm – Optional target geometry. If not given, output geometry will be computed to spatially fit dataset and to retain its spatial resolution.

  • encode_cf – Whether to encode the target grid mapping into the resampled dataset in a CF-compliant way. Defaults to True.

  • gm_name – Name for the grid mapping variable. Defaults to “crs”. Used only if encode_cf is True.

  • tile_size – Optional tile size for the output.

  • is_j_axis_up – Whether y coordinates are increasing with positive image j axis.

  • output_ij_names – If given, a tuple of variable names in which to store the computed source pixel coordinates in the returned output.

  • compute_subset – Whether to compute a spatial subset from dataset using output_geom. If set, the function may return None in case there is no overlap.

  • uv_delta – A normalized value that is used to determine whether x,y coordinates in the output are contained in the triangles defined by the input x,y coordinates. The higher this value, the more inaccurate the rectification will be.

  • interpolation – Interpolation method for computing output pixels. If given, must be “nearest”, “triangular”, or “bilinear”. The default is “nearest”. The “triangular” interpolation is performed between 3 and “bilinear” between 4 adjacent source pixels. Both are applied only to variables of floating point type. If you need to interpolate between integer data you should cast it to float first.

  • xy_var_names – Deprecated. No longer used since 1.0.0, no replacement.

Returns:

a reprojected dataset, or None if the requested output does not intersect with dataset.

For implementation details refer to Spatial Rectification Algorithm.

xcube.core.resampling.resample_in_space(dataset: Dataset, source_gm: GridMapping = None, target_gm: GridMapping = None, var_configs: Mapping[Hashable, Mapping[str, Any]] = None, encode_cf: bool = True, gm_name: str | None = None, rectify_kwargs: dict | None = None)[source]

Resample a dataset in the spatial dimensions.

If the source grid mapping source_gm is not given, it is derived from dataset: source_gm = GridMapping.from_dataset(dataset).

If the target grid mapping target_gm is not given, it is derived from source_gm: target_gm = source_gm.to_regular().

If source_gm is almost equal to target_gm, this function is a no-op and dataset is returned unchanged.

Otherwise, the function computes a spatially resampled version of dataset and returns it.

Using var_configs, the resampling of individual variables can be configured. If given, var_configs must be a mapping from variable names to configuration dictionaries which can have the following properties:

  • spline_order (int) - The order of spline polynomials

    used for interpolating. It is used for upsampling only. Possible values are 0 to 5. Default is 1 (bi-linear) for floating point variables, and 0 (= nearest neighbor) for integer and bool variables.

  • aggregator (str) - An optional aggregating

    function. It is used for downsampling only. Examples are numpy.nanmean, numpy.nanmin, numpy.nanmax. Default is numpy.nanmean for floating point variables, and None (= nearest neighbor) for integer and bool variables.

  • recover_nan (bool) - whether a special algorithm

    shall be used that is able to recover values that would otherwise yield NaN during resampling. Default is True for floating point variables, and False for integer and bool variables.

Note that var_configs is only used if the resampling involves an affine transformation. This is true if the CRS of source_gm and CRS of target_gm are equal and one of two cases is given:

  1. source_gm is regular. In this case the resampling is the affine transformation. and the result is returned directly.

  2. source_gm is not regular and has a lower resolution than target_cm. In this case dataset is downsampled first using an affine transformation. Then the result is rectified.

In all other cases, no affine transformation is applied and the resampling is a direct rectification.

Parameters:
  • dataset – The source dataset.

  • source_gm – The source grid mapping.

  • target_gm – The target grid mapping. Must be regular.

  • var_configs – Optional resampling configurations for individual variables.

  • encode_cf – Whether to encode the target grid mapping into the resampled dataset in a CF-compliant way. Defaults to True.

  • gm_name – Name for the grid mapping variable. Defaults to “crs”. Used only if encode_cf is True.

Returns: The spatially resampled dataset.

xcube.core.resampling.resample_in_time(dataset: Dataset, frequency: str, method: str | Sequence[str], offset=None, base=None, tolerance=None, interp_kind=None, time_chunk_size=None, var_names: Sequence[str] = None, metadata: Dict[str, Any] = None, cube_asserted: bool = False) Dataset[source]

Resample a dataset in the time dimension.

The argument method may be one or a sequence of 'all', 'any', 'argmax', 'argmin', 'count', 'first', 'last', 'max', 'min', 'mean', 'median', 'percentile_<p>', 'std', 'sum', 'var'.

In value 'percentile_<p>' is a placeholder, where '<p>' must be replaced by an integer percentage value, e.g. 'percentile_90' is the 90%-percentile.

Important note: As of xarray 0.14 and dask 2.8, the methods 'median' and 'percentile_<p>'` cannot be used if the variables in *cube* comprise chunked dask arrays. In this case, use the ``compute() or load() method to convert dask arrays into numpy arrays.

Parameters:
  • dataset – The xcube dataset.

  • frequency – Temporal aggregation frequency. Use format “<count><offset>” where <offset> is one of ‘H’, ‘D’, ‘W’, ‘M’, ‘Q’, ‘Y’.

  • method – Resampling method or sequence of resampling methods.

  • offset – Offset used to adjust the resampled time labels. Uses same syntax as frequency.

  • base – Deprecated since xcube 1.0.4. No longer used as of pandas 2.0.

  • time_chunk_size – If not None, the chunk size to be used for the “time” dimension.

  • var_names – Variable names to include.

  • tolerance – Time tolerance for selective upsampling methods. Defaults to frequency.

  • interp_kind – Kind of interpolation if method is ‘interpolation’.

  • metadata – Output metadata.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns:

A new xcube dataset resampled in time.

Cube Manipulation

xcube.core.vars2dim.vars_to_dim(cube: Dataset, dim_name: str = 'var', var_name='data', cube_asserted: bool = False)[source]

Convert data variables into a dimension.

Parameters:
  • cube – The xcube dataset.

  • dim_name – The name of the new dimension and coordinate variable. Defaults to ‘var’.

  • var_name – The name of the new, single data variable. Defaults to ‘data’.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns:

A new xcube dataset with data variables turned into a new dimension.

xcube.core.chunk.chunk_dataset(dataset: Dataset, chunk_sizes: Dict[str, int] = None, format_name: str = None, data_vars_only: bool = False) Dataset[source]

Chunk dataset using chunk_sizes and optionally update encodings for given format_name.

Parameters:
  • dataset – input dataset

  • chunk_sizes – mapping from dimension name to new chunk size

  • format_name – optional format, e.g. “zarr” or “netcdf4”

  • data_vars_only – only chunk data variables, not coordinates

Returns:

the (re)chunked dataset

xcube.core.unchunk.unchunk_dataset(dataset_path: str, var_names: Sequence[str] = None, coords_only: bool = False)[source]

Unchunk dataset variables in-place.

Parameters:
  • dataset_path – Path to ZARR dataset directory.

  • var_names – Optional list of variable names.

  • coords_only – Un-chunk coordinate variables only.

xcube.core.optimize.optimize_dataset(input_path: str, output_path: str = None, in_place: bool = False, unchunk_coords: bool | str | ~typing.Sequence[str] = False, exception_type: ~typing.Type[Exception] = <class 'ValueError'>)[source]

Optimize a dataset for faster access.

Reduces the number of metadata and coordinate data files in xcube dataset given by given by dataset_path. Consolidated cubes open much faster from remote locations, e.g. in object storage, because obviously much less HTTP requests are required to fetch initial cube meta information. That is, it merges all metadata files into a single top-level JSON file “.zmetadata”.

If unchunk_coords is given, it also removes any chunking of coordinate variables so they comprise a single binary data file instead of one file per data chunk. The primary usage of this function is to optimize data cubes for cloud object storage. The function currently works only for data cubes using Zarr format. unchunk_coords can be a name, or list of names of the coordinate variable(s) to be consolidated. If boolean True is used, coordinate all variables will be consolidated.

Parameters:
  • input_path – Path to input dataset with ZARR format.

  • output_path – Path to output dataset with ZARR format. May contain “{input}” template string, which is replaced by the input path’s file name without file name extension.

  • in_place – Whether to modify the dataset in place. If False, a copy is made and output_path must be given.

  • unchunk_coords – The name of a coordinate variable or a list of coordinate variables whose chunks should be consolidated. Pass True to consolidate chunks of all coordinate variables.

  • exception_type – Type of exception to be used on value errors.

Cube Subsetting

xcube.core.select.select_variables_subset(dataset: Dataset, var_names: Collection[str] | None = None) Dataset[source]

Select data variable from given dataset and create new dataset.

Parameters:
  • dataset – The dataset from which to select variables.

  • var_names – The names of data variables to select.

Returns:

A new dataset. It is empty, if var_names is empty. It is dataset, if var_names is None.

xcube.core.geom.clip_dataset_by_geometry(dataset: Dataset, geometry: BaseGeometry | Dict[str, Any] | str | Sequence[float | int], save_geometry_wkt: str | bool = False) Dataset | None[source]

Spatially clip a dataset according to the bounding box of a given geometry.

Parameters:
  • dataset – The dataset

  • geometry – A geometry-like object, see convert_geometry().

  • save_geometry_wkt – If the value is a string, the effective intersection geometry is stored as a Geometry WKT string in the global attribute named by save_geometry. If the value is True, the name “geometry_wkt” is used.

Returns:

The dataset spatial subset, or None if the bounding box of the dataset has a no or a zero area intersection with the bounding box of the geometry.

Cube Masking

xcube.core.geom.mask_dataset_by_geometry(dataset: Dataset, geometry: BaseGeometry | Dict[str, Any] | str | Sequence[float | int], tile_size: int | Tuple[int, int] = None, excluded_vars: Sequence[str] = None, no_clip: bool = False, all_touched: bool = False, save_geometry_mask: str | bool = False, save_geometry_wkt: str | bool = False) Dataset | None[source]

Mask a dataset according to the given geometry. The cells of variables of the returned dataset will have NaN-values where their spatial coordinates are not intersecting the given geometry.

Parameters:
  • dataset – The dataset

  • geometry – A geometry-like object, see convert_geometry().

  • tile_size – If given, the unconditional spatial chunk sizes in x- and y-direction in pixels. May be given as integer scalar or x,y-pair of integers.

  • excluded_vars – Optional sequence of names of data variables that should not be masked (but still may be clipped).

  • no_clip – If True, the function will not clip the dataset before masking, this is, the returned dataset will have the same dimension size as the given dataset.

  • all_touched – If True, all pixels intersected by geometry outlines will be included in the mask. If False, only pixels whose center is within the polygon or that are selected by Bresenham’s line algorithm will be included in the mask. The default value is set to False.

  • save_geometry_mask – If the value is a string, the effective geometry mask array is stored as a 2D data variable named by save_geometry_mask. If the value is True, the name “geometry_mask” is used.

  • save_geometry_wkt – If the value is a string, the effective intersection geometry is stored as a Geometry WKT string in the global attribute named by save_geometry. If the value is True, the name “geometry_wkt” is used.

Returns:

The dataset spatial subset, or None if the bounding box of the dataset has a no or a zero area intersection with the bounding box of the geometry.

class xcube.core.maskset.MaskSet(flag_var: DataArray)[source]

A set of mask variables derived from a variable flag_var with the following CF attributes:

  • One or both of flag_masks and flag_values

  • flag_meanings (always required)

See https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#flags for details on the use of these attributes.

Each mask is represented by an xarray.DataArray, has the name of the flag, is of type numpy.unit8, and has the dimensions of the given flag_var.

Parameters:

flag_var – an xarray.DataArray that defines flag values. The CF attributes flag_meanings and one or both of flag_masks and flag_values are expected to exist and be valid.

classmethod get_mask_sets(dataset: Dataset) Dict[str, MaskSet][source]

For each “flag” variable in given dataset, turn it into a MaskSet, store it in a dictionary.

Parameters:

dataset – The dataset

Returns:

A mapping of flag names to MaskSet. Will be empty if there are no flag variables in dataset.

Rasterisation of Features

xcube.core.geom.rasterize_features(dataset: Dataset, features: pandas.geodataframe.GeoDataFrame | Sequence[Mapping[str, Any]], feature_props: Sequence[str], var_props: Dict[str, Mapping[str, Mapping[str, Any]]] = None, tile_size: int | Tuple[int, int] = None, all_touched: bool = False, in_place: bool = False) Dataset | None[source]

Rasterize feature properties given by feature_props of vector-data features as new variables into dataset.

dataset must have two spatial 1-D coordinates, either lon and lat in degrees, reprojected coordinates, x and y, or similar.

feature_props is a sequence of names of feature properties that must exists in each feature of features.

features may be passed as pandas.GeoDataFrame`` or as an iterable of GeoJSON features.

Using the optional var_props, the properties of newly created variables from feature properties can be specified. It is a mapping of feature property names to mappings of variable properties. Here is an example variable properties mapping::

{
‘name’: ‘land_class’, # (str) - the variable’s name,

# default is the feature property name;

‘dtype’ np.int16, # (str|np.dtype) - the variable’s dtype,

# default is np.float64;

‘fill_value’: -999, # (bool|int|float|np.nparray) -

# the variable’s fill value, # default is np.nan;

‘attrs’: {}, # (Mapping[str, Any]) -

# the variable’s fill value, default is {};

‘converter’: int, # (Callable[[Any], Any]) -

# a converter function used to convert # from property feature value to variable # value, default is float. # Deprecated, no longer used.

}

Note that newly created variables will have data type np.float64 because np.nan is used to encode missing values. fill_value and dtype are used to encode the variables when persisting the data.

Currently, the coordinates of the geometries in the given features must use the same CRS as the given dataset.

Parameters:
  • dataset – The xarray dataset.

  • features – A geopandas.GeoDataFrame instance or a sequence of GeoJSON features.

  • feature_props – Sequence of names of numeric feature properties to be rasterized.

  • var_props – Optional mapping of feature property name to a name or a 5-tuple (name, dtype, fill_value, attributes, converter) for the new variable.

  • tile_size – If given, the unconditional spatial chunk sizes in x- and y-direction in pixels. May be given as integer scalar or x,y-pair of integers.

  • all_touched – If True, all pixels intersected by a feature’s geometry outlines will be included. If False, only pixels whose center is within the feature polygon or that are selected by Bresenham’s line algorithm will be included in the mask. The default is False.

  • in_place – Whether to add new variables to dataset. If False, a copy will be created and returned.

Returns:

dataset with rasterized feature_property

Cube Metadata

xcube.core.update.update_dataset_attrs(dataset: Dataset, global_attrs: Dict[str, Any] = None, update_existing: bool = False, in_place: bool = False) Dataset[source]

Update spatio-temporal CF/THREDDS attributes given dataset according to spatio-temporal coordinate variables time, lat, and lon.

Parameters:
  • dataset – The dataset.

  • global_attrs – Optional global attributes.

  • update_existing – If True, any existing attributes will be updated.

  • in_place – If True, dataset will be modified in place and returned.

Returns:

A new dataset, if in_place if False (default), else the passed and modified dataset.

xcube.core.update.update_dataset_spatial_attrs(dataset: Dataset, update_existing: bool = False, in_place: bool = False) Dataset[source]

Update spatial CF/THREDDS attributes of given dataset.

Parameters:
  • dataset – The dataset.

  • update_existing – If True, any existing attributes will be updated.

  • in_place – If True, dataset will be modified in place and returned.

Returns:

A new dataset, if in_place if False (default), else the passed and modified dataset.

xcube.core.update.update_dataset_temporal_attrs(dataset: Dataset, update_existing: bool = False, in_place: bool = False) Dataset[source]

Update temporal CF/THREDDS attributes of given dataset.

Parameters:
  • dataset – The dataset.

  • update_existing – If True, any existing attributes will be updated.

  • in_place – If True, dataset will be modified in place and returned.

Returns:

A new dataset, if in_place is False (default), else the passed and modified dataset.

Cube verification

xcube.core.verify.assert_cube(dataset: Dataset, name=None) Dataset[source]

Assert that the given dataset is a valid xcube dataset.

Parameters:
  • dataset – The dataset to be validated.

  • name – Optional parameter name.

Raises:

ValueError, if dataset is not a valid xcube dataset

xcube.core.verify.verify_cube(dataset: Dataset) List[str][source]

Verify the given dataset for being a valid xcube dataset.

The tool verifies that dataset * defines two spatial x,y coordinate variables, that are 1D, non-empty, using correct units; * defines a time coordinate variables, that are 1D, non-empty, using correct units; * has valid bounds variables for spatial x,y and time coordinate variables, if any; * has any data variables and that they are valid, e.g. min. 3-D, all have

same dimensions, have at least the dimensions dim(time), dim(y), dim(x) in that order.

Returns a list of issues, which is empty if dataset is a valid xcube dataset.

Parameters:

dataset – A dataset to be verified.

Returns:

List of issues or empty list.

Multi-Resolution Datasets

class xcube.core.mldataset.MultiLevelDataset[source]

A multi-level dataset of decreasing spatial resolutions (a multi-resolution pyramid).

The pyramid level at index zero provides the original spatial dimensions. The size of the spatial dimensions in subsequent levels is computed by the formula size[index + 1] = (size[index] + 1) // 2 with size[index] being the maximum size of the spatial dimensions at level zero.

Any dataset chunks are assumed to be the same in all levels. Usually, the number of chunks is one in one of the spatial dimensions of the highest level.

abstract property ds_id: str

Returns: the dataset identifier.

abstract property grid_mapping: GridMapping

Returns: the CF-conformal grid mapping

abstract property num_levels: int

Returns: the number of pyramid levels.

property resolutions: Sequence[Tuple[float, float]]

Returns: the x,y resolutions for each level given in the spatial units of the dataset’s CRS (i.e. self.grid_mapping.crs).

property avg_resolutions: Sequence[float]

Returns: the average x,y resolutions for each level given in the spatial units of the dataset’s CRS (i.e. self.grid_mapping.crs).

property base_dataset: Dataset

Returns: the base dataset for lowest level at index 0.

property datasets: Sequence[Dataset]

Get datasets for all levels.

Calling this method will trigger any lazy dataset instantiation.

Returns:

the datasets for all levels.

abstract get_dataset(index: int) Dataset[source]
Parameters:

index – the level index

Returns:

the dataset for the level at index.

close()[source]

Close all datasets. Default implementation does nothing.

apply(function: Callable[[Dataset, Dict[str, Any]], Dataset], kwargs: Dict[str, Any] = None, ds_id: str = None) MultiLevelDataset[source]

Apply function to all level datasets and return a new multi-level dataset.

derive_tiling_scheme(tiling_scheme: TilingScheme)[source]

Derive a new tiling scheme for the given one with defined minimum and maximum level indices.

get_level_for_resolution(xy_res: float | Tuple[float, float]) int[source]

Get the index of the level that best represents the given resolution.

Parameters:

xy_res – the resolution in x- and y-direction

Returns:

a level ranging from 0 to self.num_levels - 1

class xcube.core.mldataset.BaseMultiLevelDataset(base_dataset: Dataset, grid_mapping: GridMapping | None = None, num_levels: int | None = None, agg_methods: None | str | Mapping[str, None | str] = 'first', ds_id: str | None = None)[source]

A multi-level dataset whose level datasets are created by down-sampling a base dataset.

Parameters:
  • base_dataset – The base dataset for the level at index zero.

  • grid_mapping – Optional grid mapping for base_dataset.

  • num_levels – Optional number of levels.

  • ds_id – Optional dataset identifier.

  • agg_methods – Optional aggregation methods. May be given as string or as mapping from variable name pattern to aggregation method. Valid aggregation methods are None, “first”, “min”, “max”, “mean”, “median”. If None, the default, “first” is used for integer variables and “mean” for floating point variables.

class xcube.core.mldataset.CombinedMultiLevelDataset(ml_datasets: Sequence[MultiLevelDataset], ds_id: str | None = None, combiner_function: Callable | None = None, combiner_params: Dict[str, Any] | None = None)[source]

A multi-level dataset that is a combination of other multi-level datasets.

Parameters:
  • ml_datasets – The multi-level datasets to be combined. At least two must be provided.

  • ds_id – Optional dataset identifier.

  • combiner_function – An optional function used to combine the datasets, for example xarray.merge. If given, it receives a list of datasets (xarray.Dataset instances) and combiner_params as keyword arguments. If not given or None is passed, a copy of the first dataset is made, which is then subsequently updated by the remaining datasets using xarray.Dataset.update().

  • combiner_params – Parameters to the combiner_function passed as keyword arguments.

close()[source]

Close all datasets. Default implementation does nothing.

class xcube.core.mldataset.ComputedMultiLevelDataset(script_path: str, callable_name: str, input_ml_dataset_ids: ~typing.Sequence[str], input_ml_dataset_getter: ~typing.Callable[[str], ~xcube.core.mldataset.abc.MultiLevelDataset], input_parameters: ~typing.Mapping[str, ~typing.Any] | None = None, ds_id: str = '', exception_type: type = <class 'ValueError'>)[source]

A multi-level dataset whose level datasets are computed by a user function.

The script can import other Python modules located in the same directory as script_path.

class xcube.core.mldataset.FsMultiLevelDataset(path: str, fs: AbstractFileSystem | None = None, fs_root: str | None = None, fs_kwargs: Mapping[str, Any] | None = None, cache_size: int | None = None, consolidate: bool | None = None, **zarr_kwargs)[source]
property size_weights: ndarray

Size weights are used to distribute the cache size over the levels.

class xcube.core.mldataset.IdentityMultiLevelDataset(ml_dataset: MultiLevelDataset, ds_id: str = None)[source]

The identity.

class xcube.core.mldataset.LazyMultiLevelDataset(grid_mapping: GridMapping | None = None, num_levels: int | None = None, ds_id: str | None = None, parameters: Mapping[str, Any] | None = None)[source]

A multi-level dataset where each level dataset is lazily retrieved, i.e. read or computed by the abstract method get_dataset_lazily(index, **kwargs).

Parameters:
  • ds_id – Optional dataset identifier.

  • parameters – Optional keyword arguments that will be passed to the get_dataset_lazily method.

property ds_id: str

Returns: the dataset identifier.

property grid_mapping: GridMapping

Returns: the CF-conformal grid mapping

property num_levels: int

Returns: the number of pyramid levels.

property lock: RLock

Get the reentrant lock used by this object to synchronize lazy instantiation of properties.

get_dataset(index: int) Dataset[source]

Get or compute the dataset for the level at given index.

Parameters:

index – the level index

Returns:

the dataset for the level at index.

set_dataset(index: int, level_dataset: Dataset)[source]

Set the dataset for the level at given index.

Callers need to ensure that the given level_dataset has the correct spatial dimension sizes for the given level at index.

Parameters:
  • index – the level index

  • level_dataset – the dataset for the level at index.

close()[source]

Close all datasets. Default implementation does nothing.

class xcube.core.mldataset.MappedMultiLevelDataset(ml_dataset: MultiLevelDataset, mapper_function: Callable[[Dataset], Dataset], ds_id: str = None, mapper_params: Dict[str, Any] = None)[source]
close()[source]

Close all datasets. Default implementation does nothing.

Zarr Store

class xcube.core.zarrstore.ZarrStoreHolder(dataset: Dataset)[source]

Represents a xarray dataset property zarr_store.

It is used to permanently associate a dataset with its Zarr store, which would otherwise not be possible.

In xcube server, we use the new property to expose datasets via the S3 emulation API.

For that concept to work, datasets must be associated with their Zarr stores explicitly. Therefore, the xcube data store framework sets the Zarr stores of datasets after opening them xr.open_zarr()::

dataset = xr.open_zarr(zarr_store, **open_params)
dataset.zarr_store.set(zarr_store)

Note, that the dataset may change after the Zarr store has been set, so that the dataset and its Zarr store are no longer in sync. This may be an issue and limit the application of the new property.

Parameters:

dataset – The xarray dataset that is associated with a Zarr store.

get() MutableMapping[source]

Get the Zarr store of a dataset. If no Zarr store has been set, the method will use GenericZarrStore.from_dataset() to create and set one.

Returns:

The Zarr store.

set(zarr_store: MutableMapping) None[source]

Set the Zarr store of a dataset.

Parameters:

zarr_store – The Zarr store.

reset() None[source]

Resets the Zarr store.

class xcube.core.zarrstore.GenericZarrStore(*arrays: GenericArray | Dict[str, Any], attrs: Dict[str, Any] | None = None, array_defaults: GenericArray | Dict[str, Any] | None = None)[source]

A Zarr store that maintains generic arrays in a flat, top-level hierarchy. The root of the store is a Zarr group conforming to the Zarr spec v2.

It is designed to serve as a Zarr store for xarray datasets that compute their data arrays dynamically.

See class GenericArray for specifying the arrays’ properties.

The array data of this store’s arrays are either retrieved from static (numpy) arrays or from a callable that provides the array’s data chunks as bytes or numpy arrays.

Parameters:
  • arrays – Arrays to be added. Typically, these will be instances of GenericArray.

  • attrs – Optional attributes of the top-level group. If given, it must be JSON serializable.

  • array_defaults – Optional array defaults for array properties not passed to add_array. Typically, this will be an instance of GenericArray.

Array

alias of GenericArray

add_array(array: GenericArray | Dict[str, Any] | None = None, **array_kwargs) None[source]

Add a new array to this store.

Parameters:
  • array – Optional array properties. Typically, this will be an instance of GenericArray.

  • array_kwargs – Keyword arguments form for the properties of GenericArray.

is_writeable() bool[source]

Return False, because arrays in this store are generative.

listdir(path: str = '') List[str][source]

List a store path.

Parameters:

path – The path.

Returns: List of sorted directory entries.

rmdir(path: str = '') None[source]

The general form removes store paths. This implementation can remove entire arrays only.

Parameters:

path – The array’s name.

rename(src_path: str, dst_path: str) None[source]

The general form renames store paths. This implementation can rename arrays only.

Parameters:
  • src_path – Source array name.

  • dst_path – Target array name.

close() None[source]

Calls the “on_close” handlers, if any, of arrays.

classmethod from_dataset(dataset: Dataset, array_defaults: GenericArray | Dict[str, Any] | None = None) GenericZarrStore[source]

Create a Zarr store for given dataset. to the dataset’s attributes. The following array_defaults properties can be provided (other properties are prescribed by the dataset):

  • fill_value- defaults to None

  • compressor- defaults to None

  • filters- defaults to None

  • order- defaults to “C”

  • chunk_encoding - defaults to “bytes”

Parameters:
  • dataset – The dataset

  • array_defaults – Array default values.

Returns: A new Zarr store instance.

class xcube.core.zarrstore.GenericArray(array: Dict[str, any] | None = None, name: str | None = None, get_data: Callable[[Tuple[int]], bytes | ndarray] | None = None, get_data_params: Dict[str, Any] | None = None, data: ndarray | None = None, dtype: str | dtype | None = None, dims: str | Sequence[str] | None = None, shape: Sequence[int] | None = None, chunks: Sequence[int] | None = None, fill_value: bool | int | float | str | None = None, compressor: Codec | None = None, filters: Sequence[Codec] | None = None, order: str | None = None, attrs: Dict[str, Any] | None = None, on_close: Callable[[Dict[str, Any]], None] | None = None, chunk_encoding: str | None = None, **kwargs)[source]

Represent a generic array in the GenericZarrStore as dictionary of properties.

Although all properties of this class are optional, some of them are mandatory when added to the GenericZarrStore.

When added to the store using GenericZarrStore.add_array(), the array name and dims must always be present. Other mandatory properties depend on the data and get_data properties, which are mutually exclusive:

  • get_data is called for a requested data chunk of an array. It must return a bytes object or a numpy nd-array and is passed the chunk index, the chunk shape, and this array info dictionary. get_data requires the following properties to be present too: name, dims, dtype, shape. chunks is optional and defaults to shape.

  • data must be a bytes object or a numpy nd-array. data requires the following properties to be present too: name, dims. chunks must be same as shape.

The function get_data receives only keyword-arguments which comprises the ones passed by get_data_params, if any, and two special ones which may occur in the signature of get_data:

  • The keyword argument chunk_info, if given, provides a dictionary that holds information about the current chunk: - index: tuple[int, ...] - the chunk’s index - shape: tuple[int, ...] - the chunk’s shape - slices: tuple[slice, ...] - the chunk’s array slices

  • The keyword argument array_info, if given, provides a dictionary that holds information about the overall array. It contains all array properties passed to the constructor of GenericArray plus - ndim: int - number of dimensions - num_chunks: tuple[int, ...] - number of chunks in every dimension

GenericZarrStore will convert a Numpy array returned by get_data or given by data into a bytes object. It will also be compressed, if a compressor is given. It is important that the array chunks always See also https://zarr.readthedocs.io/en/stable/spec/v2.html#chunks

Note that if the value of a named keyword argument is None, it will not be stored.

Parameters:
  • array – Optional array info dictionary

  • name – Optional array name

  • data – Optional array data. Mutually exclusive with get_data. Must be a bytes object or a numpy array.

  • get_data – Optional array data chunk getter. Mutually exclusive with data. Called for a requested data chunk of an array. Must return a bytes object or a numpy array.

  • get_data_params – Optional keyword-arguments passed to get_data.

  • dtype – Optional array data type. Either a string using syntax of the Zarr spec or a numpy.dtype. For string encoded data types, see https://zarr.readthedocs.io/en/stable/spec/v2.html#data- type-encoding

  • dims – Optional sequence of dimension names.

  • shape – Optional sequence of shape sizes for each dimension.

  • chunks – Optional sequence of chunk sizes for each dimension.

  • fill_value – Optional fill value, see https://zarr.readthedocs.io/en/stable/spec/v2.html#fill- value-encoding

  • compressor – Optional compressor. If given, it must be an instance of numcodecs.abc.Codec.

  • filters – Optional sequence of filters, see https://zarr.readthedocs.io/en/stable/spec/v2.html#filters.

  • order – Optional array endian ordering. If given, must be “C” or “F”. Defaults to “C”.

  • attrs – Optional array attributes. If given, must be JSON- serializable.

  • on_close – Optional array close handler. Called if the store is closed.

  • chunk_encoding – Optional encoding type of the chunk data returned for the array. Can be “bytes” (the default) or “ndarray” for array chunks that are numpy.ndarray instances.

  • kwargs – Other keyword arguments passed directly to the dictionary constructor.

finalize() GenericArray[source]

Normalize and validate array properties and return a valid array info dictionary to be stored in the GenericZarrStore.

class xcube.core.zarrstore.CachedZarrStore(store: MutableMapping, cache: MutableMapping)[source]

A read-only Zarr store that is faster than store because it uses a writable cache store.

The cache store is assumed to read values for a given key much faster than store.

Note that iterating keys and containment checks are performed on store only.

Parameters:
  • store – A Zarr store that is known to be slow in reading values.

  • cache – A writable Zarr store that can read values faster than store.

class xcube.core.zarrstore.DiagnosticZarrStore(store: MutableMapping)[source]

A diagnostic Zarr store used for testing and investigating behaviour of Zarr and xarray’s Zarr backend.

Parameters:

store – Wrapped Zarr store.

keys() a set-like object providing a view on D's keys[source]

Utilities

class xcube.core.gridmapping.GridMapping(size: int | Tuple[int, int], tile_size: int | Tuple[int, int] | None, xy_bbox: Tuple[int | float, int | float, int | float, int | float], xy_res: int | float | Tuple[int | float, int | float], crs: CRS, xy_var_names: Tuple[str, str], xy_dim_names: Tuple[str, str], is_regular: bool | None, is_lon_360: bool | None, is_j_axis_up: bool | None, x_coords: DataArray | None = None, y_coords: DataArray | None = None, xy_coords: DataArray | None = None)[source]

An abstract base class for grid mappings that define an image grid and a transformation from image pixel coordinates to spatial Earth coordinates defined in a well-known coordinate reference system (CRS).

This class cannot be instantiated directly. Use one of its factory methods to create instances:

Some instance methods can be used to derive new instances:

This class is thread-safe.

derive(xy_var_names: Tuple[str, str] = None, xy_dim_names: Tuple[str, str] = None, tile_size: int | Tuple[int, int] = None, is_j_axis_up: bool = None)[source]

Derive a new grid mapping from this one with some properties changed.

Parameters:
  • xy_var_names – The new x-, and y-variable names.

  • xy_dim_names – The new x-, and y-dimension names.

  • tile_size – The new tile size

  • is_j_axis_up – Whether j-axis points up.

Returns:

A new, derived grid mapping.

scale(xy_scale: int | float | Tuple[int | float, int | float], tile_size: int | Tuple[int, int] = None) GridMapping[source]

Derive a scaled version of this regular grid mapping.

Scaling factors lower than one correspond to up-scaling (pixels sizes decrease, image size increases).

Scaling factors larger than one correspond to down-scaling. (pixels sizes increase, image size decreases).

Parameters:
  • xy_scale – The x-, and y-scaling factors. May be a single number or tuple.

  • tile_size – The new tile size

Returns:

A new, scaled grid mapping.

property size: Tuple[int, int]

Image size (width, height) in pixels.

property width: int

Image width in pixels.

property height: int

Image height in pixels.

property tile_size: Tuple[int, int]

Image tile size (width, height) in pixels.

property is_tiled: bool

Whether the image is tiled.

property tile_width: int

Image tile width in pixels.

property tile_height: int

Image tile height in pixels.

property x_coords

The 1D or 2D x-coordinate array of shape (width,) or (height, width).

property y_coords

The 1D or 2D y-coordinate array of shape (width,) or (height, width).

property xy_coords: DataArray

The x,y coordinates as data array of shape (2, height, width). Coordinates are given in units of the CRS.

property xy_coords_chunks: Tuple[int, int, int]

Get the chunks for the xy_coords array.

property xy_var_names: Tuple[str, str]

The variable names of the x,y coordinates as tuple (x_var_name, y_var_name).

property xy_dim_names: Tuple[str, str]

The dimension names of the x,y coordinates as tuple (x_dim_name, y_dim_name).

property xy_bbox: Tuple[float, float, float, float]

The image’s bounding box in CRS coordinates.

property x_min: int | float

Minimum x-coordinate in CRS units.

property y_min: int | float

Minimum y-coordinate in CRS units.

property x_max: int | float

Maximum x-coordinate in CRS units.

property y_max: int | float

Maximum y-coordinate in CRS units.

property xy_res: Tuple[int | float, int | float]

Pixel size in x and y direction.

property x_res: int | float

Pixel size in CRS units per pixel in x-direction.

property y_res: int | float

Pixel size in CRS units per pixel in y-direction.

property crs: CRS

The coordinate reference system.

property is_lon_360: bool | None

Check whether x_max is greater than 180 degrees. Effectively tests whether the range x_min, x_max crosses the anti-meridian at 180 degrees. Works only for geographical coordinate reference systems.

property is_regular: bool | None

Do the x,y coordinates for a regular grid? A regular grid has a constant delta in both x- and y-directions of the x- and y-coordinates.

Returns: None, if this property cannot be determined,

True or False otherwise.

property is_j_axis_up: bool | None

Does the positive image j-axis point up? By default, the positive image j-axis points down.

Returns: None, if this property cannot be determined,

True or False otherwise.

property ij_to_xy_transform: Tuple[Tuple[int | float, int | float, int | float], Tuple[int | float, int | float, int | float]]

The affine transformation matrix from image to CRS coordinates. Defined only for grid mappings with rectified x,y coordinates.

property xy_to_ij_transform: Tuple[Tuple[int | float, int | float, int | float], Tuple[int | float, int | float, int | float]]

The affine transformation matrix from CRS to image coordinates. Defined only for grid mappings with rectified x,y coordinates.

ij_transform_to(other: GridMapping) Tuple[Tuple[int | float, int | float, int | float], Tuple[int | float, int | float, int | float]][source]

Get the affine transformation matrix that transforms image coordinates of other into image coordinates of this image geometry.

Defined only for grid mappings with rectified x,y coordinates.

Parameters:

other – The other image geometry

Returns:

Affine transformation matrix

ij_transform_from(other: GridMapping) Tuple[Tuple[int | float, int | float, int | float], Tuple[int | float, int | float, int | float]][source]

Get the affine transformation matrix that transforms image coordinates of this image geometry to image coordinates of other.

Defined only for grid mappings with rectified x,y coordinates.

Parameters:

other – The other image geometry

Returns:

Affine transformation matrix

property ij_bbox: Tuple[int, int, int, int]

The image’s bounding box in pixel coordinates.

property ij_bboxes: ndarray

The image tiles’ bounding boxes in image pixel coordinates.

property xy_bboxes: ndarray

The image tiles’ bounding boxes in CRS coordinates.

ij_bbox_from_xy_bbox(xy_bbox: Tuple[float, float, float, float], xy_border: float = 0.0, ij_border: int = 0) Tuple[int, int, int, int][source]

Compute bounding box in i,j pixel coordinates given a bounding box xy_bbox in x,y coordinates.

Parameters:
  • xy_bbox – Box (x_min, y_min, x_max, y_max) given in the same CS as x and y.

  • xy_border – If non-zero, grows the bounding box xy_bbox before using it for comparisons. Defaults to 0.

  • ij_border – If non-zero, grows the returned i,j bounding box and clips it to size. Defaults to 0.

Returns:

Bounding box in (i_min, j_min, i_max, j_max) in pixel coordinates. Returns (-1, -1, -1, -1) if xy_bbox isn’t intersecting any of the x,y coordinates.

ij_bboxes_from_xy_bboxes(xy_bboxes: ndarray, xy_border: float = 0.0, ij_border: int = 0, ij_bboxes: ndarray = None) ndarray[source]

Compute bounding boxes in pixel coordinates given bounding boxes xy_bboxes [[x_min, y_min, x_max, y_max], …] in x,y coordinates.

The returned array in i,j pixel coordinates has the same shape as xy_bboxes. The value ranges in the returned array [[i_min, j_min, i_max, j_max], ..]] are:

  • i_min from 0 to width-1, i_max from 1 to width;

  • j_min from 0 to height-1, j_max from 1 to height;

so the i,j pixel coordinates can be used as array index slices.

Parameters:
  • xy_bboxes – Numpy array of x,y bounding boxes [[x_min, y_min, x_max, y_max], …] given in the same CS as x and y.

  • xy_border – If non-zero, grows the bounding box xy_bbox before using it for comparisons. Defaults to 0.

  • ij_border – If non-zero, grows the returned i,j bounding box and clips it to size. Defaults to 0.

  • ij_bboxes – Numpy array of pixel i,j bounding boxes [[x_min, y_min, x_max, y_max], …]. If given, must have same shape as xy_bboxes.

Returns:

Bounding boxes in [[i_min, j_min, i_max, j_max], ..]] in pixel coordinates.

to_dataset_attrs() Mapping[str, Any][source]

Get spatial dataset attributes as recommended by https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3#Recommended

Returns:

dictionary with dataset coordinate attributes.

to_coords(xy_var_names: Tuple[str, str] = None, xy_dim_names: Tuple[str, str] = None, exclude_bounds: bool = False, reuse_coords: bool = False) Mapping[str, DataArray][source]

Get CF-compliant axis coordinate variables and cell boundary coordinate variables.

Defined only for grid mappings with regular x,y coordinates.

Parameters:
  • xy_var_names – Optional coordinate variable names (x_var_name, y_var_name).

  • xy_dim_names – Optional coordinate dimensions names (x_dim_name, y_dim_name).

  • exclude_bounds – If True, do not create bounds coordinates. Defaults to False.

  • reuse_coords – Whether to either reuse target coordinate arrays from target_gm or to compute new ones.

Returns:

dictionary with coordinate variables

transform(crs: str | CRS, *, tile_size: int | Tuple[int, int] = None, xy_var_names: Tuple[str, str] = None, tolerance: float = 1e-05) GridMapping[source]

Transform this grid mapping so it uses the given spatial coordinate reference system into another crs.

Parameters:
  • crs – The new spatial coordinate reference system.

  • tile_size – Optional new tile size.

  • xy_var_names – Optional new coordinate names.

  • tolerance – Absolute tolerance used when comparing coordinates with each other. Must be in the units of the crs and must be greater zero.

Returns:

A new grid mapping that uses crs.

classmethod regular(size: int | Tuple[int, int], xy_min: Tuple[float, float], xy_res: float | Tuple[float, float], crs: str | CRS, *, tile_size: int | Tuple[int, int] = None, is_j_axis_up: bool = False) GridMapping[source]

Create a new regular grid mapping.

Parameters:
  • size – Size in pixels.

  • xy_min – Minimum x- and y-coordinates.

  • xy_res – Resolution in x- and y-directions.

  • crs – Spatial coordinate reference system.

  • tile_size – Optional tile size.

  • is_j_axis_up – Whether positive j-axis points up. Defaults to false.

Returns:

A new regular grid mapping.

to_regular(tile_size: int | Tuple[int, int] = None, is_j_axis_up: bool = False) GridMapping[source]

Transform this grid mapping into one that is regular.

Parameters:
  • tile_size – Optional tile size.

  • is_j_axis_up – Whether positive j-axis points up. Defaults to false.

Returns:

A new regular grid mapping or this grid mapping, if it is already regular.

classmethod from_dataset(dataset: Dataset, *, crs: str | CRS = None, tile_size: int | Tuple[int, int] = None, prefer_is_regular: bool = True, prefer_crs: str | CRS = None, emit_warnings: bool = False, tolerance: float = 1e-05) GridMapping[source]

Create a grid mapping for the given dataset.

Parameters:
  • dataset – The dataset.

  • crs – Optional spatial coordinate reference system.

  • tile_size – Optional tile size

  • prefer_is_regular – Whether to prefer a regular grid mapping if multiple found. Default is True.

  • prefer_crs – The preferred CRS of a grid mapping if multiple found.

  • emit_warnings – Whether to emit warning for non-CF compliant datasets.

  • tolerance – Absolute tolerance used when comparing coordinates with each other. Must be in the units of the crs and must be greater zero.

Returns:

a new grid mapping instance.

classmethod from_coords(x_coords: DataArray, y_coords: DataArray, crs: str | CRS, *, tile_size: int | Tuple[int, int] = None, tolerance: float = 1e-05) GridMapping[source]

Create a grid mapping from given x- and y-coordinates x_coords, y_coords and spatial coordinate reference system crs.

Parameters:
  • x_coords – The x-coordinates.

  • y_coords – The y-coordinates.

  • crs – The spatial coordinate reference system.

  • tile_size – Optional tile size.

  • tolerance – Absolute tolerance used when comparing coordinates with each other. Must be in the units of the crs and must be greater zero.

Returns:

A new grid mapping.

is_close(other: GridMapping, tolerance: float = 1e-05) bool[source]

Tests whether this grid mapping is close to other.

Parameters:
  • other – The other grid mapping.

  • tolerance – Absolute tolerance used when comparing coordinates with each other. Must be in the units of the crs and must be greater zero.

Returns:

True, if so, False otherwise.

xcube.core.geom.convert_geometry(geometry: BaseGeometry | Dict[str, Any] | str | Sequence[float | int] | None) BaseGeometry | None[source]

Convert a geometry-like object into a shapely geometry object (shapely.geometry.BaseGeometry).

A geometry-like object may be any shapely geometry object, * a dictionary that can be serialized to valid GeoJSON, * a WKT string, * a box given by a string of the form “<x1>,<y1>,<x2>,<y2>”

or by a sequence of four numbers x1, y1, x2, y2,

  • a point by a string of the form “<x>,<y>” or by a sequence of two numbers x, y.

Handling of geometries crossing the anti-meridian:

  • If box coordinates are given, it is allowed to pass x1, x2 where x1 > x2, which is interpreted as a box crossing the anti-meridian. In this case the function splits the box along the anti-meridian and returns a multi-polygon.

  • In all other cases, 2D geometries are assumed to _not cross the anti-meridian at all_.

Parameters:

geometry – A geometry-like object

Returns:

Shapely geometry object or None.

class xcube.core.schema.CubeSchema(shape: Sequence[int], coords: Mapping[str, DataArray], x_name: str = 'lon', y_name: str = 'lat', time_name: str = 'time', dims: Sequence[str] = None, chunks: Sequence[int] = None)[source]

A schema that can be used to create new xcube datasets. The given shape, dims, and chunks, coords apply to all data variables.

Parameters:
  • shape – A tuple of dimension sizes.

  • coords – A dictionary of coordinate variables. Must have values for all dims.

  • dims – A sequence of dimension names. Defaults to ('time', 'lat', 'lon').

  • chunks – A tuple of chunk sizes in each dimension.

property ndim: int

Number of dimensions.

property dims: Tuple[str, ...]

Tuple of dimension names.

property x_name: str

Name of the spatial x coordinate variable.

property y_name: str

Name of the spatial y coordinate variable.

property time_name: str

Name of the time coordinate variable.

property x_var: DataArray

Spatial x coordinate variable.

property y_var: DataArray

Spatial y coordinate variable.

property time_var: DataArray

Time coordinate variable.

property x_dim: str

Name of the spatial x dimension.

property y_dim: str

Name of the spatial y dimension.

property time_dim: str

Name of the time dimension.

property x_size: int

Size of the spatial x dimension.

property y_size: int

Size of the spatial y dimension.

property time_size: int

Size of the time dimension.

property shape: Tuple[int, ...]

Tuple of dimension sizes.

property chunks: Tuple[int] | None

Tuple of dimension chunk sizes.

property coords: Dict[str, DataArray]

Dictionary of coordinate variables.

classmethod new(cube: Dataset) CubeSchema[source]

Create a cube schema from given cube.

xcube.util.dask.new_cluster(provider: str = 'coiled', name: str | None = None, software: str | None = None, n_workers: int = 4, resource_tags: Dict[str, str] | None = None, account: str = None, region: str = 'eu-central-1', **kwargs) Cluster[source]

Create a new Dask cluster.

Cloud resource tags can be specified in an environment variable XCUBE_DASK_CLUSTER_TAGS in the format tag_1=value_1:tag_2=value_2:...:tag_n=value_n. In case of conflicts, tags specified in resource_tags will override tags specified by the environment variable.

The cluster provider account name can be specified in an environment variable XCUBE_DASK_CLUSTER_ACCOUNT. If the account argument is given to new_cluster, it will override the value from the environment variable.

Parameters:
  • provider – identifier of the provider to use. Currently, only ‘coiled’ is supported.

  • name – name to use as an identifier for the cluster

  • software – identifier for the software environment to be used.

  • n_workers – number of workers in the cluster

  • resource_tags – tags to apply to the cloud resources forming the cluster

  • account – cluster provider account name

  • **kwargs – further named arguments will be passed on to the cluster creation function

  • region – default region where workers of the cluster will be deployed set to eu-central-1

Plugin Development

class xcube.util.extension.ExtensionRegistry[source]

A registry of extensions. Typically used by plugins to register extensions.

has_extension(point: str, name: str) bool[source]

Test if an extension with given point and name is registered.

Parameters:
  • point – extension point identifier

  • name – extension name

Returns:

True, if extension exists

get_extension(point: str, name: str) Extension | None[source]

Get registered extension for given point and name.

Parameters:
  • point – extension point identifier

  • name – extension name

Returns:

the extension or None, if no such exists

get_component(point: str, name: str) Any[source]

Get extension component for given point and name. Raises a ValueError if no such extension exists.

Parameters:
  • point – extension point identifier

  • name – extension name

Returns:

extension component

find_extensions(point: str, predicate: Callable[[Extension], bool] = None) List[Extension][source]

Find extensions for point and optional filter function predicate.

The filter function is called with an extension and should return a truth value to indicate a match or mismatch.

Parameters:
  • point – extension point identifier

  • predicate – optional filter function

Returns:

list of matching extensions

find_components(point: str, predicate: Callable[[Extension], bool] = None) List[Any][source]

Find extension components for point and optional filter function predicate.

The filter function is called with an extension and should return a truth value to indicate a match or mismatch.

Parameters:
  • point – extension point identifier

  • predicate – optional filter function

Returns:

list of matching extension components

add_extension(point: str, name: str, component: Any = None, loader: Callable[[Extension], Any] = None, **metadata) Extension[source]

Register an extension component or an extension component loader for the given extension point, name, and additional metadata.

Either component or loader must be specified, but not both.

A given loader must be a callable with one positional argument extension of type Extension and is expected to return the actual extension component, which may be of any type. The loader will only be called once and only when the actual extension component is requested for the first time. Consider using the function import_component() to create a loader that lazily imports a component from a module and optionally executes it.

Parameters:
  • point – extension point identifier

  • name – extension name

  • component – extension component

  • loader – extension component loader function

  • **metadata – extension metadata

Returns:

a registered extension

remove_extension(point: str, name: str)[source]

Remove registered extension name from given point.

Parameters:
  • point – extension point identifier

  • name – extension name

to_dict()[source]

Get a JSON-serializable dictionary representation of this extension registry.

class xcube.util.extension.Extension(point: str, name: str, component: Any = None, loader: Callable[[Extension], Any] = None, **metadata)[source]

An extension that provides a component of any type.

Extensions are registered in a ExtensionRegistry.

Extension objects are not meant to be instantiated directly. Instead, ExtensionRegistry#add_extension() is used to register extensions.

Parameters:
  • point – extension point identifier

  • name – extension name

  • component – extension component

  • loader – extension component loader function

  • metadata – extension metadata

property is_lazy: bool

Whether this is a lazy extension that uses a loader.

property component: Any

Extension component.

property point: str

Extension point identifier.

property name: str

Extension name.

property metadata: Dict[str, Any]

Extension metadata.

to_dict() Dict[str, Any][source]

Get a JSON-serializable dictionary representation of this extension.

xcube.util.extension.import_component(spec: str, transform: Callable[[Any, Extension], Any] = None, call: bool = False, call_args: Sequence[Any] = None, call_kwargs: Mapping[str, Any] = None) Callable[[Extension], Any][source]

Return a component loader that imports a module or module component from spec. To import a module, spec should be the fully qualified module name. To import a component, spec must also append the component name to the fully qualified module name separated by a color (“:”) character.

An optional transform callable my be used to transform the imported component. If given, a new component is computed:

component = transform(component, extension)

If the call flag is set, the component is expected to be a callable which will be called using the given call_args and call_kwargs to produce a new component:

component = component(*call_kwargs, **call_kwargs)

Finally, the component is returned.

Parameters:
  • spec – String of the form “module_path” or “module_path:component_name”

  • transform – callable that takes two positional arguments, the imported component and the extension of type Extension

  • call – Whether to finally call the component with given call_args and call_kwargs

  • call_args – arguments passed to a callable component if call flag is set

  • call_kwargs – keyword arguments passed to callable component if call flag is set

Returns:

a component loader

xcube.constants.EXTENSION_POINT_INPUT_PROCESSORS = 'xcube.core.gen.iproc'

The extension point identifier for input processor extensions

xcube.constants.EXTENSION_POINT_DATASET_IOS = 'xcube.core.dsio'

The extension point identifier for dataset I/O extensions

xcube.constants.EXTENSION_POINT_CLI_COMMANDS = 'xcube.cli'

The extension point identifier for CLI command extensions

xcube.util.plugin.get_extension_registry() ExtensionRegistry[source]

Get populated extension registry.

xcube.util.plugin.get_plugins() Dict[str, Dict][source]

Get mapping of “xcube_plugins” entry point names to JSON-serializable plugin meta-information.