Python API
Data Store Framework
Functions
Classes
Cube generation
- xcube.core.new.new_cube(title='Test Cube', width=360, height=180, x_name='lon', y_name='lat', x_dtype='float64', y_dtype=None, x_units='degrees_east', y_units='degrees_north', x_res=1.0, y_res=None, x_start=-180.0, y_start=-90.0, inverse_y=False, time_name='time', time_dtype='datetime64[s]', time_units='seconds since 1970-01-01T00:00:00', time_calendar='proleptic_gregorian', time_periods=5, time_freq='D', time_start='2010-01-01T00:00:00', use_cftime=False, drop_bounds=False, variables=None, crs=None, crs_name=None)[source]
Create a new empty cube. Useful for creating cubes templates with predefined coordinate variables and metadata. The function is also heavily used by xcube’s unit tests.
The values of the variables dictionary can be either constants, array-like objects, or functions that compute their return value from passed coordinate indexes. The expected signature is::
def my_func(time: int, y: int, x: int) -> Union[bool, int, float]
- Parameters:
title (
str
) – A title. Defaults to ‘Test Cube’.width (
int
) – Horizontal number of grid cells. Defaults to 360.height (
int
) – Vertical number of grid cells. Defaults to 180.x_name (
str
) – Name of the x coordinate variable. Defaults to ‘lon’.y_name (
str
) – Name of the y coordinate variable. Defaults to ‘lat’.x_dtype (
str
) – Data type of x coordinates. Defaults to ‘float64’.y_dtype – Data type of y coordinates. Defaults to ‘float64’.
x_units (
str
) – Units of the x coordinates. Defaults to ‘degrees_east’.y_units (
str
) – Units of the y coordinates. Defaults to ‘degrees_north’.x_start (
float
) – Minimum x value. Defaults to -180.y_start (
float
) – Minimum y value. Defaults to -90.x_res (
float
) – Spatial resolution in x-direction. Defaults to 1.0.y_res – Spatial resolution in y-direction. Defaults to 1.0.
inverse_y (
bool
) – Whether to create an inverse y axis. Defaults to False.time_name (
str
) – Name of the time coordinate variable. Defaults to ‘time’.time_periods (
int
) – Number of time steps. Defaults to 5.time_freq (
str
) – Duration of each time step. Defaults to `1D’.time_start (
str
) – First time value. Defaults to ‘2010-01-01T00:00:00’.time_dtype (
str
) – Numpy data type for time coordinates. Defaults to ‘datetime64[s]’. If used, parameter ‘use_cftime’ must be False.time_units (
str
) – Units for time coordinates. Defaults to ‘seconds since 1970-01-01T00:00:00’.time_calendar (
str
) – Calender for time coordinates. Defaults to ‘proleptic_gregorian’.use_cftime (
bool
) – If True, the time will be given as data types according to the ‘cftime’ package. If used, the time_calendar parameter must be also be given with an appropriate value such as ‘gregorian’ or ‘julian’. If used, parameter ‘time_dtype’ must be None.drop_bounds (
bool
) – If True, coordinate bounds variables are not created. Defaults to False.variables – Dictionary of data variables to be added. None by default.
crs – pyproj-compatible CRS string or instance of pyproj.CRS or None
crs_name – Name of the variable that will hold the CRS information. Ignored, if crs is not given.
- Returns:
A cube instance
Cube computation
- xcube.core.evaluate.evaluate_dataset(dataset: Dataset, processed_variables: List[Tuple[str, Dict[str, Any] | None]] = None, errors: str = 'raise') Dataset [source]
Compute new variables or mask existing variables in dataset by the evaluation of Python expressions, that may refer to other existing or new variables. Returns a new dataset that contains the old and new variables, where both may bew now masked.
Expressions may be given by attributes of existing variables in dataset or passed a via the processed_variables argument which is a sequence of variable name / attributes tuples.
Two types of expression attributes are recognized in the attributes:
The attribute
expression
generates a new variable computed from its attribute value.The attribute
valid_pixel_expression
masks out invalid variable values.
In both cases the attribuite value must be a string that forms a valid Python expression that can reference any other preceding variables by name. The expression can also reference any flags defined by another variable according the their CF attributes
flag_meaning
andflag_values
.Invalid variable values may be masked out using the value the
valid_pixel_expression
attribute whose value should form a Boolean Python expression. In case, the expression returns zero or false, the value of the_FillValue
attribute or NaN will be used in the new variable.Other attributes will be stored as variable metadata as-is.
- Return type:
- Parameters:
dataset (
Dataset
) – A dataset.processed_variables – Optional list of variable name-attributes pairs that will processed in the given order.
errors (
str
) – How to deal with errors while evaluating expressions. May be be one of “raise”, “warn”, or “ignore”.
- Returns:
new dataset with computed variables
Cube data extraction
Cube Resampling
Cube Manipulation
- xcube.core.chunk.chunk_dataset(dataset: Dataset, chunk_sizes: Dict[str, int] = None, format_name: str = None) Dataset [source]
Chunk dataset using chunk_sizes and optionally update encodings for given format_name.
- xcube.core.unchunk.unchunk_dataset(dataset_path: str, var_names: Sequence[str] = None, coords_only: bool = False)[source]
Unchunk dataset variables in-place.
- Parameters:
dataset_path (
str
) – Path to ZARR dataset directory.var_names – Optional list of variable names.
coords_only (
bool
) – Un-chunk coordinate variables only.
- xcube.core.optimize.optimize_dataset(input_path: str, output_path: str = None, in_place: bool = False, unchunk_coords: bool | str | ~typing.Sequence[str] = False, exception_type: ~typing.Type[Exception] = <class 'ValueError'>)[source]
Optimize a dataset for faster access.
Reduces the number of metadata and coordinate data files in xcube dataset given by given by dataset_path. Consolidated cubes open much faster from remote locations, e.g. in object storage, because obviously much less HTTP requests are required to fetch initial cube meta information. That is, it merges all metadata files into a single top-level JSON file “.zmetadata”.
If unchunk_coords is given, it also removes any chunking of coordinate variables so they comprise a single binary data file instead of one file per data chunk. The primary usage of this function is to optimize data cubes for cloud object storage. The function currently works only for data cubes using Zarr format. unchunk_coords can be a name, or list of names of the coordinate variable(s) to be consolidated. If boolean
True
is used, coordinate all variables will be consolidated.- Parameters:
input_path (
str
) – Path to input dataset with ZARR format.output_path (
str
) – Path to output dataset with ZARR format. May contain “{input}” template string, which is replaced by the input path’s file name without file name extension.in_place (
bool
) – Whether to modify the dataset in place. If False, a copy is made and output_path must be given.unchunk_coords – The name of a coordinate variable or a list of coordinate variables whose chunks should be consolidated. Pass
True
to consolidate chunks of all coordinate variables.exception_type – Type of exception to be used on value errors.
Cube Subsetting
Cube Masking
- class xcube.core.maskset.MaskSet(flag_var: DataArray)[source]
A set of mask variables derived from a variable flag_var with the following CF attributes:
One or both of flag_masks and flag_values
flag_meanings (always required)
See https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#flags for details on the use of these attributes.
Each mask is represented by an xarray.DataArray, has the name of the flag, is of type numpy.unit8, and has the dimensions of the given flag_var.
- Parameters:
flag_var – an xarray.DataArray that defines flag values. The CF attributes flag_meanings and one or both of flag_masks and flag_values are expected to exist and be valid.
- classmethod get_mask_sets(dataset: Dataset) Dict[str, MaskSet] [source]
For each “flag” variable in given dataset, turn it into a
MaskSet
, store it in a dictionary.- Parameters:
dataset (
Dataset
) – The dataset- Returns:
A mapping of flag names to
MaskSet
. Will be empty if there are no flag variables in dataset.
Rasterisation of Features
Cube Metadata
- xcube.core.edit.edit_metadata(input_path: str, output_path: str = None, metadata_path: str = None, update_coords: bool = False, in_place: bool = False, monitor: ~typing.Callable[[...], None] = None, exception_type: ~typing.Type[Exception] = <class 'ValueError'>)[source]
Edit the metadata of an xcube dataset.
Editing the metadata because it may be incorrect, inconsistent or incomplete. The metadata attributes should be given by a yaml file with the keywords to be edited. The function currently works only for data cubes using ZARR format.
- Parameters:
input_path (
str
) – Path to input dataset with ZARR format.output_path (
str
) – Path to output dataset with ZARR format. May contain “{input}” template string, which is replaced by the input path’s file name without file name extentsion.metadata_path (
str
) – Path to the metadata file, which will edit the existing metadata.update_coords (
bool
) – Whether to update the metadata about the coordinates.in_place (
bool
) – Whether to modify the dataset in place. If False, a copy is made and output_path must be given.monitor – A progress monitor.
exception_type – Type of exception to be used on value errors.
- xcube.core.update.update_dataset_attrs(dataset: Dataset, global_attrs: Dict[str, Any] = None, update_existing: bool = False, in_place: bool = False) Dataset [source]
Update spatio-temporal CF/THREDDS attributes given dataset according to spatio-temporal coordinate variables time, lat, and lon.
- Return type:
- Parameters:
dataset (
Dataset
) – The dataset.global_attrs – Optional global attributes.
update_existing (
bool
) – IfTrue
, any existing attributes will be updated.in_place (
bool
) – IfTrue
, dataset will be modified in place and returned.
- Returns:
A new dataset, if in_place if
False
(default), else the passed and modified dataset.
- xcube.core.update.update_dataset_spatial_attrs(dataset: Dataset, update_existing: bool = False, in_place: bool = False) Dataset [source]
Update spatial CF/THREDDS attributes of given dataset.
- Return type:
- Parameters:
dataset (
Dataset
) – The dataset.update_existing (
bool
) – IfTrue
, any existing attributes will be updated.in_place (
bool
) – IfTrue
, dataset will be modified in place and returned.
- Returns:
A new dataset, if in_place if
False
(default), else the passed and modified dataset.
- xcube.core.update.update_dataset_temporal_attrs(dataset: Dataset, update_existing: bool = False, in_place: bool = False) Dataset [source]
Update temporal CF/THREDDS attributes of given dataset.
- Return type:
- Parameters:
dataset (
Dataset
) – The dataset.update_existing (
bool
) – IfTrue
, any existing attributes will be updated.in_place (
bool
) – IfTrue
, dataset will be modified in place and returned.
- Returns:
A new dataset, if in_place is
False
(default), else the passed and modified dataset.
Cube verification
Multi-Resolution Datasets
Zarr Store
- class xcube.core.zarrstore.ZarrStoreHolder(dataset: Dataset)[source]
Represents a xarray dataset property
zarr_store
.It is used to permanently associate a dataset with its Zarr store, which would otherwise not be possible.
In xcube server, we use the new property to expose datasets via the S3 emulation API.
For that concept to work, datasets must be associated with their Zarr stores explicitly. Therefore, the xcube data store framework sets the Zarr stores of datasets after opening them
xr.open_zarr()
:`python dataset = xr.open_zarr(zarr_store, **open_params) dataset.zarr_store.set(zarr_store) `
Note, that the dataset may change after the Zarr store has been set, so that the dataset and its Zarr store are no longer in sync. This may be an issue and limit the application of the new property.
- Parameters:
dataset – The xarray dataset that is associated with a Zarr store.
- get() MutableMapping [source]
Get the Zarr store of a dataset. If no Zarr store has been set, the method will use
GenericZarrStore.from_dataset()
to create and set one.- Returns:
The Zarr store.
- set(zarr_store: MutableMapping) None [source]
Set the Zarr store of a dataset. :type zarr_store:
MutableMapping
:param zarr_store: The Zarr store.
- class xcube.core.zarrstore.GenericZarrStore(*arrays: GenericArray | Dict[str, Any], attrs: Dict[str, Any] | None = None, array_defaults: GenericArray | Dict[str, Any] | None = None)[source]
A Zarr store that maintains generic arrays in a flat, top-level hierarchy. The root of the store is a Zarr group conforming to the Zarr spec v2.
It is designed to serve as a Zarr store for xarray datasets that compute their data arrays dynamically.
See class
GenericArray
for specifying the arrays’ properties.The array data of this store’s arrays are either retrieved from static (numpy) arrays or from a callable that provides the array’s data chunks as bytes or numpy arrays.
- Parameters:
arrays – Arrays to be added. Typically, these will be instances of
GenericArray
.attrs – Optional attributes of the top-level group. If given, it must be JSON serializable.
array_defaults – Optional array defaults for array properties not passed to
add_array
. Typically, this will be an instance ofGenericArray
.
- Array
alias of
GenericArray
- add_array(array: GenericArray | Dict[str, Any] | None = None, **array_kwargs) None [source]
Add a new array to this store.
- Parameters:
array – Optional array properties. Typically, this will be an instance of
GenericArray
.array_kwargs – Keyword arguments form for the properties of
GenericArray
.
- listdir(path: str = '') List[str] [source]
List a store path. :type path:
str
:param path: The path. :return: List of sorted directory entries.
- rmdir(path: str = '') None [source]
The general form removes store paths. This implementation can remove entire arrays only. :type path:
str
:param path: The array’s name.
- rename(src_path: str, dst_path: str) None [source]
The general form renames store paths. This implementation can rename arrays only.
- Parameters:
src_path (
str
) – Source array name.dst_path (
str
) – Target array name.
- classmethod from_dataset(dataset: Dataset, array_defaults: GenericArray | Dict[str, Any] | None = None) GenericZarrStore [source]
Create a Zarr store for given dataset. to the dataset’s attributes. The following array_defaults properties can be provided (other properties are prescribed by the dataset):
fill_value
- defaults to Nonecompressor
- defaults to Nonefilters
- defaults to Noneorder
- defaults to “C”chunk_encoding
- defaults to “bytes”
- Parameters:
dataset (
Dataset
) – The datasetarray_defaults – Array default values.
- Returns:
A new Zarr store instance.
- class xcube.core.zarrstore.GenericArray(array: Dict[str, any] | None = None, name: str | None = None, get_data: Callable[[Tuple[int]], bytes | ndarray] | None = None, get_data_params: Dict[str, Any] | None = None, data: ndarray | None = None, dtype: str | dtype | None = None, dims: str | Sequence[str] | None = None, shape: Sequence[int] | None = None, chunks: Sequence[int] | None = None, fill_value: bool | int | float | str | None = None, compressor: Codec | None = None, filters: Sequence[Codec] | None = None, order: str | None = None, attrs: Dict[str, Any] | None = None, on_close: Callable[[Dict[str, Any]], None] | None = None, chunk_encoding: str | None = None, **kwargs)[source]
Represent a generic array in the
GenericZarrStore
as dictionary of properties.Although all properties of this class are optional, some of them are mandatory when added to the
GenericZarrStore
.When added to the store using
GenericZarrStore.add_array()
, the array name and dims must always be present. Other mandatory properties depend on the data and get_data properties, which are mutually exclusive:get_data is called for a requested data chunk of an array. It must return a bytes object or a numpy nd-array and is passed the chunk index, the chunk shape, and this array info dictionary. get_data requires the following properties to be present too: name, dims, dtype, shape. chunks is optional and defaults to shape.
data must be a bytes object or a numpy nd-array. data requires the following properties to be present too: name, dims. chunks must be same as shape.
The function get_data receives only keyword-arguments which comprises the ones passed by get_data_params, if any, and two special ones which may occur in the signature of get_data:
The keyword argument chunk_info, if given, provides a dictionary that holds information about the current chunk: -
index: tuple[int, ...]
- the chunk’s index -shape: tuple[int, ...]
- the chunk’s shape -slices: tuple[slice, ...]
- the chunk’s array slicesThe keyword argument array_info, if given, provides a dictionary that holds information about the overall array. It contains all array properties passed to the constructor of
GenericArray
plus -ndim: int
- number of dimensions -num_chunks: tuple[int, ...]
- number of chunks in every dimension
GenericZarrStore
will convert a Numpy array returned by get_data or given by data into a bytes object. It will also be compressed, if a compressor is given. It is important that the array chunks always See also https://zarr.readthedocs.io/en/stable/spec/v2.html#chunksNote that if the value of a named keyword argument is None, it will not be stored.
- Parameters:
array – Optional array info dictionary
name – Optional array name
data – Optional array data. Mutually exclusive with get_data. Must be a bytes object or a numpy array.
get_data – Optional array data chunk getter. Mutually exclusive with data. Called for a requested data chunk of an array. Must return a bytes object or a numpy array.
get_data_params – Optional keyword-arguments passed to get_data.
dtype – Optional array data type. Either a string using syntax of the Zarr spec or a
numpy.dtype
. For string encoded data types, see https://zarr.readthedocs.io/en/stable/spec/v2.html#data-type-encodingdims – Optional sequence of dimension names.
shape – Optional sequence of shape sizes for each dimension.
chunks – Optional sequence of chunk sizes for each dimension.
fill_value – Optional fill value, see https://zarr.readthedocs.io/en/stable/spec/v2.html#fill-value-encoding
compressor – Optional compressor. If given, it must be an instance of
numcodecs.abc.Codec
.filters – Optional sequence of filters, see https://zarr.readthedocs.io/en/stable/spec/v2.html#filters.
order – Optional array endian ordering. If given, must be “C” or “F”. Defaults to “C”.
attrs – Optional array attributes. If given, must be JSON-serializable.
on_close – Optional array close handler. Called if the store is closed.
chunk_encoding – Optional encoding type of the chunk data returned for the array. Can be “bytes” (the default) or “ndarray” for array chunks that are numpy.ndarray instances.
kwargs – Other keyword arguments passed directly to the dictionary constructor.
- finalize() GenericArray [source]
Normalize and validate array properties and return a valid array info dictionary to be stored in the GenericZarrStore.
- class xcube.core.zarrstore.CachedZarrStore(store: MutableMapping, cache: MutableMapping)[source]
A read-only Zarr store that is faster than store because it uses a writable cache store.
The cache store is assumed to read values for a given key much faster than store.
Note that iterating keys and containment checks are performed on store only.
- Parameters:
store – A Zarr store that is known to be slow in reading values.
cache – A writable Zarr store that can read values faster than store.
- class xcube.core.zarrstore.DiagnosticZarrStore(store: MutableMapping)[source]
A diagnostic Zarr store used for testing and investigating behaviour of Zarr and xarray’s Zarr backend.
- Parameters:
store – Wrapped Zarr store.
Utilities
- xcube.util.dask.new_cluster(provider: str = 'coiled', name: str | None = None, software: str | None = None, n_workers: int = 4, resource_tags: Dict[str, str] | None = None, account: str = None, **kwargs) Cluster [source]
Create a new Dask cluster.
Cloud resource tags can be specified in an environment variable XCUBE_DASK_CLUSTER_TAGS in the format
tag_1=value_1:tag_2=value_2:...:tag_n=value_n
. In case of conflicts, tags specified inresource_tags
will override tags specified by the environment variable.The cluster provider account name can be specified in an environment variable
XCUBE_DASK_CLUSTER_ACCOUNT
. If theaccount
argument is given tonew_cluster
, it will override the value from the environment variable.- Return type:
Cluster
- Parameters:
provider (
str
) – identifier of the provider to use. Currently, only ‘coiled’ is supported.name – name to use as an identifier for the cluster
software – identifier for the software environment to be used.
n_workers (
int
) – number of workers in the clusterresource_tags – tags to apply to the cloud resources forming the cluster
account (
str
) – cluster provider account name**kwargs –
further named arguments will be passed on to the cluster creation function
Plugin Development
- class xcube.util.extension.ExtensionRegistry[source]
A registry of extensions. Typically used by plugins to register extensions.
- has_extension(point: str, name: str) bool [source]
Test if an extension with given point and name is registered.
- Return type:
bool
- Parameters:
point (
str
) – extension point identifiername (
str
) – extension name
- Returns:
True, if extension exists
- get_extension(point: str, name: str) Extension | None [source]
Get registered extension for given point and name.
- Parameters:
point (
str
) – extension point identifiername (
str
) – extension name
- Returns:
the extension or None, if no such exists
- get_component(point: str, name: str) Any [source]
Get extension component for given point and name. Raises a ValueError if no such extension exists.
- Return type:
Any
- Parameters:
point (
str
) – extension point identifiername (
str
) – extension name
- Returns:
extension component
- find_extensions(point: str, predicate: Callable[[Extension], bool] = None) List[Extension] [source]
Find extensions for point and optional filter function predicate.
The filter function is called with an extension and should return a truth value to indicate a match or mismatch.
- Parameters:
point (
str
) – extension point identifierpredicate – optional filter function
- Returns:
list of matching extensions
- find_components(point: str, predicate: Callable[[Extension], bool] = None) List[Any] [source]
Find extension components for point and optional filter function predicate.
The filter function is called with an extension and should return a truth value to indicate a match or mismatch.
- Parameters:
point (
str
) – extension point identifierpredicate – optional filter function
- Returns:
list of matching extension components
- add_extension(point: str, name: str, component: Any = None, loader: Callable[[Extension], Any] = None, **metadata) Extension [source]
Register an extension component or an extension component loader for the given extension point, name, and additional metadata.
Either component or loader must be specified, but not both.
A given loader must be a callable with one positional argument extension of type
Extension
and is expected to return the actual extension component, which may be of any type. The loader will only be called once and only when the actual extension component is requested for the first time. Consider using the functionimport_component()
to create a loader that lazily imports a component from a module and optionally executes it.- Return type:
- Parameters:
point (
str
) – extension point identifiername (
str
) – extension namecomponent (
Any
) – extension componentloader – extension component loader function
metadata – extension metadata
- Returns:
a registered extension
- class xcube.util.extension.Extension(point: str, name: str, component: Any = None, loader: Callable[[Extension], Any] = None, **metadata)[source]
An extension that provides a component of any type.
Extensions are registered in a
ExtensionRegistry
.Extension objects are not meant to be instantiated directly. Instead,
ExtensionRegistry.add_extension()
is used to register extensions.- Parameters:
point – extension point identifier
name – extension name
component – extension component
loader – extension component loader function
metadata – extension metadata
- xcube.util.extension.import_component(spec: str, transform: Callable[[Any, Extension], Any] = None, call: bool = False, call_args: Sequence[Any] = None, call_kwargs: Mapping[str, Any] = None) Callable[[Extension], Any] [source]
Return a component loader that imports a module or module component from spec. To import a module, spec should be the fully qualified module name. To import a component, spec must also append the component name to the fully qualified module name separated by a color (“:”) character.
An optional transform callable my be used to transform the imported component. If given, a new component is computed:
component = transform(component, extension)
If the call flag is set, the component is expected to be a callable which will be called using the given call_args and call_kwargs to produce a new component:
component = component(*call_kwargs, **call_kwargs)
Finally, the component is returned.
- Parameters:
spec (
str
) – String of the form “module_path” or “module_path:component_name”transform – callable that takes two positional arguments, the imported component and the extension of type
Extension
call (
bool
) – Whether to finally call the component with given call_args and call_kwargscall_args – arguments passed to a callable component if call flag is set
call_kwargs – keyword arguments passed to callable component if call flag is set
- Returns:
a component loader
- xcube.constants.EXTENSION_POINT_INPUT_PROCESSORS = 'xcube.core.gen.iproc'
The extension point identifier for input processor extensions
- xcube.constants.EXTENSION_POINT_DATASET_IOS = 'xcube.core.dsio'
The extension point identifier for dataset I/O extensions
- xcube.constants.EXTENSION_POINT_CLI_COMMANDS = 'xcube.cli'
The extension point identifier for CLI command extensions
- xcube.util.plugin.get_extension_registry() ExtensionRegistry [source]
Get populated extension registry.