Python API

Cube I/O

xcube.api.read_cube(input_path: str, format_name: str = None, **kwargs) → xarray.core.dataset.Dataset

Read a xcube dataset from input_path. If format is not provided it will be guessed from input_path.

Parameters
  • input_path – input path

  • format_name – format, e.g. “zarr” or “netcdf4”

  • kwargs – format-specific keyword arguments

Returns

xcube dataset

xcube.api.open_cube(input_path: str, format_name: str = None, **kwargs) → xarray.core.dataset.Dataset

The read_cube function as context manager that auto-closes the cube read.

Parameters
  • input_path – input path

  • format_name – format, e.g. “zarr” or “netcdf4”

  • kwargs – format-specific keyword arguments

Returns

xcube dataset

Cube generation

xcube.api.gen_cube(input_paths: Sequence[str] = None, input_processor_name: str = None, input_processor_params: Dict = None, input_reader_name: str = None, input_reader_params: Dict[str, Any] = None, output_region: Tuple[float, float, float, float] = None, output_size: Tuple[int, int] = [512, 512], output_resampling: str = 'Nearest', output_path: str = 'out.zarr', output_writer_name: str = None, output_writer_params: Dict[str, Any] = None, output_metadata: Dict[str, Any] = None, output_variables: List[Tuple[str, Optional[Dict[str, Any]]]] = None, processed_variables: List[Tuple[str, Optional[Dict[str, Any]]]] = None, profile_mode: bool = False, sort_mode: bool = False, append_mode: bool = None, dry_run: bool = False, monitor: Callable[[...], None] = None) → bool

Generate a xcube dataset from one or more input files.

Parameters
  • sort_mode

  • input_paths – The input paths.

  • input_processor_name – Name of a registered input processor (xcube.api.gen.inputprocessor.InputProcessor) to be used to transform the inputs.

  • input_processor_params – Parameters to be passed to the input processor.

  • input_reader_name – Name of a registered input reader (xcube.api.util.dsio.DatasetIO).

  • input_reader_params – Parameters passed to the input reader.

  • output_region – Output region as tuple of floats: (lon_min, lat_min, lon_max, lat_max).

  • output_size – The spatial dimensions of the output as tuple of ints: (width, height).

  • output_resampling – The resampling method for the output.

  • output_path – The output directory.

  • output_writer_name – Name of an output writer (xcube.api.util.dsio.DatasetIO) used to write the cube.

  • output_writer_params – Parameters passed to the output writer.

  • output_metadata – Extra metadata passed to output cube.

  • output_variables – Output variables.

  • processed_variables – Processed variables computed on-the-fly.

  • profile_mode – Whether profiling should be enabled.

  • append_mode – Deprecated. The function will always either insert, replace, or append new time slices.

  • dry_run – Doesn’t write any data. For testing.

  • monitor – A progress monitor.

Returns

True for success.

xcube.api.new_cube(title='Test Cube', width=360, height=180, spatial_res=1.0, lon_start=-180.0, lat_start=-90.0, time_periods=5, time_freq='D', time_start='2010-01-01T00:00:00', inverse_lat=False, drop_bounds=False, variables=None)

Create a new empty cube. Useful for creating cubes templates with predefined coordinate variables and metadata. The function is also heavily used by xcube’s unit tests.

The values of the variables dictionary can be either constants, array-like objects, or functions that compute their return value from passed coordinate indexes. The expected signature is::

def my_func(time: int, lat: int, lon: int) -> Union[bool, int, float]
Parameters
  • title – A title.

  • width – Horizontal number of grid cells

  • height – Vertical number of grid cells

  • spatial_res – Spatial resolution in degrees

  • lon_start – Minimum longitude value

  • lat_start – Minimum latitude value

  • time_periods – Number of time steps

  • time_freq – Duration of each time step

  • time_start – First time value

  • inverse_lat – Whether to create an inverse latitude axis

  • drop_bounds – If True, coordinate bounds variables are not created.

  • variables – Dictionary of data variables to be added.

Returns

A cube instance

Cube data extraction

xcube.api.get_cube_values_for_points(cube: xarray.core.dataset.Dataset, points: Union[xarray.core.dataset.Dataset, pandas.core.frame.DataFrame, Mapping[str, Any]], var_names: Sequence[str] = None, include_coords: bool = False, include_bounds: bool = False, include_indexes: bool = False, index_name_pattern: str = '{name}_index', include_refs: bool = False, ref_name_pattern: str = '{name}_ref', method: str = 'nearest', cube_asserted: bool = False) → xarray.core.dataset.Dataset

Extract values from cube variables at given coordinates in points.

Parameters
  • cube – The cube dataset.

  • points – Dictionary that maps dimension name to coordinate arrays.

  • var_names – An optional list of names of data variables in cube whose values shall be extracted.

  • include_coords – Whether to include the cube coordinates for each point in return value.

  • include_bounds – Whether to include the cube coordinate boundaries (if any) for each point in return value.

  • include_indexes – Whether to include computed indexes into the cube for each point in return value.

  • index_name_pattern – A naming pattern for the computed index columns. Must include “{name}” which will be replaced by the index’ dimension name.

  • include_refs – Whether to include point (reference) values in return value.

  • ref_name_pattern – A naming pattern for the computed point data columns. Must include “{name}” which will be replaced by the point’s attribute name.

  • method – “nearest” or “linear”.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns

A new data frame whose columns are values from cube variables at given points.

xcube.api.get_cube_point_indexes(cube: xarray.core.dataset.Dataset, points: Union[xarray.core.dataset.Dataset, pandas.core.frame.DataFrame, Mapping[str, Any]], dim_name_mapping: Mapping[str, str] = None, index_name_pattern: str = '{name}_index', index_dtype=<class 'numpy.float64'>, cube_asserted: bool = False) → xarray.core.dataset.Dataset

Get indexes of given point coordinates points into the given dataset.

Parameters
  • cube – The cube dataset.

  • points – A mapping from column names to column data arrays, which must all have the same length.

  • dim_name_mapping – A mapping from dimension names in cube to column names in points.

  • index_name_pattern – A naming pattern for the computed indexes columns. Must include “{name}” which will be replaced by the dimension name.

  • index_dtype – Numpy data type for the indexes. If it is a floating point type (default), then indexes will contain fractions, which may be used for interpolation. For out-of-range coordinates in points, indexes will be -1 if index_dtype is an integer type, and NaN, if index_dtype is a floating point types.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns

A dataset containing the index columns.

xcube.api.get_cube_values_for_indexes(cube: xarray.core.dataset.Dataset, indexes: Union[xarray.core.dataset.Dataset, pandas.core.frame.DataFrame, Mapping[str, Any]], include_coords: bool = False, include_bounds: bool = False, data_var_names: Sequence[str] = None, index_name_pattern: str = '{name}_index', method: str = 'nearest', cube_asserted: bool = False) → xarray.core.dataset.Dataset

Get values from the cube at given indexes.

Parameters
  • cube – A cube dataset.

  • indexes – A mapping from column names to index and fraction arrays for all cube dimensions.

  • include_coords – Whether to include the cube coordinates for each point in return value.

  • include_bounds – Whether to include the cube coordinate boundaries (if any) for each point in return value.

  • data_var_names – An optional list of names of data variables in cube whose values shall be extracted.

  • index_name_pattern – A naming pattern for the computed indexes columns. Must include “{name}” which will be replaced by the dimension name.

  • method – “nearest” or “linear”.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns

A new data frame whose columns are values from cube variables at given indexes.

xcube.api.get_dataset_indexes(dataset: xarray.core.dataset.Dataset, coord_var_name: str, coord_values: Union[xarray.core.dataarray.DataArray, numpy.ndarray], index_dtype=<class 'numpy.float64'>) → Union[xarray.core.dataarray.DataArray, numpy.ndarray]

Compute the indexes and their fractions into a coordinate variable coord_var_name of a dataset for the given coordinate values coord_values.

The coordinate variable’s labels must be monotonic increasing or decreasing, otherwise the result will be nonsense.

For any value in coord_values that is out of the bounds of the coordinate variable’s values, the index depends on the value of index_dtype. If index_dtype is an integer type, invalid indexes are encoded as -1 while for floating point types, NaN will be used.

Returns a tuple of indexes as int64 array and fractions as float64 array.

Parameters
  • dataset – A cube dataset.

  • coord_var_name – Name of a coordinate variable.

  • coord_values – Array-like coordinate values.

  • index_dtype – Numpy data type for the indexes. If it is floating point type (default), then indexes contain fractions, which may be used for interpolation. If dtype is an integer type out-of-range coordinates are indicated by index -1, and NaN if it is is a floating point type.

Returns

The indexes and their fractions as a tuple of numpy int64 and float64 arrays.

xcube.api.get_time_series(cube: xarray.core.dataset.Dataset, geometry: Union[shapely.geometry.base.BaseGeometry, Dict[str, Any], str, Sequence[Union[float, int]]] = None, var_names: Sequence[str] = None, start_date: Union[numpy.datetime64, str] = None, end_date: Union[numpy.datetime64, str] = None, include_count: bool = False, include_stdev: bool = False, use_groupby: bool = False, cube_asserted: bool = False) → Optional[xarray.core.dataset.Dataset]

Get a time series dataset from a data cube.

geometry may be provided as a (shapely) geometry object, a valid GeoJSON object, a valid WKT string, a sequence of box coordinates (x1, y1, x2, y2), or point coordinates (x, y). If geometry covers an area, i.e. is not a point, the function aggregates the variables to compute a mean value and if desired, the number of valid observations and the standard deviation.

start_date and end_date may be provided as a numpy.datetime64 or an ISO datetime string.

Returns a time-series dataset whose data variables have a time dimension but no longer have spatial dimensions, hence the resulting dataset’s variables will only have N-2 dimensions. A global attribute max_number_of_observations will be set to the maximum number of observations that could have been made in each time step. If the given geometry does not overlap the cube’s boundaries, or if not output variables remain, the function returns None.

Parameters
  • cube – The xcube dataset

  • geometry – Optional geometry

  • var_names – Optional sequence of names of variables to be included.

  • start_date – Optional start date.

  • end_date – Optional end date.

  • include_count – Whether to include the number of valid observations for each time step. Ignored if geometry is a point.

  • include_stdev – Whether to include standard deviation for each time step. Ignored if geometry is a point.

  • use_groupby – Use group-by operation. May increase or decrease runtime performance and/or memory consumption.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns

A new dataset with time-series for each variable.

Cube manipulation

xcube.api.resample_in_time(cube: xarray.core.dataset.Dataset, frequency: str, method: Union[str, Sequence[str]], offset=None, base: str = 0, tolerance=None, interp_kind=None, var_names: Sequence[str] = None, metadata: Dict[str, Any] = None)

Resample a xcube dataset in the time dimension.

Parameters
  • cube – The xcube dataset.

  • frequency – Resampling frequency.

  • method – Resampling method or sequence of resampling methods.

  • offset – Offset used to adjust the resampled time labels. Some pandas date offset strings are supported.

  • base – Resampling method.

  • var_names – Variable names to include.

  • tolerance – Time tolerance for selective upsampling methods. Defaults to frequency.

  • interp_kind – Kind of interpolation if method is ‘interpolation’.

  • metadata – Output metadata.

Returns

A new xcube dataset resampled in time.

xcube.api.vars_to_dim(cube: xarray.core.dataset.Dataset, dim_name: str = 'var', var_name='data', cube_asserted: bool = False)

Convert data variables into a dimension.

Parameters
  • cube – The xcube dataset.

  • dim_name – The name of the new dimension and coordinate variable. Defaults to ‘var’.

  • var_name – The name of the new, single data variable. Defaults to ‘data’.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns

A new xcube dataset with data variables turned into a new dimension.

xcube.api.chunk_dataset(dataset: xarray.core.dataset.Dataset, chunk_sizes: Dict[str, int] = None, format_name: str = None) → xarray.core.dataset.Dataset

Chunk dataset and update encodings for given format.

Parameters
  • dataset – input dataset

  • chunk_sizes – mapping from dimension name to new chunk size

  • format_name – format, e.g. “zarr” or “netcdf4”

Returns

the re-chunked dataset

xcube.api.unchunk_dataset(dataset_path: str, var_names: Sequence[str] = None, coords_only: bool = False)

Unchunk dataset variables in-place.

Parameters
  • dataset_path – Path to ZARR dataset directory.

  • var_names – Optional list of variable names.

  • coords_only – Un-chunk coordinate variables only.

xcube.api.vars_to_dim(cube: xarray.core.dataset.Dataset, dim_name: str = 'var', var_name='data', cube_asserted: bool = False)

Convert data variables into a dimension.

Parameters
  • cube – The xcube dataset.

  • dim_name – The name of the new dimension and coordinate variable. Defaults to ‘var’.

  • var_name – The name of the new, single data variable. Defaults to ‘data’.

  • cube_asserted – If False, cube will be verified, otherwise it is expected to be a valid cube.

Returns

A new xcube dataset with data variables turned into a new dimension.

Cube subsetting

xcube.api.select_vars(dataset: xarray.core.dataset.Dataset, var_names: Collection[str] = None) → xarray.core.dataset.Dataset

Select data variable from given dataset and create new dataset.

Parameters
  • dataset – The dataset from which to select variables.

  • var_names – The names of data variables to select.

Returns

A new dataset. It is empty, if var_names is empty. It is dataset, if var_names is None.

xcube.api.clip_dataset_by_geometry(dataset: xarray.core.dataset.Dataset, geometry: Union[shapely.geometry.base.BaseGeometry, Dict[str, Any], str, Sequence[Union[float, int]]], save_geometry_wkt: Union[str, bool] = False) → Optional[xarray.core.dataset.Dataset]

Spatially clip a dataset according to the bounding box of a given geometry.

Parameters
  • dataset – The dataset

  • geometry – A geometry-like object, see py:function:convert_geometry.

  • save_geometry_wkt – If the value is a string, the effective intersection geometry is stored as a Geometry WKT string in the global attribute named by save_geometry. If the value is True, the name “geometry_wkt” is used.

Returns

The dataset spatial subset, or None if the bounding box of the dataset has a no or a zero area intersection with the bounding box of the geometry.

Cube masking

xcube.api.mask_dataset_by_geometry(dataset: xarray.core.dataset.Dataset, geometry: Union[shapely.geometry.base.BaseGeometry, Dict[str, Any], str, Sequence[Union[float, int]]], excluded_vars: Sequence[str] = None, no_clip: bool = False, save_geometry_mask: Union[str, bool] = False, save_geometry_wkt: Union[str, bool] = False) → Optional[xarray.core.dataset.Dataset]

Mask a dataset according to the given geometry. The cells of variables of the returned dataset will have NaN-values where their spatial coordinates are not intersecting the given geometry.

Parameters
  • dataset – The dataset

  • geometry – A geometry-like object, see py:function:convert_geometry.

  • excluded_vars – Optional sequence of names of data variables that should not be masked (but still may be clipped).

  • no_clip – If True, the function will not clip the dataset before masking, this is, the returned dataset will have the same dimension size as the given dataset.

  • save_geometry_mask – If the value is a string, the effective geometry mask array is stored as a 2D data variable named by save_geometry_mask. If the value is True, the name “geometry_mask” is used.

  • save_geometry_wkt – If the value is a string, the effective intersection geometry is stored as a Geometry WKT string in the global attribute named by save_geometry. If the value is True, the name “geometry_wkt” is used.

Returns

The dataset spatial subset, or None if the bounding box of the dataset has a no or a zero area intersection with the bounding box of the geometry.

class xcube.api.MaskSet(flag_var: xarray.core.dataarray.DataArray)

A set of mask variables derived from a variable flag_var with CF attributes “flag_masks” and “flag_meanings”.

Each mask is represented by an xarray.DataArray and has the name of the flag, is of type numpy.unit8, and has the dimensions of the given flag_var.

Parameters

flag_var – an xarray.DataArray that defines flag values. The CF attributes “flag_masks” and “flag_meanings” are expected to exists and be valid.

Cube optimization

xcube.api.optimize_dataset(input_path: str, output_path: str = None, in_place: bool = False, unchunk_coords: bool = False, exception_type: Type[Exception] = <class 'ValueError'>)

Optimize a dataset for faster access.

Reduces the number of metadata and coordinate data files in xcube dataset given by given by dataset_path. Consolidated cubes open much faster from remote locations, e.g. in object storage, because obviously much less HTTP requests are required to fetch initial cube meta information. That is, it merges all metadata files into a single top-level JSON file “.zmetadata”. If unchunk_coords is set, it removes any chunking of coordinate variables so they comprise a single binary data file instead of one file per data chunk. The primary usage of this function is to optimize data cubes for cloud object storage. The function currently works only for data cubes using ZARR format.

Parameters
  • input_path – Path to input dataset with ZARR format.

  • output_path – Path to output dataset with ZARR format. May contain “{input}” template string, which is replaced by the input path’s file name without file name extentsion.

  • in_place – Whether to modify the dataset in place. If False, a copy is made and output_path must be given.

  • unchunk_coords – Whether to also consolidate coordinate chunk files.

  • exception_type – Type of exception to be used on value errors.

Cube metadata

xcube.api.update_dataset_attrs(dataset: xarray.core.dataset.Dataset, global_attrs: Dict[str, Any] = None, update_existing: bool = False, in_place: bool = False) → xarray.core.dataset.Dataset

Update spatio-temporal CF/THREDDS attributes given dataset according to spatio-temporal coordinate variables time, lat, and lon.

Parameters
  • dataset – The dataset.

  • global_attrs – Optional global attributes.

  • update_existing – If True, any existing attributes will be updated.

  • in_place – If True, dataset will be modified in place and returned.

Returns

A new dataset, if in_place if False (default), else the passed and modified dataset.

xcube.api.update_dataset_spatial_attrs(dataset: xarray.core.dataset.Dataset, update_existing: bool = False, in_place: bool = False) → xarray.core.dataset.Dataset

Update spatial CF/THREDDS attributes of given dataset.

Parameters
  • dataset – The dataset.

  • update_existing – If True, any existing attributes will be updated.

  • in_place – If True, dataset will be modified in place and returned.

Returns

A new dataset, if in_place if False (default), else the passed and modified dataset.

xcube.api.update_dataset_temporal_attrs(dataset: xarray.core.dataset.Dataset, update_existing: bool = False, in_place: bool = False) → xarray.core.dataset.Dataset

Update temporal CF/THREDDS attributes of given dataset.

Parameters
  • dataset – The dataset.

  • update_existing – If True, any existing attributes will be updated.

  • in_place – If True, dataset will be modified in place and returned.

Returns

A new dataset, if in_place is False (default), else the passed and modified dataset.

Cube verification

xcube.api.assert_cube(dataset: xarray.core.dataset.Dataset, name=None) → xarray.core.dataset.Dataset

Assert that the given dataset is a valid xcube dataset.

Parameters
  • dataset – The dataset to be validated.

  • name – Optional parameter name.

Raise

ValueError, if dataset is not a valid xcube dataset

xcube.api.verify_cube(dataset: xarray.core.dataset.Dataset) → List[str]

Verify the given dataset for being a valid xcube dataset.

The tool verifies that dataset * defines the dimensions “time”, “lat”, “lon”; * has corresponding “time”, “lat”, “lon” coordinate variables and that they

are valid, e.g. 1-D, non-empty, using correct units;

  • has valid bounds variables for “time”, “lat”, “lon” coordinate variables, if any;

  • has any data variables and that they are valid, e.g. min. 3-D, all have same dimensions, have at least dimensions “time”, “lat”, “lon”.

Returns a list of issues, which is empty if dataset is a valid xcube dataset.

Parameters

dataset – A dataset to be verified.

Returns

List of issues or empty list.

Multi-resolution pyramids

xcube.api.compute_levels(dataset: xarray.core.dataset.Dataset, spatial_dims: Tuple[str, str] = None, spatial_shape: Tuple[int, int] = None, spatial_tile_shape: Tuple[int, int] = None, var_names: Sequence[str] = None, num_levels_max: int = None, post_process_level: Callable[[xarray.core.dataset.Dataset, int, int], Optional[xarray.core.dataset.Dataset]] = None, progress_monitor: Callable[[xarray.core.dataset.Dataset, int, int], Optional[xarray.core.dataset.Dataset]] = None) → List[xarray.core.dataset.Dataset]

Transform the given dataset into the levels of a multi-level pyramid with spatial resolution decreasing by a factor of two in both spatial dimensions.

It is assumed that the spatial dimensions of each variable are the inner-most, that is, the last two elements of a variable’s shape provide the spatial dimension sizes.

Parameters
  • dataset – The input dataset to be turned into a multi-level pyramid.

  • spatial_dims – If given, only variables are considered whose last to dimension elements match the given spatial_dims.

  • spatial_shape – If given, only variables are considered whose last to shape elements match the given spatial_shape.

  • spatial_tile_shape – If given, chunking will match the provided spatial_tile_shape.

  • var_names – Variables to consider. If None, all variables with at least two dimensions are considered.

  • num_levels_max – If given, the maximum number of pyramid levels.

  • post_process_level – If given, the function will be called for each level and must return a dataset.

  • progress_monitor – If given, the function will be called for each level.

Returns

A list of dataset instances representing the multi-level pyramid.

xcube.api.read_levels(dir_path: str, progress_monitor: Callable[[xarray.core.dataset.Dataset, int, int], Optional[xarray.core.dataset.Dataset]] = None) → List[xarray.core.dataset.Dataset]

Read the of a multi-level pyramid with spatial resolution decreasing by a factor of two in both spatial dimensions.

Parameters
  • dir_path – The directory path.

  • progress_monitor – An optional progress monitor.

Returns

A list of dataset instances representing the multi-level pyramid.

xcube.api.write_levels(output_path: str, dataset: xarray.core.dataset.Dataset = None, input_path: str = None, link_input: bool = False, progress_monitor: Callable[[xarray.core.dataset.Dataset, int, int], Optional[xarray.core.dataset.Dataset]] = None, **kwargs) → List[xarray.core.dataset.Dataset]

Transform the given dataset given by a dataset instance or input_path string into the levels of a multi-level pyramid with spatial resolution decreasing by a factor of two in both spatial dimensions and write them to output_path.

One of dataset and input_path must be given.

Parameters
  • output_path – Output path

  • dataset – Dataset to be converted and written as levels.

  • input_path – Input path to a dataset to be transformed and written as levels.

  • link_input – Just link the dataset at level zero instead of writing it.

  • progress_monitor – An optional progress monitor.

  • kwargs – Keyword-arguments accepted by the compute_levels() function.

Returns

A list of dataset instances representing the multi-level pyramid.

Utilities

xcube.api.convert_geometry(geometry: Union[shapely.geometry.base.BaseGeometry, Dict[str, Any], str, Sequence[Union[float, int]], None]) → Optional[shapely.geometry.base.BaseGeometry]

Convert a geometry-like object into a shapely geometry object (shapely.geometry.BaseGeometry).

A geometry-like object is may be any shapely geometry object, * a dictionary that can be serialized to valid GeoJSON, * a WKT string, * a box given by a string of the form “<x1>,<y1>,<x2>,<y2>”

or by a sequence of four numbers x1, y1, x2, y2,

  • a point by a string of the form “<x>,<y>” or by a sequence of two numbers x, y.

Handling of geometries crossing the antimeridian:

  • If box coordinates are given, it is allowed to pass x1, x2 where x1 > x2, which is interpreted as a box crossing the antimeridian. In this case the function splits the box along the antimeridian and returns a multi-polygon.

  • In all other cases, 2D geometries are assumed to _not cross the antimeridian at all_.

Parameters

geometry – A geometry-like object

Returns

Shapely geometry object or None.