[xcube.core.store]: https://github.com/dcs4cop/xcube/tree/main/xcube/core/store
[xcube Dataset Convention]: ./cubespec.md
[xcube Multi-Level Dataset Convention]: ./mldatasets.md
[xcube Data Store Conventions]: ./storeconv.md
[xarray.Dataset]: https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html
[geopandas.GeoDataFrame]: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.html
[Dask arrays]: https://docs.dask.org/en/stable/array.html
[JSON Object Schema]: https://json-schema.org/understanding-json-schema/reference/object.html
[setuptools entry point]: https://setuptools.pypa.io/en/latest/userguide/entry_point.html

[CDSE STAC API]: https://browser.stac.dataspace.copernicus.eu
[Copernicus Marine Service]: https://marine.copernicus.eu/
[Copernicus Climate Data Store]: https://cds.climate.copernicus.eu/
[Copernicus Land Monitoring Service]: https://land.copernicus.eu/en
[EOPF Sentinel Zarr Samples]: https://zarr.eopf.copernicus.eu/
[ESA Climate Data Centre]: https://climate.esa.int/en/odp/
[ESA Soil Moisture and Ocean Salinity]: https://earth.esa.int/eogateway/missions/smos
[Global Ecosystem Dynamics Investigation]: https://gedi.umd.edu/
[gedidb]: https://gedidb.readthedocs.io/en/latest/
[Open Data Portal]: https://climate.esa.int/en/data/#/dashboard
[SpatioTemporal Asset Catalogs]: https://stacspec.org/en/
[Sentinel Hub]: https://www.sentinel-hub.com/
[Zenodo]: https://zenodo.org/

[xcube-cci]: https://github.com/dcs4cop/xcube-cci
[xcube-cds]: https://github.com/dcs4cop/xcube-cds
[xcube-clms]: https://github.com/xcube-dev/xcube-clms
[xcube-cmems]: https://github.com/dcs4cop/xcube-cmems
[xcube-gedidb]: https://github.com/xcube-dev/xcube-gedidb
[xcube-sh]: https://github.com/dcs4cop/xcube-sh
[xcube-smos]: https://github.com/dcs4cop/xcube-smos
[xcube-stac]: https://github.com/xcube-dev/xcube-stac
[xcube-zenodo]: https://github.com/xcube-dev/xcube-zenodo

[API reference]: https://xcube.readthedocs.io/en/latest/api.html#data-store-framework
[DataStore]: https://xcube.readthedocs.io/en/latest/api.html#xcube.core.store.DataStore
[MutableDataStore]: https://xcube.readthedocs.io/en/latest/api.html#xcube.core.store.MutableDataStore
[DataOpener]: https://xcube.readthedocs.io/en/latest/api.html#xcube.core.store.DataOpener
[DataWriter]: https://xcube.readthedocs.io/en/latest/api.html#xcube.core.store.DataOpener
[DataDescriptor]: https://xcube.readthedocs.io/en/latest/api.html#xcube.core.store.DataDescriptor
[DatasetDescriptor]: https://xcube.readthedocs.io/en/latest/api.html#xcube.core.store.DatasetDescriptor
[GenericZarrStore]: https://xcube.readthedocs.io/en/latest/api.html#xcube.core.zarrstore.GenericZarrStore
[MultiLevelDataset]: https://xcube.readthedocs.io/en/latest/api.html#xcube.core.mldataset.MultiLevelDataset
[Server]: https://xcube.readthedocs.io/en/latest/cli/xcube_serve.html

# Data Access

In xcube, data cubes are raster datasets that are basically a collection 
of N-dimensional geo-physical variables represented by
[xarray.Dataset] Python objects (see also [xcube Dataset Convention]). 
Data cubes may be provided by a variety of sources and may be stored using 
different data formats. In the simplest case you have a NetCDF file or a 
Zarr directory in your local filesystem that already represents a data cube. 
Data cubes may also be stored on AWS S3 or Google Cloud Storage using the 
Zarr format. Sometimes a set of NetCDF or GeoTIFF files in some storage 
must first be concatenated to form a data cube. In other cases, data cubes 
can be generated on-the-fly by suitable requests to some cloud-hosted data 
API such as the [ESA Climate Data Centre] or [Sentinel Hub].

## Data Store Framework

The _xcube data store framework_ provides a simple and consistent Python 
interface that is used to open [xarray.Dataset] and other data objects from
_data stores_ which abstract away the individual data sources, protocols, 
formats and hides involved data processing steps. For example, the following 
two lines open a data cube from the [ESA Climate Data Centre] comprising the 
essential climate variable Sea Surface Temperature (SST):

```python
store = new_data_store("cciodp")
cube = store.open_data("esacci.SST.day.L4.SSTdepth.multi-sensor.multi-platform.OSTIA.1-1.r1")
```

Often, and in the example above, data stores create data cube _views_ on
a given data source. That is, the actual data arrays are subdivided into 
chunks and each chunk is fetched from the source in a "lazy" manner.
In such cases, the [xarray.Dataset]'s variables are backed by [Dask arrays].
This allows data cubes to be virtually of any size.

Data stores can provide the data using different Python in-memory 
representations or data types.
The most common representation for a data cube is an [xarray.Dataset]
instance, multi-resolution data cubes would be represented as a xcube 
[MultiLevelDataset] instance 
(see also [xcube Multi-Level Dataset Convention]). 
Vector data is usually provided as an instance of [geopandas.GeoDataFrame].

Data stores can also be writable. All read-only data stores share the same 
functional interface and so do writable
data stores. Of course, different data stores will have different
configuration parameters. Also, the parameters passed to the `open_data()`
method, or respectively the `write_data()` method, may change based on the 
store's capabilities. 
Depending on what is offered by a given data store, also the parameters 
passed to the `open_data()` method may change. 

The xcube data store framework is exported from the [xcube.core.store] 
package, see also its [API reference].

The [DataStore] abstract base class is the primary user interface for 
accessing data in xcube. The most important operations of a data store are:

* `list_data_ids()` - enumerate the datasets of a data store by returning
  their data identifiers;
* `describe_data(data_id)` - describe a given dataset in terms of its metadata
  by returning a specific [DataDescriptor], e.g., a [DatasetDescriptor];
* `search_data(...)` - search for datasets in the data store and return a
   [DataDescriptor] iterator;
* `open_data(data_id, ...)` - open a given dataset and return, e.g., 
  an [xarray.Dataset] instance.

The [MutableDataStore] abstract base class represents a writable data 
store and extends [DataStore] by the following operations:

* `write_data(dataset, data_id, ...)` - write a dataset to the data store;
* `delete_data(data_id)` - delete a dataset from the data store;

Above, the ellipses `...` are used to indicate store-specific parameters
that are passed as keyword-arguments. For a given data store instance, 
it is not obvious what are parameters are allowed. Therefore, data stores 
provide a programmatic way to describe the allowed parameters for the  
operations of a given data store by the means of a parameter schema:

* `get_open_data_params_schema()` - describes parameters of `open_data()`;
* `get_search_data_params_schema()` - describes parameters of `search_data()`;
* `get_write_data_params_schema()` - describes parameters of `write_data()`.

All operations return an instance of a [JSON Object Schema].
The JSON object's properties describe the set of allowed and required 
parameters as well as the type and value range of each parameter. The 
schemas are also used internally to validate the parameters passed by the 
user.

xcube comes with a predefined set of writable, filesystem-based data stores.
Since data stores are xcube extensions, additional data stores can be added 
by xcube plugins. The data store framework provides a number of global 
functions that can be used to access the available data stores: 

* `find_data_store_extensions() -> list[Extension]` - get a list of 
  xcube data store extensions;
* `new_data_store(store_id, ...) -> DataStore` - instantiate a data store with 
  store-specific parameters;
* `get_data_store_params_schema(store_id) -> Schema` - describe the 
  store-specific parameters that must/can be passed to `new_data_store()` 
  as [JSON Object Schema].

The following example outputs all installed data stores:

```python
from xcube.core.store import find_data_store_extensions

for ex in find_data_store_extensions():
    store_id = ex.name
    store_md = ex.metadata
    print(store_id, "-", store_md.get("description"))
```

If one of the installed data stores is, e.g. `sentinelhub`, you could
further introspect its specific parameters and datasets as shown in the 
following example:  

```python
from xcube.core.store import get_data_store_params_schema
from xcube.core.store import new_data_store

store_schema = get_data_store_params_schema("sentinelhub")
store = new_data_store("sentinelhub",
                       # The following parameters are specific to the 
                       # "sentinelhub" data store. 
                       # Refer to the store_schema.
                       client_id="YOURID",
                       client_secret="YOURSECRET",
                       num_retries=250,
                       enable_warnings=True)

data_ids = store.list_data_ids()
# Among others, we find "S2L2A" in data_ids 

open_schema = store.get_open_data_params_schema("S2L2A")
cube = store.open_data("S2L2A",
                       # The following parameters are specific to 
                       # "sentinelhub" datasets, such as "S2L2A". 
                       # Refer to the open_schema.
                       variable_names=["B03", "B06", "B8A"],             
                       bbox=[9, 53, 20, 62],             
                       spatial_res=0.025,             
                       crs="WGS-84",              
                       time_range=["2022-01-01", "2022-01-05"],             
                       time_period="1D")
```


## Available Data Stores

This sections lists briefly the official data stores available for xcube.
We provide the store identifier, list the store parameters, and list 
the common parameters used to open data cubes, i.e., [xarray.Dataset] 
instances.

Note that in some data stores, the open parameters may differ from dataset to 
dataset depending on the actual dataset layout, coordinate references 
system or data type. Some data stores may also provide vector data.

For every data store we also provide a dedicated example Notebook that 
demonstrates its specific usage in 
[examples/notebooks/datastores](https://github.com/dcs4cop/xcube/tree/main/examples/notebooks/datastores).

Use `list_data_store_ids()` to list all data stores available in your current Python environment.
The output depends on the installed xcube plugins. 

```python
from xcube.core.store import list_data_store_ids

list_data_store_ids()
```

### Filesystem-based data stores

The following filesystem-based data stores are available in xcube:

* `"file"` for the local filesystem;
* `"s3"` for AWS S3 compatible object storage; 
* `"abfs"` for Azure blob storage;
* `"memory"` for mimicking an in-memory filesystem;
* `"https"` for https protocols;
* `"ftp"` for FTP server;
* `"reference"` for read-only `fsspec` reference file systems.

All filesystem-based data store have the following parameters:

* `root: str` - The root directory of the store in the filesystem. Defaults to `''`.
* `max_depth: int` - Maximum directory traversal depth. Defaults to `1`.
* `read_only: bool` - Whether this store is read-only. Defaults to `False`.
* `includes: list[str]` - A list of paths to include into the store. 
  May contain wildcards `*` and `?`. Defaults to `UNDEFINED`.
* `excludes: list[str]` - A list of paths to exclude from the store. 
  May contain wildcards `*` and `?`. Defaults to `UNDEFINED`.
* `storage_options: dict[str, any]` - Filesystem-specific options.

The `reference` store has the following additional parameters:
* `refs: list` - List of references to use for this instance. Items can be:
    * A path or URL to a reference JSON file, or
    * A dictionary with:
        * `ref_path: str` - Path or URL to the reference file. Required.
        * `data_id: str` - Optional identifier for the referenced data.
        * `data_descriptor: dict` - Optional metadata or descriptor.
* `target_protocol: str` - Target Protocol. If not provided,
  derived from the given path.
* `target_options: str` - Additional options for loading reference files. 
* `remote_protocol: str` - Protocol of the filesystem on which the references. Derived from 
   the first reference with a protocol, if not given.
   will be evaluated.
* `remote_options: str` - Additional options for loadings reference files.
* `max_gap: int` - Max byte-range gap allowed when merging concurrent requests.
* `max_block: int` - Max size of merged byte ranges.
* `cache_size: int` - Max size of LRU cache.

The parameter `storage_options` is filesystem-specific. Valid 
`storage_options` for all filesystem data stores are: 

* `use_listings_cache: bool` 
* `listings_expiry_time: float` 
* `max_paths: int` 
* `skip_instance_cache: bool` 
* `asynchronous: bool` 

The following `storage_options` can be used for the `file` data store:

* `auto_mkdirs: bool` - Whether, when opening a file, the directory 
  containing it should be created (if it doesn't already exist).

The following `storage_options` can be used for the `s3` data store:

* `anon: bool` - Whether to anonymously connect to AWS S3.
* `key: str` - AWS access key identifier.
* `secret: str` - AWS secret access key.
* `token: str` - Session token.
* `use_ssl: bool` - Whether to use SSL in connections to S3; 
  may be faster without, but insecure. 
  Defaults to `True`.
* `requester_pays: bool` - If "RequesterPays" buckets are supported.
  Defaults to `False`.
* `s3_additional_kwargs: dict` - parameters that are used when calling 
  S3 API methods. Typically, used for things like "ServerSideEncryption".
* `client_kwargs: dict` - Parameters for the botocore client.

The following `storage_options` can be used for the `abfs` data store:

* `anon: bool` - Whether to anonymously connect to Azure Blob Storage.
* `account_name: str` - Azure storage account name.
* `account_key: str` - Azure storage account key.
* `connection_string: str` - Connection string for Azure blob storage.

The following `storage_options` can be used for the `ftp` data store:

* `host` - Remote server name/ip 
* `port` - FTP Port, min: 0, max: 65535,`default`: 21
* `username` - User's identifier, if using
* `password` - User's password, if using

All filesystem data stores can open datasets from various data formats. 
Datasets in Zarr, GeoTIFF / COG, or NetCDF format will be provided either by
[xarray.Dataset] or xcube [MultiLevelDataset] instances.
Datasets stored in GeoJSON or ESRI Shapefile will yield 
[geopandas.GeoDataFrame] instances.

Common parameters for opening [xarray.Dataset] instances:

* `cache_size: int` - Defaults to `UNDEFINED`.
* `group: str` - Group path. (a.k.a. path in zarr terminology.). 
  Defaults to `UNDEFINED`.
* `chunks: dict[str, int | str]` - Optional chunk sizes along each dimension. 
  Chunk size values may be None, "auto" or an integer value. 
  Defaults to `UNDEFINED`.
* `decode_cf: bool` - Whether to decode these variables, assuming they 
  were saved according to CF conventions. Defaults to `True`.
* `mask_and_scale: bool` - If True, replace array values equal to 
  attribute "_FillValue" with NaN.  Use "scale_factor" and "add_offset" 
  attributes to compute actual values.. Defaults to `True`.
* `decode_times: bool` - If True, decode times encoded in the standard 
  NetCDF datetime format into datetime objects. 
  Otherwise, leave them encoded as numbers.. Defaults to `True`.
* `decode_coords: bool` - If True, decode the "coordinates" 
  attribute to identify coordinates in the resulting dataset. 
  Defaults to `True`.
* `drop_variables: list[str]` - List of names of variables to be dropped. 
  Defaults to `UNDEFINED`.
* `consolidated: bool` - Whether to open the store using Zarr's 
  consolidated metadata capability. Only works for stores that have already 
  been consolidated. Defaults to `False`.
* `log_access: bool` - Defaults to `False`.

### Copernicus Climate Data Store `cds`

The data store `cds` provides datasets of the [Copernicus Climate Data Store].

This data store is provided by the xcube plugin [xcube-cds].
You can install it using `conda install -c conda-forge xcube-cds`.

Data store parameters:

* `cds_api_key: str` - User API key for Copernicus Climate Data Store.
* `endpoint_url: str` - API endpoint URL.
* `num_retries: int` - Defaults to `200`.
* `normalize_names: bool` - Defaults to `False`.

Common parameters for opening [xarray.Dataset] instances:

* `bbox: (float, float, float, float)` - Bounding box in geographical 
  coordinates. 
* `time_range: (str, str)` - Time range.
* `variable_names: list[str]` - List of names of variables to be included. 
  Defaults to all.
* `spatial_res: float` - Spatial resolution. Defaults to `0.1`.

### Copernicus Marine Service `cmems`

The data store `cmems` provides datasets of the [Copernicus Marine Service]. 

This data store is provided by the xcube plugin [xcube-cmems].
You can install it using `conda install -c conda-forge xcube-cmems`.

Data store parameters:

* `cmems_username: str` - CMEMS API username
* `cmems_password: str` - CMEMS API password
* `cas_url: str` - Defaults to `'https://cmems-cas.cls.fr/cas/login'`.
* `csw_url: str` - Defaults to `'https://cmems-catalog-ro.cls.fr/geonetwork/srv/eng/csw-MYOCEAN-CORE-PRODUCTS?'`.
* `databases: str` - One of `['nrt', 'my']`.
* `server: str` - Defaults to `'cmems-du.eu/thredds/dodsC/'`.

Common parameters for opening [xarray.Dataset] instances:

* `variable_names: list[str]` - List of variable names.
* `time_range: [str, str]` - Time range.

### Copernicus Land Monitoring Service `clms`

The data store `clms` provides datasets of the [Copernicus Land Monitoring Service].

This data store is provided by the xcube plugin [xcube-clms].
You can install it using `conda install -c conda-forge xcube-clms`.

Data store parameters:

* `credentials: dict`: CLMS API credentials that can be obtained following the steps outlined
  [here](https://eea.github.io/clms-api-docs/authentication.html).
  These are the credentials parameters:

    * `client_id: str` - Required.
    * `issued: str`
    * `private_key: str` - Required.
    * `key_id: str`
    * `title: str`
    * `token_uri: str` - Required.
    * `user_id: str` - Required.

* `cache_store_id: str` - Store ID of cache data store. Defaults to `file`.
* `cache_store_params: dict` - Store parameters of a filesystem-based data
  store.

Before opening a specific dataset from CLMS, it's required to preload the data first.
Preloading lets you create data requests ahead of time, which may sit in a queue before
being processed. Once processed, the data is downloaded as zip files, unzipped,
extracted to a cache, and prepared for use. After this, it can be accessed through the
cache data store.

The preload parameters are:
* `blocking: bool` - Switch to make the preloading process blocking or non-blocking.
  If True, the preloading process blocks the script. Defaults to `True`.
* `silent: bool` - Silence the output of Preload API. If True, no preload state
  output is given. Defaults to `False`.
* `cleanup: bool` - Cleanup the download directory before and after the
  preload job and the cache directory when preload_handle.close() is called.
  Defaults to `True`.

Its common dataset open parameters for opening [xarray.Dataset] instances are the same
as for the filesystem-based data stores described above.

### EOPF Sample Service `eopf-zarr`

The data store `eopf-zarr` provides access to the [EOPF Sentinel Zarr Samples] as an
analysis-ready datacube (ARDC).

This data store is provided by the xcube plugin `xcube-eopf`.
You can install it using `conda install -c conda-forge xcube-eopf`.

No data store parameters needed.

Common parameters for opening [xarray.Dataset] instances:

* `bbox: ?[float|int, float|int, float|int, float|int]?`- Bounding box 
   ["west", "south", "est", "north"] in CRS coordinates. 
* `time_range: [str, str]` - Temporal extent ["YYYY-MM-DD", "YYYY-MM-DD"]. 
* `spatial_res: int|float` - Spatial resolution in meter of degree (depending on the CRS). 
* `crs: str` - Coordinate reference system (e.g. `"EPSG:4326"`). 
* `variables: ?str | list[str]?` - Variables to include in the dataset. Can be a name 
   or regex pattern or iterable of the latter. 
* `query: Any (not specified)` - Additional query options for filtering STAC Items by 
   properties. See [STAC Query Extension](https://github.com/stac-api-extensions/query) 
   for details.

### ESA Climate Data Centre (ESA CCI) `cciodp`, `ccizarr`, `esa-cci-kc`

Three data stores are provided by the xcube plugin [xcube-cci].
You can install the plugin using `conda install -c conda-forge xcube-cci`.

#### `cciodp`

The data store `cciodp` provides the datasets of
the [ESA Climate Data Centre].

Data store parameters:

* `endpoint_url: str` - Defaults to
  `'https://archive.opensearch.ceda.ac.uk/opensearch/request'`.
* `endpoint_description_url: str` - Defaults to
  `'https://archive.opensearch.ceda.ac.uk/opensearch/description.xml?parentIdentifier=cci'`.
* `enable_warnings: bool` - Whether to output warnings. Defaults to `False`.
* `num_retries: int` - Number of retries when requesting data fails.
  Defaults to `200`.
* `retry_backoff_max: int` - Defaults to `40`.
* `retry_backoff_base: float` - Defaults to `1.001`.

Common parameters for opening [xarray.Dataset] instances:

* `variable_names: list[str]` - List of variable names. Defaults to all.
* `bbox: (float, float, float, float)` - Bounding box in geographical
  coordinates.
* `time_range: (str, str)` - Time range.
* `normalize_data: bool` - Whether to normalize and sanitize the data.
  Defaults to `True`.

#### `ccizarr`

A subset of the datasets of the `cciodp` store have been made available
using the Zarr format using the data store `ccizarr`. It provides
much better data access performance.

It has no dedicated data store parameters.

Its common dataset open parameters for opening [xarray.Dataset] instances are
the same as for the filesystem-based data stores described above.

#### `esa-cci-kc`

The data store `esa-cci-kc` accesses datasets that are offered by the [Open Data Portal]
via the references format.

Data store parameters are the same as for the filesystem-based `reference` store.

Its common dataset open parameters for opening [xarray.Dataset] instances are
the same as for the filesystem-based data stores described above.

### ESA SMOS `smos`

The data store `smos` provides L2C datasets of the [ESA Soil Moisture and Ocean Salinity]
mission.

This data store is provided by the xcube plugin [xcube-smos]. You can install it using
`conda install -c conda-forge xcube-smos`.

Data store parameters:
* `source_path: str` - Path or URL into SMOS archive filesystem.
* `source_protocol: str`: Protocol name for the SMOS archive filesystem.
* `source_storage_options: dict`: Storage options for the SMOS archive filesystem. See
  fsspec documentation for specific filesystems. Any option can be overriden by
  passing it as additional data store parameter.
* `cache_path: str`: Path to local cache directory. Must be given, if file caching
  is desired.
* `xarray_kwargs: dict`: Extra keyword arguments accepted by `xarray.open_dataset`.

Common parameters for opening [xarray.Dataset] instances:

* `time_range: (str, str)` - Time range given as pair of start and stop dates.
  Format: `YYYY-MM-DD`. Required.
* `bbox: (float, float, float, float)` - Bounding box in geographical
  coordinates.
* `res_level: int` - Spatial resolution level in the range 0 to 4. Zero refers to
  the max resolution of 0.0439453125 degrees.

### Global Ecosystem Dynamics Investigation `gedidb`

The data store `gedidb` provides access to [Global Ecosystem Dynamics Investigation] 
(GEDI) data. The store is developed using the API from [gedidb] which is licensed under
[European Union Public License 1.2](https://github.com/simonbesnard1/gedidb/blob/main/LICENSE).

This data store is provided by the xcube plugin [xcube-gedidb]. Due to the 
unavailability of `gedidb` as a conda package, `xcube-gedidb` is packaged via PyPi. 
To install it, please make sure you have an activated conda environment created from the 
[environment.yml](https://github.com/xcube-dev/xcube-gedidb/blob/main/environment.yml), 
and then do `pip install xcube-gedi`.

It has no dedicated data store parameters.

This data store can be requested to open the datasets in one of two ways:

- request all available data within a **bounding box** by specifying a `bbox` in 
  the `open_data` method.
- request all available data around a given **point** by specifying a `point` in 
  the `open_data` method.

Parameters for opening [xarray.Dataset] instances:

Either  
* `bbox: (float, float, float, float)` - A bounding box in the form of 
  `(xmin, ymin, xmax, ymax)`. Required.

Or  
* `point: (float, float)` - Reference point for nearest query. Required
* `num_shots: int` - Number of shots to retrieve. Defaults to `10`.
* `radius: float` - Radius in degrees around the point Defaults to`0.1`.

Common:  
* `time_range: (str, str)` - Time range. Required.
* `variables: list[str]` - List of variables to retrieve from the database.

### Sentinel Hub API 

The data store `sentinelhub` provides the datasets of the 
[Sentinel Hub] API. 

This data store is provided by the xcube plugin [xcube-sh].
You can install it using `conda install -c conda-forge xcube-sh`.

Data store parameters:

* `client_id: str` - Sentinel Hub API client identifier
* `client_secret: str` - Sentinel Hub API client secret
* `api_url: str` - Sentinel Hub API URL. 
  Defaults to `'https://services.sentinel-hub.com'`.
* `oauth2_url: str` - Sentinel Hub API authorisation URL. 
  Defaults to `'https://services.sentinel-hub.com/oauth'`.
* `enable_warnings: bool` - Whether to output warnings. 
  Defaults to `False`.
* `error_policy: str` - Policy for errors while requesting data. 
  Defaults to `'fail'`.
* `num_retries: int` - Number of retries when requesting data fails. 
  Defaults to `200`.
* `retry_backoff_max: int` - Defaults to `40`.
* `retry_backoff_base: number` - Defaults to `1.001`.

Common parameters for opening [xarray.Dataset] instances:

* `bbox: (float, float, float, float)` - Bounding box in coordinate 
  units of `crs`. Required. 
* `crs: str` - Defaults to `'WGS84'`.
* `time_range: (str, str)` - Time range. Required. 
* `variable_names: list[str]` - List of variable names. Defaults to all.
* `variable_fill_values: list[float]` - List of fill values 
  according to `variable_names`
* `variable_sample_types: list[str]` - List of sample types 
  according to `variable_names`
* `variable_units: list[str]` - List of sample units 
  according to `variable_names`
* `tile_size: (int, int)` - Defaults to `(1000, 1000)`.
* `spatial_res: float` - Required. 
* `upsampling: str` - Defaults to `'NEAREST'`.
* `downsampling: str` - Defaults to `'NEAREST'`.
* `mosaicking_order: str` - Defaults to `'mostRecent'`.
* `time_period: str` - Defaults to `'1D'`.
* `time_tolerance: str` - Defaults to `'10M'`.
* `collection_id: str` - Name of the collection.
* `four_d: bool` - Defaults to `False`.
* `extra_search_params: dict` - Extra search parameters passed to a 
  catalogue query.
* `max_cache_size: int` - Maximum chunk cache size in bytes.

### SpatioTemporal Asset Catalogs `stac`, `stac-xcube`, `stac-cdse`

The data stores `stac`, `stac-xcube`, and `stac-cdse` provide access to datasets of
the [SpatioTemporal Asset Catalogs].

The three data stores are provided by the xcube plugin [xcube-stac].
You can install it using `conda install -c conda-forge xcube-stac.`


#### `stac`
The data store `stac` provides datasets from a user-defined STAC API.

Specific parameters for this store are:
* `url: str` - URL to STAC catalog. Required.
* `stack_mode: bool` - Stacking of STAC items. Transforms data into analysis-ready
  format. Defaults to `False`.
* `**store_params`: Store parameters to configure the store used to access the data, 
  which are the same as those used for `https` and `s3` stores. The hrefs in the 
  STAC assets determines whether data is accessed via `https` or `s3`. 

#### `stac-xcube`
The data store `stac-xcube` connects to STAC catalogs published on a xcube [Server].

Specific parameters for this store are:
* `url: str` - URL to STAC catalog. Required.
* `stack_mode: bool` - Stacking of STAC items. Transforms data into analysis-ready
  format. Defaults to `False`.
* `**store_params`: Store parameters to configure the `s3` store used to access 
  the data. 

#### `stac-cdse`
The data store `stac-cdse` provides direct access datasets published by the 
[CDSE STAC API].

* `stack_mode: bool` - Stacking of STAC items. Transforms data into analysis-ready
  format. Defaults to `False`. Available for `data_id="sentinel-2-l2"`, which allows to 
  build 3D spatiotemporal data cubes from multiple Sentinel-2 Level-2A tiles. 
  Commen opening parameter:

  * `float, float, float, float)` - Bounding box ["west", "south", "est", "north"] 
    in CRS coordinates. 
  * `time_range: [str, str]`: Temporal extent ["YYYY-MM-DD", "YYYY-MM-DD"]. 
  * `spatial_res: int | float` - Spatial resolution in meter of degree (depending on the CRS). 
  * `crs: str` - Coordinate reference system (e.g. `"EPSG:4326"`).

* `key: str`- S3 key credential for CDSE data access
* `secret: str`- S3 secret credential for CDSE data access. In order to access [EO data via S3 from CDSE](https://documentation.dataspace.copernicus.eu/APIs/S3.html)
one needs to [generate S3 credentials](https://documentation.dataspace.copernicus.eu/APIs/S3.html#generate-secrets).


There are no common parameters for opening datasets with the three stores.
As the available datasets are varying across a wide spectrum of datatypes
no specific opening parameters can be named here. The stores delegate to the
general xcube DataOpener which offers a variety of parameters depending on the
datatype of the dataset.

Use the following function to access the parameters fitting for the dataset of interest:

```python
open_schema = store.get_open_data_params_schema("data_id")
```

### Zenodo `zenodo`

The data store `zenodo` provides access to datasets published on [Zenodo].

This data store is provided by the xcube plugin [xcube-zenodo]. 
You can install it using `conda install -c conda-forge xcube-zenodo`.

Data store parameters:

* `root: str` - Zenodo record ID. Required.
* `cache_store_id: str` - Store ID of cache data store. Defaults to `file`.
* `cache_store_params: dict` - Store parameters of a filesystem-based data store.
   Defaults to: `{"root":"zenodo_cache","max_depth":10}`.

Before opening a specific dataset in .zip format, it's required to preload the data first.
Preloading lets you create data requests ahead of time, which may sit in a queue before
being processed. Once processed, the data is downloaded as zip files, unzipped,
extracted to a cache, and prepared for use. After this, it can be accessed through the
cache data store.

The preload parameters are:
* `blocking: bool` - Switch to make the preloading process blocking or non-blocking.
  If True, the preloading process blocks the script. Defaults to `True`.
* `silent: bool` - Switch to visualize the preloading process. If False, the 
  preloading progress will be visualized in a table. If True, the visualization will 
  be suppressed. Defaults to `True`.

There are no common parameters for opening datasets with the `xcube-zenodo` store.
As the datasets uploaded on Zenodo are varying across a wide spectrum of datatypes 
no specific opening parameters can be named here. `xcube-zenodo` delegates to the
general xcube DataOpener which offers a variety of open parameters depending on the
datatype of the dataset.

Use the following function to access the parameters fitting for the dataset of interest:

```python
open_schema = store.get_open_data_params_schema("data_id")
```

## Developing new data stores

### Implementing the data store

New data stores can be developed by implementing the xcube [DataStore]
interface for read-only data store, or the [MutableDataStore] interface for
writable data stores, and should follow the [xcube Data Store Conventions].

If a data store supports combinations of Python data types, external storages  
types, and/or data formats it should consider the following design pattern:

![DataStore and MutableDataStore](uml/datastore-uml.png)

Here, we implement a dedicated [DataOpener] for a suitable combination of 
supported Python data types, external storages types, and/or data formats.
The [DataStore], which implements the [DataOpener] interface delegates
to specialized [DataOpener] implementations based on the open
parameters passed to the `open_data()` method. The same holds for the 
[DataWriter] implementations for a [MutableDataStore].

New data stores that are backed by some cloud-based data API can 
make use the xcube [GenericZarrStore] to implement the lazy fetching
of data array chunks from the API.

### Registering the data store

To register the new data store with xcube, it must be provided as
a Python package. Based on the package's name there are to ways 
to register it with xcube. If your package name matches the pattern 
`xcube_*`, then you would need to provide a function `init_plugin()` 
in the package's `plugin` module (hence `{package}.plugin.init_plugin()`). 

Alternatively, the package can have any name, but then it must register 
a [setuptools entry point] in the slot "xcube_plugins". In this case the 
function `init_plugin()` can also be placed anywhere in your code.

If you use `pyproject.toml`:
```
[project.entry-points.xcube_plugins]
{your_name} = "{your_name}.plugin:init_plugin"
```

If you use `setup.cfg`:

```
[options.entry_points]
xcube_plugins =
    {your_name} = {your_package}.plugin:init_plugin
```

If you use `setup.py`:

```python
from setuptools import setup

setup(
    # ...,
    entry_points={
        'xcube_plugins': [
            '{your_name} = {your_package}.plugin:init_plugin',
        ]
    }
)
```

The function `init_plugin` will be implemented as follows:

```python
from xcube.constants import EXTENSION_POINT_DATA_OPENERS
from xcube.constants import EXTENSION_POINT_DATA_STORES
from xcube.constants import EXTENSION_POINT_DATA_WRITERS
from xcube.util import extension

def init_plugin(ext_registry: extension.ExtensionRegistry):

    # register your DataStore extension
    ext_registry.add_extension(
        loader=extension.import_component(
            '{your_package}.store:{YourStoreClass}'),
        point=EXTENSION_POINT_DATA_STORES,
        name="{your_store_id}",
        description='{your store description}'
    )

    # register any extra DataOpener (EXTENSION_POINT_DATA_OPENERS)
    # or DataWriter (EXTENSION_POINT_DATA_WRITERS) extensions (optional)
    ext_registry.add_extension(
        loader=extension.import_component(
            '{your_package}.opener:{YourOpenerClass}'),
        point=EXTENSION_POINT_DATA_OPENERS, 
        name="{your_opener_id}",
        description='{your opener description}'
    )
```