Commit 6b90be82 authored by Matt Pryor's avatar Matt Pryor
Browse files

Document the use of existing catalogs + local cache

parent 1bb9ae3a
Loading
Loading
Loading
Loading
+9 −5
Original line number Diff line number Diff line
@@ -26,10 +26,14 @@ This project is under heavy active development, with the implementation dependin
Future Architecture discussions.

Currently, only an unauthenticated data node is implemented. The data node uses THREDDS to serve
catalog and OPeNDAP endpoints and Nginx to serve files, using
[datasetScan elements](https://www.unidata.ucar.edu/software/tds/current/reference/DatasetScan.html)
for a catalog-free configuration. As such, it is designed to work with the next-generation publisher
being developed at LLNL that does not rely on THREDDS catalogs for publishing metadata.
catalog and OPeNDAP endpoints, but uses Nginx to do direct file serving which should by more
performant than THREDDS.

The data node is capable of using existing catalogs from the current publisher to specify the
available data, however it can also use a catalog-free configuration which utilises
[datasetScan elements](https://www.unidata.ucar.edu/software/tds/current/reference/DatasetScan.html),
to serve all files under a given dataset root. This is designed to work with the next-generation
publisher being developed at LLNL that does not rely on THREDDS catalogs for publishing metadata.

## Image tags

@@ -70,7 +74,7 @@ cd esgf-docker
These changes have not yet been committed to `master`, so you will need to check out the development branch:

```sh
git checkout issue/112/nginx-data-node
git checkout future-architecture
```

Then follow the deployment guide for your chosen deployment method:
+1 −1
Original line number Diff line number Diff line
@@ -6,7 +6,7 @@ data:
  # Mount the /test_data volume on the host as /test_data in the container
  mounts:
    - mountPath: /test_data
      volume:
      volumeSpec:
        hostPath:
          path: /test_data

+37 −5
Original line number Diff line number Diff line
@@ -11,6 +11,18 @@ of the [playbook roles](../deploy/ansible/roles/). The defaults have extensive c
explain how to use these variables. This document describes how to apply some common
configurations.

<!-- TOC depthFrom:2 -->

- [Running the playbook](#running-the-playbook)
- [Local test installation with Vagrant](#local-test-installation-with-vagrant)
- [Configuring the installation](#configuring-the-installation)
    - [Setting the version](#setting-the-version)
    - [Setting the web address](#setting-the-web-address)
    - [Configuring the available datasets](#configuring-the-available-datasets)
    - [Using existing THREDDS catalogs](#using-existing-thredds-catalogs)

<!-- /TOC -->

## Running the playbook

Before attempting to run the playbook, make sure that you have
@@ -114,18 +126,17 @@ service like [AWS Route 53](https://aws.amazon.com/route53/) to do this.

### Configuring the available datasets

The data node uses a catalog-free configuration where the available data is defined simply by a
series of datasets. For each dataset, all files under the specified path will be served using both
By default, the data node uses a catalog-free configuration where the available data is defined simply
by a series of datasets. For each dataset, all files under the specified path will be served using both
OPeNDAP (for NetCDF files) and plain HTTP. The browsable interface and OPeNDAP are provided by
THREDDS and direct file serving is provided by Nginx.

The configuration of the datasets is done using two variables:

  * `data_mounts`: List of directories to mount from the host into the container. Each item should contain
    the keys:
  * `data_mounts`: List of directories to mount from the host into the data-serving containers. Each item should contain the keys:
    * `host_path`: The path on the host
    * `mount_path`: The path in the container
  * `data_datasets`: List of datasets to expose. Each item should contain the keys:
  * `data_datasets`: List of datasets to expose using the data-serving containers. Each item should contain the keys:
    * `name`: The human-readable name of the dataset, displayed in the THREDDS UI
    * `path`: The URL path part for the dataset
    * `location`: The directory path to the root of the dataset in the container
@@ -153,3 +164,24 @@ data_datasets:
    path: esg_cordex
    location: /data/cordex
```

### Using existing THREDDS catalogs

The data node can be configured to serve data based on pre-existing THREDDS catalogs, for
example those generated by the ESGF publisher. This is done by specifying a single additional
variable - `thredds_catalog_host_path` - pointing to a directory containing the pre-existing
catalogs:

```yaml
thredds_catalog_host_path: /path/to/existing/catalogs
```

> **NOTE**
>
> You must still configure `data_mounts` and `data_datasets` as above, except in this case the
> datasets should correspond the to the `datasetRoot`s in your THREDDS catalogs.

When the catalogs change, run the Ansible playbook in order to restart the containers and
load the new catalogs. THREDDS is configured to use a persistent volume for cache files, meaning
that although the first start may be slow for large catalogs, subsequent restarts should be
much faster (depending how many files have changed).
+105 −9
Original line number Diff line number Diff line
@@ -11,6 +11,18 @@ For a complete list of all the variables that are available, please look at the
comments that explain how to use these variables. This document describes how to apply some common
configurations.

<!-- TOC depthFrom:2 -->

- [Installing/upgrading ESGF](#installingupgrading-esgf)
- [Local test installation with Minikube](#local-test-installation-with-minikube)
- [Configuring the installation](#configuring-the-installation)
    - [Setting the version](#setting-the-version)
    - [Configuring the available datasets](#configuring-the-available-datasets)
    - [Using existing THREDDS catalogs](#using-existing-thredds-catalogs)
    - [Improving pod startup time for large catalogs](#improving-pod-startup-time-for-large-catalogs)

<!-- /TOC -->

## Installing/upgrading ESGF

Before attempting to install the ESGF Helm chart, you must have the following:
@@ -34,7 +46,9 @@ ESGF deployment will be available:
hostname: esgf.example.org
```

> **NOTE:** The Helm chart does not create a DNS entry for the hostname. This must be separately configured
> **NOTE**
>
> The Helm chart does not create a DNS entry for the hostname. This must be separately configured
> to point to the ingress controller for your Kubernetes cluster.

Once you have configured your `values.yaml`, you can install or upgrade ESGF using the Helm chart. If no
@@ -91,8 +105,8 @@ image:

### Configuring the available datasets

The data node uses a catalog-free configuration where the available data is defined simply by a
series of datasets. For each dataset, all files under the specified path will be served using both
By default, the data node uses a catalog-free configuration where the available data is defined simply by
a series of datasets. For each dataset, all files under the specified path will be served using both
OPeNDAP (for NetCDF files) and plain HTTP. The browsable interface and OPeNDAP are provided by
THREDDS and direct file serving is provided by Nginx.

@@ -100,8 +114,10 @@ The configuration of the datasets is done using two variables:

  * `data.mounts`: List of volumes to mount into the container. Each item should contain the keys:
    * `mountPath`: The path to mount the volume inside the container
    * `volume`: A [Kubernetes volume specification](https://kubernetes.io/docs/concepts/storage/volumes/)
    * Any additional keys are set as options on the volume mount, e.g. `mountPropagation` for `hostPath` volumes
    * `volumeSpec`: A [Kubernetes volume specification](https://kubernetes.io/docs/concepts/storage/volumes/) for
      the volume containing the data
    * `name` (optional): A name for the volume - by default, a name is derived from the `mountPath`
    * `mountOptions` (optional): Options for the volume mount, e.g. `mountPropagation` for `hostPath` volumes
  * `data.datasets`: List of datasets to expose. Each item should contain the keys:
    * `name`: The human-readable name of the dataset, displayed in the THREDDS UI
    * `path`: The URL path part for the dataset
@@ -112,7 +128,8 @@ The configuration of the datasets is done using two variables:
> When using `hostPath` volumes, the data must exist at the same path on all cluster hosts where the THREDDS
> or file server pods might be scheduled.
>
> If your data is on a shared filesystem, just mount the filesystem on your cluster nodes as you normally would.
> If your data is on a shared filesystem, just mount the filesystem on your cluster nodes as you would
> with any other host.

These variables should be defined in your `values.yaml`, e.g.:

@@ -121,11 +138,12 @@ data:
  mounts:
    # This uses a hostPath volume to mount /datacentre/archive on the host as /data in the container
    - mountPath: /data
      # mountPropagation is particularly important if the filesystem has automounted sub-mounts
      mountPropagation: HostToContainer
      volume:
      volumeSpec:
        hostPath:
          path: /datacentre/archive
      mountOptions:
        # mountPropagation is particularly important if the filesystem has automounted sub-mounts
        mountPropagation: HostToContainer

  datasets:
    # This will expose files at /data/cmip6/[path] in the container
@@ -139,3 +157,81 @@ data:
      path: esg_cordex
      location: /data/cordex
```

### Using existing THREDDS catalogs

The data node can be configured to serve data based on pre-existing THREDDS catalogs, for
example those generated by the ESGF publisher. To do this, you must specify the volume
containing the catalogs using the variable `data.thredds.catalogVolume`. This volume must
be available to all nodes where THREDDS pods might be scheduled and must be able to be
mounted in multiple pods at once, for example a `hostPath` using a shared filesystem.
This variable should contain the keys `volumeSpec` and `mountOptions`, which have the
same meaning as for `data.mounts` above, e.g.:

```yaml
data:
  thredds:
    catalogVolume:
      volumeSpec:
        hostPath:
          path: /path/to/shared/catalogs
      mountOptions:
        mountPropagation: HostToContainer
```

> **NOTE**
>
> You must still configure `data.mounts` and `data.datasets` as above, except in this case the
> datasets should correspond the to the `datasetRoot`s in your THREDDS catalogs.

When the catalogs change, run the Helm chart in order to create new pods which will
load the new catalogs. This will be done using a rolling upgrade with no downtime - the
old pods will continue to serve requests with the old catalogs until new pods are ready.

For large catalogs, you may also need to adjust the startup time for the THREDDS container
as THREDDS must build the catalog cache before it can start serving requests. To do this,
specify `data.thredds.startTimeout`, which specifies the number of seconds to wait for
THREDDS to start before assuming there is a problem and trying again (default `300`):

```yaml
data:
  thredds:
    startTimeout: 3600  # Large catalogs may take an hour or more
```

### Improving pod startup time for large catalogs

Pods in Kubernetes are ephemeral, meaning they do not preserve state across restarts.
This includes the THREDDS caches, meaning that every time a pod starts it will spend time
rebuilding the catalog cache before serving requests, even if the catalogs have not changed.
This is exacerbated by the fact that the catalogs will likely be on network-attached-storage
in order to facilitate sharing across nodes, meaning higher latency for stat and read
operations.

For large catalogs, this can result in THREDDS pods taking an hour or more to start. This is not
merely an inconvenience - in order to benefit from advanced features in Kubernetes such as
recovery from failure and demand-based auto-scaling, pods must start quickly in order to begin
taking load as soon as possible. There are two things that can be done to address this problem:

  * Keep a copy of the catalogs on the local disk of each node that may have THREDDS pods scheduled
  * Pre-build the catalog cache (again on the local disk of each node) and use it to seed the cache for new THREDDS pods

In an ESGF deployment, this is acheived by having a
[DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) that runs on
each node. When the Helm chart is run or a new node is added to the cluster, this `DaemonSet`
will syncronise the THREDDS catalogs to each node's local disk and run THREDDS to build the catalog
cache. The THREDDS pods will wait for the `DaemonSet` to finish updating the cache before starting,
using the pre-built cache as a seed for their own local caches. While they are waiting, the old
pods will continue to serve requests using the old catalogs, so the upgrade is zero-downtime.
Using this approach, copying the catalogs to local disk and rebuilding the cache are one-time
operations and the THREDDS pods start much faster (less than one minute for a large catalog at
CEDA in testing).

To enable local caching of catalogs for a deployment, just set `data.thredds.localCache.enabled`:

```yaml
data:
  thredds:
    localCache:
      enabled: true
```