Commit 612f831d authored by Matt Pryor's avatar Matt Pryor
Browse files

Documentation for Kubernetes deployment

parent 91b08bba
Loading
Loading
Loading
Loading
+0 −15
Original line number Diff line number Diff line
@@ -77,18 +77,3 @@ Then follow the deployment guide for your chosen deployment method:

  * [Deploy ESGF using Ansible](./docs/deploy-ansible.md)
  * [Deploy ESGF to Kubernetes using Helm](./docs/deploy-kubernetes.md)

## Test server using Vagrant

This repository includes a [Vagrantfile](./Vagrantfile) that deploys a simple test server using the
Ansible method. This test server is configured to serve data from
[roocs/mini-esgf-data](https://github.com/roocs/mini-esgf-data).

To deploy a test server, first install [VirtualBox](https://www.virtualbox.org/) and
[Vagrant](https://www.vagrantup.com/), then run:

```sh
vagrant up
```

After waiting for the containers to start, the THREDDS interface will be available at http://192.168.100.100.nip.io/thredds.
+54 −12
Original line number Diff line number Diff line
# The hostname for the deployment
hostname:

###
# Image defaults
###
# All image properties can be overridden on a per-service basis
image:
  # The image repository prefix
  # The image prefix to use
  # If using a private registry, change this, e.g. registry.ceda.ac.uk/esgfdeploy
  prefix: esgfdeploy
  # The tag to use
  tag: latest
  # The image pull policy
  # Indicates whether images should be pulled every time a pod starts
  # When using mutable tags, like latest or branch names, this should be Always
  # When using immutable tags, like commit shas or release tags, this should be IfNotPresent
  pullPolicy: Always
  # A list of names of existing secrets providing Docker registry credentials
  # Required if using a private registry that requires authentication
  # See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
  pullSecrets:

###
# Ingress configuration
###
ingress:
  # The annotations for the ingress
  # Depending on your Kubernetes cluster, this can be used to configure things like Let's Encrypt certificates
  annotations:
  # TLS configuration
  tls:
    # Either give an existing secret name
    # Either give the name of an existing TLS secret
    # See https://kubernetes.io/docs/concepts/services-networking/ingress/#tls
    secretName:
    # Or provide PEM-encoded certificate (including chain) and key files
    # Or provide a PEM-encoded certificate (including chain) and key as variables
    pem:
      cert:
      key:
    # If neither are given, then a self-signed certificate is generated

###
# Data node configuration
###
data:
  # The mounts that are required to serve data, as defined by the given datasets
  #
  # Each specified mount should include at least the following keys:
  #
  #   mountPath: The path to mount the volume inside the container
  #   volume: A Kubernetes volume specification
  #   volume: A Kubernetes volume specification - see https://kubernetes.io/docs/concepts/storage/volumes/
  #
  # Any additional keys are set as options on the volume mount, e.g. mountPropagation for hostPath volumes
  mounts: []
@@ -52,9 +71,12 @@ data:
    #   path: esg_dataroot
    #   location: /datacentre/archiveroots/archive/badc/cmip5/data

  # The pod and container security contexts for the THREDDS and Nginx pods
  # These may especially be required if using hostPath volumes for data, depending
  # The pod and container security contexts for data serving pods
  # In particular, these may be required if using hostPath volumes for data, depending
  # on the permissions of that data
  # See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
  # WARNING: Due to permissions set inside the container, the user *must* belong to group 1000
  #          in addition to the groups required to access data
  podSecurityContext: {}
  securityContext: {}

@@ -68,16 +90,36 @@ data:
    # The number of replicas for the THREDDS pod
    replicaCount: 1
    # The resource allocations for the THREDDS container
    # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
    resources: {}
    # The node selector for the THREDDS pod
    # See https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
    nodeSelector:
    # The affinity rules for the THREDDS pod
    # See https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
    affinity:
    # The tolerations for the THREDDS pod
    # See https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
    tolerations:

  # Configuration for the Nginx file server pod
  # Configuration for the file server pod
  fileServer:
    # Indicates if the Nginx file server should be deployed or not
    # Indicates if the file server should be deployed or not
    enabled: true
    # Image overrides for the Nginx image
    # Image overrides for the file server image
    image:
      repository: nginx
    # The number of replicas for the Nginx file server pod
    # The number of replicas for the file server pod
    replicaCount: 1
    # The resource allocations for the Nginx file server container
    # The resource allocations for the file server container
    # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
    resources: {}
    # The node selector for the file server pod
    # See https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
    nodeSelector:
    # The affinity rules for the file server pod
    # See https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
    affinity:
    # The tolerations for the file server pod
    # See https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
    tolerations:
+20 −0
Original line number Diff line number Diff line
#####
# Values for running in Minikube with test data from roocs/mini-esgf-data
#####

data:
  # Mount the /test_data volume on the host as /test_data in the container
  mounts:
    - mountPath: /test_data
      volume:
        hostPath:
          path: /test_data

  # Configure the datasets in the test data
  datasets:
    - name: "CMIP5"
      path: "esg_cmip5"
      location: "/test_data/badc/cmip5/data"
    - name: "CORDEX"
      path: "esg_cordex"
      location: "/test_data/group_workspaces/jasmin2/cp4cds1/data/c3s-cordex"
+18 −3
Original line number Diff line number Diff line
@@ -45,6 +45,21 @@ Once you have configured your inventory and host/group variables, you can run th
ansible-playbook -i /my/esgf/config/inventory.ini ./deploy/ansible/playbook.yml
```

## Local test installation with Vagrant

This repository includes a [Vagrantfile](./Vagrantfile) that deploys a simple test server using the
Ansible method. This test server is configured to serve data from
[roocs/mini-esgf-data](https://github.com/roocs/mini-esgf-data).

To deploy a test server, first install [VirtualBox](https://www.virtualbox.org/) and
[Vagrant](https://www.vagrantup.com/), then run:

```sh
vagrant up
```

After waiting for the containers to start, the THREDDS interface will be available at http://192.168.100.100.nip.io/thredds.

## Configuring the installation

This section describes the most commonly modified configuration options. For a full list of available
@@ -113,7 +128,7 @@ The configuration of the datasets is done using two variables:
  * `data_datasets`: List of datasets to expose. Each item should contain the keys:
    * `name`: The human-readable name of the dataset, displayed in the THREDDS UI
    * `path`: The URL path part for the dataset
    * `location`: The directory path to the root of the dataset
    * `location`: The directory path to the root of the dataset in the container

These variables should be defined in your configuration directory using
`/my/esgf/config/group_vars/data.yml`, e.g.:
@@ -127,12 +142,12 @@ data_mounts:
    mount_path: /data

data_datasets:
  # This will expose files at /data/cmip6/[path]
  # This will expose files at /data/cmip6/[path] in the container
  # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cmip6/[path]
  - name: CMIP6
    path: esg_cmip6
    location: /data/cmip6
  # Similarly, this exposes files at /data/cordex/[path]
  # Similarly, this exposes files at /data/cordex/[path] in the container
  # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cordex/[path]
  - name: CORDEX
    path: esg_cordex
+141 −0
Original line number Diff line number Diff line
# Deploy ESGF using Kubernetes

This project provides a [Helm chart](https://helm.sh/docs/topics/charts/) to deploy ESGF resources
on a [Kubernetes](https://kubernetes.io/) cluster.

The chart is in [deploy/kubernetes/chart](../deploy/kubernetes/chart/). Please look at the files to
understand exactly what resources are being created.

For a complete list of all the variables that are available, please look at the
[values.yaml for the chart](../deploy/kubernetes/chart/values.yaml). The defaults there have extensive
comments that explain how to use these variables. This document describes how to apply some common
configurations.

## Installing/upgrading ESGF

Before attempting to install the ESGF Helm chart, you must have the following:

  * A Kubernetes cluster with an
    [Ingress Controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/) enabled
  * [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) installed and configured to talk
    to your cluster
  * [Helm](https://helm.sh/docs/intro/install/) installed

Next, make a configuration directory - this can be anywhere on your machine that is **not** under
`esgf-docker`. You can also place this directory under version control if you wish - this can be very
useful for tracking changes to the configuration, or even triggering deployments automatically when
configuration changes.

In your configuration directory, make a new YAML file called `values.yaml` and override any variables to fit
your deployment. The only required variable is `hostname`, which should be the DNS name at which your
ESGF deployment will be available:

```yaml
hostname: esgf.example.org
```

> **NOTE:** The Helm chart does not create a DNS entry for the hostname. This must be separately configured
> to point to the ingress controller for your Kubernetes cluster.

Once you have configured your `values.yaml`, you can install or upgrade ESGF using the Helm chart. If no
namespace is specified, it will use the default namespace for your `kubectl` configuration:

```sh
helm upgrade -i [-n <namespace>] -f /my/esgf/config/values.yaml --wait esgf ./deploy/kubernetes/chart
```

## Local test installation with Minikube

For local test deployments, you can use [Minikube](https://kubernetes.io/docs/setup/learning-environment/minikube/)
with data from [roocs/mini-esgf-data](https://github.com/roocs/mini-esgf-data):

```sh
# Start the minikube cluster
minikube start
# Enable the ingress addon
minikube addons enable ingress
# Install the test data
minikube ssh "curl -fsSL https://github.com/roocs/mini-esgf-data/tarball/master | sudo tar -xz --strip-components=1 -C / --wildcards */test_data"
```

Configure the chart to serve the test data (see [minikube-values.yaml](../deploy/kubernetes/minikube-values.yaml)),
using a `nip.io` domain pointing to the Minikube server:

```sh
helm install esgf ./deploy/kubernetes/chart/ \
  -f ./deploy/kubernetes/minikube-values.yaml \
  --set hostname="$(minikube ip).nip.io"
```

Once the containers have started, the THREDDS interface will be available at `http://$(minikube ip).nip.io/thredds`.

## Configuring the installation

This section describes the most commonly modified configuration options. For a full list of available
variables, please consult the chart [values.yaml](../deploy/kubernetes/chart/values.yaml).

### Setting the version

By default, the Helm chart will use the `latest` tag when specifying Docker images. For production
installations, it is recommended to use an immutable tag (see [Image tags](../README.md#image-tags)).

To set the tag to something other than `latest`, set the following variables in your `values.yaml`:

```yaml
image:
  # Use the images that were built for a particular commit
  tag: a031a2ca
  # If using an immutable tag, don't do unnecessary pulls
  pullPolicy: IfNotPresent
```

### Configuring the available datasets

The data node uses a catalog-free configuration where the available data is defined simply by a
series of datasets. For each dataset, all files under the specified path will be served using both
OPeNDAP (for NetCDF files) and plain HTTP. The browsable interface and OPeNDAP are provided by
THREDDS and direct file serving is provided by Nginx.

The configuration of the datasets is done using two variables:

  * `data.mounts`: List of volumes to mount into the container. Each item should contain the keys:
    * `mountPath`: The path to mount the volume inside the container
    * `volume`: A [Kubernetes volume specification](https://kubernetes.io/docs/concepts/storage/volumes/)
    * Any additional keys are set as options on the volume mount, e.g. `mountPropagation` for `hostPath` volumes
  * `data.datasets`: List of datasets to expose. Each item should contain the keys:
    * `name`: The human-readable name of the dataset, displayed in the THREDDS UI
    * `path`: The URL path part for the dataset
    * `location`: The directory path to the root of the dataset in the container

> **WARNING**
>
> When using `hostPath` volumes, the data must exist at the same path on all cluster hosts where the THREDDS
> or file server pods might be scheduled.
>
> If your data is on a shared filesystem, just mount the filesystem on your cluster nodes as you normally would.

These variables should be defined in your `values.yaml`, e.g.:

```yaml
data:
  mounts:
    # This uses a hostPath volume to mount /datacentre/archive on the host as /data in the container
    - mountPath: /data
      # mountPropagation is particularly important if the filesystem has automounted sub-mounts
      mountPropagation: HostToContainer
      volume:
        hostPath:
          path: /datacentre/archive

  datasets:
    # This will expose files at /data/cmip6/[path] in the container
    # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cmip6/[path]
    - name: CMIP6
      path: esg_cmip6
      location: /data/cmip6
    # Similarly, this exposes files at /data/cordex/[path] in the container
    # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cordex/[path]
    - name: CORDEX
      path: esg_cordex
      location: /data/cordex
```