Documentation for Kubernetes deployment (612f831d) · Commits · ESGF / mirrors / Esgf Docker

README.md

+0 −15

Original line number	Diff line number	Diff line
		@@ -77,18 +77,3 @@ Then follow the deployment guide for your chosen deployment method:

		* [Deploy ESGF using Ansible](./docs/deploy-ansible.md)
		* [Deploy ESGF to Kubernetes using Helm](./docs/deploy-kubernetes.md)

		## Test server using Vagrant

		This repository includes a [Vagrantfile](./Vagrantfile) that deploys a simple test server using the
		Ansible method. This test server is configured to serve data from
		[roocs/mini-esgf-data](https://github.com/roocs/mini-esgf-data).

		To deploy a test server, first install [VirtualBox](https://www.virtualbox.org/) and
		[Vagrant](https://www.vagrantup.com/), then run:

		```sh
		vagrant up
		```

		After waiting for the containers to start, the THREDDS interface will be available at http://192.168.100.100.nip.io/thredds.

deploy/kubernetes/chart/values.yaml

+54 −12

Original line number	Diff line number	Diff line
		# The hostname for the deployment
		hostname:

		###
		# Image defaults
		###
		# All image properties can be overridden on a per-service basis
		image:
		# The image repository prefix
		# The image prefix to use
		# If using a private registry, change this, e.g. registry.ceda.ac.uk/esgfdeploy
		prefix: esgfdeploy
		# The tag to use
		tag: latest
		# The image pull policy
		# Indicates whether images should be pulled every time a pod starts
		# When using mutable tags, like latest or branch names, this should be Always
		# When using immutable tags, like commit shas or release tags, this should be IfNotPresent
		pullPolicy: Always
		# A list of names of existing secrets providing Docker registry credentials
		# Required if using a private registry that requires authentication
		# See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
		pullSecrets:

		###
		# Ingress configuration
		###
		ingress:
		# The annotations for the ingress
		# Depending on your Kubernetes cluster, this can be used to configure things like Let's Encrypt certificates
		annotations:
		# TLS configuration
		tls:
		# Either give an existing secret name
		# Either give the name of an existing TLS secret
		# See https://kubernetes.io/docs/concepts/services-networking/ingress/#tls
		secretName:
		# Or provide PEM-encoded certificate (including chain) and key files
		# Or provide a PEM-encoded certificate (including chain) and key as variables
		pem:
		cert:
		key:
		# If neither are given, then a self-signed certificate is generated

		###
		# Data node configuration
		###
		data:
		# The mounts that are required to serve data, as defined by the given datasets
		#
		# Each specified mount should include at least the following keys:
		#
		# mountPath: The path to mount the volume inside the container
		# volume: A Kubernetes volume specification
		# volume: A Kubernetes volume specification - see https://kubernetes.io/docs/concepts/storage/volumes/
		#
		# Any additional keys are set as options on the volume mount, e.g. mountPropagation for hostPath volumes
		mounts: []
		@@ -52,9 +71,12 @@ data:
		# path: esg_dataroot
		# location: /datacentre/archiveroots/archive/badc/cmip5/data

		# The pod and container security contexts for the THREDDS and Nginx pods
		# These may especially be required if using hostPath volumes for data, depending
		# The pod and container security contexts for data serving pods
		# In particular, these may be required if using hostPath volumes for data, depending
		# on the permissions of that data
		# See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
		# WARNING: Due to permissions set inside the container, the user must belong to group 1000
		# in addition to the groups required to access data
		podSecurityContext: {}
		securityContext: {}

		@@ -68,16 +90,36 @@ data:
		# The number of replicas for the THREDDS pod
		replicaCount: 1
		# The resource allocations for the THREDDS container
		# See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
		resources: {}
		# The node selector for the THREDDS pod
		# See https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
		nodeSelector:
		# The affinity rules for the THREDDS pod
		# See https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
		affinity:
		# The tolerations for the THREDDS pod
		# See https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
		tolerations:

		# Configuration for the Nginx file server pod
		# Configuration for the file server pod
		fileServer:
		# Indicates if the Nginx file server should be deployed or not
		# Indicates if the file server should be deployed or not
		enabled: true
		# Image overrides for the Nginx image
		# Image overrides for the file server image
		image:
		repository: nginx
		# The number of replicas for the Nginx file server pod
		# The number of replicas for the file server pod
		replicaCount: 1
		# The resource allocations for the Nginx file server container
		# The resource allocations for the file server container
		# See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
		resources: {}
		# The node selector for the file server pod
		# See https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
		nodeSelector:
		# The affinity rules for the file server pod
		# See https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
		affinity:
		# The tolerations for the file server pod
		# See https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
		tolerations:

deploy/kubernetes/minikube-values.yaml

0 → 100644

+20 −0

Original line number	Diff line number	Diff line
		#####
		# Values for running in Minikube with test data from roocs/mini-esgf-data
		#####

		data:
		# Mount the /test_data volume on the host as /test_data in the container
		mounts:
		- mountPath: /test_data
		volume:
		hostPath:
		path: /test_data

		# Configure the datasets in the test data
		datasets:
		- name: "CMIP5"
		path: "esg_cmip5"
		location: "/test_data/badc/cmip5/data"
		- name: "CORDEX"
		path: "esg_cordex"
		location: "/test_data/group_workspaces/jasmin2/cp4cds1/data/c3s-cordex"

docs/deploy-ansible.md

+18 −3

Original line number	Diff line number	Diff line
		@@ -45,6 +45,21 @@ Once you have configured your inventory and host/group variables, you can run th
		ansible-playbook -i /my/esgf/config/inventory.ini ./deploy/ansible/playbook.yml
		```

		## Local test installation with Vagrant

		This repository includes a [Vagrantfile](./Vagrantfile) that deploys a simple test server using the
		Ansible method. This test server is configured to serve data from
		[roocs/mini-esgf-data](https://github.com/roocs/mini-esgf-data).

		To deploy a test server, first install [VirtualBox](https://www.virtualbox.org/) and
		[Vagrant](https://www.vagrantup.com/), then run:

		```sh
		vagrant up
		```

		After waiting for the containers to start, the THREDDS interface will be available at http://192.168.100.100.nip.io/thredds.

		## Configuring the installation

		This section describes the most commonly modified configuration options. For a full list of available
		@@ -113,7 +128,7 @@ The configuration of the datasets is done using two variables:
		* `data_datasets`: List of datasets to expose. Each item should contain the keys:
		* `name`: The human-readable name of the dataset, displayed in the THREDDS UI
		* `path`: The URL path part for the dataset
		* `location`: The directory path to the root of the dataset
		* `location`: The directory path to the root of the dataset in the container

		These variables should be defined in your configuration directory using
		`/my/esgf/config/group_vars/data.yml`, e.g.:
		@@ -127,12 +142,12 @@ data_mounts:
		mount_path: /data

		data_datasets:
		# This will expose files at /data/cmip6/[path]
		# This will expose files at /data/cmip6/[path] in the container
		# as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cmip6/[path]
		- name: CMIP6
		path: esg_cmip6
		location: /data/cmip6
		# Similarly, this exposes files at /data/cordex/[path]
		# Similarly, this exposes files at /data/cordex/[path] in the container
		# as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cordex/[path]
		- name: CORDEX
		path: esg_cordex

docs/deploy-kubernetes.md

0 → 100644

+141 −0

Original line number	Diff line number	Diff line
		# Deploy ESGF using Kubernetes

		This project provides a [Helm chart](https://helm.sh/docs/topics/charts/) to deploy ESGF resources
		on a [Kubernetes](https://kubernetes.io/) cluster.

		The chart is in [deploy/kubernetes/chart](../deploy/kubernetes/chart/). Please look at the files to
		understand exactly what resources are being created.

		For a complete list of all the variables that are available, please look at the
		[values.yaml for the chart](../deploy/kubernetes/chart/values.yaml). The defaults there have extensive
		comments that explain how to use these variables. This document describes how to apply some common
		configurations.

		## Installing/upgrading ESGF

		Before attempting to install the ESGF Helm chart, you must have the following:

		* A Kubernetes cluster with an
		[Ingress Controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/) enabled
		* [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) installed and configured to talk
		to your cluster
		* [Helm](https://helm.sh/docs/intro/install/) installed

		Next, make a configuration directory - this can be anywhere on your machine that is not under
		`esgf-docker`. You can also place this directory under version control if you wish - this can be very
		useful for tracking changes to the configuration, or even triggering deployments automatically when
		configuration changes.

		In your configuration directory, make a new YAML file called `values.yaml` and override any variables to fit
		your deployment. The only required variable is `hostname`, which should be the DNS name at which your
		ESGF deployment will be available:

		```yaml
		hostname: esgf.example.org
		```

		> NOTE: The Helm chart does not create a DNS entry for the hostname. This must be separately configured
		> to point to the ingress controller for your Kubernetes cluster.

		Once you have configured your `values.yaml`, you can install or upgrade ESGF using the Helm chart. If no
		namespace is specified, it will use the default namespace for your `kubectl` configuration:

		```sh
		helm upgrade -i [-n <namespace>] -f /my/esgf/config/values.yaml --wait esgf ./deploy/kubernetes/chart
		```

		## Local test installation with Minikube

		For local test deployments, you can use [Minikube](https://kubernetes.io/docs/setup/learning-environment/minikube/)
		with data from [roocs/mini-esgf-data](https://github.com/roocs/mini-esgf-data):

		```sh
		# Start the minikube cluster
		minikube start
		# Enable the ingress addon
		minikube addons enable ingress
		# Install the test data
		minikube ssh "curl -fsSL https://github.com/roocs/mini-esgf-data/tarball/master \| sudo tar -xz --strip-components=1 -C / --wildcards */test_data"
		```

		Configure the chart to serve the test data (see [minikube-values.yaml](../deploy/kubernetes/minikube-values.yaml)),
		using a `nip.io` domain pointing to the Minikube server:

		```sh
		helm install esgf ./deploy/kubernetes/chart/ \
		-f ./deploy/kubernetes/minikube-values.yaml \
		--set hostname="$(minikube ip).nip.io"
		```

		Once the containers have started, the THREDDS interface will be available at `http://$(minikube ip).nip.io/thredds`.

		## Configuring the installation

		This section describes the most commonly modified configuration options. For a full list of available
		variables, please consult the chart [values.yaml](../deploy/kubernetes/chart/values.yaml).

		### Setting the version

		By default, the Helm chart will use the `latest` tag when specifying Docker images. For production
		installations, it is recommended to use an immutable tag (see [Image tags](../README.md#image-tags)).

		To set the tag to something other than `latest`, set the following variables in your `values.yaml`:

		```yaml
		image:
		# Use the images that were built for a particular commit
		tag: a031a2ca
		# If using an immutable tag, don't do unnecessary pulls
		pullPolicy: IfNotPresent
		```

		### Configuring the available datasets

		The data node uses a catalog-free configuration where the available data is defined simply by a
		series of datasets. For each dataset, all files under the specified path will be served using both
		OPeNDAP (for NetCDF files) and plain HTTP. The browsable interface and OPeNDAP are provided by
		THREDDS and direct file serving is provided by Nginx.

		The configuration of the datasets is done using two variables:

		* `data.mounts`: List of volumes to mount into the container. Each item should contain the keys:
		* `mountPath`: The path to mount the volume inside the container
		* `volume`: A [Kubernetes volume specification](https://kubernetes.io/docs/concepts/storage/volumes/)
		* Any additional keys are set as options on the volume mount, e.g. `mountPropagation` for `hostPath` volumes
		* `data.datasets`: List of datasets to expose. Each item should contain the keys:
		* `name`: The human-readable name of the dataset, displayed in the THREDDS UI
		* `path`: The URL path part for the dataset
		* `location`: The directory path to the root of the dataset in the container

		> WARNING
		>
		> When using `hostPath` volumes, the data must exist at the same path on all cluster hosts where the THREDDS
		> or file server pods might be scheduled.
		>
		> If your data is on a shared filesystem, just mount the filesystem on your cluster nodes as you normally would.

		These variables should be defined in your `values.yaml`, e.g.:

		```yaml
		data:
		mounts:
		# This uses a hostPath volume to mount /datacentre/archive on the host as /data in the container
		- mountPath: /data
		# mountPropagation is particularly important if the filesystem has automounted sub-mounts
		mountPropagation: HostToContainer
		volume:
		hostPath:
		path: /datacentre/archive

		datasets:
		# This will expose files at /data/cmip6/[path] in the container
		# as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cmip6/[path]
		- name: CMIP6
		path: esg_cmip6
		location: /data/cmip6
		# Similarly, this exposes files at /data/cordex/[path] in the container
		# as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cordex/[path]
		- name: CORDEX
		path: esg_cordex
		location: /data/cordex
		```