Loading README.md +90 −3 Original line number Diff line number Diff line # esgf-docker ESGF software stack as Docker images. This repository contains the `Dockerfile`s and associated deployment artifacts for building and running the ESGF stack as Docker images. ## Documentation Images are built automatically for every commit that modifies the `images` directory and pushed to Docker Hub under the [esgfdeploy organisation](https://hub.docker.com/u/esgfdeploy). For documentation, please visit [cedadev.github.io/esgf-docker](https://cedadev.github.io/esgf-docker). The ESGF stack can be deployed in one of two ways: * Using Ansible to deploy and configure containers on specific hosts * Using Helm to deploy containers to a Kubernetes cluster The Kubernetes deployment is recommended if possible, but we recognise that not all sites will be comfortable configuring and maintaining a Kubernetes cluster. However Ansible-based deployments will not benefit from many features provided by Kubernetes, including: * Zero downtime upgrades * Health checks providing increased resilience * Automatic scaling and load-balancing * Aggregated logging and metrics ## Current status This project is under heavy active development, with the implementation depending on the ESGF Future Architecture discussions. Currently, only an unauthenticated data node is implemented. The data node uses THREDDS to serve catalog and OPeNDAP endpoints and Nginx to serve files, using [datasetScan elements](https://www.unidata.ucar.edu/software/tds/current/reference/DatasetScan.html) for a catalog-free configuration. As such, it is designed to work with the next-generation publisher being developed at LLNL that does not rely on THREDDS catalogs for publishing metadata. ## Image tags Each image that is built for ESGF Docker is given several tags. Some of these are immutable, which means they refer to a fixed version of the image for all time, and some are mutable which means that the underlying image will change over time. ESGF Docker will apply the following tags when building images: * Mutable tags * `latest`: the latest build for the `master` branch * `<slugified-branch-name>`: the latest build for the given branch name, as a slug, e.g. for the branch `issue/112/nginx-data-node` use `issue-112-nginx-data-node` * Immutable tags * The short Git hash for the commit that triggered the build, e.g. `d65ca162`, `a031a2ca` * The tag name for any tagged releases By default, both the Ansible and Kubernetes installations use the `latest` tag when specifying Docker images, which is a mutable tag. For production installations it is recommended to use an immutable tag, either for a tagged release or a particular commit, in order to avoid unexpected code changes or differences in the container image between load-balanced nodes. You can check the [available tags on Docker Hub](https://hub.docker.com/r/esgfdeploy/thredds/tags). All the ESGF Docker images are built together, so any given tag will always be available for all images. ## Making a deployment Whether deploying ESGF using Kubernetes or Ansible, the first step is to clone the repository: ```sh git clone https://github.com/ESGF/esgf-docker.git cd esgf-docker ``` These changes have not yet been committed to `master`, so you will need to check out the development branch: ```sh git checkout issue/112/nginx-data-node ``` Then follow the deployment guide for your chosen deployment method: * [Deploy ESGF using Ansible](./docs/deploy-ansible.md) * [Deploy ESGF to Kubernetes using Helm](./docs/deploy-kubernetes.md) ## Test server using Vagrant This repository includes a [Vagrantfile](./Vagrantfile) that deploys a simple test server using the Ansible method. This test server is configured to serve data from [roocs/mini-esgf-data](https://github.com/roocs/mini-esgf-data). To deploy a test server, first install [VirtualBox](https://www.virtualbox.org/) and [Vagrant](https://www.vagrantup.com/), then run: ```sh vagrant up ``` After waiting for the containers to start, the THREDDS interface will be available at http://192.168.100.100.nip.io/thredds. docs/deploy-ansible.md 0 → 100644 +140 −0 Original line number Diff line number Diff line # Deploy ESGF using Ansible This project provides an [Ansible playbook](https://docs.ansible.com/ansible/latest/index.html) that will place [Docker containers](https://www.docker.com/) onto specific hosts. The playbook and associated roles and variables are in [deploy/ansible/](../deploy/ansible/). Please look at these files to understand exactly what the playbook is doing. For a complete list of all variables that are available, please look at the defaults for each of the [playbook roles](../deploy/ansible/roles/). The defaults have extensive comments that explain how to use these variables. This document describes how to apply some common configurations. ## Running the playbook Before attempting to run the playbook, make sure that you have [installed Ansible](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html). Next, make a configuration directory - this can be anywhere on your machine that is **not** under `esgf-docker`. You can also place this directory under version control if you wish - this can be very useful for tracking changes to the configuration, or even triggering deployments automatically when configuration changes. In your configuration directory, make an [inventory file](https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html) defining the hosts that you want to deploy to: ```ini # /my/esgf/config/inventory.ini [data] esgf.example.org ``` Currently, ESGF deployments only respect the `data` group. Hosts in this group will be deployed as data nodes. Variables can be overridden on a per-group or per-host basis by placing YAML files at `/my/esgf/config/group_vars/[group name].yaml` or `/my/esgf/config/host_vars/[host name].yml`. See below for some common examples, and consult the [role defaults](../deploy/ansible/roles/) for a complete list of available variables. Once you have configured your inventory and host/group variables, you can run the playbook: ```sh ansible-playbook -i /my/esgf/config/inventory.ini ./deploy/ansible/playbook.yml ``` ## Configuring the installation This section describes the most commonly modified configuration options. For a full list of available variables, please consult the playbook [role defaults](../deploy/ansible/roles/). ### Setting the version By default, the Ansible playbook will use the `latest` tag when specifying Docker images. For production installations, it is recommended to use an immutable tag (see [Image tags](../README.md#image-tags)). To set the tag to something other than `latest`, create a file at `/my/esgf/config/group_vars/all.yml`: ```yaml # /my/esgf/config/group_vars/all.yml image_defaults: # Use the images that were built for a particular commit tag: a031a2ca # If using an immutable tag, don't do unnecessary pulls pull: false ``` ### Setting the web address By default, the web address is the FQDN of the host (i.e. the output of `hostname --fqdn`). This can be changed on a host-by-host basis using the variable `hostname`. For convenience, this can be set directly in the inventory file: ```ini # /my/esgf/config/inventory.ini [data] esgf-data01.example.org hostname=esgf-data.example.org ``` It is even possible to provision multiple hosts with the same `hostname` and use DNS load-balancing to distribute the load across those hosts: ```ini # /my/esgf/config/inventory.ini [data] esgf-data[01:10].example.org hostname=esgf-data.example.org # Or .... esgf-data01.example.org hostname=esgf-data.example.org esgf-data02.example.org hostname=esgf-data.example.org ``` The Ansible playbook does **not** configure the DNS load-balancing automatically - you will need to separately configure [Round-robin DNS](https://en.wikipedia.org/wiki/Round-robin_DNS) or a more sophisticated service like [AWS Route 53](https://aws.amazon.com/route53/) to do this. ### Configuring the available datasets The Docker-based data node uses a catalog-free configuration to serve data - the available data is defined simply by a series of datasets, under which all files will be served using both OPeNDAP (for NetCDF files) and plain HTTP. The browsable interface and OPeNDAP are provided by THREDDS and, direct file serving is provided by Nginx. The configuration of the datasets is done using two variables: * `data.mounts`: List of directories to mount from the host into the container. Each item should contain the keys: * `hostPath`: The path on the host * `mountPath`: The path in the container * `data.datasets`: List of datasets to expose via THREDDS/Nginx. Each item should contain the keys: * `name`: The human-readable name of the dataset, displayed in the THREDDS UI * `path`: The URL path part for the dataset * `location`: The directory path to the root of the dataset These variables should be defined in your configuration directory using `/my/esgf/config/group_vars/data.yml`, e.g.: ```yaml # /my/esgf/config/group_vars/data.yml data: mounts: # This will mount /datacentre/archive on the host as /data in the containers - hostPath: /datacentre/archive mountPath: /data datasets: # This will expose files at /data/cmip6/[path] # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cmip6/[path] - name: CMIP6 path: esg_cmip6 location: /data/cmip6 # Similarly, this exposes files at /data/cordex/[path] # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cordex/[path] - name: CORDEX path: esg_cordex location: /data/cordex ``` Loading
README.md +90 −3 Original line number Diff line number Diff line # esgf-docker ESGF software stack as Docker images. This repository contains the `Dockerfile`s and associated deployment artifacts for building and running the ESGF stack as Docker images. ## Documentation Images are built automatically for every commit that modifies the `images` directory and pushed to Docker Hub under the [esgfdeploy organisation](https://hub.docker.com/u/esgfdeploy). For documentation, please visit [cedadev.github.io/esgf-docker](https://cedadev.github.io/esgf-docker). The ESGF stack can be deployed in one of two ways: * Using Ansible to deploy and configure containers on specific hosts * Using Helm to deploy containers to a Kubernetes cluster The Kubernetes deployment is recommended if possible, but we recognise that not all sites will be comfortable configuring and maintaining a Kubernetes cluster. However Ansible-based deployments will not benefit from many features provided by Kubernetes, including: * Zero downtime upgrades * Health checks providing increased resilience * Automatic scaling and load-balancing * Aggregated logging and metrics ## Current status This project is under heavy active development, with the implementation depending on the ESGF Future Architecture discussions. Currently, only an unauthenticated data node is implemented. The data node uses THREDDS to serve catalog and OPeNDAP endpoints and Nginx to serve files, using [datasetScan elements](https://www.unidata.ucar.edu/software/tds/current/reference/DatasetScan.html) for a catalog-free configuration. As such, it is designed to work with the next-generation publisher being developed at LLNL that does not rely on THREDDS catalogs for publishing metadata. ## Image tags Each image that is built for ESGF Docker is given several tags. Some of these are immutable, which means they refer to a fixed version of the image for all time, and some are mutable which means that the underlying image will change over time. ESGF Docker will apply the following tags when building images: * Mutable tags * `latest`: the latest build for the `master` branch * `<slugified-branch-name>`: the latest build for the given branch name, as a slug, e.g. for the branch `issue/112/nginx-data-node` use `issue-112-nginx-data-node` * Immutable tags * The short Git hash for the commit that triggered the build, e.g. `d65ca162`, `a031a2ca` * The tag name for any tagged releases By default, both the Ansible and Kubernetes installations use the `latest` tag when specifying Docker images, which is a mutable tag. For production installations it is recommended to use an immutable tag, either for a tagged release or a particular commit, in order to avoid unexpected code changes or differences in the container image between load-balanced nodes. You can check the [available tags on Docker Hub](https://hub.docker.com/r/esgfdeploy/thredds/tags). All the ESGF Docker images are built together, so any given tag will always be available for all images. ## Making a deployment Whether deploying ESGF using Kubernetes or Ansible, the first step is to clone the repository: ```sh git clone https://github.com/ESGF/esgf-docker.git cd esgf-docker ``` These changes have not yet been committed to `master`, so you will need to check out the development branch: ```sh git checkout issue/112/nginx-data-node ``` Then follow the deployment guide for your chosen deployment method: * [Deploy ESGF using Ansible](./docs/deploy-ansible.md) * [Deploy ESGF to Kubernetes using Helm](./docs/deploy-kubernetes.md) ## Test server using Vagrant This repository includes a [Vagrantfile](./Vagrantfile) that deploys a simple test server using the Ansible method. This test server is configured to serve data from [roocs/mini-esgf-data](https://github.com/roocs/mini-esgf-data). To deploy a test server, first install [VirtualBox](https://www.virtualbox.org/) and [Vagrant](https://www.vagrantup.com/), then run: ```sh vagrant up ``` After waiting for the containers to start, the THREDDS interface will be available at http://192.168.100.100.nip.io/thredds.
docs/deploy-ansible.md 0 → 100644 +140 −0 Original line number Diff line number Diff line # Deploy ESGF using Ansible This project provides an [Ansible playbook](https://docs.ansible.com/ansible/latest/index.html) that will place [Docker containers](https://www.docker.com/) onto specific hosts. The playbook and associated roles and variables are in [deploy/ansible/](../deploy/ansible/). Please look at these files to understand exactly what the playbook is doing. For a complete list of all variables that are available, please look at the defaults for each of the [playbook roles](../deploy/ansible/roles/). The defaults have extensive comments that explain how to use these variables. This document describes how to apply some common configurations. ## Running the playbook Before attempting to run the playbook, make sure that you have [installed Ansible](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html). Next, make a configuration directory - this can be anywhere on your machine that is **not** under `esgf-docker`. You can also place this directory under version control if you wish - this can be very useful for tracking changes to the configuration, or even triggering deployments automatically when configuration changes. In your configuration directory, make an [inventory file](https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html) defining the hosts that you want to deploy to: ```ini # /my/esgf/config/inventory.ini [data] esgf.example.org ``` Currently, ESGF deployments only respect the `data` group. Hosts in this group will be deployed as data nodes. Variables can be overridden on a per-group or per-host basis by placing YAML files at `/my/esgf/config/group_vars/[group name].yaml` or `/my/esgf/config/host_vars/[host name].yml`. See below for some common examples, and consult the [role defaults](../deploy/ansible/roles/) for a complete list of available variables. Once you have configured your inventory and host/group variables, you can run the playbook: ```sh ansible-playbook -i /my/esgf/config/inventory.ini ./deploy/ansible/playbook.yml ``` ## Configuring the installation This section describes the most commonly modified configuration options. For a full list of available variables, please consult the playbook [role defaults](../deploy/ansible/roles/). ### Setting the version By default, the Ansible playbook will use the `latest` tag when specifying Docker images. For production installations, it is recommended to use an immutable tag (see [Image tags](../README.md#image-tags)). To set the tag to something other than `latest`, create a file at `/my/esgf/config/group_vars/all.yml`: ```yaml # /my/esgf/config/group_vars/all.yml image_defaults: # Use the images that were built for a particular commit tag: a031a2ca # If using an immutable tag, don't do unnecessary pulls pull: false ``` ### Setting the web address By default, the web address is the FQDN of the host (i.e. the output of `hostname --fqdn`). This can be changed on a host-by-host basis using the variable `hostname`. For convenience, this can be set directly in the inventory file: ```ini # /my/esgf/config/inventory.ini [data] esgf-data01.example.org hostname=esgf-data.example.org ``` It is even possible to provision multiple hosts with the same `hostname` and use DNS load-balancing to distribute the load across those hosts: ```ini # /my/esgf/config/inventory.ini [data] esgf-data[01:10].example.org hostname=esgf-data.example.org # Or .... esgf-data01.example.org hostname=esgf-data.example.org esgf-data02.example.org hostname=esgf-data.example.org ``` The Ansible playbook does **not** configure the DNS load-balancing automatically - you will need to separately configure [Round-robin DNS](https://en.wikipedia.org/wiki/Round-robin_DNS) or a more sophisticated service like [AWS Route 53](https://aws.amazon.com/route53/) to do this. ### Configuring the available datasets The Docker-based data node uses a catalog-free configuration to serve data - the available data is defined simply by a series of datasets, under which all files will be served using both OPeNDAP (for NetCDF files) and plain HTTP. The browsable interface and OPeNDAP are provided by THREDDS and, direct file serving is provided by Nginx. The configuration of the datasets is done using two variables: * `data.mounts`: List of directories to mount from the host into the container. Each item should contain the keys: * `hostPath`: The path on the host * `mountPath`: The path in the container * `data.datasets`: List of datasets to expose via THREDDS/Nginx. Each item should contain the keys: * `name`: The human-readable name of the dataset, displayed in the THREDDS UI * `path`: The URL path part for the dataset * `location`: The directory path to the root of the dataset These variables should be defined in your configuration directory using `/my/esgf/config/group_vars/data.yml`, e.g.: ```yaml # /my/esgf/config/group_vars/data.yml data: mounts: # This will mount /datacentre/archive on the host as /data in the containers - hostPath: /datacentre/archive mountPath: /data datasets: # This will expose files at /data/cmip6/[path] # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cmip6/[path] - name: CMIP6 path: esg_cmip6 location: /data/cmip6 # Similarly, this exposes files at /data/cordex/[path] # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cordex/[path] - name: CORDEX path: esg_cordex location: /data/cordex ```