Commit f04fad4a authored by Matt Pryor's avatar Matt Pryor
Browse files

Add docs for Ansible installer

parent 9d4706cd
Loading
Loading
Loading
Loading
+90 −3
Original line number Diff line number Diff line
# esgf-docker

ESGF software stack as Docker images.
This repository contains the `Dockerfile`s and associated deployment artifacts for building
and running the ESGF stack as Docker images.

## Documentation
Images are built automatically for every commit that modifies the `images` directory and pushed
to Docker Hub under the [esgfdeploy organisation](https://hub.docker.com/u/esgfdeploy).

For documentation, please visit [cedadev.github.io/esgf-docker](https://cedadev.github.io/esgf-docker).
The ESGF stack can be deployed in one of two ways:

  * Using Ansible to deploy and configure containers on specific hosts
  * Using Helm to deploy containers to a Kubernetes cluster

The Kubernetes deployment is recommended if possible, but we recognise that not all sites will
be comfortable configuring and maintaining a Kubernetes cluster. However Ansible-based deployments
will not benefit from many features provided by Kubernetes, including:

  * Zero downtime upgrades
  * Health checks providing increased resilience
  * Automatic scaling and load-balancing
  * Aggregated logging and metrics

## Current status

This project is under heavy active development, with the implementation depending on the ESGF
Future Architecture discussions.

Currently, only an unauthenticated data node is implemented. The data node uses THREDDS to serve
catalog and OPeNDAP endpoints and Nginx to serve files, using
[datasetScan elements](https://www.unidata.ucar.edu/software/tds/current/reference/DatasetScan.html)
for a catalog-free configuration. As such, it is designed to work with the next-generation publisher
being developed at LLNL that does not rely on THREDDS catalogs for publishing metadata.

## Image tags

Each image that is built for ESGF Docker is given several tags. Some of these are immutable, which
means they refer to a fixed version of the image for all time, and some are mutable which means
that the underlying image will change over time.

ESGF Docker will apply the following tags when building images:

  * Mutable tags
    * `latest`: the latest build for the `master` branch
    * `<slugified-branch-name>`: the latest build for the given branch name, as a slug, e.g.
      for the branch `issue/112/nginx-data-node` use `issue-112-nginx-data-node`
  * Immutable tags
    * The short Git hash for the commit that triggered the build, e.g. `d65ca162`, `a031a2ca`
    * The tag name for any tagged releases

By default, both the Ansible and Kubernetes installations use the `latest` tag when specifying
Docker images, which is a mutable tag.

For production installations it is recommended to use an immutable tag, either for a tagged
release or a particular commit, in order to avoid unexpected code changes or differences in
the container image between load-balanced nodes.

You can check the [available tags on Docker Hub](https://hub.docker.com/r/esgfdeploy/thredds/tags).
All the ESGF Docker images are built together, so any given tag will always be available for all
images.

## Making a deployment

Whether deploying ESGF using Kubernetes or Ansible, the first step is to clone the repository:

```sh
git clone https://github.com/ESGF/esgf-docker.git
cd esgf-docker
```

These changes have not yet been committed to `master`, so you will need to check out the development branch:

```sh
git checkout issue/112/nginx-data-node
```

Then follow the deployment guide for your chosen deployment method:

  * [Deploy ESGF using Ansible](./docs/deploy-ansible.md)
  * [Deploy ESGF to Kubernetes using Helm](./docs/deploy-kubernetes.md)

## Test server using Vagrant

This repository includes a [Vagrantfile](./Vagrantfile) that deploys a simple test server using the
Ansible method. This test server is configured to serve data from
[roocs/mini-esgf-data](https://github.com/roocs/mini-esgf-data).

To deploy a test server, first install [VirtualBox](https://www.virtualbox.org/) and
[Vagrant](https://www.vagrantup.com/), then run:

```sh
vagrant up
```

After waiting for the containers to start, the THREDDS interface will be available at http://192.168.100.100.nip.io/thredds.

docs/deploy-ansible.md

0 → 100644
+140 −0
Original line number Diff line number Diff line
# Deploy ESGF using Ansible

This project provides an [Ansible playbook](https://docs.ansible.com/ansible/latest/index.html)
that will place [Docker containers](https://www.docker.com/) onto specific hosts.

The playbook and associated roles and variables are in [deploy/ansible/](../deploy/ansible/). Please look at
these files to understand exactly what the playbook is doing.

For a complete list of all variables that are available, please look at the defaults for each
of the [playbook roles](../deploy/ansible/roles/). The defaults have extensive comments that
explain how to use these variables. This document describes how to apply some common
configurations.

## Running the playbook

Before attempting to run the playbook, make sure that you have
[installed Ansible](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html).

Next, make a configuration directory - this can be anywhere on your machine that is **not** under
`esgf-docker`. You can also place this directory under version control if you wish - this can be very
useful for tracking changes to the configuration, or even triggering deployments automatically when
configuration changes.

In your configuration directory, make an
[inventory file](https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html)
defining the hosts that you want to deploy to:

```ini
# /my/esgf/config/inventory.ini

[data]
esgf.example.org
```

Currently, ESGF deployments only respect the `data` group. Hosts in this group will be deployed as data nodes.

Variables can be overridden on a per-group or per-host basis by placing YAML files at
`/my/esgf/config/group_vars/[group name].yaml` or `/my/esgf/config/host_vars/[host name].yml`. See below
for some common examples, and consult the [role defaults](../deploy/ansible/roles/) for a complete list
of available variables.

Once you have configured your inventory and host/group variables, you can run the playbook:

```sh
ansible-playbook -i /my/esgf/config/inventory.ini ./deploy/ansible/playbook.yml
```

## Configuring the installation

This section describes the most commonly modified configuration options. For a full list of available
variables, please consult the playbook [role defaults](../deploy/ansible/roles/).

### Setting the version

By default, the Ansible playbook will use the `latest` tag when specifying Docker images. For production
installations, it is recommended to use an immutable tag (see [Image tags](../README.md#image-tags)).

To set the tag to something other than `latest`, create a file at `/my/esgf/config/group_vars/all.yml`:

```yaml
# /my/esgf/config/group_vars/all.yml

image_defaults:
  # Use the images that were built for a particular commit
  tag: a031a2ca
  # If using an immutable tag, don't do unnecessary pulls
  pull: false
```

### Setting the web address

By default, the web address is the FQDN of the host (i.e. the output of `hostname --fqdn`). This can
be changed on a host-by-host basis using the variable `hostname`. For convenience, this can be set directly
in the inventory file:

```ini
# /my/esgf/config/inventory.ini

[data]
esgf-data01.example.org  hostname=esgf-data.example.org
```

It is even possible to provision multiple hosts with the same `hostname` and use DNS load-balancing to
distribute the load across those hosts:

```ini
# /my/esgf/config/inventory.ini

[data]
esgf-data[01:10].example.org  hostname=esgf-data.example.org

# Or ....
esgf-data01.example.org  hostname=esgf-data.example.org
esgf-data02.example.org  hostname=esgf-data.example.org
```

The Ansible playbook does **not** configure the DNS load-balancing automatically - you will need to separately
configure [Round-robin DNS](https://en.wikipedia.org/wiki/Round-robin_DNS) or a more sophisticated service like
[AWS Route 53](https://aws.amazon.com/route53/) to do this.

### Configuring the available datasets

The Docker-based data node uses a catalog-free configuration to serve data - the available data is defined simply
by a series of datasets, under which all files will be served using both OPeNDAP (for NetCDF files) and plain
HTTP. The browsable interface and OPeNDAP are provided by THREDDS and, direct file serving is provided by Nginx.

The configuration of the datasets is done using two variables:

  * `data.mounts`: List of directories to mount from the host into the container. Each item should contain
    the keys:
    * `hostPath`: The path on the host
    * `mountPath`: The path in the container
  * `data.datasets`: List of datasets to expose via THREDDS/Nginx. Each item should contain the keys:
    * `name`: The human-readable name of the dataset, displayed in the THREDDS UI
    * `path`: The URL path part for the dataset
    * `location`: The directory path to the root of the dataset

These variables should be defined in your configuration directory using `/my/esgf/config/group_vars/data.yml`, e.g.:

```yaml
# /my/esgf/config/group_vars/data.yml

data:
  mounts:
    # This will mount /datacentre/archive on the host as /data in the containers
    - hostPath: /datacentre/archive
      mountPath: /data

  datasets:
    # This will expose files at /data/cmip6/[path]
    # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cmip6/[path]
    - name: CMIP6
      path: esg_cmip6
      location: /data/cmip6
    # Similarly, this exposes files at /data/cordex/[path]
    # as http://esgf-data.example.org/thredds/{dodsC,fileServer}/esg_cordex/[path]
    - name: CORDEX
      path: esg_cordex
      location: /data/cordex
```