Commit 25a268fc authored by Matt Pryor's avatar Matt Pryor
Browse files

Add docs for Ansible index node

parent 4c995963
Loading
Loading
Loading
Loading
+118 −12
Original line number Diff line number Diff line
@@ -3,8 +3,8 @@
This project provides an [Ansible playbook](https://docs.ansible.com/ansible/latest/index.html)
that will place [Docker containers](https://www.docker.com/) onto specific hosts.

The playbook and associated roles and variables are in [deploy/ansible/](../deploy/ansible/). Please look at
these files to understand exactly what the playbook is doing.
The playbook and associated roles and variables are in [deploy/ansible/](../deploy/ansible/).
Please look at these files to understand exactly what the playbook is doing.

For a complete list of all variables that are available, please look at the defaults for each
of the [playbook roles](../deploy/ansible/roles/). The defaults have extensive comments that
@@ -16,10 +16,13 @@ configurations.
- [Running the playbook](#running-the-playbook)
- [Local test installation with Vagrant](#local-test-installation-with-vagrant)
- [Configuring the installation](#configuring-the-installation)
    - [Setting the version](#setting-the-version)
    - [Setting the image version](#setting-the-image-version)
    - [Setting the web address](#setting-the-web-address)
    - [Enabling and disabling components](#enabling-and-disabling-components)
    - [Configuring the available datasets](#configuring-the-available-datasets)
    - [Using existing THREDDS catalogs](#using-existing-thredds-catalogs)
    - [Configuring Solr replicas](#configuring-solr-replicas)
    - [Using external Solr instances](#using-external-solr-instances)

<!-- /TOC -->

@@ -31,7 +34,9 @@ Before attempting to run the playbook, make sure that you have
Next, make a configuration directory - this can be anywhere on your machine that is **not** under
`esgf-docker`. You can also place this directory under version control if you wish - this can be very
useful for tracking changes to the configuration, or even triggering deployments automatically when
configuration changes.
configuration changes. If you do, make sure not to commit any plain-text secrets to your
version control repository (e.g. by using an encryption tool such as
[Ansible Vault](https://docs.ansible.com/ansible/latest/user_guide/vault.html)).

In your configuration directory, make an
[inventory file](https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html)
@@ -42,14 +47,19 @@ defining the hosts that you want to deploy to:

[data]
esgf.example.org

[index]
esgf.example.org
```

Currently, ESGF deployments only respect the `data` group. Hosts in this group will be deployed as data nodes.
Currently, ESGF deployments respect the `data` and `index` groups. Hosts in these groups will be
deployed as data and/or index nodes respectively. A host can be in both groups, and will be deployed
as a combined data and index node.

Variables can be overridden on a per-group or per-host basis by placing YAML files at
`/my/esgf/config/group_vars/[group name].yaml` or `/my/esgf/config/host_vars/[host name].yml`. See below
for some common examples, and consult the [role defaults](../deploy/ansible/roles/) for a complete list
of available variables.
Variables can be overridden on a per-group or per-host basis by placing YAML files in your
configuration directory at `/my/esgf/config/group_vars/[group name].yaml` or
`/my/esgf/config/host_vars/[host name].yml`. See below for some common examples, and consult the
[role defaults](../deploy/ansible/roles/) for a complete list of available variables.

Once you have configured your inventory and host/group variables, you can run the playbook:

@@ -77,7 +87,7 @@ After waiting for the containers to start, the THREDDS interface will be availab
This section describes the most commonly modified configuration options. For a full list of available
variables, please consult the playbook [role defaults](../deploy/ansible/roles/).

### Setting the version
### Setting the image version

By default, the Ansible playbook will use the `latest` tag when specifying Docker images. For production
installations, it is recommended to use an immutable tag (see [Image tags](../README.md#image-tags)).
@@ -93,6 +103,21 @@ image_tag: a031a2ca
image_pull: false
```

To use images from a custom registry, e.g. if you need to perform additional security checks:

```yaml
# Set the prefix for the images
image_prefix: registry.example.com/esgf
```

Properties can also be overridden on a per-image basis, e.g.:

```yaml
# Use a different branch for the THREDDS image
thredds_image_tag: my-branch
thredds_image_pull: true
```

### Setting the web address

By default, the web address is the FQDN of the host (i.e. the output of `hostname --fqdn`). This can
@@ -121,8 +146,23 @@ esgf-data02.example.org hostname=esgf-data.example.org
```

The Ansible playbook does **not** configure the DNS load-balancing automatically - you will need to
separately configure [Round-robin DNS](https://en.wikipedia.org/wiki/Round-robin_DNS) or a more sophisticated
service like [AWS Route 53](https://aws.amazon.com/route53/) to do this.
separately configure [Round-robin DNS](https://en.wikipedia.org/wiki/Round-robin_DNS) or use a more
sophisticated service like [AWS Route 53](https://aws.amazon.com/route53/) to do this.

### Enabling and disabling components

As well as defining each node as a data and/or index node using groups, the Ansible playbook allows
individual components to be enabled or disabled using variables. By default, all components for the
node type (as determined by the groups) will be deployed.

The following variables control which components are deployed:

```yaml
thredds_enabled: true/false
fileserver_enabled: true/false
solr_enabled: true/false
search_enabled: true/false
```

### Configuring the available datasets

@@ -185,3 +225,69 @@ When the catalogs change, run the Ansible playbook in order to restart the conta
load the new catalogs. THREDDS is configured to use a persistent volume for cache files, meaning
that although the first start may be slow for large catalogs, subsequent restarts should be
much faster (depending how many files have changed).

### Configuring Solr replicas

By default, the Ansible playbook configures local master and slave Solr instances for locally
pulished data and configures the `esg-search` application to talk to them.

However, `esg-search` can also include results from indexes at other sites, which are
replicated locally. Each replica gets it's own Solr instance and the `esg-search` application is
configured to use these replicas.

To configure the available replicas use the variable `solr_replicas`. The value should
be a list in which the following keys are required for each item:

  * `name`: Used in the names of Kubernetes resources for the replica
  * `master_url`: The URL to replicate, including scheme, port and path, e.g.
    `https://esgf-index1.ceda.ac.uk/solr`

For example, the following configures two replicas, and will result in four Solr containers running:

  * `master`
  * `slave`
  * `ceda-index-3`
  * `llnl`

```yaml
solr_replicas:
  - name: ceda-index-3
    master_url: https://esgf-index3.ceda.ac.uk/solr
  - name: llnl
    master_url: https://esgf-node.llnl.gov/solr
```

Additional variables are available to customise behaviour, e.g. poll intervals - please see the
[role defaults for the index role](../deploy/ansible/roles/index/defaults/main.yml).

### Using external Solr instances

If you have existing Solr instances that you do not wish to migrate, or need to run Solr
outside of Docker for persistence or performance reasons, the Ansible playbook can configure the
`esg-search` application to use external Solr instances.

To do this, just disable Solr and set the external URLs to use. For any replicas that are specified,
`esg-search` will be configured to use the `master_url` directly.

> **WARNING**
>
> If you want to use a Solr instance configured using `esgf-ansible` as an external Solr instance,
> you will need to configure the firewall on that host to expose the port  `8984` where the
> master listens.

Example configuration using external Solr instances:

```yaml
# Disable local Solr instances
solr_enabled: false
# Set the external URLs for Solr
solr_master_external_url: http://external.solr:8984/solr
solr_slave_external_url: http://external.solr:8983/solr
# Configure the replicas
# No local containers will be deployed - esg-search will use the master_url directly
solr_replicas:
  - name: ceda-index-3
    master_url: https://esgf-index3.ceda.ac.uk/solr
  - name: llnl
    master_url: https://esgf-node.llnl.gov/solr
```