Commit ab73ab46 authored by Matt Pryor's avatar Matt Pryor
Browse files

Add documentation for access log forwarding

parent c6a223bc
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -17,7 +17,7 @@
# If the access log exporter is enabled, we use a FIFO pipe for the Nginx access log
# This will be followed by the exporter to get the access logs
# Unfortunately, in order to get the container to pick this up we have to splat
# the whole logs directory, so we also need to set up the other logs as symlinks to stdout
# the whole logs directory, so we also need to set up the other logs as symlinks to stdout/err
- name: Set up fileserver logs directory
  block:
    - name: Ensure fileserver logs directory exists
+19 −0
Original line number Diff line number Diff line
@@ -23,6 +23,7 @@ configurations.
    - [Using existing THREDDS catalogs](#using-existing-thredds-catalogs)
    - [Configuring Solr replicas](#configuring-solr-replicas)
    - [Using external Solr instances](#using-external-solr-instances)
    - [Fowarding access logs](#fowarding-access-logs)

<!-- /TOC -->

@@ -291,3 +292,21 @@ solr_replicas:
  - name: llnl
    master_url: https://esgf-node.llnl.gov/solr
```

### Fowarding access logs

ESGF data nodes can be configured to forward access logs to [CMCC](https://www.cmcc.it/)
for processing in order to produce download statistics for the federation.

Before enabling this functionality you must first contact CMCC to arrange for the IP addresses
of your ESGF nodes, as visible from the internet, to be whitelisted.

Then set the following variable to enable the forwarding of access logs:

```yaml
logstash_enabled: true
```

Additional variables are available to configure the server to which logs should be forwarded -
please see the [role defaults for the data role](../deploy/ansible/roles/data/defaults/main.yml) -
however the vast majority of deployments will not need to change these.
+85 −57
Original line number Diff line number Diff line
@@ -7,9 +7,10 @@ For a full list of available variables, please consult the chart at
<!-- TOC depthFrom:2 -->

- [Configuring the available datasets](#configuring-the-available-datasets)
- [Fowarding access logs](#fowarding-access-logs)
- [Enabling demand-based autoscaling](#enabling-demand-based-autoscaling)
- [Using existing THREDDS catalogs](#using-existing-thredds-catalogs)
- [Improving pod startup time for large catalogs](#improving-pod-startup-time-for-large-catalogs)
- [Enabling demand-based autoscaling](#enabling-demand-based-autoscaling)

<!-- /TOC -->

@@ -68,6 +69,89 @@ data:
      location: /data/cordex
```

## Fowarding access logs

The THREDDS and Nginx file server components can be configured to forward access logs to
[CMCC](https://www.cmcc.it/) for processing in order to produce download statistics for
the federation.

Before enabling this functionality you must first contact CMCC to arrange for the IP addresses
of your Kubernetes nodes, as visible from the internet, to be whitelisted.

> If your Kubernetes nodes are not directly exposed to the internet then they are probably using
> [Network Address Translation (NAT)](https://en.wikipedia.org/wiki/Network_address_translation)
> when accessing resources on the internet.
>
> In this case, the address that you need to give to CMCC is the translated address.

To enable the forwarding of access logs for THREDDS and Nginx file server pods, add the following
to your `values.yaml`:

```yaml
data:
  accessLogSidecar:
    enabled: true
```

Additional variables are available to configure the server to which logs should be forwarded,
however the vast majority of deployments will not need to change these.

## Enabling demand-based autoscaling

Kubernetes allows the number of pods backing a service to be scaled up and down automatically using
a [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/).
This allows the service to respond to spikes in demand by creating more pods to respond to requests.
A Kubernetes `Service` ensures that requests are routed to the new replicas as they become ready.

A HPA can be configured to automatically adjust the number of replicas based on any metrics that are exposed via
the [Metrics API](https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/).
By default, this allows scaling based on the CPU or memory usage of the pods backing a service. However
it is possible to integrate other metrics gathering systems, such as [Prometheus](https://prometheus.io/),
to allow scaling based on any of the collected metrics (e.g. network I/O, requests per second).

By default, autoscaling is disabled in the ESGF Helm chart. To enable autoscaling for the THREDDS and
Nginx file server components, the chart allows `HorizontalPodAutoscaler` resources to be defined using
the `data.{thredds,fileServer}.hpa` variables. These variables define the `spec` section of the HPA, except
for the `scaleTargetRef` section which is automatically populated with the correct reference.
For more information about HPA configuration, see the
[Kubernetes HPA Walkthrough](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/).

> **WARNING**
>
> In order to scale based on utilisation (as opposed to absolute value), you must define
> `resources.requests` for the service
> (see [Configuring container resources](#configuring-container-resources) above).

For example, the following configuration would attempt to keep the average CPU utilisation
below 80% of the requested amount by scaling out up to a maximum of 10 replicas:

```yaml
data:
  thredds:
    hpa:
      minReplicas: 1
      maxReplicas: 10
      metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 80

  fileServer:
    hpa:
      minReplicas: 1
      maxReplicas: 10
      metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 80
```

## Using existing THREDDS catalogs

The data node can be configured to serve data based on pre-existing THREDDS catalogs, for
@@ -145,59 +229,3 @@ data:
    localCache:
      enabled: true
```

## Enabling demand-based autoscaling

Kubernetes allows the number of pods backing a service to be scaled up and down automatically using
a [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/).
This allows the service to respond to spikes in demand by creating more pods to respond to requests.
A Kubernetes `Service` ensures that requests are routed to the new replicas as they become ready.

A HPA can be configured to automatically adjust the number of replicas based on any metrics that are exposed via
the [Metrics API](https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/).
By default, this allows scaling based on the CPU or memory usage of the pods backing a service. However
it is possible to integrate other metrics gathering systems, such as [Prometheus](https://prometheus.io/),
to allow scaling based on any of the collected metrics (e.g. network I/O, requests per second).

By default, autoscaling is disabled in the ESGF Helm chart. To enable autoscaling for the THREDDS and
Nginx file server components, the chart allows `HorizontalPodAutoscaler` resources to be defined using
the `data.{thredds,fileServer}.hpa` variables. These variables define the `spec` section of the HPA, except
for the `scaleTargetRef` section which is automatically populated with the correct reference.
For more information about HPA configuration, see the
[Kubernetes HPA Walkthrough](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/).

> **WARNING**
>
> In order to scale based on utilisation (as opposed to absolute value), you must define
> `resources.requests` for the service
> (see [Configuring container resources](#configuring-container-resources) above).

For example, the following configuration would attempt to keep the average CPU utilisation
below 80% of the requested amount by scaling out up to a maximum of 10 replicas:

```yaml
data:
  thredds:
    hpa:
      minReplicas: 1
      maxReplicas: 10
      metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 80

  fileServer:
    hpa:
      minReplicas: 1
      maxReplicas: 10
      metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 80
```