Commit 66dd9bf4 authored by Sebastien Gardoll's avatar Sebastien Gardoll
Browse files

fix docker tips page

parent bce37b74
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 5181bd2b8280e663a350317fc93a53af
config: 4d3597b4651c71301cf359fad5392948
tags: 645f666f9bcd5a90fca523b33c5a78b7
+18 −12
Original line number Diff line number Diff line
.. _data_publishing:

***************
Data Publishing
***************

*Tested with ESGF_VERSION=1.1*
*Tested with ESGF_VERSION=dev_1.4*

Description
===========

This page instructs on how to use the esgf-publisher Python client to
publish a sample dataset and file to the ESGF data node running as a
Docker container. These instructions are meant to be executed inside the
data node container::
publish a sample dataset and file to the ESGF **tds** and **index** node
containers. These publishing instructions are meant to be executed inside the
**publisher** container. Additionally, the **postgres** contrainer is needed
to store publishing metadata, and the **orp** container is needed to enforce access
control (for publishing and downloading data), and the **idp** container is needed 
for authentication and to retrieve the group memberships of the data publisher .::

  docker-compose up 
  docker exec -it -u 0 data-node /bin/bash
  docker exec -it -u 0 <publisher container id> /bin/bash

One-Time Setup
==============
@@ -50,14 +55,14 @@ Generate the mapfile listing the dataset and files to be published::
  cd /esg/config/esgcet
  source ${CDAT_HOME}/bin/activate esgf-pub 
  esgprep mapfile --project test /esg/data/test
  ls -l test.test.map
  ls -l mapfiles/test.test.map

Step 2
======

Publish to the postgres database::
  
  esgpublish --project test --map test.test.map --service fileservice
  esgpublish --project test --map mapfiles/test.test.map --service fileservice
  esglist_datasets test

Step 3
@@ -65,7 +70,7 @@ Step 3

Publish to the TDS::

  esgpublish --project test --map test.test.map --service fileservice --noscan --thredds
  esgpublish --project test --map mapfiles/test.test.map --service fileservice --noscan --thredds

Note: this operation will use the credentials contained in the *esg.ini*
file to invoke the TDS re-initialization URL: 
@@ -79,7 +84,7 @@ from the TDS main catalog page:

and downloadable using any openid, password combination that is trusted
by the data-node. The authorization required for downloading the file is
specified in the access policy file: */esg/config/esgf_policies_local.xml*
specified inside the **orp** container in the access policy file: */esg/config/esgf_policies_local.xml*
as an XML statement of the form::

   <policy resource=".*test.*" attribute_type="AUTH_ONLY" attribute_value="" action="Read"/>
@@ -95,20 +100,21 @@ the identity provider container. In the meantime, the following
workarounds can be adopted.

Obtain a short-term X509 certificate from any other trusted ESGF
identity provider, and copy it into the data-node container in the location
identity provider, and copy it into the **publisher** container in the location
referenced by the file esg.ini::

  cp certificate-file /root/.globus/certificate-file

Then, disable specific authorization for publishing test data, requiring only
Then, disable the specific authorization for publishing test data, requiring only
the availability of an X509 certificate. Edit the file: */esg/config/esgf_policies_local.xml*
inside the **orp** container
and insert the following policy statement (as XML)::

  <policy resource=".*test.*" attribute_type="ANY" attribute_value="" action="Write"/>

At this point, you can issue the publishing command::

  esgpublish --project test --map test.test.map --service fileservice --noscan --publish
  esgpublish --project test --map mapfiles/test.test.map --service fileservice --noscan --publish

After about a minute, the dataset and file should be returned when
querying the "slave" Solr index:
+1 −1
Original line number Diff line number Diff line
@@ -36,7 +36,7 @@ Remove dangling images::

Remove all volumes::

  docker image ls -q | xargs docker volume rm --force
  docker image ls -q | xargs docker image rm --force

Remove image according to a given pattern::

+1 −0
Original line number Diff line number Diff line
@@ -24,3 +24,4 @@ Table of Contents:
   esgf_solr.rst
   testing_guide.rst
   docker_tips.rst
   kubernetes.rst
+82 −0
Original line number Diff line number Diff line
*******************
Kubernetes for ESGF
*******************

This document contains preliminary instructions on how to manage ESGF service containers with Kubernetes.

Currently, the following ESGF services can be started through Kubernetes:

* Solr

* ESGF Index Node (i.e. the search web application)


Setup
=====

Before using Kubernetes, you need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster.
For example, you can install *minikube* and *kubectl* on a Mac laptop, then start a Kubernetes cluster as follows::

  minikube start --vm-driver=xhyve

  kubectl config use-context minikube

All Kubernetes files to follow this tutorial are contained in the *kubernetes* sub-dirctory.


Solr
====

To start a pod that contains the *esgf-solr* container::

  kubectl create -f solr-deployment.yaml

The *esgf-solr* container includes the ESGF solr master instance (port 8984) and slave instance (port 8983).
The Solr indexes are written to a Kubernetes PersistentVolume that is mounted into the location */esg/solr-index* inside the container.
The configuration file above also includes a Kubernetes service which makes the two Solr instances available
within the cluster at the URLs *http://esgf-solr:8984/* and *http://esgf-solr:8983/*, respectively.

To inspect the Kubernetes deployment::

  kubectl get pods -l app=solr
  kubectl describe deployment esgf-solr
  kubectl describe service esgf-solr

To test that the two Solr instances are working, enter the container and query localhost::

  kubectl exec -it esgf-solr-<pod hash-id> -- /bin/bash
  /]# curl 'http://localhost:8983/solr/datasets/select?q=*%3A*&wt=json&indent=true'
  /]# curl 'http://localhost:8984/solr/datasets/select?q=*%3A*&wt=json&indent=true'


Index Node
==========

To start a pod that contains the ESGF Index Node (i.e. the ESGF search web application running within Tomcat)::

  kubectl create -f index-node-deployment.yaml 

The *esgf-index-node* container mounts an archive file that contains all the necessary configuration files, and that is expanded into the location 
*/esg/config* inside the container. The web application connects to the Solr indexes exposed by the Solr service 
at the URLs *http://esgf-solr:8984/* and *http://esgf-solr:8983/*.
This deployment also includes its own service which exposes the web application at the URL *http://esgf-index-node:8080/esg-search/search* to other containers in the cluster.

To inspect the Kubernetes deployment::

  kubectl get pods -l app=index-node
  kubectl describe deployment esgf-index-node
  kubectl describe service esgf-index-node

To test that the ESGF web app is working, connect inside the container and query localhost::

  kubectl exec -it esgf-index-node-<pod hash-id> -- /bin/bash
  /]# curl 'http://localhost:8080/esg-search/search'
  /]# curl -k 'https://localhost:8443/esg-search/search'


Cleanup
=======

To clean up all pods, services and deployments::

  kubectl delete deployment,svc esgf-solr esgf-index-node
Loading