Initial commit of NCCS Python reference scripts. (e8c826fb) · Commits · Belhorn, Matt / nccs_python_reference

README.md

0 → 100644

+196 −0

Original line number	Diff line number	Diff line
		Custom Python installations on NCCS Resources
		=============================================

		These are personal notes and scripts for installing and managing custom python
		installations on various NCCS resources.

		# Python

		Python is available through environment modules. The base `python` environment
		module (2016-09-01) provides either Python v2.7 or Python v3 and requires extra
		environment modules to be loaded to provide core extensions such as `pip` and
		`virtualenv` as well as center-provided builds of common packages like `numpy`.

		Python packages loaded from extra environment modules should supersede any that
		are provided in the base environment module because environment modules
		prepend packages to the `PYTHONPATH`.

		# Extending Python

		Python packages not available through environment modules can be installed by
		individual users. There are several methods by which users can extend Python for
		their needs, each having their own benefits and drawbacks. The principle
		considerations to choosing the best method are:

		1. Will the package be used on Titan/Eos compute nodes?
		2. Will the package be used by many users?

		* Will the package be used on different resources or under different
		environment modules? If yes:

		3. Does the package provide compiled non-python binaries or shared objects?
		4. Does the package depend on shared libraries from other environment modules?

		Questions (1) and (2) establish on which filesystem the package should be
		installed. Packages to be used on Cray compute nodes must reside on a
		filesystem that is visible to those nodes in a virtual environment or
		alternate root, nominally `/ccs/proj/${PROJECT_ID}` which is readable by the
		compute nodes and is not purged.

		Packages that will be shared by multiple users should be installed to an
		appropriately accessible virtual environment or alternate root such as a
		subdirectory of `/ccs/proj/${PROJECT_ID}`.

		Single-user packages that are not used on the Cray compute nodes may be
		installed under your home directory. In this case, if questions (3) and (4) do
		not apply or are 'no', then it can be installed to the standard **User install
		directory** which is automatically added to the python package search path.

		Questions (3) and (4) establish how the package should be installed if it will
		be used on different resources or under different runtime environments.

		Packages that provide non-python binaries or shared objects ('yes' to question
		(3)) cannot generally be assumed to produce architecture independent code that
		can be run on all OLCF resources or Cray node types. Simple cases, the package
		can be installed as a generically pre-compiled python wheel in the standard
		package search path. However, this may cause instruction errors at runtime on
		some resources. This is a major issue with distributions like anaconda, which
		prefers wheels, when run on Cray compute nodes.

		Likewise for question (4), CPU instruction sets are generally different for each
		system and in the case of Titan and Eos are different between the service nodes
		and the compute nodes. The available shared libraries also generally change
		between systems and CrayPE programming environments.

		Packages with specific runtime or architecture dependencies should be installed to either

		* a virtual environment that is activated in the appropriate environment,
		* an alternate root explicitly added to the PYTHONPATH when appropriate,
		* or provided by the OLCF as an environment module.

		Examples of such packages include optimized `numpy`, `mpi4py`, `h5py`,
		and `python-netcdf`.

		# Installing Packages

		Popular packages can generally be installed from online package indexes such as
		the Python Package Index, PyPI, using the tool `pip` (aka `pip2`) or `pip3`
		depending on which version of python is being used. These commands are added to
		your `PATH` when the base python environment module is loaded.

		Alternatively, many packages can be installed by running a `setup.py` script
		provided with the package. These scripts use a number of distribution tools
		that are provided with the core python installation.

		## User install directory

		Packages can be easily installed to your user install directory (typically
		`$HOME/.local/lib/pythonV.v/site-packages`) using

		`pip install --user -v [--no-binary :all:] PACKAGE`

		where the optional flag `--no-binary` instructs `pip` to avoid pre-compiled
		binaries and wheels and instead compile any binaries for the current environment
		using the system compiler. Packages installed this way are known to the python
		interpreter without any extra setup.

		## Virtual Environments

		A robust way to build a customized python stack is to build it from
		scratch in a virtual environment, which is sometimes shortened to
		virtualenv or simply venv.

		Virtual environments allow you to maintain a personal python stack that is fully
		under your control. To create a venv, load the base python environment module
		and issue the command

		`virtualenv [-p PYTHON] VENVPATH`

		which will create a clean Python distribution directory structure at any
		arbitrary virtual environment path `VENVPATH`. The optional flag `-p PYTHON`
		allows you to specify a specific python interpreter version for the venv to use.

		It is recommended to give your virtual environments clear names and organize
		them in standard locations. For example, given shared applications or projects
		named `foo` and `bar` which have environment-specific binaries and private apps
		`baz` and `widget`, one might choose the following virtual environment paths:

		```
		/ccs/proj/<PROJECTID>/venvs/titan-pgi-foo
		/ccs/proj/<PROJECTID>/venvs/titan-intel-foo
		/ccs/proj/<PROJECTID>/venvs/rhea-bar
		/home/$USER/.venvs/baz
		/home/$USER/.venvs/widget
		```

		To use a venv, it must by activated by sourcing:

		`. VENVPATH/bin/activate`

		A venv can be de-activated by calling a shell function

		`deactivate`

		This function is created when the venv is activated.

		It is important that environment modules are not changed while a venv is
		activated. Any environment modules that are dependencies of packages installed
		in the venv **must be loaded prior to both creating and subsequently
		activating** the virtual environment. An active venv must be de-activated before
		making any changes to the loaded environment modules.

		While a venv is active, the `python` interpreter used will be the one installed
		in the virtual environment path. Likewise, all packages installed to the
		"system" site-packages directory, for instance using:

		`pip install -v [--no-binary :all:] PACKAGE`

		will in fact be installed under the virtual environment site-packages. This
		allows you, as an unprivileged user, to install any package you like using
		`pip` or `setup.py` into a customized python stack.

		It is typically necessary to install all packages that you would like to use
		into the virtualenv. In this way, it is possible to create a customized python
		stack for each resource or programming environment which is optimized with
		potentially architecture specific binaries and any extra python packages that
		are not available in the base distribution.

		### Library links in `$VIRTUAL_ENV/lib`

		Shared libraries that are not made available through environment modules can be
		linked to within `$VIRTUAL_ENV/lib`.

		### Enhancing `activate`

		Adventurous users can add additional commands to `activate`, if they wish. This
		is dangerous as parts of the script are called multiple times during activation. A
		safer alternative is to write a source-able script to activate the environment.
		See `venv_activator.sh` in this repo, for instance.

		## Alternate Roots

		It is possible to install most python packages to any location that you like and
		make them available by manually adding them to your `PYTHONPATH`.

		## Anaconda

		The author does not like Anaconda primarily because it conflicts with the system
		python, python inadvertently loaded from environment modules, and tends to favor
		pre-compiled binaries that can cause runtime errors when used on Crays. However,
		it can be used successfully on Rhea. If additional packages are needed by a
		user, an Anaconda virtual environment (clone) should be made in a directory
		where the user has write permissions.

		## The Nuclear Option: Installing a core Python stack directly.

		For when a virtualenv just isn't enough, users can build Python (including dual
		Python2+Python3 deployments) directly from source in a directory of their
		choosing. All that needs to be done to use it is to add the relevant parts of
		the install to the `PATH`, `LD_LIBRARY_PATH`, and possibly `PYTHONPATH`
		variables. To keep these changes from conflicting with modifications made by
		environment modules, it is recommended to construct and use a modulefile to
		enable the custom stack under the module name `python`.

		See `build_raw_python.sh` for an overview of what is involved to deploy a custom
		python stack from source.

build_raw_python.sh

0 → 100755

+121 −0

Original line number	Diff line number	Diff line
		#!/bin/bash

		# Installs python2 and python3 side-by-side in a customized directory $TOPDIR.
		# Installation includes pip, virtualenv, and core packages for an-all-in-one
		# useful python stack.

		PY2_VER="2.7.12"
		PY3_VER="3.5.2"

		WHEEL_EXTRAS=(nose
		PyYAML
		jsonschema
		pep8
		argcomplete
		psutil
		)
		COMPILED_EXTRAS=(numpy
		cython
		matplotlib
		ipython
		pandas
		sympy
		)

		declare -a GROUPS
		for g in $(groups \| grep -m 1 -oE "\<[a-z]{3}[0-9]{3}\>"); do
		GROUPS+=("$g")
		done
		echo "Under which project do you want to install Python?"
		select grp in ${GROUPS[@]}; do break; done
		TOPDIR="/ccs/proj/$grp/opt/python"
		BUILD="$TOPDIR/build"
		USR="$TOPDIR/usr"
		PY2LOG="$TOPDIR/py2_build.log"
		PY3LOG="$TOPDIR/py3_build.log"

		notify () {
		if [ $1 -gt 0 ]; then
		printf "[FAILED]\n"
		else
		printf "[ OK ]\n"
		fi
		}

		echo "Removing existing installation"
		rm -fr $TOPDIR
		mkdir -p $BUILD $USR
		# FIXME: Installation group and permissions should be considered. It may be
		# prudent to set the group sticky bit.
		cd $BUILD

		echo "Obtaining source files"
		echo "==============="
		echo "Python2"
		wget https://www.python.org/ftp/python/${PY2_VER}/Python-${PY2_VER}.tgz
		echo "Python3"
		wget https://www.python.org/ftp/python/${PY3_VER}/Python-${PY3_VER}.tar.xz
		echo "Pip boostrap script"
		curl -O https://bootstrap.pypa.io/get-pip.py
		printf "===============\n\n"

		printf "%-41s" "Installing Python2"
		cd $BUILD
		tar xf Python-${PY2_VER}.tgz >> $PY2LOG 2>&1
		cd $BUILD/Python-${PY2_VER}
		./configure --prefix=$TOPDIR/usr --enable-shared >> $PY2LOG 2>&1
		make >> $PY2LOG 2>&1
		make install >> $PY2LOG 2>&1
		notify $?

		printf "%-41s" "Installing Python3"
		cd $BUILD
		tar xf Python-${PY3_VER}.tar.xz >> $PY3LOG 2>&1
		cd $BUILD/Python-${PY3_VER}
		./configure --prefix=$TOPDIR/usr --enable-shared >> $PY3LOG 2>&1
		make >> $PY3LOG 2>&1
		make install >> $PY3LOG 2>&1
		notify $?

		export LD_LIBRARY_PATH="$USR/lib:$LD_LIBRARY_PATH"
		export PATH="$USR/bin:$PATH"

		cd $BUILD
		printf "%-41s" "Installing pip3"
		python3 get-pip.py >> $PY3LOG 2>&1 # Install pip3 first, so that
		notify $?

		printf "%-41s" "Installing pip2"
		python get-pip.py >> $PY2LOG 2>&1 # pip2 overwrites default `pip`
		notify $?

		printf "%-41s" "Installing virtualenv for Python2"
		# Version Specific Packages
		## Python3 provides pyvenv
		pip2 install -v virtualenv >> $PY2LOG 2>&1
		notify $?

		printf "\nInstalling extra packages:\n"
		# Install packages for python3 first so that python2 packages writing binaries
		# with version-less names (like 'ipython' as opposed to 'ipython2') use python2
		for pipx in $(which pip3) $(which pip2); do
		case "$(basename ${pipx})" in
		pip2) log="$PY2LOG"; name="python2" ;;
		pip3) log="$PY3LOG"; name="python3" ;;
		*) exit 1
		esac
		for package in ${WHEEL_EXTRAS[@]}; do
		printf " %s: %-30s" $name $package
		$pipx install -v $package >> $log 2>&1
		notify $?
		done

		for package in ${COMPILED_EXTRAS[@]}; do
		printf " %s: %-30s" $name $package
		$pipx install -v --no-use-wheel $package >> $log 2>&1
		notify $?
		done
		done


		printf "\nFinished! See build logs\n python2: $PY2LOG\n python3: $PY3LOG\nfor details.\n\n"

build_virtualenv.sh

0 → 100755

+48 −0

Original line number	Diff line number	Diff line
		#!/bin/bash

		declare -a GROUPS
		for g in $(groups \| grep -m 1 -oE "\<[a-z]{3}[0-9]{3}\>"); do
		GROUPS+=("$g")
		done
		echo "Under which project do you want to install Python?"
		select grp in ${GROUPS[@]}; do break; done

		# Setup all the optional paths.
		VENV_DIR="/ccs/proj/$grp/.venvs"
		VENV="$VENV_DIR/rhea-pyms"
		TMPDIR=/tmp/$USER/venvbuild

		# Set the environment.
		# !!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
		# ENVIRONMENT MODULE CHANGES CANNOT BE MADE INSIDE AN ACTIVE VIRTUALENV
		module swap PE-intel PE-gnu
		module load netcdf python/2.7.9 python_virtualenv/12.0.7

		# Make the necessary directories
		mkdir -p $VENV_DIR $TMPDIR

		# Make and activate a virtualenv for this python stack
		virtualenv $VENV
		source $VENV/bin/activate

		# Link against RHEL atlas - this is kind of gross. These are probably the least
		# optimized lapack/blas implementations I've ever seen...
		for i in $(ls /usr/lib64/atlas/*.so.3); do
		BASELIB=${i##*atlas/}
		ln -s $i $VIRTUAL_ENV/lib/${BASELIB%%.3}
		done

		# Install indexed dependancies outright.
		pip install --upgrade pip
		pip install -v --no-binary :all: numpy
		pip install -v --no-binary :all: scipy
		pip install -v jupyter matplotlib nose mock ipyparallel mpi4py

		# Install non-indexed packages manually.
		cd $TMPDIR

		wget -O pycdf-0.6-3b.tar.gz "http://downloads.sourceforge.net/project/pysclint/pycdf/pycdf-0.6.3b/pycdf-0.6-3b.tar.gz?r=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fpysclint%2Ffiles%2Fpycdf%2Fpycdf-0.6.3b%2F&ts=1471285723&use_mirror=superb-sea2"
		tar xf pycdf-0.6-3b.tar.gz
		cd pycdf-0.6-3b
		python setup.py install

jupyter-on-rhea.pbs

0 → 100644

+55 −0

Original line number	Diff line number	Diff line
		#!/bin/bash -l
		#PBS -A FIXME
		#PBS -q batch
		#PBS -l walltime=48:00:00,nodes=1
		#PBS -o jupyter.log
		#PBS -j oe

		# Setup all the optional paths.
		# WORK defined in user's bashrc
		VENV_DIR="$HOME/.venvs"
		VENV="$VENV_DIR/rhea-pyms"

		# Change the login and client ports to suitable values.
		# Be aware your preferred login port may be in use by other users. A login port
		# used by another project will cause dire confusion at runtime.
		CLIENT_PORT=8080
		LOGIN_PORT=XXXXX # FIXME: Choose a RANDOM unused port number in the range 10k-64k.
		SERVER_PORT=8082
		COMMAND="${HOME}/.jupyter_connect"

		# Setup the environment.
		source $HOME/.venvs/venv-activator.sh
		venvctl-rhea-app

		cd $HOME

		function finish {
		rm $COMMAND
		}

		if [ -f "$COMMAND" ]; then
		echo "A Jupyter server is already running."
		echo "See '$COMMAND' for details."
		exit 1
		fi

		cat << EOF > $COMMAND
		#!/bin/bash
		# To open a tunnel to the notebook server/kernels running on the compute node,
		# issue the following command from your local machine:
		#
		# ssh -f -L 127.0.0.1:$CLIENT_PORT:127.0.0.1:$LOGIN_PORT $USER@rhea.ccs.ornl.gov $COMMAND
		#
		# Then, on your local machine, navigate to "http://127.0.0.1:$CLIENT_PORT" in
		# the browser of your choice. Use 'https' if the server is configured to use
		# TLS/SSL encryption.

		ssh -q -L 127.0.0.1:$LOGIN_PORT:127.0.0.1:$SERVER_PORT \
		$HOSTNAME.ccs.ornl.gov sleep $PBS_WALLTIME
		EOF

		trap finish EXIT
		chmod a+x $COMMAND

		jupyter-notebook --no-browser --port=$SERVER_PORT --log-level='DEBUG'

venv-activator.sh

0 → 100644

+107 −0

Original line number	Diff line number	Diff line
		#!/bin/bash
		#
		# This script provides shell function utilities for activating and
		# deactivating python virtual environments that have dependencies on Tkl
		# environment modules.
		#
		#------------------------------------------------------------------------------
		# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
		# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
		# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
		# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
		# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
		# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
		# SOFTWARE.
		#------------------------------------------------------------------------------
		#

		function venvmodulectl () {
		# A single function to activate and deactivate a python virtualenv that has
		# prerequisite environment module dependencies. The function will apply a
		# sequence of environment module commands prior to activating a specific venv.
		# If the venv is already active, it will be deactivated and the sequence of
		# module commands will be reversed to undo the environment changes.
		#
		# USAGE:
		# venvmodulectl "/PATH/TO/VENV" "MODULE CMD" ["MODULE CMD"...]
		#
		# Where "/PATH/TO/VENV" points to the root of the virtual environment and
		# each "MODULE CMD" is a double-quoted string of instructions to `modulecmd`
		# of the forms:
		# "swap MODULEA MODULEB" or
		# "load MODULE1 MODULE2 ... MODULEN"
		#
		# The script will likely fail if the sequence of module commands conflicts
		# with the modules that are loaded when the function is first called. It is
		# intended that the sequence of module commands be chosen such that they are
		# applied from a clean login environment.

		declare _ENVNAME="$1"
		declare -a _COMMANDS
		for jj in "${2:+"${@:2}"}"; do
		_COMMANDS+=("$jj")
		done
		if [ -z "$_ENVNAME" -o -z "$_COMMANDS" ]; then
		echo "Could not interpret input."
		return
		fi

		declare _CMD
		if [ -z "$VIRTUAL_ENV" -a -z "$MYENV" ]; then
		# Load the modules.
		for _CMD in "${_COMMANDS[@]}"; do
		echo "module ${_CMD}"
		eval "module ${_CMD}"
		done
		# Activate the virtualenv.
		. "$_ENVNAME/bin/activate"
		# Keep track of what's been done.
		export MYENV="$_ENVNAME"
		elif [[ "$MYENV" == "$_ENVNAME" ]]; then
		# Deactivate the virtualenv.
		[ -n "$VIRTUAL_ENV" ] && deactivate
		# Unload the modules. The double loop is gross and inefficient
		# but more shell-agnostic and readable than other approaches.
		declare -a SDNAMMOC_
		for _CMD in "${_COMMANDS[@]}"; do
		_CMD="$(echo ${_CMD} \| sed 's/^lo$.*$$/unlo\1/' \| \
		awk '{ printf("%s ", $1)
		for (i=NF; i>2; i--) printf("%s ",$i)
		print $2 }')"
		SDNAMMOC_=("$_CMD" "${SDNAMMOC_[@]}")
		done
		for _CMD in "${SDNAMMOC_[@]}"; do
		echo "module $_CMD"
		eval "module $_CMD"
		done
		# Cleanup the environment.
		unset MYENV
		else
		echo "ERROR - Cannot alter $_ENVNAME environment:"
		[ -n "$VIRTUAL_ENV" ] && echo " $VIRTUAL_ENV already active."
		[ -n "$MYENV" ] && echo " Run script for $MYENV first"
		fi
		}

		# Usage Examples
		# ==============
		# Environement module requirements for a venv are unlikely to change often.
		# Therefore it makes sense to use the above function in other functions or
		# aliases crafted for specific venvs deployed on OLCF resources.

		# These examples assume a venv is located at $HOME/.venvs/rhea-app that was
		# built using the python/2.7.9 module on Rhea and has dependancies on the PE-gnu
		# and netcdf environment modules.

		# Within a shell function
		function venvctl-rhea-app () {
		declare -a COMMANDS
		COMMANDS=(
		"swap PE-intel PE-gnu"
		"load netcdf python/2.7.9"
		)
		venvmodulectl "$HOME/.venvs/rhea-app" "${COMMANDS[@]}"
		}

		# Within an alias
		alias venvctl-rhea-app-alias="venvmodulectl "$HOME/.venvs/rhea-app" "swap PE-intel PE-gnu" "load netcdf python/2.7.9""