Loading README.md 0 → 100644 +196 −0 Original line number Diff line number Diff line Custom Python installations on NCCS Resources ============================================= These are personal notes and scripts for installing and managing custom python installations on various NCCS resources. # Python Python is available through environment modules. The base `python` environment module (2016-09-01) provides either Python v2.7 or Python v3 and requires extra environment modules to be loaded to provide core extensions such as `pip` and `virtualenv` as well as center-provided builds of common packages like `numpy`. Python packages loaded from extra environment modules should supersede any that are provided in the base environment module because environment modules *prepend* packages to the `PYTHONPATH`. # Extending Python Python packages not available through environment modules can be installed by individual users. There are several methods by which users can extend Python for their needs, each having their own benefits and drawbacks. The principle considerations to choosing the best method are: 1. Will the package be used on Titan/Eos compute nodes? 2. Will the package be used by many users? * Will the package be used on different resources or under different environment modules? If yes: 3. Does the package provide compiled non-python binaries or shared objects? 4. Does the package depend on shared libraries from other environment modules? Questions (1) and (2) establish *on which filesystem* the package should be installed. Packages to be used on Cray compute nodes must reside on a filesystem that is visible to those nodes in a **virtual environment** or **alternate root**, nominally `/ccs/proj/${PROJECT_ID}` which is readable by the compute nodes and is not purged. Packages that will be shared by multiple users should be installed to an appropriately accessible **virtual environment** or **alternate root** such as a subdirectory of `/ccs/proj/${PROJECT_ID}`. Single-user packages that are not used on the Cray compute nodes may be installed under your home directory. In this case, if questions (3) and (4) do not apply or are 'no', then it can be installed to the standard **User install directory** which is automatically added to the python package search path. Questions (3) and (4) establish *how* the package should be installed if it will be used on different resources or under different runtime environments. Packages that provide non-python binaries or shared objects ('yes' to question (3)) cannot generally be assumed to produce architecture independent code that can be run on all OLCF resources or Cray node types. Simple cases, the package can be installed as a generically pre-compiled **python wheel** in the standard package search path. However, this may cause instruction errors at runtime on some resources. This is a major issue with distributions like *anaconda*, which prefers wheels, when run on Cray compute nodes. Likewise for question (4), CPU instruction sets are generally different for each system and in the case of Titan and Eos are different between the service nodes and the compute nodes. The available shared libraries also generally change between systems and CrayPE programming environments. Packages with specific runtime or architecture dependencies should be installed to either * a **virtual environment** that is activated in the appropriate environment, * an **alternate root** explicitly added to the PYTHONPATH when appropriate, * or provided by the OLCF as an environment module. Examples of such packages include optimized `numpy`, `mpi4py`, `h5py`, and `python-netcdf`. # Installing Packages Popular packages can generally be installed from online package indexes such as the Python Package Index, *PyPI*, using the tool `pip` (aka `pip2`) or `pip3` depending on which version of python is being used. These commands are added to your `PATH` when the base python environment module is loaded. Alternatively, many packages can be installed by running a `setup.py` script provided with the package. These scripts use a number of distribution tools that are provided with the core python installation. ## User install directory Packages can be easily installed to your **user install directory** (typically `$HOME/.local/lib/pythonV.v/site-packages`) using `pip install --user -v [--no-binary :all:] PACKAGE` where the optional flag `--no-binary` instructs `pip` to avoid pre-compiled binaries and wheels and instead compile any binaries for the current environment using the system compiler. Packages installed this way are known to the python interpreter without any extra setup. ## Virtual Environments A robust way to build a customized python stack is to build it from scratch in a **virtual environment**, which is sometimes shortened to *virtualenv* or simply *venv*. Virtual environments allow you to maintain a personal python stack that is fully under your control. To create a venv, load the base python environment module and issue the command `virtualenv [-p PYTHON] VENVPATH` which will create a clean Python distribution directory structure at any arbitrary virtual environment path `VENVPATH`. The optional flag `-p PYTHON` allows you to specify a specific python interpreter version for the venv to use. It is recommended to give your virtual environments clear names and organize them in standard locations. For example, given shared applications or projects named `foo` and `bar` which have environment-specific binaries and private apps `baz` and `widget`, one might choose the following virtual environment paths: ``` /ccs/proj/<PROJECTID>/venvs/titan-pgi-foo /ccs/proj/<PROJECTID>/venvs/titan-intel-foo /ccs/proj/<PROJECTID>/venvs/rhea-bar /home/$USER/.venvs/baz /home/$USER/.venvs/widget ``` To use a venv, it must by *activated* by sourcing: `. VENVPATH/bin/activate` A venv can be *de-activated* by calling a shell function `deactivate` This function is created when the venv is activated. It is important that environment modules are not changed while a venv is activated. Any environment modules that are dependencies of packages installed in the venv **must be loaded prior to both creating and subsequently activating** the virtual environment. An active venv must be de-activated before making any changes to the loaded environment modules. While a venv is active, the `python` interpreter used will be the one installed in the virtual environment path. Likewise, all packages installed to the "system" site-packages directory, for instance using: `pip install -v [--no-binary :all:] PACKAGE` will in fact be installed under the virtual environment site-packages. This allows you, as an unprivileged user, to install any package you like using `pip` or `setup.py` into a customized python stack. It is typically necessary to install all packages that you would like to use into the virtualenv. In this way, it is possible to create a customized python stack for each resource or programming environment which is optimized with potentially architecture specific binaries and any extra python packages that are not available in the base distribution. ### Library links in `$VIRTUAL_ENV/lib` Shared libraries that are not made available through environment modules can be linked to within `$VIRTUAL_ENV/lib`. ### Enhancing `activate` Adventurous users can add additional commands to `activate`, if they wish. This is dangerous as parts of the script are called multiple times during activation. A safer alternative is to write a source-able script to activate the environment. See `venv_activator.sh` in this repo, for instance. ## Alternate Roots It is possible to install most python packages to any location that you like and make them available by manually adding them to your `PYTHONPATH`. ## Anaconda The author does not like Anaconda primarily because it conflicts with the system python, python inadvertently loaded from environment modules, and tends to favor pre-compiled binaries that can cause runtime errors when used on Crays. However, it can be used successfully on Rhea. If additional packages are needed by a user, an Anaconda virtual environment (clone) should be made in a directory where the user has write permissions. ## The Nuclear Option: Installing a core Python stack directly. For when a virtualenv just isn't enough, users can build Python (including dual Python2+Python3 deployments) directly from source in a directory of their choosing. All that needs to be done to use it is to add the relevant parts of the install to the `PATH`, `LD_LIBRARY_PATH`, and possibly `PYTHONPATH` variables. To keep these changes from conflicting with modifications made by environment modules, it is recommended to construct and use a modulefile to enable the custom stack under the module name `python`. See `build_raw_python.sh` for an overview of what is involved to deploy a custom python stack from source. build_raw_python.sh 0 → 100755 +121 −0 Original line number Diff line number Diff line #!/bin/bash # Installs python2 and python3 side-by-side in a customized directory $TOPDIR. # Installation includes pip, virtualenv, and core packages for an-all-in-one # useful python stack. PY2_VER="2.7.12" PY3_VER="3.5.2" WHEEL_EXTRAS=(nose PyYAML jsonschema pep8 argcomplete psutil ) COMPILED_EXTRAS=(numpy cython matplotlib ipython pandas sympy ) declare -a GROUPS for g in $(groups | grep -m 1 -oE "\<[a-z]{3}[0-9]{3}\>"); do GROUPS+=("$g") done echo "Under which project do you want to install Python?" select grp in ${GROUPS[@]}; do break; done TOPDIR="/ccs/proj/$grp/opt/python" BUILD="$TOPDIR/build" USR="$TOPDIR/usr" PY2LOG="$TOPDIR/py2_build.log" PY3LOG="$TOPDIR/py3_build.log" notify () { if [ $1 -gt 0 ]; then printf "[FAILED]\n" else printf "[ OK ]\n" fi } echo "Removing existing installation" rm -fr $TOPDIR mkdir -p $BUILD $USR # FIXME: Installation group and permissions should be considered. It may be # prudent to set the group sticky bit. cd $BUILD echo "Obtaining source files" echo "===============" echo "Python2" wget https://www.python.org/ftp/python/${PY2_VER}/Python-${PY2_VER}.tgz echo "Python3" wget https://www.python.org/ftp/python/${PY3_VER}/Python-${PY3_VER}.tar.xz echo "Pip boostrap script" curl -O https://bootstrap.pypa.io/get-pip.py printf "===============\n\n" printf "%-41s" "Installing Python2" cd $BUILD tar xf Python-${PY2_VER}.tgz >> $PY2LOG 2>&1 cd $BUILD/Python-${PY2_VER} ./configure --prefix=$TOPDIR/usr --enable-shared >> $PY2LOG 2>&1 make >> $PY2LOG 2>&1 make install >> $PY2LOG 2>&1 notify $? printf "%-41s" "Installing Python3" cd $BUILD tar xf Python-${PY3_VER}.tar.xz >> $PY3LOG 2>&1 cd $BUILD/Python-${PY3_VER} ./configure --prefix=$TOPDIR/usr --enable-shared >> $PY3LOG 2>&1 make >> $PY3LOG 2>&1 make install >> $PY3LOG 2>&1 notify $? export LD_LIBRARY_PATH="$USR/lib:$LD_LIBRARY_PATH" export PATH="$USR/bin:$PATH" cd $BUILD printf "%-41s" "Installing pip3" python3 get-pip.py >> $PY3LOG 2>&1 # Install pip3 first, so that notify $? printf "%-41s" "Installing pip2" python get-pip.py >> $PY2LOG 2>&1 # pip2 overwrites default `pip` notify $? printf "%-41s" "Installing virtualenv for Python2" # Version Specific Packages ## Python3 provides pyvenv pip2 install -v virtualenv >> $PY2LOG 2>&1 notify $? printf "\nInstalling extra packages:\n" # Install packages for python3 first so that python2 packages writing binaries # with version-less names (like 'ipython' as opposed to 'ipython2') use python2 for pipx in $(which pip3) $(which pip2); do case "$(basename ${pipx})" in pip2) log="$PY2LOG"; name="python2" ;; pip3) log="$PY3LOG"; name="python3" ;; *) exit 1 esac for package in ${WHEEL_EXTRAS[@]}; do printf " %s: %-30s" $name $package $pipx install -v $package >> $log 2>&1 notify $? done for package in ${COMPILED_EXTRAS[@]}; do printf " %s: %-30s" $name $package $pipx install -v --no-use-wheel $package >> $log 2>&1 notify $? done done printf "\nFinished! See build logs\n python2: $PY2LOG\n python3: $PY3LOG\nfor details.\n\n" build_virtualenv.sh 0 → 100755 +48 −0 Original line number Diff line number Diff line #!/bin/bash declare -a GROUPS for g in $(groups | grep -m 1 -oE "\<[a-z]{3}[0-9]{3}\>"); do GROUPS+=("$g") done echo "Under which project do you want to install Python?" select grp in ${GROUPS[@]}; do break; done # Setup all the optional paths. VENV_DIR="/ccs/proj/$grp/.venvs" VENV="$VENV_DIR/rhea-pyms" TMPDIR=/tmp/$USER/venvbuild # Set the environment. # !!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! # ENVIRONMENT MODULE CHANGES CANNOT BE MADE INSIDE AN ACTIVE VIRTUALENV module swap PE-intel PE-gnu module load netcdf python/2.7.9 python_virtualenv/12.0.7 # Make the necessary directories mkdir -p $VENV_DIR $TMPDIR # Make and activate a virtualenv for this python stack virtualenv $VENV source $VENV/bin/activate # Link against RHEL atlas - this is kind of gross. These are probably the least # optimized lapack/blas implementations I've ever seen... for i in $(ls /usr/lib64/atlas/*.so.3); do BASELIB=${i##*atlas/} ln -s $i $VIRTUAL_ENV/lib/${BASELIB%%.3} done # Install indexed dependancies outright. pip install --upgrade pip pip install -v --no-binary :all: numpy pip install -v --no-binary :all: scipy pip install -v jupyter matplotlib nose mock ipyparallel mpi4py # Install non-indexed packages manually. cd $TMPDIR wget -O pycdf-0.6-3b.tar.gz "http://downloads.sourceforge.net/project/pysclint/pycdf/pycdf-0.6.3b/pycdf-0.6-3b.tar.gz?r=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fpysclint%2Ffiles%2Fpycdf%2Fpycdf-0.6.3b%2F&ts=1471285723&use_mirror=superb-sea2" tar xf pycdf-0.6-3b.tar.gz cd pycdf-0.6-3b python setup.py install jupyter-on-rhea.pbs 0 → 100644 +55 −0 Original line number Diff line number Diff line #!/bin/bash -l #PBS -A FIXME #PBS -q batch #PBS -l walltime=48:00:00,nodes=1 #PBS -o jupyter.log #PBS -j oe # Setup all the optional paths. # WORK defined in user's bashrc VENV_DIR="$HOME/.venvs" VENV="$VENV_DIR/rhea-pyms" # Change the login and client ports to suitable values. # Be aware your preferred login port may be in use by other users. A login port # used by another project will cause dire confusion at runtime. CLIENT_PORT=8080 LOGIN_PORT=XXXXX # FIXME: Choose a *RANDOM* unused port number in the range 10k-64k. SERVER_PORT=8082 COMMAND="${HOME}/.jupyter_connect" # Setup the environment. source $HOME/.venvs/venv-activator.sh venvctl-rhea-app cd $HOME function finish { rm $COMMAND } if [ -f "$COMMAND" ]; then echo "A Jupyter server is already running." echo "See '$COMMAND' for details." exit 1 fi cat << EOF > $COMMAND #!/bin/bash # To open a tunnel to the notebook server/kernels running on the compute node, # issue the following command from your local machine: # # ssh -f -L 127.0.0.1:$CLIENT_PORT:127.0.0.1:$LOGIN_PORT $USER@rhea.ccs.ornl.gov $COMMAND # # Then, on your local machine, navigate to "http://127.0.0.1:$CLIENT_PORT" in # the browser of your choice. Use 'https' if the server is configured to use # TLS/SSL encryption. ssh -q -L 127.0.0.1:$LOGIN_PORT:127.0.0.1:$SERVER_PORT \ $HOSTNAME.ccs.ornl.gov sleep $PBS_WALLTIME EOF trap finish EXIT chmod a+x $COMMAND jupyter-notebook --no-browser --port=$SERVER_PORT --log-level='DEBUG' venv-activator.sh 0 → 100644 +107 −0 Original line number Diff line number Diff line #!/bin/bash # # This script provides shell function utilities for activating and # deactivating python virtual environments that have dependencies on Tkl # environment modules. # #------------------------------------------------------------------------------ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. #------------------------------------------------------------------------------ # function venvmodulectl () { # A single function to activate and deactivate a python virtualenv that has # prerequisite environment module dependencies. The function will apply a # sequence of environment module commands prior to activating a specific venv. # If the venv is already active, it will be deactivated and the sequence of # module commands will be reversed to undo the environment changes. # # USAGE: # venvmodulectl "/PATH/TO/VENV" "MODULE CMD" ["MODULE CMD"...] # # Where "/PATH/TO/VENV" points to the root of the virtual environment and # each "MODULE CMD" is a double-quoted string of instructions to `modulecmd` # of the forms: # "swap MODULEA MODULEB" or # "load MODULE1 MODULE2 ... MODULEN" # # The script will likely fail if the sequence of module commands conflicts # with the modules that are loaded when the function is first called. It is # intended that the sequence of module commands be chosen such that they are # applied from a clean login environment. declare _ENVNAME="$1" declare -a _COMMANDS for jj in "${2:+"${@:2}"}"; do _COMMANDS+=("$jj") done if [ -z "$_ENVNAME" -o -z "$_COMMANDS" ]; then echo "Could not interpret input." return fi declare _CMD if [ -z "$VIRTUAL_ENV" -a -z "$MYENV" ]; then # Load the modules. for _CMD in "${_COMMANDS[@]}"; do echo "module ${_CMD}" eval "module ${_CMD}" done # Activate the virtualenv. . "$_ENVNAME/bin/activate" # Keep track of what's been done. export MYENV="$_ENVNAME" elif [[ "$MYENV" == "$_ENVNAME" ]]; then # Deactivate the virtualenv. [ -n "$VIRTUAL_ENV" ] && deactivate # Unload the modules. The double loop is gross and inefficient # but more shell-agnostic and readable than other approaches. declare -a SDNAMMOC_ for _CMD in "${_COMMANDS[@]}"; do _CMD="$(echo ${_CMD} | sed 's/^lo\(.*\)$/unlo\1/' | \ awk '{ printf("%s ", $1) for (i=NF; i>2; i--) printf("%s ",$i) print $2 }')" SDNAMMOC_=("$_CMD" "${SDNAMMOC_[@]}") done for _CMD in "${SDNAMMOC_[@]}"; do echo "module $_CMD" eval "module $_CMD" done # Cleanup the environment. unset MYENV else echo "ERROR - Cannot alter $_ENVNAME environment:" [ -n "$VIRTUAL_ENV" ] && echo " $VIRTUAL_ENV already active." [ -n "$MYENV" ] && echo " Run script for $MYENV first" fi } # Usage Examples # ============== # Environement module requirements for a venv are unlikely to change often. # Therefore it makes sense to use the above function in other functions or # aliases crafted for specific venvs deployed on OLCF resources. # These examples assume a venv is located at $HOME/.venvs/rhea-app that was # built using the python/2.7.9 module on Rhea and has dependancies on the PE-gnu # and netcdf environment modules. # Within a shell function function venvctl-rhea-app () { declare -a COMMANDS COMMANDS=( "swap PE-intel PE-gnu" "load netcdf python/2.7.9" ) venvmodulectl "$HOME/.venvs/rhea-app" "${COMMANDS[@]}" } # Within an alias alias venvctl-rhea-app-alias="venvmodulectl "$HOME/.venvs/rhea-app" "swap PE-intel PE-gnu" "load netcdf python/2.7.9"" Loading
README.md 0 → 100644 +196 −0 Original line number Diff line number Diff line Custom Python installations on NCCS Resources ============================================= These are personal notes and scripts for installing and managing custom python installations on various NCCS resources. # Python Python is available through environment modules. The base `python` environment module (2016-09-01) provides either Python v2.7 or Python v3 and requires extra environment modules to be loaded to provide core extensions such as `pip` and `virtualenv` as well as center-provided builds of common packages like `numpy`. Python packages loaded from extra environment modules should supersede any that are provided in the base environment module because environment modules *prepend* packages to the `PYTHONPATH`. # Extending Python Python packages not available through environment modules can be installed by individual users. There are several methods by which users can extend Python for their needs, each having their own benefits and drawbacks. The principle considerations to choosing the best method are: 1. Will the package be used on Titan/Eos compute nodes? 2. Will the package be used by many users? * Will the package be used on different resources or under different environment modules? If yes: 3. Does the package provide compiled non-python binaries or shared objects? 4. Does the package depend on shared libraries from other environment modules? Questions (1) and (2) establish *on which filesystem* the package should be installed. Packages to be used on Cray compute nodes must reside on a filesystem that is visible to those nodes in a **virtual environment** or **alternate root**, nominally `/ccs/proj/${PROJECT_ID}` which is readable by the compute nodes and is not purged. Packages that will be shared by multiple users should be installed to an appropriately accessible **virtual environment** or **alternate root** such as a subdirectory of `/ccs/proj/${PROJECT_ID}`. Single-user packages that are not used on the Cray compute nodes may be installed under your home directory. In this case, if questions (3) and (4) do not apply or are 'no', then it can be installed to the standard **User install directory** which is automatically added to the python package search path. Questions (3) and (4) establish *how* the package should be installed if it will be used on different resources or under different runtime environments. Packages that provide non-python binaries or shared objects ('yes' to question (3)) cannot generally be assumed to produce architecture independent code that can be run on all OLCF resources or Cray node types. Simple cases, the package can be installed as a generically pre-compiled **python wheel** in the standard package search path. However, this may cause instruction errors at runtime on some resources. This is a major issue with distributions like *anaconda*, which prefers wheels, when run on Cray compute nodes. Likewise for question (4), CPU instruction sets are generally different for each system and in the case of Titan and Eos are different between the service nodes and the compute nodes. The available shared libraries also generally change between systems and CrayPE programming environments. Packages with specific runtime or architecture dependencies should be installed to either * a **virtual environment** that is activated in the appropriate environment, * an **alternate root** explicitly added to the PYTHONPATH when appropriate, * or provided by the OLCF as an environment module. Examples of such packages include optimized `numpy`, `mpi4py`, `h5py`, and `python-netcdf`. # Installing Packages Popular packages can generally be installed from online package indexes such as the Python Package Index, *PyPI*, using the tool `pip` (aka `pip2`) or `pip3` depending on which version of python is being used. These commands are added to your `PATH` when the base python environment module is loaded. Alternatively, many packages can be installed by running a `setup.py` script provided with the package. These scripts use a number of distribution tools that are provided with the core python installation. ## User install directory Packages can be easily installed to your **user install directory** (typically `$HOME/.local/lib/pythonV.v/site-packages`) using `pip install --user -v [--no-binary :all:] PACKAGE` where the optional flag `--no-binary` instructs `pip` to avoid pre-compiled binaries and wheels and instead compile any binaries for the current environment using the system compiler. Packages installed this way are known to the python interpreter without any extra setup. ## Virtual Environments A robust way to build a customized python stack is to build it from scratch in a **virtual environment**, which is sometimes shortened to *virtualenv* or simply *venv*. Virtual environments allow you to maintain a personal python stack that is fully under your control. To create a venv, load the base python environment module and issue the command `virtualenv [-p PYTHON] VENVPATH` which will create a clean Python distribution directory structure at any arbitrary virtual environment path `VENVPATH`. The optional flag `-p PYTHON` allows you to specify a specific python interpreter version for the venv to use. It is recommended to give your virtual environments clear names and organize them in standard locations. For example, given shared applications or projects named `foo` and `bar` which have environment-specific binaries and private apps `baz` and `widget`, one might choose the following virtual environment paths: ``` /ccs/proj/<PROJECTID>/venvs/titan-pgi-foo /ccs/proj/<PROJECTID>/venvs/titan-intel-foo /ccs/proj/<PROJECTID>/venvs/rhea-bar /home/$USER/.venvs/baz /home/$USER/.venvs/widget ``` To use a venv, it must by *activated* by sourcing: `. VENVPATH/bin/activate` A venv can be *de-activated* by calling a shell function `deactivate` This function is created when the venv is activated. It is important that environment modules are not changed while a venv is activated. Any environment modules that are dependencies of packages installed in the venv **must be loaded prior to both creating and subsequently activating** the virtual environment. An active venv must be de-activated before making any changes to the loaded environment modules. While a venv is active, the `python` interpreter used will be the one installed in the virtual environment path. Likewise, all packages installed to the "system" site-packages directory, for instance using: `pip install -v [--no-binary :all:] PACKAGE` will in fact be installed under the virtual environment site-packages. This allows you, as an unprivileged user, to install any package you like using `pip` or `setup.py` into a customized python stack. It is typically necessary to install all packages that you would like to use into the virtualenv. In this way, it is possible to create a customized python stack for each resource or programming environment which is optimized with potentially architecture specific binaries and any extra python packages that are not available in the base distribution. ### Library links in `$VIRTUAL_ENV/lib` Shared libraries that are not made available through environment modules can be linked to within `$VIRTUAL_ENV/lib`. ### Enhancing `activate` Adventurous users can add additional commands to `activate`, if they wish. This is dangerous as parts of the script are called multiple times during activation. A safer alternative is to write a source-able script to activate the environment. See `venv_activator.sh` in this repo, for instance. ## Alternate Roots It is possible to install most python packages to any location that you like and make them available by manually adding them to your `PYTHONPATH`. ## Anaconda The author does not like Anaconda primarily because it conflicts with the system python, python inadvertently loaded from environment modules, and tends to favor pre-compiled binaries that can cause runtime errors when used on Crays. However, it can be used successfully on Rhea. If additional packages are needed by a user, an Anaconda virtual environment (clone) should be made in a directory where the user has write permissions. ## The Nuclear Option: Installing a core Python stack directly. For when a virtualenv just isn't enough, users can build Python (including dual Python2+Python3 deployments) directly from source in a directory of their choosing. All that needs to be done to use it is to add the relevant parts of the install to the `PATH`, `LD_LIBRARY_PATH`, and possibly `PYTHONPATH` variables. To keep these changes from conflicting with modifications made by environment modules, it is recommended to construct and use a modulefile to enable the custom stack under the module name `python`. See `build_raw_python.sh` for an overview of what is involved to deploy a custom python stack from source.
build_raw_python.sh 0 → 100755 +121 −0 Original line number Diff line number Diff line #!/bin/bash # Installs python2 and python3 side-by-side in a customized directory $TOPDIR. # Installation includes pip, virtualenv, and core packages for an-all-in-one # useful python stack. PY2_VER="2.7.12" PY3_VER="3.5.2" WHEEL_EXTRAS=(nose PyYAML jsonschema pep8 argcomplete psutil ) COMPILED_EXTRAS=(numpy cython matplotlib ipython pandas sympy ) declare -a GROUPS for g in $(groups | grep -m 1 -oE "\<[a-z]{3}[0-9]{3}\>"); do GROUPS+=("$g") done echo "Under which project do you want to install Python?" select grp in ${GROUPS[@]}; do break; done TOPDIR="/ccs/proj/$grp/opt/python" BUILD="$TOPDIR/build" USR="$TOPDIR/usr" PY2LOG="$TOPDIR/py2_build.log" PY3LOG="$TOPDIR/py3_build.log" notify () { if [ $1 -gt 0 ]; then printf "[FAILED]\n" else printf "[ OK ]\n" fi } echo "Removing existing installation" rm -fr $TOPDIR mkdir -p $BUILD $USR # FIXME: Installation group and permissions should be considered. It may be # prudent to set the group sticky bit. cd $BUILD echo "Obtaining source files" echo "===============" echo "Python2" wget https://www.python.org/ftp/python/${PY2_VER}/Python-${PY2_VER}.tgz echo "Python3" wget https://www.python.org/ftp/python/${PY3_VER}/Python-${PY3_VER}.tar.xz echo "Pip boostrap script" curl -O https://bootstrap.pypa.io/get-pip.py printf "===============\n\n" printf "%-41s" "Installing Python2" cd $BUILD tar xf Python-${PY2_VER}.tgz >> $PY2LOG 2>&1 cd $BUILD/Python-${PY2_VER} ./configure --prefix=$TOPDIR/usr --enable-shared >> $PY2LOG 2>&1 make >> $PY2LOG 2>&1 make install >> $PY2LOG 2>&1 notify $? printf "%-41s" "Installing Python3" cd $BUILD tar xf Python-${PY3_VER}.tar.xz >> $PY3LOG 2>&1 cd $BUILD/Python-${PY3_VER} ./configure --prefix=$TOPDIR/usr --enable-shared >> $PY3LOG 2>&1 make >> $PY3LOG 2>&1 make install >> $PY3LOG 2>&1 notify $? export LD_LIBRARY_PATH="$USR/lib:$LD_LIBRARY_PATH" export PATH="$USR/bin:$PATH" cd $BUILD printf "%-41s" "Installing pip3" python3 get-pip.py >> $PY3LOG 2>&1 # Install pip3 first, so that notify $? printf "%-41s" "Installing pip2" python get-pip.py >> $PY2LOG 2>&1 # pip2 overwrites default `pip` notify $? printf "%-41s" "Installing virtualenv for Python2" # Version Specific Packages ## Python3 provides pyvenv pip2 install -v virtualenv >> $PY2LOG 2>&1 notify $? printf "\nInstalling extra packages:\n" # Install packages for python3 first so that python2 packages writing binaries # with version-less names (like 'ipython' as opposed to 'ipython2') use python2 for pipx in $(which pip3) $(which pip2); do case "$(basename ${pipx})" in pip2) log="$PY2LOG"; name="python2" ;; pip3) log="$PY3LOG"; name="python3" ;; *) exit 1 esac for package in ${WHEEL_EXTRAS[@]}; do printf " %s: %-30s" $name $package $pipx install -v $package >> $log 2>&1 notify $? done for package in ${COMPILED_EXTRAS[@]}; do printf " %s: %-30s" $name $package $pipx install -v --no-use-wheel $package >> $log 2>&1 notify $? done done printf "\nFinished! See build logs\n python2: $PY2LOG\n python3: $PY3LOG\nfor details.\n\n"
build_virtualenv.sh 0 → 100755 +48 −0 Original line number Diff line number Diff line #!/bin/bash declare -a GROUPS for g in $(groups | grep -m 1 -oE "\<[a-z]{3}[0-9]{3}\>"); do GROUPS+=("$g") done echo "Under which project do you want to install Python?" select grp in ${GROUPS[@]}; do break; done # Setup all the optional paths. VENV_DIR="/ccs/proj/$grp/.venvs" VENV="$VENV_DIR/rhea-pyms" TMPDIR=/tmp/$USER/venvbuild # Set the environment. # !!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! # ENVIRONMENT MODULE CHANGES CANNOT BE MADE INSIDE AN ACTIVE VIRTUALENV module swap PE-intel PE-gnu module load netcdf python/2.7.9 python_virtualenv/12.0.7 # Make the necessary directories mkdir -p $VENV_DIR $TMPDIR # Make and activate a virtualenv for this python stack virtualenv $VENV source $VENV/bin/activate # Link against RHEL atlas - this is kind of gross. These are probably the least # optimized lapack/blas implementations I've ever seen... for i in $(ls /usr/lib64/atlas/*.so.3); do BASELIB=${i##*atlas/} ln -s $i $VIRTUAL_ENV/lib/${BASELIB%%.3} done # Install indexed dependancies outright. pip install --upgrade pip pip install -v --no-binary :all: numpy pip install -v --no-binary :all: scipy pip install -v jupyter matplotlib nose mock ipyparallel mpi4py # Install non-indexed packages manually. cd $TMPDIR wget -O pycdf-0.6-3b.tar.gz "http://downloads.sourceforge.net/project/pysclint/pycdf/pycdf-0.6.3b/pycdf-0.6-3b.tar.gz?r=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fpysclint%2Ffiles%2Fpycdf%2Fpycdf-0.6.3b%2F&ts=1471285723&use_mirror=superb-sea2" tar xf pycdf-0.6-3b.tar.gz cd pycdf-0.6-3b python setup.py install
jupyter-on-rhea.pbs 0 → 100644 +55 −0 Original line number Diff line number Diff line #!/bin/bash -l #PBS -A FIXME #PBS -q batch #PBS -l walltime=48:00:00,nodes=1 #PBS -o jupyter.log #PBS -j oe # Setup all the optional paths. # WORK defined in user's bashrc VENV_DIR="$HOME/.venvs" VENV="$VENV_DIR/rhea-pyms" # Change the login and client ports to suitable values. # Be aware your preferred login port may be in use by other users. A login port # used by another project will cause dire confusion at runtime. CLIENT_PORT=8080 LOGIN_PORT=XXXXX # FIXME: Choose a *RANDOM* unused port number in the range 10k-64k. SERVER_PORT=8082 COMMAND="${HOME}/.jupyter_connect" # Setup the environment. source $HOME/.venvs/venv-activator.sh venvctl-rhea-app cd $HOME function finish { rm $COMMAND } if [ -f "$COMMAND" ]; then echo "A Jupyter server is already running." echo "See '$COMMAND' for details." exit 1 fi cat << EOF > $COMMAND #!/bin/bash # To open a tunnel to the notebook server/kernels running on the compute node, # issue the following command from your local machine: # # ssh -f -L 127.0.0.1:$CLIENT_PORT:127.0.0.1:$LOGIN_PORT $USER@rhea.ccs.ornl.gov $COMMAND # # Then, on your local machine, navigate to "http://127.0.0.1:$CLIENT_PORT" in # the browser of your choice. Use 'https' if the server is configured to use # TLS/SSL encryption. ssh -q -L 127.0.0.1:$LOGIN_PORT:127.0.0.1:$SERVER_PORT \ $HOSTNAME.ccs.ornl.gov sleep $PBS_WALLTIME EOF trap finish EXIT chmod a+x $COMMAND jupyter-notebook --no-browser --port=$SERVER_PORT --log-level='DEBUG'
venv-activator.sh 0 → 100644 +107 −0 Original line number Diff line number Diff line #!/bin/bash # # This script provides shell function utilities for activating and # deactivating python virtual environments that have dependencies on Tkl # environment modules. # #------------------------------------------------------------------------------ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. #------------------------------------------------------------------------------ # function venvmodulectl () { # A single function to activate and deactivate a python virtualenv that has # prerequisite environment module dependencies. The function will apply a # sequence of environment module commands prior to activating a specific venv. # If the venv is already active, it will be deactivated and the sequence of # module commands will be reversed to undo the environment changes. # # USAGE: # venvmodulectl "/PATH/TO/VENV" "MODULE CMD" ["MODULE CMD"...] # # Where "/PATH/TO/VENV" points to the root of the virtual environment and # each "MODULE CMD" is a double-quoted string of instructions to `modulecmd` # of the forms: # "swap MODULEA MODULEB" or # "load MODULE1 MODULE2 ... MODULEN" # # The script will likely fail if the sequence of module commands conflicts # with the modules that are loaded when the function is first called. It is # intended that the sequence of module commands be chosen such that they are # applied from a clean login environment. declare _ENVNAME="$1" declare -a _COMMANDS for jj in "${2:+"${@:2}"}"; do _COMMANDS+=("$jj") done if [ -z "$_ENVNAME" -o -z "$_COMMANDS" ]; then echo "Could not interpret input." return fi declare _CMD if [ -z "$VIRTUAL_ENV" -a -z "$MYENV" ]; then # Load the modules. for _CMD in "${_COMMANDS[@]}"; do echo "module ${_CMD}" eval "module ${_CMD}" done # Activate the virtualenv. . "$_ENVNAME/bin/activate" # Keep track of what's been done. export MYENV="$_ENVNAME" elif [[ "$MYENV" == "$_ENVNAME" ]]; then # Deactivate the virtualenv. [ -n "$VIRTUAL_ENV" ] && deactivate # Unload the modules. The double loop is gross and inefficient # but more shell-agnostic and readable than other approaches. declare -a SDNAMMOC_ for _CMD in "${_COMMANDS[@]}"; do _CMD="$(echo ${_CMD} | sed 's/^lo\(.*\)$/unlo\1/' | \ awk '{ printf("%s ", $1) for (i=NF; i>2; i--) printf("%s ",$i) print $2 }')" SDNAMMOC_=("$_CMD" "${SDNAMMOC_[@]}") done for _CMD in "${SDNAMMOC_[@]}"; do echo "module $_CMD" eval "module $_CMD" done # Cleanup the environment. unset MYENV else echo "ERROR - Cannot alter $_ENVNAME environment:" [ -n "$VIRTUAL_ENV" ] && echo " $VIRTUAL_ENV already active." [ -n "$MYENV" ] && echo " Run script for $MYENV first" fi } # Usage Examples # ============== # Environement module requirements for a venv are unlikely to change often. # Therefore it makes sense to use the above function in other functions or # aliases crafted for specific venvs deployed on OLCF resources. # These examples assume a venv is located at $HOME/.venvs/rhea-app that was # built using the python/2.7.9 module on Rhea and has dependancies on the PE-gnu # and netcdf environment modules. # Within a shell function function venvctl-rhea-app () { declare -a COMMANDS COMMANDS=( "swap PE-intel PE-gnu" "load netcdf python/2.7.9" ) venvmodulectl "$HOME/.venvs/rhea-app" "${COMMANDS[@]}" } # Within an alias alias venvctl-rhea-app-alias="venvmodulectl "$HOME/.venvs/rhea-app" "swap PE-intel PE-gnu" "load netcdf python/2.7.9""