Adds troubleshooting and better security. (8fb0321e) · Commits · Belhorn, Matt / nccs_python_reference

jupyter-on-rhea.pbs

+13 −10

Original line number	Diff line number	Diff line
		@@ -5,10 +5,6 @@
		#PBS -o jupyter.log
		#PBS -j oe

		# Setup all the optional paths.
		# WORK defined in user's bashrc
		VENV_DIR="$HOME/.venvs"
		VENV="$VENV_DIR/rhea-pyms"

		# Change the login and client ports to suitable values.
		# Be aware your preferred login port may be in use by other users. A login port
		@@ -18,9 +14,14 @@ LOGIN_PORT=XXXXX # FIXME: Choose a RANDOM unused port number in the range 10k-
		SERVER_PORT=8082
		COMMAND="${HOME}/.jupyter_connect"

		# Setup the environment.
		source $HOME/.venvs/venv-activator.sh
		venvctl-rhea-app
		# Setup the environment. This block assumes the use of a python virtualenv under
		# which Jupyter and all the python packages needed for your work is installed.
		# It also assumes there is a script `$VENV/bin/setup_environment_modules` that
		# has all the environment module commands needed by packages installed to the
		# virtualenv. Change this block as needed.
		VENV="$HOME/.venvs/jupyter-on-rhea"
		source $VENV/bin/setup_environment_modules
		source $VENV/bin/activate

		cd $HOME

		@@ -41,9 +42,9 @@ cat << EOF > $COMMAND
		#
		# ssh -f -L 127.0.0.1:$CLIENT_PORT:127.0.0.1:$LOGIN_PORT $USER@rhea.ccs.ornl.gov $COMMAND
		#
		# Then, on your local machine, navigate to "http://127.0.0.1:$CLIENT_PORT" in
		# the browser of your choice. Use 'https' if the server is configured to use
		# TLS/SSL encryption.
		# Then, on your local machine, navigate to "https://127.0.0.1:$CLIENT_PORT" in
		# the browser of your choice. It is ill-advised to leave your server unencrypted,
		# but replace 'https' with 'http' in the URL if you have not setup TLS/SSL encryption.

		ssh -q -L 127.0.0.1:$LOGIN_PORT:127.0.0.1:$SERVER_PORT \
		$HOSTNAME.ccs.ornl.gov sleep $PBS_WALLTIME
		@@ -52,4 +53,6 @@ EOF
		trap finish EXIT
		chmod a+x $COMMAND

		# Change the log-level to a preferred value. DEBUG is useful for troubleshooting
		# new server deployments.
		jupyter-notebook --no-browser --port=$SERVER_PORT --log-level='DEBUG'

jupyter_on_rhea.md

+107 −63

Original line number	Diff line number	Diff line
		@@ -2,38 +2,56 @@ Setting up Jupyter on Rhea
		==========================

		The following procedure can be used to run Jupyter server instances on Rhea
		while allowing connections to them from a local (i.e. a laptop) browser. This
		is a band-aid procedure in leiu of dedicated infrastructure for spinning up
		while allowing connections to them from a local (i.e. a laptop) browser.

		This is a band-aid procedure in leiu of dedicated infrastructure for spinning up
		Jupyter instances at the OLCF. Ideally we would offer a host running a
		JupyterHub service where users could spin up secured, private Jupyter servers
		that offload work transparently to dynamically started ipycluster backend jobs
		through the batch system.

		If you like this idea, please mention it in the OLCF User Survey - with enough
		voices and support behind it we could push to aquire the necessary
		infrastructure.
		If you like the above idea, please mention it in the OLCF User Survey. With enough
		voices of support, we could push to aquire the necessary infrastructure.

		Meanwhile
		---------

		The jupyter compute kernels should be run on reserved batch nodes (ie, not
		shared login nodes where they can be killed without warning) and the web
		browser used to access the notebook interface is best run on your local machine
		as you are already used to.
		The Jupyter compute kernels should be run on reserved batch nodes. That is to
		say not on the shared login nodes. Compute/resource intensive processes on
		the login nodes can be killed without warning. The web browser used to access
		the notebook interface is best run on your local machine, so the connection to
		the notebook server must be tunneled out of the compute nodes.

		## Installation

		The `jupyter-on-rhea.pbs` batch script launches a jupyter notebook server on a
		single batch node and sets up a script to create the necessary SSH tunnel to
		access it. In order to work, you will need to have jupyter installed somewhere
		in your PYTHONPATH. This can either be in `/ccs/proj/...` or (for Rhea) simply
		in your home directory using pip:
		The [jupyter-on-rhea.pbs](jupyter-on-rhea.pbs) batch script in this repo
		launches a Jupyter server on a single batch node and sets up a script to create
		the necessary SSH tunnel to access it. In order to work, you will need to have
		Jupyter installed somewhere in your PYTHONPATH. This can either be in
		`/ccs/proj/...` or, on Rhea, simply in your user site-packages directory. It is
		recommended that you use a virtualenv or alternate Python install to manage your
		Python environment for this app.

		```bash
		$ module load python/2.7.9 python_setuptools python_pip
		$ pip install --user jupyter
		$ module load python/2.7.9 python_virtualenv
		$ virtualenv $MYVENVPATH # install wherever you prefer
		$ . $MYVENVPATH/bin/activate
		(myvenv)$ pip --trusted-host pypi.python.org pip -U pip
		(myvenv)$ pip install --user jupyter
		```

		You should then create a skeleton configuration file and set an access password
		(see the caveats below for an explanation):
		The rest these instructions assume you will have the installed Jupyter package
		in your PATH either through an activated virtualenv or other scheme.

		## Secure the Server

		The connection to the server is not encrypted nor password protected by
		default. As anyone can SSH to any node on Rhea, it is possible for other users
		to connect to your notebook server and then **generate and run code as your
		user if left unsecured!**

		To harden the server, create a skeleton configuration file and **set an access
		password**:

		```bash
		$ jupyter notebook --generate-config
		@@ -46,77 +64,71 @@ c.NotebookApp.password = 'sha1:123:some:456:secret:789:password:012:hash:3456789
		add the output hashed password line to the profile config file (typically
		`$HOME/.jupyter/jupyter_notebook_config.py`)

		Starting the server is done by launching the PBS script with appropriate PBS
		options:
		Now that access is password protected, you must **encrypt all communication
		traffic**, otherwise someone can simply intercept the unencrypted stream and
		read your data and passwords. To enable encryption, generate a self-signed x509
		server certificate/key pair. The DN data used is mostly arbitrary, but you must
		use `127.0.0.1` for the hostname/server name as modern browsers will reject the
		certificate if it does not match the URL that will be used to access the
		notebook server:

		```bash
		$ qsub jupyter-on-rhea.pbs
		$ openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout ~/.jupyter/mykey.key -out ~/.jupyter/mycert.pem
		$ chmod go= ~/.jupyter/mykey.key
		```

		The job will place an executable at `$HOME/.jupyter_connect` which contains
		instructions on how to attach to the server. Typically you would issue a
		variation of
		and set the options in `$HOME/.jupyter/jupyter_notebook_config.py`:

		```bash
		ssh -f -L 127.0.0.1:8080:127.0.0.1:8081 $USER@rhea.ccs.ornl.gov /ccs/home/$USER/.jupyter_connect
		```python
		c.NotebookApp.certfile = '/ccs/home/$USER/.jupyter/mycert.pem'
		c.NotebookApp.keyfile = '/ccs/home/$USER/.jupyter/mykey.pem'
		```

		on your local workstation and direct your local browser to `http://127.0.0.1:8080`.

		When using TLS encryption, you must explicitly use `https` instead of `http` in
		the URL used to access the server.

		## Caveats

		### Security
		### Running the Server

		The connection to the server is not encrypted nor password protected by
		default.

		As anyone can SSH to any node on Rhea, it is possible for other users
		to connect to your notebook server and then generate and run code as your user
		if left unsecured. It is a good idea to setup TLS/SSL encryption if you are
		concerned about the possibility of someone sniffing data packets sent to the
		notebook server. To enable encryption, generate a self-signed x509 server
		certificate/key pair. The DN data used is mostly arbitrary, but you must use
		`127.0.0.1` for the hostname/server name as modern browsers will reject the
		certificate if it does not match the URL in the browser:
		Starting the server is done by launching the PBS script with appropriate PBS
		options:

		```bash
		$ openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout ~/.jupyter/mykey.key -out ~/.jupyter/mycert.pem
		$ chmod go= ~/.jupyter/mykey.key
		$ qsub jupyter-on-rhea.pbs
		```

		and set the options in `$HOME/.jupyter/jupyter_notebook_config.py`:
		The job will place an executable at `$HOME/.jupyter_connect` which contains
		instructions on how to attach to the server. Typically, you must issue a
		variation of

		```python
		c.NotebookApp.certfile = '/ccs/home/$USER/.jupyter/mycert.pem'
		c.NotebookApp.keyfile = '/ccs/home/$USER/.jupyter/mykey.pem'
		```bash
		ssh -f -L 127.0.0.1:8080:127.0.0.1:8081 $USER@rhea.ccs.ornl.gov /ccs/home/$USER/.jupyter_connect
		```

		If you add TLS encryption (you should), you must connect using `https://127.0.0.1:8080`
		instead of `http`.
		on your local workstation and direct your local browser to `http://127.0.0.1:8080`.

		### MPI Capabilities and Ipyparallel
		## MPI Capabilities and Ipyparallel

		The kernels all run on a single node. It is possible to extend this
		setup to use the ipython cluster 'ipcluster' backend and the $PBS_NODEFILE to
		allow the kernel to run parallel tasks. See the notebook
		`interactive_notebooks_with_mpi_on_rhea.ipynb` in this repo for instructions on
		setting up an ipycluster backend.
		Setup as per the above instructions, the kernels all run on a single node. It is
		possible to extend this setup to use the ipython cluster ipcluster backend and
		the `$PBS_NODEFILE` to allow the kernel to run parallel tasks. See the notebook
		[interactive_notebooks_with_mpi_on_rhea.ipynb](interactive_notebooks_with_mpi_on_rhea.ipynb)
		in this repo for instructions on setting up an ipycluster backend.

		### Server Uptime
		## Server Uptime

		The server is killed at least every 48 hours so you will want to make sure
		your work is saved often. You can add a line like:
		The server is killed at least every 48 hours so you will want to **make sure
		your work is saved often**. You can add a line like:

		```bash
		qsub -W depend=afternotok:$PBS_JOBID ok_jupyter.pbs
		```

		near the top of `ok_jupyter.pbs` to resubmit the job automatically to keep a
		server up, but you will still need to re-establish the tunnel each time it goes
		down.
		near the top of the `jupyter-on-rhea.pbs` batch script to resubmit the job
		automatically to keep a server up, but you will still need to re-establish the
		tunnel each time it goes down.

		### Allocation Consumption
		## Allocation Consumption

		This does consume your Rhea allocation so just keeping the server up and not
		using it to crunch numbers is wasteful. It is perhaps the best practice to
		@@ -124,10 +136,42 @@ do interactive development work on a local jupyter instance and then run a
		dedicated python script in a batch job to make the most efficient use of your
		allocation.

		### Custom Re-configuration
		## Custom Re-configuration

		Any of the configuration details should be tuned to your needs.
		Specifically, the ports may need to be different for your case. You may want to
		change 'c.NotebookManager.notebook_dir' to use a different path then the default
		so as to keep your toplevel $HOME directory tidy.

		## Troubleshooting

		Issues with this approach that could be fixed but have not yet been addressed
		include:

		1. Cannot connect through local `CLIENT_PORT`
		* Check that a server job is actually running with `showq -u $USER`.
		* Check that you are using the `https` prefix in your URL if using TLS encryption.
		* Check that you only have one tunnel open on the local machine. The tunnels
		are supposed to close when the connection is broken, but if they hang open,
		new tunnels will be assigned different ports than `CLIENT_PORT`. There is a
		fix for this, but I have not documented it...

		1. Login password is rejected even though it is entered correctly.
		* This is a symptom of multiple users attempting to access different servers
		through the same login node and port number. The connection will succeed
		but the password will be rejected because the server you are accessing is
		someone else's. Even though the login screen will look the same, you can
		use the TLS certificate information in your browser to verify if you are
		interacting with your server (ie, it is using your certificate) or someone
		else's. If this happens, use a different random LOGIN_PORT in the batch
		script.

		1. Cannot start a new server jobs even though no server job is running.
		* New servers won't start while a `$HOME/.jupyter_connect` script exists.
		This script is used as a lockfile, but is sometimes not removed when the
		last server job exits. Verify that no server job is running using `showq -u
		$USER` and if none is running, simply delete `$HOME/.jupyter_connect` to
		allow a new server job to start. This is a symptom of the crude method of
		creating and clearing lockfiles used by this technique and should be
		improved.