diff --git a/.gitignore b/.gitignore index 26dcf87f8d574bc8f9efe78ee9e43f9c647ec0cf..31a3d855b01d91c45e4d05db543b28cde5621ee6 100644 --- a/.gitignore +++ b/.gitignore @@ -27,3 +27,4 @@ toolbox dependencies dependency_resolvers_conf.xml job_metrics_conf.xml +.DS_Store diff --git a/docs/files/job_conf_sample_mq.xml b/docs/files/job_conf_sample_mq.xml index 3978a32b0eb2f559c6851870dcfb03c92e9f8270..b3b85761d3fdf8250ac2356c592837b185aae0e0 100644 --- a/docs/files/job_conf_sample_mq.xml +++ b/docs/files/job_conf_sample_mq.xml @@ -18,8 +18,8 @@ <param id="native_specification">-P littlenodes -R y -pe threads 4</param> </destination> <destination id="remote_cluster" runner="pulsar"> - <!-- Tell Galaxy where files are being store on remote system, no - web server it can simply ask for this information. + <!-- Tell Galaxy where files are being stored on remote system, so + the web server can simply ask for this information. --> <param id="jobs_directory">/path/to/remote/pulsar/files/staging/</param> <!-- Remaining parameters same as previous example --> diff --git a/docs/galaxy_conf.rst b/docs/galaxy_conf.rst index 2f59164c80eb9567d9c669d5cfffa0bf1bfc6f28..19046bdc72e77ce0d06562ad540455bee422f881 100644 --- a/docs/galaxy_conf.rst +++ b/docs/galaxy_conf.rst @@ -9,8 +9,8 @@ Examples The most complete and updated documentation for configuring Galaxy job destinations is Galaxy's ``job_conf.xml.sample_advanced`` file (check it out on -`Bitbucket -<https://bitbucket.org/galaxy/galaxy-dist/src/tip/job_conf.xml.sample_advanced?at=default>`_). +`GitHub +<https://github.com/galaxyproject/galaxy/blob/dev/config/job_conf.xml.sample_advanced>`_). These examples just provide a different Pulsar-centric perspective on some of the documentation in that file. @@ -39,34 +39,34 @@ files requiring the use of the Pulsar. This variant routes some larger assembly jobs to the remote cluster - namely the `trinity` and `abyss` tools. Be sure the underlying applications required by the ``trinity`` and ``abyss`` tools are on the Pulsar path or set ``tool_dependency_dir`` in ``app.yml`` and setup -Galaxy env.sh-style packages definitions for these applications). +Galaxy env.sh-style packages definitions for these applications. .. literalinclude:: files/job_conf_sample_remote_cluster.xml :language: xml -For this configuration, on the Pulsar side be sure to set a +For this configuration, on the Pulsar side be sure to also set a ``DRMAA_LIBRARY_PATH`` in ``local_env.sh``, install the Python ``drmaa`` -module, and configure a DRMAA job manager (example ``job_managers.ini`` -follows). +module, and configure a DRMAA job manager for Pulsar in ``job_managers.ini`` as +follows: .. literalinclude:: files/job_managers_sample_remote_cluster.ini Targeting a Linux Cluster (Pulsar over Message Queue) ````````````````````````````````````````````````````` -For Pulsar instances sitting behind a firewall a web server may be impossible. If +For Pulsar instances sitting behind a firewall, a web server may be impossible. If the same Pulsar configuration discussed above is additionally configured with a ``message_queue_url`` of ``amqp://rabbituser:rabb8pa8sw0d@mqserver:5672//`` in -``app.yml`` the following Galaxy configuration will cause this message +``app.yml``, the following Galaxy configuration will cause this message queue to be used for communication. This is also likely better for large file transfers since typically your production Galaxy server will be sitting behind -a high-performance proxy but not the Pulsar. +a high-performance proxy while Pulsar will not. .. literalinclude:: files/job_conf_sample_mq.xml :language: xml For those interested in this deployment option and new to Message Queues, there -is more documentation in :ref:`gx-pulsar-mq-setup` +is more documentation in :ref:`gx-pulsar-mq-setup`. Additionally, Pulsar now ships with an RSync and SCP transfer action rather than making use of the HTTP transport method. @@ -89,35 +89,37 @@ document how here). Etc... `````` -There are many more options for configuring what paths get staging/unstaged -how, how Galaxy metadata is generated, running jobs as the real user, defining +There are many more options for configuring what paths get staged/unstaged, +how Galaxy metadata is generated, running jobs as the real user, defining multiple job managers on the Pulsar side, etc.... If you ever have any questions -please don't hesistate to ask John Chilton (jmchilton@gmail.com). +please don't hesitate to ask John Chilton (jmchilton@gmail.com). -File Actions +Data Staging ------------ Most of the parameters settable in Galaxy's job configuration file -``job_conf.xml`` are straight forward - but specifing how Galaxy and the Pulsar -stage various files may benefit from more explaination. +``job_conf.xml`` are straight forward - but specifying how Galaxy and the Pulsar +stage various files may benefit from more explanation. -As demonstrated in the above ``default_file_action`` describes how inputs, -outputs, etc... are staged. The default ``transfer`` has Galaxy initiate HTTP -transfers. This makes little sense in the context of message queues so this -should be overridden and set to ``remote_transfer`` which causes the Pulsar to -initiate the file transfers. Additional options are available including -``none``, ``copy``, and ``remote_copy``. +``default_file_action`` defined in Galaxy's `job_conf.xml` describes how +inputs, outputs, indexed reference data, etc... are staged. The default +``transfer`` has Galaxy initiate HTTP transfers. This makes little sense in the +context of message queues so this should be set to ``remote_transfer``, which +causes Pulsar to initiate the file transfers. Additional options are available +including ``none``, ``copy``, and ``remote_copy``. In addition to this default - paths may be overridden based on various -patterns to allow optimization of file transfers in real production +patterns to allow optimization of file transfers in production infrastructures where various systems mount different file stores and file stores with different paths on different systems. -To do this, the Pulsar destination in ``job_conf.xml`` may specify a parameter -named ``file_action_config``. This needs to be some config file path (if -relative, relative to Galaxy's root) like ``pulsar_actions.yaml`` (can be YAML or JSON - but older Galaxy's only supported JSON). +To do this, the defined Pulsar destination in Galaxy's ``job_conf.xml`` may +specify a parameter named ``file_action_config``. This needs to be a config +file path (if relative, relative to Galaxy's root) like +``config/pulsar_actions.yaml`` (can be YAML or JSON - but older Galaxy's only +supported JSON). The following captures available options: .. literalinclude:: files/file_actions_sample_1.yaml :language: yaml diff --git a/docs/upgrading.rst b/docs/upgrading.rst index 6b775fcd0807d9a5bcaea17c814f35754e341262..ee7c8a465b4446c1ca6d1ed61c362fd69a8823f5 100644 --- a/docs/upgrading.rst +++ b/docs/upgrading.rst @@ -7,7 +7,7 @@ Pulsar was born out of the poorly named `LWR <https://usegalaxyp.org/>`_ project. This section outlines broadly how to upgrade from an LWR server to a Pulsar one. -The tenative plan is to allow Galaxy to support both targets for +The tentative plan is to allow Galaxy to support both targets for sometime - but at some point LWR servers should be upgraded to the Pulsar servers. diff --git a/pulsar/client/__init__.py b/pulsar/client/__init__.py index a42c60173cef27dd2005ffaa0ff5bd10f2cb401d..951cb3c7c3a2a0ca82e13ee74cd941c3e0151895 100644 --- a/pulsar/client/__init__.py +++ b/pulsar/client/__init__.py @@ -10,7 +10,7 @@ Configuring Galaxy Galaxy job runners are configured in Galaxy's ``job_conf.xml`` file. See ``job_conf.xml.sample_advanced`` in your Galaxy code base or on -`Bitbucket <https://bitbucket.org/galaxy/galaxy-dist/src/tip/config/job_conf.xml.sample_advanced?at=default>`_ +`Github <https://github.com/galaxyproject/galaxy/blob/dev/config/job_conf.xml.sample_advanced>`_ for information on how to configure Galaxy to interact with the Pulsar. Galaxy also supports an older, less rich configuration of job runners directly diff --git a/pulsar/client/client.py b/pulsar/client/client.py index d6b6d68066d27c5259137976b13a69b44f2e5d00..ae539906f6f627692773c28481eefca08ef0efa9 100644 --- a/pulsar/client/client.py +++ b/pulsar/client/client.py @@ -184,7 +184,7 @@ class JobClient(BaseJobClient): if action_type in ['transfer', 'message']: if isinstance(contents, string_types): contents = contents.encode("utf-8") - message = "Uplodaing path [%s] (action_type: [%s])" + message = "Uploading path [%s] (action_type: [%s])" log.debug(message, path, action_type) return self._upload_file(args, contents, input_path) elif action_type == 'copy':