Create Pulsar "pull" and "push" hooks that handle remote data management to a remote data management API service

As a User I want to have remote data management enabled at the Pulsar instance to pull and push data to the remote data management AP so that we can handle "big data" created for jobs and move them to other resources that a workflow will utilize.

Parent Children

Acceptance Criteria:

  • Have tool input be a JSON file that has a "pointer" to data location and/or metadata about the dataset
  • The tool will run "remote" via Pulsar
  • Pulsar will use this input JSON file to pull the data from a remote data management API service and put it in a location accessible for the tool
    • Prepare CliTokenQueueManager Manager to also handle remote data functions; refactor to put token-specific code in launch() method into separate methods
    • Rename CliTokenQueueManager to a more general manager term
    • Similar to how we get the output data, determine how to access the input data from Galaxy in the Manager
    • At execution, make REST API calls to the remote data management API to download the remote input data
  • The tool will perform some operation on the data (i.e. append a timestamp to the data)
  • The tool will save the output file, a JSON file that has a "pointer" to the new data location (not yet pushed)
  • Pulsar will push the modified data back to the remote data management via the remote data API

Description, Additional Detail, Context:

The remote data manager REST API work is tracked in this Issue: https://code.ornl.gov/ndip/remote-data-managers/-/issues/1

Will use the https://code.ornl.gov/ndip/galaxy-pulsar-docker-compose project to create the testbed environment for this story.

Will use our Rucio instance for first remote data management backend.

Need to use the Rucio client (our docker file is found here) in remote data management service to perform the transfers.

Edited by McDonnell, Marshall