Create Pulsar "pull" and "push" hooks that handle remote data management to a remote data management API service
As a User I want to have remote data management enabled at the Pulsar instance to pull and push data to the remote data management AP so that we can handle "big data" created for jobs and move them to other resources that a workflow will utilize.
Parent | Children |
---|---|
Acceptance Criteria:
-
Have tool input be a JSON file that has a "pointer" to data location and/or metadata about the dataset -
The tool will run "remote" via Pulsar -
Pulsar will use this input JSON file to pull the data from a remote data management API service and put it in a location accessible for the tool -
Prepare CliTokenQueueManager
Manager to also handle remote data functions; refactor to put token-specific code inlaunch()
method into separate methods -
Rename CliTokenQueueManager
to a more general manager term -
Similar to how we get the output data, determine how to access the input data from Galaxy in the Manager -
At execution, make REST API calls to the remote data management API to download the remote input data
-
-
The tool will perform some operation on the data (i.e. append a timestamp to the data) -
The tool will save the output file, a JSON file that has a "pointer" to the new data location (not yet pushed) -
Pulsar will push the modified data back to the remote data management via the remote data API -
Similar to how we get the output data, get the local file path for the output file -
As the manager checks the status of the job, when we are in an "OK" / "completed" state, execute the REST calls to the remote data management service to upload the output file
-
Description, Additional Detail, Context:
The remote data manager REST API work is tracked in this Issue: https://code.ornl.gov/ndip/remote-data-managers/-/issues/1
Will use the https://code.ornl.gov/ndip/galaxy-pulsar-docker-compose project to create the testbed environment for this story.
Will use our Rucio instance for first remote data management backend.
Need to use the Rucio client (our docker file is found here) in remote data management service to perform the transfers.