Skip to content
Snippets Groups Projects
user avatar
John Chilton authored
Currently the only container-based scheduled (where we schedule containers and not schedule jobs that launch containers) is the Kubernetes message queue based coexecution approach.

On one hand, we have some intriguing applications TES and AWS Batch where we would like to schedule containers directly and on the other the MQ approach with Kuberentes can easily be generalized to not require the MQ (falling back to polling the way the Kubernetes job runner in Galaxy does).

The goal of this work is to generalize the MQ Kubernetes approach into six approaches:

- MQ + Kubernetes (the current recommendation)
- Kubernetes w/polling API (a simpler Kubernetes approach that retains all the advantages of the Pulsar approach over the Kubernetes runner in Galaxy without requiring a MQ).
- MQ + TES.
- TES w/polling.
- MQ + AWS Batch
- AWS Batch w/polling.

TES
----------

I've developed a client library for TES called pydantic-tes (https://github.com/jmchilton/pydantic-tes) - that should use validated models to communicate with a TES server and is tested against Funnel. It also distributes a pytest fixture that can build and launch funnel for writing automated tests and that works with tox and Github actions as demonstrated by the pydantic-tes CI.

AWS Batch
----------

TODO:

Sequential vs Parallel Container Execution
-------------------------------------------

This work contains a generalization of the approach used in Kubernetes of co-execution of Pulsar and Biocontainers, but the model for TES and AWS Batch are more serial container executions - this runs, then that, then that, etc... In TES this is given as a list of "Executors" and AWS Batch has the idea of the job dependencies that I believe can capture this - but this will require a slightly alternative approach (probably simpler) than the K8S co-execution approach in which the containers wait on each other to write files in order to coordinate.
8fd077b7
History
Name Last commit Last update