Skip to content
Snippets Groups Projects
Commit 8fd077b7 authored by John Chilton's avatar John Chilton
Browse files

Generalize orchestrated container scheduling.

Currently the only container-based scheduled (where we schedule containers and not schedule jobs that launch containers) is the Kubernetes message queue based coexecution approach.

On one hand, we have some intriguing applications TES and AWS Batch where we would like to schedule containers directly and on the other the MQ approach with Kuberentes can easily be generalized to not require the MQ (falling back to polling the way the Kubernetes job runner in Galaxy does).

The goal of this work is to generalize the MQ Kubernetes approach into six approaches:

- MQ + Kubernetes (the current recommendation)
- Kubernetes w/polling API (a simpler Kubernetes approach that retains all the advantages of the Pulsar approach over the Kubernetes runner in Galaxy without requiring a MQ).
- MQ + TES.
- TES w/polling.
- MQ + AWS Batch
- AWS Batch w/polling.

TES
----------

I've developed a client library for TES called pydantic-tes (https://github.com/jmchilton/pydantic-tes) - that should use validated models to communicate with a TES server and is tested against Funnel. It also distributes a pytest fixture that can build and launch funnel for writing automated tests and that works with tox and Github actions as demonstrated by the pydantic-tes CI.

AWS Batch
----------

TODO:

Sequential vs Parallel Container Execution
-------------------------------------------

This work contains a generalization of the approach used in Kubernetes of co-execution of Pulsar and Biocontainers, but the model for TES and AWS Batch are more serial container executions - this runs, then that, then that, etc... In TES this is given as a list of "Executors" and AWS Batch has the idea of the job dependencies that I believe can capture this - but this will require a slightly alternative approach (probably simpler) than the K8S co-execution approach in which the containers wait on each other to write files in order to coordinate.
parent 87bd6722
No related branches found
No related tags found
3 merge requests!44Try to fix CI.,!38Generalize orchestrated container scheduling.,!22update Pulsar to 0.15.3
Showing
with 1162 additions and 63 deletions
...@@ -6,6 +6,7 @@ dist ...@@ -6,6 +6,7 @@ dist
*.egg *.egg
*.egg-info *.egg-info
docs/_build docs/_build
docs/plantuml.jar
cache cache
*.pyc *.pyc
*~ *~
......
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
pycurl pycurl
kombu kombu
pykube pykube
boto3
# For testing # For testing
pytest pytest
......
...@@ -35,7 +35,7 @@ RUN apt-get update \ ...@@ -35,7 +35,7 @@ RUN apt-get update \
ADD pulsar_app-*-py2.py3-none-any.whl /pulsar_app-*-py2.py3-none-any.whl ADD pulsar_app-*-py2.py3-none-any.whl /pulsar_app-*-py2.py3-none-any.whl
RUN pip install --upgrade setuptools && pip install pyOpenSSL --upgrade && pip install cryptography --upgrade RUN pip install --upgrade setuptools && pip install pyOpenSSL --upgrade && pip install cryptography --upgrade
RUN pip install --no-cache-dir /pulsar_app-*-py2.py3-none-any.whl[galaxy_extended_metadata] RUN pip install --no-cache-dir /pulsar_app-*-py2.py3-none-any.whl[galaxy_extended_metadata] && rm /pulsar_app-*-py2.py3-none-any.whl
RUN pip install --upgrade 'importlib-metadata<5.0' RUN pip install --upgrade 'importlib-metadata<5.0'
RUN _pulsar-configure-galaxy-cvmfs RUN _pulsar-configure-galaxy-cvmfs
RUN _pulsar-conda-init --conda_prefix=/pulsar_dependencies/conda RUN _pulsar-conda-init --conda_prefix=/pulsar_dependencies/conda
.. _containers:
-------------------------------
Containers
-------------------------------
Galaxy and Shared File Systems
-------------------------------
Galaxy can be configured to run Pulsar with traditional job managers and just submit jobs
that launch containers. Simply setting ``docker_enabled`` on the job environment in Galaxy's
job_conf.yml file will accomplish this.
There are limitations to using DRM systems that submit job scripts that launch containers
though. Modern container scheduling environments (AWS Batch or Kubernetes or instance) are
capable of scheduling containers directly. This is conceptually cleaner, persumably scales better,
and side steps all sorts of issues for the deployer and developer such as configuring Docker and
managing the interaction between the DRM and the container host server (i.e. the Docker server).
There are a couple approaches to scheduling containers directly in Galaxy - such as the Galaxy
Kubernetes runner and the Galaxy AWS Batch runner. These approaches require Galaxy be deployed
alongside the compute infrasture (i.e. on Amazon with the same EFS volume or inside of Kubernetes
with the same mounts).
These two scenarios and some of their limitations are described below.
.. figure:: gx_aws_deployment.plantuml.svg
Deployment diagram for Galaxy's AWS Batch job runner.
.. figure:: gx_k8s_deployment.plantuml.svg
Deployment diagram for Galaxy's Kubernetes job runner.
The most glaring disadvantage of not using Pulsar in the above scenarios is that Galaxy must
be deployed in the same container with the same mounts as the job execution environment. This
prevents leveraging external cloud compute, multi-cloud compute, and makes it unsuitable for
common Galaxy use cases such as large public instances, Galaxy's leveraging institution non-cloud
storage, etc... Even within the same cloud - a large shared file system can be an expensive prospect
and Pulsar may allow making use of buckets and such more tractable. Finally, Pulsar offers more
options in terms of how to collect metadata which can have big implications in terms of metadata.
Co-execution
-------------------------------
Galaxy job inputs and outputs are very flexible and staging up job inputs, configs, and scripts,
and staging down results doesn't map cleanly to cloud APIs and cannot be fully reasoned about
until job runtime. For this reason, code that needs to know how stage Galaxy jobs up and down needs
to run in the cloud when disk isn't shared and Galaxy cannot do this directly. Galaxy jobs however
are typically executed in Biocontainers that are minimal containers just for the tool being executed
and not appropriate for executing Galaxy code.
For this reason, the Pulsar runners that schedule containers will run a container beside (or before
and after) that is responsible for staging the job up and down, communicating with Galaxy, etc..
Perhaps the most typical potential scenario is using the Kubernetes Job API along with a message queue
for communication with Galaxy and a Biocontainer. A diagram for this deployment would look something
like:
.. figure:: pulsar_k8s_coexecution_mq_deployment.plantuml.svg
The modern Galaxy landscape is much more container driven, but the setup can be simplified to use
Galaxy dependency resolution from within the "pulsar" container. This allows the tool and the staging
code to live side-by-side and results in requesting only one container for the execution from the target
container. The default Pulsar staging container has a conda environment configured out of the box and
has some initial tooling to be connected to a CVM-FS available conda directory.
This one-container approach (staging+conda) is available with or without MQ and on either Kubernetes
or against a GA4GH TES server. The TES version of this with RabbitMQ to mitigate communication looks
like:
.. figure:: pulsar_tes_coexecution_mq_deployment.plantuml.svg
Notice when executing jobs on Kubernetes, the containers of the pod run concurrrently. The Pulsar container
will compute a command-line and write it out, the tool container will wait for it on boot and execute it
when available, while the Pulsar container waits for a return code from the tool container to proceed to
staging out the job. In the GA4GH TES case, 3 containers are used instead of 2, but they run sequentially
one at a time.
Typically, a MQ is needed to communicate between Pulsar and Galaxy even though the status of the job
could potentially be inferred from the container scheduling environment. This is because Pulsar needs
to transfer information about job state, etc. after the job is complete.
More experimentally this shouldn't be needed if extended metadata is being collected because then the
whole job state that needs to be ingested by Galaxy should be populated as part of the job. In this case
it may be possible to get away without a MQ.
.. figure:: pulsar_k8s_coexecution_deployment.plantuml.svg
Deployment Scenarios
-------------------------------
Kubernetes
~~~~~~~~~~
.. figure:: pulsar_k8s_coexecution_mq_deployment.plantuml.svg
Kuberentes job execution with a biocontainer for the tool and RabbitMQ for communicating with
Galaxy.
.. figure:: pulsar_k8s_mq_deployment.plantuml.svg
Kuberentes job execution with Conda dependencies for the tool and RabbitMQ for communicating with
Galaxy.
.. figure:: pulsar_k8s_coexecution_deployment.plantuml.svg
Kuberentes job execution with a biocontainer for the tool and no message queue.
.. figure:: pulsar_k8s_deployment.plantuml.svg
Kuberentes job execution with Conda dependencies for the tool and no message queue.
GA4GH TES
~~~~~~~~~~
.. figure:: pulsar_tes_coexecution_mq_deployment.plantuml.svg
GA4GH TES job execution with a biocontainer for the tool and RabbitMQ for communicating with
Galaxy.
.. figure:: pulsar_tes_mq_deployment.plantuml.svg
GA4GH TES job execution with Conda dependencies for the tool and RabbitMQ for communicating with
Galaxy.
.. figure:: pulsar_tes_coexecution_deployment.plantuml.svg
GA4GH TES job execution with a biocontainer for the tool and no message queue.
.. figure:: pulsar_tes_deployment.plantuml.svg
GA4GH TES job execution with Conda dependencies for the tool and no message queue.
AWS Batch
~~~~~~~~~~
Work in progress.
@startuml
!include plantuml_options.txt
!define AWSPuml https://raw.githubusercontent.com/awslabs/aws-icons-for-plantuml/v14.0/dist
!include AWSPuml/AWSCommon.puml
!include AWSPuml/Groups/all.puml
!include AWSPuml/Containers/Fargate.puml
AWSCloudGroup(cloud) {
Fargate(api, "Batch", "")
component galaxy as "galaxy" {
}
frame pod as "Job Description" {
component job as "a galaxy job" {
}
}
storage disk as "shared efs" {
}
note left of disk
Disk must be fully accessible to Galaxy
and any AWS spawned job containers,
live in the same cloud as Galaxy
end note
}
galaxy --> disk
galaxy --> api : register_job_definition, submit_job, describe_jobs
job --> disk
api -[dashed]-> pod : [manages]
@enduml
@startuml
!include plantuml_options.txt
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
!includeurl KubernetesPuml/OSS/KubernetesApi.puml
!includeurl KubernetesPuml/OSS/KubernetesPod.puml
!includeurl KubernetesPuml/OSS/KubernetesPv.puml
Cluster_Boundary(cluster, "Kubernetes Cluster") {
KubernetesApi(api, "Kubernetes Jobs API", "")
component galaxy as "galaxy" {
}
frame pod as "Job Pod" {
component job as "biocontainer" {
}
}
KubernetesPv(disk, "shared volume", "")
note left of disk
Disk must be fully accessible to Galaxy
and any Kubernetes spawned job pods,
live in the same cloud as Galaxy
end note
}
galaxy --> disk
galaxy --> api
job --> disk
api -[dashed]-> pod : [manages]
@enduml
MINDMAPS := $(wildcard *.mindmap.yml)
INPUTS := $(wildcard *.plantuml.txt)
OUTPUTS := $(INPUTS:.txt=.svg)
all: plantuml.jar $(MINDMAPS) $(OUTPUTS)
$(OUTPUTS): $(INPUTS) $(MINDMAPS)
java -jar plantuml.jar -c plantuml_options.txt -tsvg $(INPUTS)
plantuml.jar:
wget http://jaist.dl.sourceforge.net/project/plantuml/plantuml.jar || curl --output plantuml.jar http://jaist.dl.sourceforge.net/project/plantuml/plantuml.jar
...@@ -16,6 +16,7 @@ Contents: ...@@ -16,6 +16,7 @@ Contents:
install install
configure configure
job_managers job_managers
containers
galaxy_conf galaxy_conf
scripts scripts
conduct conduct
......
' skinparam handwritten true
' skinparam roundcorner 20
skinparam class {
ArrowFontColor DarkOrange
BackgroundColor #FFEFD5
ArrowColor Orange
BorderColor DarkOrange
}
skinparam object {
ArrowFontColor DarkOrange
BackgroundColor #FFEFD5
BackgroundColor #FFEFD5
ArrowColor Orange
BorderColor DarkOrange
}
skinparam ComponentBackgroundColor #FFEFD5
skinparam ComponentBorderColor DarkOrange
skinparam DatabaseBackgroundColor #FFEFD5
skinparam DatabaseBorderColor DarkOrange
skinparam StorageBackgroundColor #FFEFD5
skinparam StorageBorderColor DarkOrange
skinparam QueueBackgroundColor #FFEFD5
skinparam QueueBorderColor DarkOrange
skinparam note {
BackgroundColor #FFEFD5
BorderColor #BF5700
}
skinparam sequence {
ArrowColor Orange
ArrowFontColor DarkOrange
ActorBorderColor DarkOrange
ActorBackgroundColor #FFEFD5
ParticipantBorderColor DarkOrange
ParticipantBackgroundColor #FFEFD5
LifeLineBorderColor DarkOrange
LifeLineBackgroundColor #FFEFD5
DividerBorderColor DarkOrange
GroupBorderColor DarkOrange
}
<style>
mindmapDiagram {
node {
BackgroundColor #FFEFD5
BorderColor DarkOrange
LineColor Orange
}
}
</style>
@startuml
!include plantuml_options.txt
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
!includeurl KubernetesPuml/OSS/KubernetesApi.puml
!includeurl KubernetesPuml/OSS/KubernetesPod.puml
!includeurl KubernetesPuml/OSS/KubernetesPv.puml
component galaxy as "galaxy" {
}
note as galaxynote
Use extended metadata to write
results right from Pulsar and
skip the need for RabbitMQ.
end note
galaxy ... galaxynote
storage disk as "Object Store" {
}
note as disknote
Disk is unrestricted and does
not need to be shared between
Pulsar and Galaxy.
end note
disk ... disknote
Cluster_Boundary(cluster, "Kubernetes Cluster") {
KubernetesApi(api, "Kubernetes Jobs API", "")
frame pod as "Job Pod" {
component staging as "pulsar" {
}
component tool as "biocontainer" {
}
}
}
galaxy --> disk
galaxy --> api : submit, cancel, status
api -[dashed]-> pod : [manages]
staging --> disk : stage in and out
@enduml
@startuml
!include plantuml_options.txt
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
!includeurl KubernetesPuml/OSS/KubernetesApi.puml
!includeurl KubernetesPuml/OSS/KubernetesPod.puml
!includeurl KubernetesPuml/OSS/KubernetesPv.puml
component galaxy as "galaxy" {
}
queue queue as "RabbitMQ" {
}
storage disk as "Object Store" {
}
note as disknote
Disk is unrestricted and does
not need to be shared between
Pulsar and Galaxy.
end note
disk ... disknote
Cluster_Boundary(cluster, "Kubernetes Cluster") {
KubernetesApi(api, "Kubernetes Jobs API", "")
frame pod as "Job Pod" {
component staging as "pulsar" {
}
component tool as "biocontainer" {
}
}
}
galaxy --> disk
galaxy --> api : submit, cancel
staging --> queue : status updates, final job status
galaxy -[dotted]-> queue
api -[dashed]-> pod : [manages]
staging --> galaxy : stage in and out
@enduml
@startuml
!include plantuml_options.txt
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
!includeurl KubernetesPuml/OSS/KubernetesApi.puml
!includeurl KubernetesPuml/OSS/KubernetesPod.puml
!includeurl KubernetesPuml/OSS/KubernetesPv.puml
component galaxy as "galaxy" {
}
note right of galaxy
Use extended metadata to write
results right from Pulsar and
skip the need for RabbitMQ.
end note
storage disk as "Object Store" {
}
note as disknote
Disk is unrestricted and does
not need to be shared between
Pulsar and Galaxy.
end note
disk ... disknote
Cluster_Boundary(cluster, "Kubernetes Cluster") {
KubernetesApi(api, "Kubernetes Jobs API", "")
frame pod as "Job Pod" {
component staging as "pulsar+conda" {
}
}
}
galaxy --> disk
galaxy --> api : submit, cancel, get status
api -[dashed]-> pod : [manages]
staging --> disk : stage in and out
@enduml
@startuml
!include plantuml_options.txt
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
!includeurl KubernetesPuml/OSS/KubernetesApi.puml
!includeurl KubernetesPuml/OSS/KubernetesPod.puml
!includeurl KubernetesPuml/OSS/KubernetesPv.puml
component galaxy as "galaxy" {
}
queue queue as "RabbitMQ" {
}
storage disk as "Object Store" {
}
note as disknote
Disk is unrestricted and does
not need to be shared between
Pulsar and Galaxy.
end note
disk ... disknote
Cluster_Boundary(cluster, "Kubernetes Cluster") {
KubernetesApi(api, "Kubernetes Jobs API", "")
frame pod as "Job Pod" {
component staging as "pulsar+conda" {
}
}
}
galaxy --> disk
galaxy --> api : submit, cancel
staging --> queue : status updates, final job status
galaxy -[dotted]-> queue
api -[dashed]-> pod : [manages]
staging --> galaxy : stage in and out
@enduml
@startuml
!include plantuml_options.txt
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
component galaxy as "galaxy" {
}
note as galaxynote
Use extended metadata to write
results right from Pulsar and
skip the need for RabbitMQ.
end note
galaxy ... galaxynote
storage disk as "Object Store" {
}
note as disknote
Disk is unrestricted and does
not need to be shared between
Pulsar and Galaxy.
end note
disk ... disknote
cloud cluster as "GA4GH TES Cluster" {
queue api as "GA4GH TES API" {
}
frame pod as "TesTask" {
component stageout as "TesExecutor - pulsar stage-out" {
}
component tool as "TesExecutor - tool container" {
}
component stagein as "TesExecutor - pulsar stage-in" {
}
stagein <.. tool : depends on
tool <.. stageout : depends on
}
}
galaxy --> disk
galaxy --> api : submit, cancel, status
api -[dashed]-> pod : [manages]
stagein --> disk : stage in
stageout --> disk : stage out
@enduml
@startuml
!include plantuml_options.txt
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
component galaxy as "galaxy" {
}
queue queue as "RabbitMQ" {
}
storage disk as "Object Store" {
}
note as disknote
Disk is unrestricted and does
not need to be shared between
Pulsar and Galaxy.
end note
disk ... disknote
cloud cluster as "GA4GH TES Cluster" {
queue api as "GA4GH TES API" {
}
frame pod as "TesTask" {
component stageout as "TesExecutor - pulsar stage-out" {
}
component tool as "TesExecutor - tool container" {
}
component stagein as "TesExecutor - pulsar stage-in" {
}
stagein <.. tool : depends on
tool <.. stageout : depends on
}
}
galaxy --> disk
galaxy --> api : submit, cancel
stagein --> queue : status updates
stageout --> queue : status updates
galaxy -[dotted]-> queue
api -[dashed]-> pod : [manages]
stagein --> galaxy : stage in
stageout --> galaxy : stage out
@enduml
@startuml
!include plantuml_options.txt
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
component galaxy as "galaxy" {
}
note right of galaxy
Use extended metadata to write
results right from Pulsar and
skip the need for RabbitMQ.
end note
storage disk as "Object Store" {
}
note as disknote
Disk is unrestricted and does
not need to be shared between
Pulsar and Galaxy.
end note
disk ... disknote
cloud cluster as "GA4GH TES Cluster" {
queue api as "GA4GH TES API" {
}
frame pod as "TesTask" {
component staging as "TesExecutor - pulsar+conda" {
}
}
}
galaxy --> disk
galaxy --> api : submit, cancel, status
api -[dashed]-> pod : [manages]
staging --> disk : stage in and out
@enduml
@startuml
!include plantuml_options.txt
!define KubernetesPuml https://raw.githubusercontent.com/dcasati/kubernetes-PlantUML/master/dist
!includeurl KubernetesPuml/kubernetes_Common.puml
!includeurl KubernetesPuml/kubernetes_Context.puml
!includeurl KubernetesPuml/kubernetes_Simplified.puml
component galaxy as "galaxy" {
}
queue queue as "RabbitMQ" {
}
storage disk as "Object Store" {
}
note as disknote
Disk is unrestricted and does
not need to be shared between
Pulsar and Galaxy.
end note
disk ... disknote
cloud cluster as "GA4GH TES Cluster" {
queue api as "GA4GH TES API" {
}
frame pod as "TesTask" {
component staging as "TesExecutor - pulsar+conda" {
}
}
}
galaxy --> disk
galaxy --> api : submit, cancel
galaxy -[dotted]-> queue
staging --> queue : status updates, final job status
api -[dashed]-> pod : [manages]
staging --> galaxy : stage in and out
@enduml
[mypy-galaxy.*] [mypy-galaxy.*]
ignore_missing_imports = True ignore_missing_imports = True
[mypy-boto3.*]
ignore_missing_imports = True
[mypy-mesos.*] [mypy-mesos.*]
ignore_missing_imports = True ignore_missing_imports = True
......
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment