A `Datastore` in nova-galaxy represents a Galaxy history. It serves as a container for organizing your data and tool outputs within Galaxy.
A `Datastore` or `Data Store` in nova-galaxy represents a Galaxy history. It serves as a container for organizing your data and tool outputs within Galaxy.
.. code-block:: python
@@ -11,13 +11,59 @@ A `Datastore` in nova-galaxy represents a Galaxy history. It serves as a contain
galaxy_url = "your_galaxy_url"
galaxy_key = "your_galaxy_api_key"
nova = Nova(galaxy_url, galaxy_key)
connection = Connection(galaxy_url, galaxy_key)
with nova.connect() as conn:
with connection.connect() as conn:
data_store = conn.create_data_store("My Data Store")
You can also choose to persist a data store, preventing the tools in the data store from being automatically stopped when the nova-galaxy connection is closed:
By default data stores are persisted, meaning that their jobs and outputs will be available to retrieve even after the connection is closed.
Datastores (or data stores) also keep their namespace even after the application is exited. Meaning, if you name your data store "Data1", then
if you create a new data store in the future named "Data1" then Nova Galaxy will automatically connect the new instance to the old one, assuming
it has not been deleted.
In order to delete and cleanup your data stores (ie delete all outputs/resources associated with the data store), there are a few methods.
First you can mark a data store for cleanup automatically when you close your nova connection.
.. code-block:: python
with connection.connect() as conn:
data_store = conn.create_data_store("My Data Store")
data_store.mark_for_cleanup()
# when the 'with' block exits, the data store will be cleaned up.
This will also work when the connection class is used without the 'with' syntax.
.. code-block:: python
active_connection = connection.connect()
data_store = conn.create_data_store("My Data Store")
data_store.mark_for_cleanup()
active_connection.close()
# when close() is called, the data store will be cleaned up.
You can also manually clean a data store by invoking the cleanup class method:
.. code-block:: python
active_connection = connection.connect()
data_store = active_connection.create_data_store("My Data Store")
# Do work
data_store.cleanup()
data_store = active_connection.create_data_store("My Data Store")
# In order to use this data store again, you will have to call create_data_store again. This will be an empty store since the previous was cleaned up.
If at any point, you want to persist a store that has been marked for cleanup, you can call the persist class method:
.. code-block:: python
active_connection = connection.connect()
data_store = active_connection.create_data_store("My Data Store")
# Run your first tool
data_store.cleanup()
data_store = active_connection.create_data_store("My Data Store")
# Run your second tool
data_store.persist()
active_connection.close()
# All data in the store from the second tool will be persisted, whereas the first tool's outputs will be gone.
Datasets can be marked as a remote file if you don't want to upload them from your local machine. Remote files are files that your upstream Galaxy instance will have access to.
For example, if your upstream Galaxy instance has access to a directory named `/SNS`, you can load a file from there as a dataset:
Note, when the remote_files flag is set to true, the files are not actually "uploaded". Instead, they will be ingested into Galaxy as a link to the actual file, so file size should not slow down the system.
When running tools, any Dataset that is used as an input parameter will be automatically uploaded/ingested, unless that dataset has already been uploaded.
In order to force the dataset to be uploaded when a tool runs, even if it has been uploaded before, the dataset can be marked with force_upload:
If instead of loading a file from disk or ingesting a file, you want to directly upload some text or some other serializable python value, you can set the dataset content directly:
.. code-block:: python
my_dataset = Dataset()
my_dataset.set_content("Some text that will be uploaded as a text file", file_type=".txt")
The `file_type` argument is optional and will default to a text file.
In order to fetch the content of a dataset you can either download the dataset to a path or fetch the content and store it directly in memory (be careful using this with large files.)
@@ -19,4 +19,10 @@ nova-galaxy allows running Galaxy tools in interactive mode, which is especially
url = my_tool.run_interactive(data_store, params)
print(f"Interactive tool URL: {url}")
By default, interactive tools are stopped automatically once the Nova connection is closed. To override this behavior, use the DataStore persist method. This will cause the tool to run into perpetuity and will need to be stopped manually using the Tool stop_all_tools_in_store method.
By default, interactive tools are not stopped automatically once the Nova connection is closed. To override this behavior, use the DataStore mark_for_cleanup method. This will cause the tool to stop automatically, once the connection is closed (or `with` block is exited). You can manually stop these tools by using the Tool stop_all_tools_in_store method.
If you want to get the url of an interactive tool at a later point, you can use the `get_url` method: