A `Datastore` or `Data Store` in nova-galaxy represents a Galaxy history. It serves as a container for organizing your data and tool outputs within Galaxy.
.. code-block:: python
You will need to create your connection as is standard with the galaxy_url and galaxy_key values set.
from nova.galaxy import Nova
.. literalinclude:: ../../tests/conftest.py
:start-after: setup nova connection
:end-before: setup nova connection complete
:dedent:
galaxy_url = "your_galaxy_url"
galaxy_key = "your_galaxy_api_key"
connection = Connection(galaxy_url, galaxy_key)
with connection.connect() as conn:
data_store = conn.create_data_store("My Data Store")
If at any point, you want to persist a store that has been marked for cleanup, you can call the persist class method:
When the `connection.close()` method is called, the data store will be cleaned up. You can also manually clean a data store by invoking the cleanup class method: `cleanup()`.
.. code-block:: python
active_connection = connection.connect()
data_store = active_connection.create_data_store("My Data Store")
# Run your first tool
data_store.cleanup()
data_store = active_connection.create_data_store("My Data Store")
# Run your second tool
data_store.persist()
active_connection.close()
# All data in the store from the second tool will be persisted, whereas the first tool's outputs will be gone.
In order to use the data store again after it's been clean up, you will have to call create_data_store again. If at any point, you want to persist a store that has been marked for cleanup, you can call the `persist()` class method.
By default Datasets will take their name from the filepath given, but they can be given unique names by passing a string into the constructor in the `name` parameter.
By default Datasets will take their name from the filepath given, but they can be given unique names by passing a string into the constructor.
Datasets can be marked as a remote file if you don't want to upload them from your local machine. Remote files are files that your upstream Galaxy instance will have access to.
For example, if your upstream Galaxy instance has access to a directory named `/SNS`, you can load a file from there as a dataset:
Note, when the remote_files flag is set to true, the files are not actually "uploaded". Instead, they will be ingested into Galaxy as a link to the actual file, so file size should not slow down the system.
When running tools, any Dataset that is used as an input parameter will be automatically uploaded/ingested, unless that dataset has already been uploaded.
In order to force the dataset to be uploaded when a tool runs, even if it has been uploaded before, the dataset can be marked with force_upload:
In order to force the dataset to be uploaded when a tool runs, even if it has been uploaded before, the dataset can be marked with `force_upload` by passing in a boolean value to that parameter in the constructor.
By default `force_upload` is actually True.
If instead of loading a file from disk or ingesting a file, you want to directly upload some text or some other serializable python value, you can set the dataset content directly:
.. code-block:: python
my_dataset = Dataset()
my_dataset.set_content("Some text that will be uploaded as a text file", file_type=".txt")
.. literalinclude:: ../../tests/test_dataset.py
:start-after: set dataset content
:end-before: set dataset content complete
:dedent:
The `file_type` argument is optional and will default to a text file.
In order to fetch the content of a dataset you can either download the dataset to a path or fetch the content and store it directly in memory (be careful using this with large files.)
dataset_content = my_dataset.get_content() # will store content in memory
In order to fetch the content of a dataset you can either download the dataset to a path using `download()` or fetch the content and store it directly in memory using `get_content()` (be careful using this with large files.)
DatasetCollections currently have less functionality than individual Datasets, as most collections will come from tool outputs.
nova-galaxy allows running Galaxy tools in interactive mode, which is especially useful when tools generate URLs that need to be accessed during runtime.
.. code-block:: python
from nova.galaxy import Tool, Parameters
# Define tool parameters
params = Parameters()
# Get a tool instance
my_tool = Tool("tool_id") # Replace with your tool id from Galaxy
# Run the tool in interactive mode
url = my_tool.run_interactive(data_store, params)
print(f"Interactive tool URL: {url}")
.. literalinclude:: ../../tests/test_run_tool.py
:start-after: run interactive tool
:end-before: run interactive tool complete
:dedent:
By default, interactive tools are not stopped automatically once the Nova connection is closed. To override this behavior, use the DataStore mark_for_cleanup method. This will cause the tool to stop automatically, once the connection is closed (or `with` block is exited). You can manually stop these tools by using the Tool stop_all_tools_in_store method.
If you want to get the url of an interactive tool at a later point, you can use the `get_url` method: