Merge pull request #46 from nova-model/update-user-docs (07312fbf) · Commits · NDIP / Nova Packages / nova-galaxy

docs/core_concepts/data_stores.rst

+51 −5

Original line number	Diff line number	Diff line
		@@ -3,7 +3,7 @@
		Data Stores
		-------------------------

		A `Datastore` in nova-galaxy represents a Galaxy history. It serves as a container for organizing your data and tool outputs within Galaxy.
		A `Datastore` or `Data Store` in nova-galaxy represents a Galaxy history. It serves as a container for organizing your data and tool outputs within Galaxy.

		.. code-block:: python

		@@ -11,13 +11,59 @@ A `Datastore` in nova-galaxy represents a Galaxy history. It serves as a contain

		galaxy_url = "your_galaxy_url"
		galaxy_key = "your_galaxy_api_key"
		nova = Nova(galaxy_url, galaxy_key)
		connection = Connection(galaxy_url, galaxy_key)

		with nova.connect() as conn:
		with connection.connect() as conn:
		data_store = conn.create_data_store("My Data Store")

		You can also choose to persist a data store, preventing the tools in the data store from being automatically stopped when the nova-galaxy connection is closed:
		By default data stores are persisted, meaning that their jobs and outputs will be available to retrieve even after the connection is closed.
		Datastores (or data stores) also keep their namespace even after the application is exited. Meaning, if you name your data store "Data1", then
		if you create a new data store in the future named "Data1" then Nova Galaxy will automatically connect the new instance to the old one, assuming
		it has not been deleted.

		In order to delete and cleanup your data stores (ie delete all outputs/resources associated with the data store), there are a few methods.

		First you can mark a data store for cleanup automatically when you close your nova connection.

		.. code-block:: python

		with connection.connect() as conn:
		data_store = conn.create_data_store("My Data Store")
		data_store.mark_for_cleanup()
		# when the 'with' block exits, the data store will be cleaned up.

		This will also work when the connection class is used without the 'with' syntax.

		.. code-block:: python

		active_connection = connection.connect()
		data_store = conn.create_data_store("My Data Store")
		data_store.mark_for_cleanup()
		active_connection.close()
		# when close() is called, the data store will be cleaned up.


		You can also manually clean a data store by invoking the cleanup class method:

		.. code-block:: python

		active_connection = connection.connect()
		data_store = active_connection.create_data_store("My Data Store")
		# Do work
		data_store.cleanup()
		data_store = active_connection.create_data_store("My Data Store")
		# In order to use this data store again, you will have to call create_data_store again. This will be an empty store since the previous was cleaned up.

		If at any point, you want to persist a store that has been marked for cleanup, you can call the persist class method:

		.. code-block:: python

		active_connection = connection.connect()
		data_store = active_connection.create_data_store("My Data Store")
		# Run your first tool
		data_store.cleanup()
		data_store = active_connection.create_data_store("My Data Store")
		# Run your second tool
		data_store.persist()
		active_connection.close()
		# All data in the store from the second tool will be persisted, whereas the first tool's outputs will be gone.

docs/core_concepts/datasets.rst

+56 −0

Original line number	Diff line number	Diff line
		@@ -14,3 +14,59 @@ nova-galaxy provides abstractions for handling individual files (`Dataset`) and

		# Create a DatasetCollection (implementation for upload pending)
		my_collection = DatasetCollection("path/to/my/collection")


		By default Datasets will take their name from the filepath given, but they can be given unique names by passing a string into the constructor.

		.. code-block:: python

		my_dataset = Dataset(path="path/to/file.txt", name="cool_dataset_name")

		Datasets can be marked as a remote file if you don't want to upload them from your local machine. Remote files are files that your upstream Galaxy instance will have access to.
		For example, if your upstream Galaxy instance has access to a directory named `/SNS`, you can load a file from there as a dataset:

		.. code-block:: python

		my_dataset = Dataset(path="/SNS/path/to/file.txt", remote_file=True)

		Datasets can be uploaded to a store by calling the upload method.

		.. code-block:: python

		connection = Connection("galaxy_url", "galaxy_key").connect()
		store = connection.create_data_store("store")
		my_dataset = Dataset("filepath/file.txt")
		my_dataset.upload(store, name="optional name")


		Note, when the remote_files flag is set to true, the files are not actually "uploaded". Instead, they will be ingested into Galaxy as a link to the actual file, so file size should not slow down the system.

		When running tools, any Dataset that is used as an input parameter will be automatically uploaded/ingested, unless that dataset has already been uploaded.
		In order to force the dataset to be uploaded when a tool runs, even if it has been uploaded before, the dataset can be marked with force_upload:

		.. code-block:: python

		my_dataset = Dataset(path="/SNS/path/to/file.txt", force_upload=True)

		By default `force_upload` is actually True.

		If instead of loading a file from disk or ingesting a file, you want to directly upload some text or some other serializable python value, you can set the dataset content directly:

		.. code-block:: python

		my_dataset = Dataset()
		my_dataset.set_content("Some text that will be uploaded as a text file", file_type=".txt")

		The `file_type` argument is optional and will default to a text file.

		In order to fetch the content of a dataset you can either download the dataset to a path or fetch the content and store it directly in memory (be careful using this with large files.)

		.. code-block:: python

		my_dataset.download("/path/to/local/location/where/you/want/to/download/this.txt")
		dataset_content = my_dataset.get_content() # will store content in memory


		DatasetCollections currently have less functionality than individual Datasets, as most collections will come from tool outputs.
		The `get_content()` method will return a list of info on each element in the collection rather than the content of each element.
		The `download()` method will save the collection (with all content included) as a zip archive to the given path.

docs/core_concepts/interactive_tools.rst

+7 −1

Original line number	Diff line number	Diff line
		@@ -19,4 +19,10 @@ nova-galaxy allows running Galaxy tools in interactive mode, which is especially
		url = my_tool.run_interactive(data_store, params)
		print(f"Interactive tool URL: {url}")

		By default, interactive tools are stopped automatically once the Nova connection is closed. To override this behavior, use the DataStore persist method. This will cause the tool to run into perpetuity and will need to be stopped manually using the Tool stop_all_tools_in_store method.
		By default, interactive tools are not stopped automatically once the Nova connection is closed. To override this behavior, use the DataStore mark_for_cleanup method. This will cause the tool to stop automatically, once the connection is closed (or `with` block is exited). You can manually stop these tools by using the Tool stop_all_tools_in_store method.

		If you want to get the url of an interactive tool at a later point, you can use the `get_url` method:

		.. code-block:: python

		my_tool.get_url()

docs/core_concepts/tools.rst

+78 −2

Original line number	Diff line number	Diff line
		@@ -7,10 +7,86 @@ The `Tool` class represents a Galaxy tool. You can run tools, manage their input

		.. code-block:: python

		from nova.galaxy import Tool, Parameters, Dataset
		from nova.galaxy import Connection, Tool, Parameters, Dataset

		# Get a tool instance
		my_tool = Tool("tool_id")

		connection = Connection("galaxy_url", "galaxy_key")
		active_connection = connection.connect()
		data_store = active_connection.create_data_store("cool store")
		inputs = Parameters()
		# Run the tool
		outputs = my_tool.run(data_store, params)
		outputs = my_tool.run(data_store, inputs)

		By default tools will run synchronously. In order to run a tool in an "async" manner, set the wait argument to False.

		.. code-block:: python

		outputs = my_tool.run(data_store=data_store, params=inputs, wait=False)
		# any code after will be executed immediately. Outputs will be None in this case.

		You can get the status of the tool in the form of a WorkState (from nova-common library) enum value:

		.. code-block:: python

		status = my_tool.get_status()
		print(status) # could print "running", "queued", "error", etc
		full status = my_tool.get_full_status()
		print(full_status) # Gives you details on error states, etc

		If a tool has already been run, and you want to get the results/outputs again:

		.. code-block:: python

		outputs = my_tool.get_results()

		If you have run a tool asynchronously, and at a later point, you want to wait for the tool, you can use the `wait_for_results` method:

		.. code-block:: python

		my_tool.run(data_store=data_store, params=inputs, wait=False)

		# do some stuff

		my_tool.wait_for_results()
		# Any code after will be executed after tool has finished running

		If you want to stop a tool from running, but keep any existing outputs from the Tool, use the `stop` method.

		.. code-block:: python

		my_tool.run(data_store=data_store, params=inputs, wait=False)
		my_tool.stop()
		outputs = my_tool.get_results()

		If you want to cancel a tool from running and throw away any output from it, use the `cancel` method:

		.. code-block:: python

		my_tool.run(data_store=data_store, params=inputs, wait=False)
		my_tool.cancel()

		You can get any current stdout and stderr from a Tool:

		.. code-block:: python

		stdout = my_tool.get_stdout() # Get current stdout
		stderr = my_tool.get_stderr(position=10, length = 300) # Gets 300 characters of stderr, starting from the tenth index.
		# Both stdout and stderr amount and starting position can be specified.

		These methods work regardless of whether the job is running or has been completed.

		Advanced users may find they need to access the underlying job id for a tool, which they can do so with `get_uid`

		.. code-block:: python

		upstream_id = my_tool.get_uid() # Galaxy job ID

		Tools can also be assigned to already running or completed jobs by using `assign_id`

		.. code-block:: python

		second_tool = Tool("tool_id")
		second_tool.assign_id(upstream_id)
		# second_tool now can access status, outputs, stdout, stderr, etc from first tool

docs/examples/basic_usage.rst

+51 −0

Original line number	Diff line number	Diff line
		@@ -43,3 +43,54 @@ This example demonstrates how to upload a dataset to Galaxy and run a tool using
		# Get the content of the output dataset
		content = output_dataset.get_content()
		print(content)
		# Because data stores persist by default, this content will still be saved after the with block is exited.


		Example 2: Manually managing a Connection
		--------------------------------------------------
		.. code-block:: python

		from nova.galaxy import Connection, Dataset, Tool, Parameters

		galaxy_url = "your_galaxy_url"
		galaxy_key = "your_galaxy_api_key"
		nova = Connection(galaxy_url, galaxy_key)

		# Open the connection
		conn = nova.connect()

		# Create a data store
		data_store = conn.create_data_store("Example Data Store")

		# Create a dataset from a local file
		my_dataset = Dataset("path/to/your/file.txt")

		# Define tool parameters
		params = Parameters()
		params.add_input("input", my_dataset)

		# Get the tool
		my_tool = Tool("some_tool_id") # Replace with the actual tool ID

		# Run the tool asynchronously
		my_tool.run(data_store, params, wait=False)

		# Get Tool Status
		print(my_tool.get_status())

		# Wait for tool to finish
		my_tool.wait_for_results()

		# Get the results from the tool
		results = my_tool.get_results()
		output_coll = results.get_collection("my_output_collection")

		# Download the output collection to a local path
		output_coll.download("/local/path/where/I/want/to/download/")

		# Clean data store (remove all files and outputs) after connection is closed
		data_store.mark_for_cleanup()

		# Manually close connection
		conn.close()
		# Results have been removed from upstream since data store was cleaned up.