Commit 9c1ef8cd authored by Cage, Gregory's avatar Cage, Gregory
Browse files

Refactor docs to integrate tests

parent 07312fbf
Loading
Loading
Loading
Loading
Loading
+20 −45
Original line number Diff line number Diff line
@@ -5,16 +5,17 @@ Data Stores

A `Datastore` or `Data Store` in nova-galaxy represents a Galaxy history. It serves as a container for organizing your data and tool outputs within Galaxy.

.. code-block:: python
You will need to create your connection as is standard with the galaxy_url and galaxy_key values set.

    from nova.galaxy import Nova
.. literalinclude:: ../../tests/conftest.py
    :start-after: setup nova connection
    :end-before: setup nova connection complete
    :dedent:

    galaxy_url = "your_galaxy_url"
    galaxy_key = "your_galaxy_api_key"
    connection = Connection(galaxy_url, galaxy_key)

    with connection.connect() as conn:
        data_store = conn.create_data_store("My Data Store")
.. literalinclude:: ../../tests/test_data_store.py
    :start-after: create new datastore
    :end-before: create new datastore complete
    :dedent:

By default data stores are persisted, meaning that their jobs and outputs will be available to retrieve even after the connection is closed.
Datastores (or data stores) also keep their namespace even after the application is exited. Meaning, if you name your data store "Data1", then
@@ -25,45 +26,19 @@ In order to delete and cleanup your data stores (ie delete all outputs/resources

First you can mark a data store for cleanup automatically when you close your nova connection.

.. code-block:: python

    with connection.connect() as conn:
        data_store = conn.create_data_store("My Data Store")
        data_store.mark_for_cleanup()
        # when the 'with' block exits, the data store will be cleaned up.

This will also work when the connection class is used without the 'with' syntax.

.. code-block:: python

    active_connection = connection.connect()
    data_store = conn.create_data_store("My Data Store")
    data_store.mark_for_cleanup()
    active_connection.close()
    # when close() is called, the data store will be cleaned up.


You can also manually clean a data store by invoking the cleanup class method:
.. literalinclude:: ../../tests/test_data_store.py
    :start-after: create new datastore
    :end-before: mark for cleanup complete
    :dedent:

.. code-block:: python
When the 'with' block exits, the data store will be cleaned up. This will also work when the connection class is used without the 'with' syntax.

    active_connection = connection.connect()
    data_store = active_connection.create_data_store("My Data Store")
    # Do work
    data_store.cleanup()
    data_store = active_connection.create_data_store("My Data Store")
    # In order to use this data store again, you will have to call create_data_store again. This will be an empty store since the previous was cleaned up.
.. literalinclude:: ../../tests/test_data_store.py
    :start-after: manual connection start
    :end-before: manual connection complete
    :dedent:

If at any point, you want to persist a store that has been marked for cleanup, you can call the persist class method:
When the `connection.close()` method is called, the data store will be cleaned up. You can also manually clean a data store by invoking the cleanup class method: `cleanup()`.

.. code-block:: python

    active_connection = connection.connect()
    data_store = active_connection.create_data_store("My Data Store")
    # Run your first tool
    data_store.cleanup()
    data_store = active_connection.create_data_store("My Data Store")
    # Run your second tool
    data_store.persist()
    active_connection.close()
    # All data in the store from the second tool will be persisted, whereas the first tool's outputs will be gone.
In order to use the data store again after it's been clean up, you will have to call create_data_store again. If at any point, you want to persist a store that has been marked for cleanup, you can call the `persist()` class method.
+23 −36
Original line number Diff line number Diff line
@@ -5,66 +5,53 @@ Datasets and Dataset Collections

nova-galaxy provides abstractions for handling individual files (`Dataset`) and collections of files (`DatasetCollection`) within Galaxy.

.. code-block:: python
.. literalinclude:: ../../tests/test_dataset.py
    :start-after: create dataset
    :end-before: create dataset complete
    :dedent:

   from nova.galaxy import Dataset, DatasetCollection
.. literalinclude:: ../../tests/test_dataset.py
    :start-after: create dataset collection
    :end-before: create dataset collection complete
    :dedent:

   # Create a Dataset from a local file
   my_dataset = Dataset("path/to/my/file.txt")

   # Create a DatasetCollection (implementation for upload pending)
   my_collection = DatasetCollection("path/to/my/collection")
By default Datasets will take their name from the filepath given, but they can be given unique names by passing a string into the constructor in the `name` parameter.


By default Datasets will take their name from the filepath given, but they can be given unique names by passing a string into the constructor.

.. code-block:: python

    my_dataset = Dataset(path="path/to/file.txt", name="cool_dataset_name")

Datasets can be marked as a remote file if you don't want to upload them from your local machine. Remote files are files that your upstream Galaxy instance will have access to.
For example, if your upstream Galaxy instance has access to a directory named `/SNS`, you can load a file from there as a dataset:

.. code-block:: python

    my_dataset = Dataset(path="/SNS/path/to/file.txt", remote_file=True)
.. literalinclude:: ../../tests/test_dataset.py
    :start-after: create remote dataset
    :end-before: create remote dataset complete
    :dedent:

Datasets can be uploaded to a store by calling the upload method.

.. code-block:: python

    connection = Connection("galaxy_url", "galaxy_key").connect()
    store = connection.create_data_store("store")
    my_dataset = Dataset("filepath/file.txt")
    my_dataset.upload(store, name="optional name")

.. literalinclude:: ../../tests/test_run_tool.py
    :start-after: existing dataset input
    :end-before: existing dataset input complete
    :dedent:

Note, when the remote_files flag is set to true, the files are not actually "uploaded". Instead, they will be ingested into Galaxy as a link to the actual file, so file size should not slow down the system.

When running tools, any Dataset that is used as an input parameter will be automatically uploaded/ingested, unless that dataset has already been uploaded.
In order to force the dataset to be uploaded when a tool runs, even if it has been uploaded before, the dataset can be marked with force_upload:

.. code-block:: python

     my_dataset = Dataset(path="/SNS/path/to/file.txt", force_upload=True)
In order to force the dataset to be uploaded when a tool runs, even if it has been uploaded before, the dataset can be marked with `force_upload` by passing in a boolean value to that parameter in the constructor.

By default `force_upload` is actually True.

If instead of loading a file from disk or ingesting a file, you want to directly upload some text or some other serializable python value, you can set the dataset content directly:

.. code-block:: python

    my_dataset = Dataset()
    my_dataset.set_content("Some text that will be uploaded as a text file", file_type=".txt")
.. literalinclude:: ../../tests/test_dataset.py
    :start-after: set dataset content
    :end-before: set dataset content complete
    :dedent:

The `file_type` argument is optional and will default to a text file.

In order to fetch the content of a dataset you can either download the dataset to a path or fetch the content and store it directly in memory (be careful using this with large files.)

.. code-block:: python

    my_dataset.download("/path/to/local/location/where/you/want/to/download/this.txt")
    dataset_content = my_dataset.get_content() # will store content in memory
In order to fetch the content of a dataset you can either download the dataset to a path  using `download()` or fetch the content and store it directly in memory using `get_content()` (be careful using this with large files.)


DatasetCollections currently have less functionality than individual Datasets, as most collections will come from tool outputs.
+8 −16
Original line number Diff line number Diff line
@@ -5,24 +5,16 @@ Interactive Tools

nova-galaxy allows running Galaxy tools in interactive mode, which is especially useful when tools generate URLs that need to be accessed during runtime.

.. code-block:: python

    from nova.galaxy import Tool, Parameters

    # Define tool parameters
    params = Parameters()

    # Get a tool instance
    my_tool = Tool("tool_id") # Replace with your tool id from Galaxy

    # Run the tool in interactive mode
    url = my_tool.run_interactive(data_store, params)
    print(f"Interactive tool URL: {url}")
.. literalinclude:: ../../tests/test_run_tool.py
    :start-after: run interactive tool
    :end-before: run interactive tool complete
    :dedent:

By default, interactive tools are not stopped automatically once the Nova connection is closed. To override this behavior, use the DataStore mark_for_cleanup method. This will cause the tool to stop automatically, once the connection is closed (or `with` block is exited). You can manually stop these tools by using the Tool stop_all_tools_in_store method.

If you want to get the url of an interactive tool at a later point, you can use the `get_url` method:

.. code-block:: python

     my_tool.get_url()
.. literalinclude:: ../../tests/test_run_tool.py
    :start-after: interactive tool get link
    :end-before: interactive tool get link complete
    :dedent:
+4 −9
Original line number Diff line number Diff line
@@ -5,12 +5,7 @@ Outputs

The `Outputs` class encapsulates the output datasets and collections generated by a tool run.

.. code-block:: python

    from nova.galaxy import Dataset, DatasetCollection, Outputs

   # Get a specific dataset by name
   specific_dataset = outputs.get_dataset("my_output_dataset")

   # Get a specific collection by name
   specific_collection = outputs.get_collection("my_output_collection")
.. literalinclude:: ../../tests/test_run_tool.py
    :start-after: outputs example
    :end-before: outputs example complete
    :dedent:
+5 −16
Original line number Diff line number Diff line
@@ -5,20 +5,9 @@ Parameters

The `Parameters` class is used to define the input parameters for a Galaxy tool.

.. code-block:: python
.. literalinclude:: ../../tests/test_run_tool.py
    :start-after: run interactive tool
    :end-before: run interactive tool complete
    :dedent:

   from nova.galaxy import Parameters, Dataset

   # Create a dataset from a local file
   my_dataset = Dataset("path/to/my/file.txt")

   # Define tool parameters
   params = Parameters()
   params.add_input("input_file", my_dataset)
   params.add_input("param_name", "param_value")

   # Change an existing input value
   params.change_input_value("param_name", "new_value")

   # Remove an input
   params.remove_input("param_name")
You can remove an existing input value with `remove_input()` or change the value with `change_input_value()`.
Loading