> For geospatial data processing, Modules A and P are available to run as standalone modules. Module O currently requires Module P processing first, and therefore cannot be standalone. See the multi-module configuration options to incorporate Module O processing.
## Understanding the Entire Config
Single module configurations are relatively straightforward, as there are no variables to align between multiple modules. Instead, users simply need to modify four main parts of the config based on their specific processing needs.
# Configuration for: Atmospheric Correction, Pansharpening, and Orthorectification
## Breaking Down the Configuration
- Specify the input directory or series of directories:
-All standalone modules will use the `input` value to parse through imagery in parallel.
- In `module-a-config.json`, there is a single directory listed. PIPE will parse through this main directory and concurrently process all available subdirectories.
-Use the `input` value to parse through imagery in parallel.
- In the associated `config.json`, there is a single directory listed. Xylem will parse through this main directory and concurrently process all available subdirectories.
```json
"input":
"/path/to/input/directory"
```
-In `module-p-config.json`, there are multiple directories listed. Note that when listing multiple directories, they are listed within `[ ]`, signaling to PIPE these specific directories should be concurrently processed.
-Multiple directories may also be specified. Note that when listing multiple directories, they are listed within `[ ]`, signaling to Xylem these specific directories should be concurrently processed.
```json
"input":[
"/path/to/input/order1",
@@ -20,16 +17,14 @@ Single module configurations are relatively straightforward, as there are no var
"/path/to/input/order4"
]
```
- Modules A and P can both use either of these `input` configurations; they are not limited to strictly following the examples shown here. These input variables are only different between example `config.json` files to showcase the variation in `input` options.
- Define the module:
- Within the module variable, specify the `name` and `uri` to the module. The `uri` specifically informs PIPE from where to install the module environment. For example, if a user has PIPE installed and has cloned each of the modules in their `dev` directory within a Docker container, this path may look something like:
- Define each module:
- Within the module variable, specify the `name` and `uri` to the module. The `uri` specifically informs Xylem from where to install the module environment. For example, if a user has Xylem installed and has cloned each of the modules in their `dev` directory within a Docker container, this path may look something like:
```json
"uri":"file:///root/dev/module-a"
```
- Define the module variables:
- Variables are unique to each module (details below).
- For standalone modules specifically, pay close attention to the `input`: users do **not** need to specify `input` as a variable within the module. Rather, the `input` is noted in the `template`.
- Speaking of the `template`:
- Define each module's variables:
- Variables are unique to each module (detailed below).
- Simply follow the `template`:
- This section is built from the module's `Makefile`. Here, users can specify arguments and associated variables. Below is an example template for Module A:
```json
"template":{
@@ -52,28 +47,56 @@ Single module configurations are relatively straightforward, as there are no var
]
}
```
-**NOTE**:
- For standalone modules the `input_directory` for Module A and `source_directory` for Module P will **always** be `INPUT` in the template.
## Module Variables
While Module O is not an option as a standalone module, the variable definitions are included here as well for continuity. For more thorough discussions on variables and structure for individual modules, please browse the README for each:
-`input_directory`: the directory containing the raw, Level 1B Maxar imagery. Module A currently expects a Maxar data directory structure. Because Module A is either run as a standalone module or always the first in a multi-module sequence, this argument will always be `INPUT` in the template.
-`input_directory`: the directory containing the raw, Level 1B Maxar imagery. Module A currently expects a Maxar data directory structure. This argument will always be `INPUT` in the template.
-`output_directory`: the directory in which to store the output of Module A processing.
-`method`: specification for users to define if they want the output to contain top-of-atmosphere reflectance (`toa_reflectance`) or bottom-of-atmosphere reflectance (`boa_reflectance`). For the best representation of true surface reflectance (e.g., the removal of the blue effects of the atmosphere), users should select the latter.
-`profile`: specification for users to define the aersol profile selected in the Py6S model. Module A is currently optimized for `urban` applications, so consider using `maritime` sparingly.
- Future work for Module A will incorporate the maritime-optimized atmospheric correction efforts led by Matt McCarthy.
### Module P: ***P***ansharpening
-`source_directory`: the directory containing the raw, Level 1B Maxar imagery (or the output of Module A for multi-module implementations). When Module P is run as a standalone module, this argument will be `INPUT` in the template. For multi-module configurations, this argument should match the `output_directory` of Module A.
-`output_directory`: the directory in which to store the output of Module P processing. For multi-module configurations, this argument should match both the `source_directory` for Module P and the `output_directory` of Module A.
-`source_directory`: the directory containing the output of Module A. This argument should match the `output_directory` of Module A.
-`output_directory`: the directory in which to store the output of Module P processing. This argument should match both the `source_directory` for Module P and the `output_directory` of Module A.
-`method`: specification for users to define a pansharpening method. Currently, only `nn_diffuse` is supported.
-`module_list`: specification for users to identify all modules used in the given configuration. The presence, or absence, of the string `MODA` in this argument determines how the input directories are processed, as Module P can be used standalone (where input is a Maxar directory) or as a multi-module configuration (where input is the output of Module A). Options for this argument are: 'MODP', 'MODP, MODO', 'MODA, MODP', 'MODA, MODP, MODO'.
-`module_list`: specification for users to identify all modules used in the given configuration. The presence, or absence, of the string `MODA` in this argument determines how the input directories are processed. In this example `config.json`, users would specify Modules A, P, and O. Options for this argument are: 'MODP', 'MODP, MODO', 'MODA, MODP', 'MODA, MODP, MODO'.
### Module O: ***O***rthorectification
- 'source_directory': the directory containing the output of Module P. This argument should match the `output_directory` of Module P, as Module O is only supported in mutli-module configurations.
- 'output_directory': the directory in which to store the output of Module O processing. This argument should match both the `source_directory` for Module O and the `output_directory` of Module P, as Module O is only supported in multi-module configurations.
No newline at end of file
- 'source_directory': the directory containing the output of Module P. This argument should match the `output_directory` of Module P.
- 'output_directory': the directory in which to store the output of Module O processing. This argument should match both the `source_directory` for Module O and the `output_directory` of Module P.
## Running the Module
- To run the standalone module independently of any workflow:
```bash
make run
```
- To run the module with Xylem, at maximum verbosity, using the `config.json`:
```bash
xylem run -vvvv
```
## General Notes for a Multi-Module Configuration
- The general structure of the configuration remains the same:
- Specify the input directory or series of directories:
- There will still be the initial `input` value to include outside of the module definitions. This is generally going to be the directory (or specific subdirectories) of raw, level 1B imagery.
- Define the modules:
- Just as with a standalone configuration, users will need to identify all of the modules being used for processing, in the correct order and correct paths.
- Define each module's variables:
- All variables are defined as above. However, the only caveat with the multi-module implementation is aligning the input/source and output directories between modules.
-**The output of a given module will always be the same path as the input to the following module.**
-**All non-primary modules will have the same path for their source and output directories.**
- Follow the template:
- Just as with a standalone configuration, users can specify module arguments within each module's template.
- The multi-module configuration will provide outputs at each stage of processing, which are used as input to subsequent stages. This design is meant to support easier troubleshooting between modules, if needed.
- As imagery is processed, output files will be appended with the name of the processing module. Therefore, imagery processed with Modules A, P, and O will have filenames ending in `MODA_MODP_MODO`.
- Module P also updates the middle of the filenames from `P1BS` for panchromatic imagery and `M1BS` for multispectral imagery to `S3XS` which is a Maxar naming standard and a naming convention upheld throughout the use of Legacy PIPE.
- For multi-module processing, combinations can **only** be made in these two orders:
> Module A -> Module P -> Module O
> Module P -> Module O
- Atmospheric correction is **always** the first processing step if it is included, and imagery **must** be pansharpened before it is orthorectified.
- Specify the input directory or series of directories:
- Use the `input` value to parse through imagery in parallel.
- In the associated `config.json`, there is a single directory listed. Xylem will parse through this main directory and concurrently process all available subdirectories.
```json
"input":
"/path/to/input/directory"
```
- Multiple directories may also be specified. Note that when listing multiple directories, they are listed within `[ ]`, signaling to Xylem these specific directories should be concurrently processed.
```json
"input":[
"/path/to/input/order1",
"/path/to/input/order2",
"/path/to/input/order3",
"/path/to/input/order4"
]
```
- Define the module:
- Within the module variable, specify the `name` and `uri` to the module. The `uri` specifically informs Xylem from where to install the module environment. For example, if a user has Xylem installed and has cloned each of the modules in their `dev` directory within a Docker container, this path may look something like:
```json
"uri":"file:///root/dev/module-a"
```
- Define the module variables:
- Variables are unique to each module (detailed below).
- Simply follow the `template`:
- This section is built from the module's `Makefile`. Here, users can specify arguments and associated variables. Below is an example template for Module A:
```json
"template":{
"command":"python",
"environment":{
"name":"module-a",
"manager":"conda"
},
"arguments":[
"-m",
"lib",
"--input_directory",
"{{ INPUT }}",
"--output_directory",
"{{ OUTPUT_DIRECTORY }}",
"--method",
"{{ METHOD }}",
"--profile",
"{{ PROFILE }}"
]
}
```
## Module Variables
For more thorough discussions on variables and overall structure, along with a link to technical documentation, please browse the README:
-`input_directory`: the directory containing the raw, Level 1B Maxar imagery. Module A currently expects a Maxar data directory structure. Because Module A is either run as a standalone module or always the first in a multi-module sequence, this argument will always be `INPUT` in the template.
-`output_directory`: the directory in which to store the output of Module A processing.
-`method`: specification for users to define if they want the output to contain top-of-atmosphere reflectance (`toa_reflectance`) or bottom-of-atmosphere reflectance (`boa_reflectance`). For the best representation of true surface reflectance (e.g., the removal of the blue effects of the atmosphere), users should select the latter.
-`profile`: specification for users to define the aersol profile selected in the Py6S model. Module A is currently optimized for `urban` applications, so consider using `maritime` sparingly.
- Future work for Module A will incorporate the maritime-optimized atmospheric correction efforts led by Matt McCarthy.
## Running the Module
- To run the standalone module independently of any workflow:
```bash
make run
```
- To run the module with Xylem, at maximum verbosity, using the `config.json`:
> For geospatial data processing, there are two routes to take for multi-module processing: either with or without including Module A. If included, Module A will always be listed first in the multi-module workflow, followed by Module P and then finally Module O.
## Understanding the Entire Config
Multi-module configurations follow the same overall structure as single module configurations, simply with more than one module listed. Please reference the `README-single-modules.md` before attempting a multi-module approach.
The general structure of the configuration remains the same:
- Specify the input directory or series of directories:
- There will still be the initial `input` value to include outside of the module definitions. This is generally going to be the directory (or specific subdirectories) of raw, level 1B imagery.
- Define the modules:
- Just as with a standalone configuration, users will need to identify all of the modules being used for processing, in the correct order and correct paths.
- Define each module's variables:
- All variables are still defined as previously mentioned in the `README-single-modules.md`. However, the only caveat with the multi-module implementation is aligning the input/source and output directories between modules.
-**The output of a given module will always be the same path as the input to the following module.**
-**All non-primary modules will have the same path for their source and output directories.**
- Follow the template:
- Just as with a standalone configuration, users can specify module arguments within each module's template.
## General Comments
- The multimodule configuration will provide outputs at each stage of processing, which are used as input to subsequent stages. This design is meant to support easier troubleshooting between modules, if needed.
- As imagery is processed, output files will be appended with the name of the processing module. Therefore, imagery processed with Modules A, P, and O will have filenames ending in `MODA_MODP_MODO`.
- Module P also updates the middle of the filenames from `P1BS` for panchromatic imagery and `M1BS` for multispectral imagery to `S3XS` which is a Maxar naming standard and a naming convention upheld throughout the use of Legacy PIPE.
- For multi-module processing, combinations can **only** be made in these two orders:
> Module A -> Module P -> Module O
> Module P -> Module O
- Atmospheric correction is **always** the first processing step if it is included, and imagery **must** be pansharpened before it is orthorectified.
- For more thorough discussions on variables and structure for individual modules, please browse the README for each: