@@ -14,11 +14,13 @@ Note: Spark 1.6.x + Hadoop 2.6.x (stand-alone) must be installed and the SPARK_H
...
@@ -14,11 +14,13 @@ Note: Spark 1.6.x + Hadoop 2.6.x (stand-alone) must be installed and the SPARK_H
To configure a spark job, run spark_setup.py with the following parameters:
To configure a spark job, run spark_setup.py with the following parameters:
-s <file> : The PBS script file to generate'
-s <file> : The PBS script file to generate
-a <account> : Name of account to charge'
-a <account> : Name of account to charge
-n <num> : Number of nodes'
-n <num> : Number of nodes*
-w <time> : Maximum walltime'
-w <time> : Maximum walltime
-d <path> : Spark deployment directory'
-d <path> : Spark deployment directory
*Note: number of nodes must be 2 or greater (tasks are not run on master node)
The deployment directory must be unique for each Spark batch job being executed and should be located in a scratch space (Spark uses this directory to write temporary files). After running spark_setup.py, the specified deployment directory will be created (or re-initialized if it already exists) and template configuration files/scripts will copied into the "templates" subdirectory under the deployment directory. If needed, these template files may be modified before the Spark job is submitted. When the job is submitted, these template files will be copied into per-node configuration directories and used by Spark to configure worker nodes.
The deployment directory must be unique for each Spark batch job being executed and should be located in a scratch space (Spark uses this directory to write temporary files). After running spark_setup.py, the specified deployment directory will be created (or re-initialized if it already exists) and template configuration files/scripts will copied into the "templates" subdirectory under the deployment directory. If needed, these template files may be modified before the Spark job is submitted. When the job is submitted, these template files will be copied into per-node configuration directories and used by Spark to configure worker nodes.