added README (1ee9b89d) · Commits · Belviranli, Mehmet E / juggler

README.md

0 → 100644

+207 −0

Original line number	Diff line number	Diff line
		Summary
		--------

		This repository includes the source code for Juggler host and device
		runtimes and the sources for the applications used in the evaluation.

		The scripts to compile the source code, generate inputs, execute
		binaries, validate results and parse the outputs are also included in
		the repo and explained below in detail.

		Requirements
		-----------
		- Program: CUDA 8.0 APIs.

		- Compilation: NVIDIA `nvcc` version 8.0.

		- Binary: CUDA host(x86-64) and device executable. Linux binary
		(Centos 7 recommended) is included. Source code and scripts to
		re-generate the binaries are also included.

		- Data set: Dynamically generated data, prior to the execution.

		- Run-time environment: CUDA 8.0 APIs and drivers, They are
		included with CUDA 8.0 toolkit distribution.

		- Hardware: NVIDIA Tesla P100 (12GB, PCI-e v3), and Intel CPU
		(Intel Xeon CPU E5-2683 as tested).

		- Output: Verification results and detailed timings such as
		execution times and runtime overhead.

		- Experiment workflow: Linux `bash` scripts.

		- Publicly available?: Yes

		### How software can be obtained (if available)

		The source code for Juggler host & device runtime and the experimented
		applications (both baseline and Juggler-integrated version) can be
		accessed via the following (this) repository:

		``` {.bash language="bash"}

		https://code.ornl.gov/fub/juggler
		```

		### Hardware dependencies

		We have performed our experiments on NVIDIA P100 GPU. The supplied
		makefile will only work for Pascal architecture or later.

		### Software dependencies

		CUDA 8.0 toolkit is required for compilation and profiling. At the time
		of CUDA driver installation, GCC 5.2 was installed on the system. To run
		the scripts, `bash`, and `egrep` is sufficient.

		### Datasets

		Due to large sizes, each application dynamically generates and populates
		its input dataset, as part of its initialization stage.

		Installation
		------------

		Clone Juggler runtime and application suite from the git repo:

		``` {.bash language="bash"}
		$ git clone git@code.ornl.gov:fub/juggler.git
		$ JUGGLER_HOME=$(pwd)/juggler
		```

		Experiment workflow
		-------------------

		To repeat the experiments presented in the evaluation section, we have
		created a script named `exp`. It is located under `$JUGGLER_HOME/build`
		and the parameters to the script are as follows:

		``` {.bash language="bash"}
		exp {scriptMode} {outFilePrefix} {runGB?} {runJUG?} {nRuns} {nProfRuns}
		```

		To repeat the main set of experiments as presented in the paper, run:

		``` {.bash language="bash"}
		$ cd $JUGGLER_HOME/build
		$ bash exp 0 output 1 1 5 1
		```

		This script compiles (i.e. `scriptMode=0`) each application for each
		scheduling policy; then runs each of them five (i.e. `nRuns=5`) times
		for global barriers (i.e. `runGB?=1`) and also five times for the
		Juggler integrated versions(i.e. `runJUG=1`). It also performs an
		additional run with profiling enabled (i.e. `nProfRuns=1`). The program
		output for each application are written into separate files prefixed by
		`outFilePrefix`.

		Evaluation and expected result
		------------------------------

		The execution of the `exp` script with the parameters above will produce
		a series of output files named as `output.$APPNAME` and
		`output.PROF.$APPNAME`, for each application. The `exp` script can also
		be used to properly parse (i.e. `scriptMode=1`) these output files and
		combine the values from all runs in a tab separated value format.

		1. To list the kernel execution times (i.e. the values used to draw
		Figure 4) for all 7 applications, in separate columns:

		``` {.bash language="bash"}
		$ bash exp 1 output execTime 2 formattedResults1.csv
		```

		\medskip
		Parsed values will be written into `formattedResults1.csv`, in a
		tabular format. There will be 7 columns in total, one for each
		application. The number of rows in the csv file will be equal to
		$nRuns \times 4$, (i.e. 20, when main experiment is run with 5
		runs). The first 15 rows will be for $LRR$, $GRR$ and $LF$,
		respectively, in groups of five. The last 5 rows will be for global
		barriers.

		2. To see verification results against serial execution:

		``` {.bash language="bash"}
		$ bash exp 1 output check 2 formattedResults2.csv
		```

		The parsed output will be written into `formattedResults2.csv` file,
		and the values will be either SUCCESS or FAIL.

		3. To parse task load deviations (i.e. Figure 5), run:

		``` {.bash language="bash"}
		$ bash exp 1 output avgTaskLoad 2 formattedResults3.csv
		```


		Experiment customization
		------------------------

		Number of runs for each type of run in the main experiment can be
		modified by changing the input parameters of `exp` script, as explained
		in A.4.

		Additionally, if only specific subset of applications is desired to be
		tested, the values in the bash array, named `$APPS`, in the `exp` script
		can be modified.

		More information can be parsed from the output files by providing
		the key string and column number to the value parser (i.e. *\"exp
		1\"*). A few examples:

		1. Cache miss data: Keys can be
		`write_sector_misses, write_sector_queries, read_sector_misses` and
		`read_sector_queries`.

		``` {.bash language="bash"}
		$ bash exp 1 output.PROF write_sector_misses 7 formatted.tsv
		```

		2. Host runtime and inspection loop overhead breakdown:

		``` {.bash language="bash"}
		$ bash exp 1 output initAppContext_H 2 formatted.tsv
		$ bash exp 1 output initAppContext_D 2 formatted.tsv
		$ bash exp 1 output initRtContext_D 2 formatted.tsv
		$ bash exp 1 output inspectionLoop 2 formatted.tsv
		$ bash exp 1 output buildCSR 2 formatted.tsv
		```

		3. Total application runtime, including user data initialization and
		transfers:

		``` {.bash language="bash"}
		$ bash exp 1 output totalTime 2 formatted.tsv
		```


		Compiling & Running single application: In our test suite,
		applications are distinguished with compiler directives, to optimize the
		resource usage for the ones that share common kernels. Similarly,
		Juggler runtime requires a re-compilation if internal run-time
		parameters (e.g. scheduling policy) are changed.

		1. To recompile Juggler for the desired application and scheduling
		policy:

		``` {.bash language="bash"}
		$ cd $JUGGLER_HOME/build
		$ bash switchAPP {DTW\|HEAT\|INT\|JACOBI\|SAT\|SW\|LUD} {LRR\|GRR\|LF}}
		```

		2. To run the compiled application with Juggler runtime:

		``` {.bash language="bash"}
		$ cd $JUGGLER_HOME/build
		$ ./OMP_CUDART -n {matrix_size} -b {block_size} -d {1\|2} [-c]
		```

		The `-d` parameter indicates the run mode, which is 1 for Juggler,
		and 2 for global barriers. The optional `-c` parameter enables
		verification against serial version and compares the two outputs.
		`-c` is enabled by default.