Update README and scripts. (808dadc1) · Commits · Lee, Seyong / OpenARCExamples

README.md

+99 −11

Original line number	Diff line number	Diff line
		@@ -76,7 +76,7 @@ EXPERIMENTS
		$ cd ${openarcexamples}/matmul
		//Run the OpenARC compiler.
		$ ./O2GBuild.script
		//Check generated output files.
		//Check the generated output files.
		$ vi ./cetus_output/openarc_kernel.cu
		$ vi ./cetus_output/matmul.cpp
		//Compile the generated output files.
		@@ -84,52 +84,140 @@ EXPERIMENTS
		//Submit a job to run the output program.
		$ cd bin; sbatch gpu.sh

		- Task2: learn how to use an OpenARC commandline option: showInternalAnnotations
		* Experiment2: learn how to use OpenARC commandline options.
		- Task1: learn how to use an OpenARC commandline option: showInternalAnnotations
		$ cd ${openarcexamples}/matmul
		//Open the OpenARC configuration file and set showInternalAnnotations to 1 (e.g., showInternalAnnotations=1)
		$ vi ./matmul/openarcConf.txt
		//Run O2GBuild.script script again.
		$ ./O2GBuild.script
		//Check generated output files
		//Check the generated output files
		$ vi ./cetus_output/openarc_kernel.cu
		$ vi ./cetus_output/matmul.cpp
		//Repeat the above steps by changing showInternalAnnotations up to 3.

		- Task3: learn how to use commandline options for caching optimizations: shrdArryCachingOnTM
		- Task2: learn how to use commandline options for caching optimizations: shrdArryCachingOnTM
		$ cd ${openarcexamples}/matmul
		//Open the OpenARC configuration file and set shrdArryCachingOnTM to 1 (e.g., shrdArryCachingOnTM=1)
		$ vi ./matmul/openarcConf.txt
		//Run O2GBuild.script script again.
		$ ./O2GBuild.script
		//Check generated output files
		//Check the generated output files
		$ vi ./cetus_output/openarc_kernel.cu
		$ vi ./cetus_output/matmul.cpp

		- Task4: learn how to use a commandline option needed for manual profiling: forceSyncKernelCall
		- Task3: learn how to use a commandline option needed for manual profiling: forceSyncKernelCall
		$ cd ${openarcexamples}/matmul
		//Open the OpenARC configuration file and set forceSyncKernelCall to 1 (e.g., forceSyncKernelCall=1)
		$ vi ./matmul/openarcConf.txt
		//Run O2GBuild.script script again.
		$ ./O2GBuild.script
		//Check generated output files
		//Check the generated output files
		$ vi ./cetus_output/matmul.cpp

		- Task5: learn how to use a commandline option to set default number of workers: defaultNumWorkers
		- Task4: learn how to use a commandline option to set default number of workers: defaultNumWorkers
		$ cd ${openarcexamples}/matmul
		//Open the OpenARC configuration file and set defaultNumWorkers to 128 (e.g., defaultNumWorkers=128)
		$ vi ./matmul/openarcConf.txt
		//Run O2GBuild.script script again.
		$ ./O2GBuild.script
		//Check generated output files
		//Check the generated output files
		$ vi ./cetus_output/openarc_kernel.cu
		$ vi ./cetus_output/matmul.cpp

		- Task6: learn how to use a commandline option to set the maximum number of gangs: maxNumGangs
		- Task5: learn how to use a commandline option to set the maximum number of gangs: maxNumGangs
		$ cd ${openarcexamples}/matmul
		//Open the OpenARC configuration file and set maxNumGangs to 32 (e.g., maxNumGangs=32)
		$ vi ./matmul/openarcConf.txt
		//Run O2GBuild.script script again.
		$ ./O2GBuild.script
		//Check generated output files
		//Check the generated output files
		$ vi ./cetus_output/openarc_kernel.cu
		$ vi ./cetus_output/matmul.cpp

		- Task6: learn how to use a commandline option (targetArch) to generate output programs for non-CUDA targets (e.g., AMD GPU).
		$ cd ${openarcexamples}/matmul
		//Open the OpenARC configuration file and set targetArch to 1 (e.g., targetArch=1)
		$ vi ./matmul/openarcConf.txt
		//Run O2GBuild.script script again.
		$ ./O2GBuild.script
		//Check the generated output files
		$ vi ./cetus_output/openarc_kernel.cu
		$ vi ./cetus_output/matmul.cpp
		//To compile and run the generated output programs, environment variable OPENARC_ARCH should be also set accordingly (e.g., OPENARC_ARCH=1)
		//If the commandline option targetArch is not used, environment variable OPENARC_ARCH will be used to decide the target architecture.

		* Experiment3: learn how to use CUDA Unified Memory with OpenARC
		- Task1: learn how to use CUDA Unified Memory with OpenARC.
		- To use Unified Memory, 1) host data should be allocated using OpenARC's Unified Memory APIs such as acc_create_unified(), and
		2) environment variable OPENARCRT_UNIFIEDMEM should be set to 1 (e.g., OPENARCRT_UNIFIEDMEM=1).
		- To create an OpenACC program that works both with Unified Memory and without Unified Memory, write the OpenACC program as if targeting devices without Unified Memory and OpenARC's Unified Memory APIs for host memory allocation.
		- If OPENARCRT_UNIFIEDMEM=0, OpenARC's Unified Memory APIs work as if using corresponding OpenACC data managment runtime APIs (acc_create_unified() ==> acc_create()).
		- If OPENARCRT_UNIFIEDMEM=1, OpenACC data clauses are ignored for the data allocated on Unified Memory.
		- Using the OpenARC's Unified Memory APIs allows users to apply CUDA

		- Task2: compile and run an example program using OpenARC's Unified Memory APIs.
		$ cd ${openarcexamples}/unifiedmemory
		//Run O2GBuild.script script.
		$ ./O2GBuild.script
		//Check the generated output files
		$ vi ./cetus_output/jacobi.cpp
		//Compile the generated output files.
		$ make
		//Submit a job to run the output program.
		$ cd bin; sbatch gpu.sh

		* Experiment4: learn how to use OpenMP with OpenARC
		- Task1: learn how to use OpenMP with OpenARC.
		- To use OpenMP with OpenACC, 1) environment variable OMP_NUM_THREADS should be explicitly set (e.g., OMP_NUM_THREADS=10), and
		2) A macro in Makefile, OMP should be set to 1.
		- When OMP=1, the backend compiler will use the OpenMP version of the OpenARC runtime library.

		- Task2: compile and run openmp example.
		$ cd ${openarcexamples}/openmp
		//Check the OMP macro in Makefile.
		$ vi ./Makefile
		//Run the OpenARC compiler.
		$ ./O2GBuild.script
		//Check the generated output files.
		$ vi ./cetus_output/openmp.cpp
		//Compile the generated output files.
		$ make
		//Submit a job to run the output program.
		$ cd bin; sbatch gpu.sh

		* Experiment5: learn how to use MPI with OpenARC
		- Task1: learn how to use MPI with OpenARC.
		- To use MPI with OpenACC, 1) commandline option addIncludePath should be set to the MPI include path, it is not in the default include search path, and
		2) compile the OpenARC-generated output program using an MPI compiler (e.g., mpicxx).

		- Task2: compile and run jacobi_mpi example.
		$ cd ${openarcexamples}/jacobi_mpi
		//Run the OpenARC compiler.
		$ ./O2GBuild.script
		//Check the generated output files.
		$ vi ./cetus_output/jacobi_mpi.cpp
		//Compile the generated output files.
		$ make
		//Submit a job to run the output program.
		$ cd bin; sbatch gpu.sh

		* Experiment6: learn how to use OpenARC's built-in interactive debugging tools
		- Task1: learn how to use OpenARC's built-in interactive debugging tools.
		- Use commandline option programVerification:
		- programVerification=1 //verify the correctness of CPU-GPU memory transfers
		- programVerification=2 //verify the correctness of GPU kernel translation
		- OpenARC offers various sub-options to control the interactive debugging tools.
		- verificationOptions, defaultMarginOfError, minValueToCheck, etc.

		- Task2: compile and run matmul_debug example.
		$ cd ${openarcexamples}/matmul_debug
		//Run the OpenARC compiler.
		$ ./O2GBuild.script
		//Check the generated output files.
		$ vi ./cetus_output/jacobi_mpi.cpp
		//Compile the generated output files.
		$ make
		//Submit a job to run the output program.
		$ cd bin; sbatch gpu.sh
		//Repeat the above steps by changing the programVerification option value (e.g., programVerification=1)

openmp/Makefile

+1 −2

Original line number	Diff line number	Diff line
		@@ -3,7 +3,6 @@ include $(openarc)/make.header
		################################################
		# Makefile options that the user can overwrite #
		################################################
		OMP = 1

		########################
		# Set the program name #
		@@ -32,7 +31,7 @@ DEFSET_ACC = -D_N_=$(_N_)
		# verbosity level set by OPENARCRT_VERBOSITY #
		# environment variable. #
		#########################################################
		OMP ?= 0
		OMP ?= 1
		MODE ?= normal
		POSTCMD ?= cp ./gpu.sh ./bin/

openmp/openarcConf_NORMAL.txt

+1 −1

Original line number	Diff line number	Diff line
		@@ -321,7 +321,7 @@
		#targetArch=0
		#AccAnalysisOnly=1
		#SkipGPUTranslation
		showInternalAnnotations=1
		showInternalAnnotations=0
		##########################
		# Analysis configuration #
		##########################

openmp/openmp.c

+2 −2

Original line number	Diff line number	Diff line
		@@ -41,7 +41,7 @@ int main() {
		#ifdef _OPENACC
		is_sync = acc_async_test(thread_id);
		if( is_sync != 0 ) {
		printf("Thread %d fails async test1\n", thread_id);
		printf("Thread %d fails async test1 (compute region launched by thread %d has not finished yet)\n",thread_id, thread_id);
		}

		#pragma acc wait(thread_id);
		@@ -49,7 +49,7 @@ int main() {

		is_sync = acc_async_test(thread_id);
		if( is_sync == 0 ) {
		printf("Thread %d fails async test2\n", thread_id);
		printf("Thread %d fails async test2 (compute region launched by thread %d has not finished yet)\n",thread_id, thread_id);
		}
		#endif
		}