Commit a709b986 authored by Joel E. Denny's avatar Joel E. Denny
Browse files

[Clacc][OpenACC] Extend implicit worker/vector clause options

This patch adds `-fopenacc-implicit-worker=vector`, which instructs
Clacc's Clang to implicitly determine `worker` clauses only on `loop`
constructs with explicit `vector` clauses.  This choice can be useful
when compiling OpenACC applications primarily employing explicit
`gang` and `vector` clauses while targeting an OpenMP implementation
(like Clacc's) for which `omp simd` (to which Clacc translates
`vector`) does not increase parallelism for the given offload devices.

This patch also adds `-fopenacc-implicit-(worker|vector)=none|outer`
as aliases for `-f[no-]openacc-implicit-(worker|vector)`.

Finally, it adds `-fopenacc-implicit-worker=vector-outer`, which
instructs Clacc's Clang to apply `-fopenacc-implicit-worker=vector`
followed by `-fopenacc-implicit-worker=outer`.
parent 36a30419
Loading
Loading
Loading
Loading
Loading
+51 −25
Original line number Diff line number Diff line
@@ -50,11 +50,18 @@ OpenACC-related and OpenMP-related command-line options, run Clacc's
    * See the section "Using" in `../README.md` for more usage details.
    * See the section "Interaction with OpenMP Support" in
      `README-OpenACC-design.md` for design details.
* `-f[no-]openacc-implicit-worker` and `-f[no-]openacc-implicit-vector`
    * Enables (or disables) implicitly determining `worker` and `vector` clauses
      on `loop` constructs to try to increase parallelism and thus performance.
    * Currently, they are disabled by default, but that might change in the
      future.
* Options controlling the implicit determination of `worker` and `vector`
  clauses on `loop` constructs
    * The goal is to try to increase parallelism and thus performance.
    * Currently, implicit `worker` and `vector` clauses are disabled by default.
    * `-fopenacc-implicit-worker=none|vector|outer|vector-outer` specifies the
      `loop` constructs on which `worker` clauses are implicitly determined.
    * `-fopenacc-implicit-vector=none|outer` specifies the `loop` constructs on
       which `vector` clauses are implicitly determined.
    * `-f[no-]openacc-implicit-worker` are currently aliases for
      `-fopenacc-implicit-worker=none|outer`.
    * `-f[no-]openacc-implicit-vector` are currently aliases for
      `-fopenacc-implicit-vector=none|outer`.
    * See the section "`loop` Directive" below for details.
* Other OpenMP options
    * `-fopenmp` produces an error when OpenACC support is enabled as
@@ -67,14 +74,10 @@ OpenACC-related and OpenMP-related command-line options, run Clacc's
      warnings for them.
* Options controlling the translation to OpenMP and their associated
  diagnostics
    * `-fopenacc-update-present-omp=KIND` where `KIND` is either
      `present` or `no-present`
    * `-fopenacc-structured-ref-count-omp=KIND` where `KIND` is either
      `ompx-hold` or `no-ompx-hold`
    * `-fopenacc-present-omp=KIND` where `KIND` is either `present` or
      `no-present`
    * `-fopenacc-no-create-omp=KIND` where `KIND` is either
      `ompx-no-alloc` or `no-ompx-no-alloc`
    * `-fopenacc-update-present-omp=present|no-present`
    * `-fopenacc-structured-ref-count-omp=ompx-hold|no-ompx-hold`
    * `-fopenacc-present-omp=present|no-present`
    * `-fopenacc-no-create-omp=ompx-no-alloc|no-ompx-no-alloc`
    * `-Wopenacc-omp-update-present`
    * `-Wopenacc-omp-map-ompx-hold`
    * `-Wopenacc-omp-map-present`
@@ -356,23 +359,46 @@ Run-Time Environment Variables
        * This is always enabled because it is specified by OpenACC (introduced
          in 3.1) and can be important for correct behavior of the application.
    * Implicit `worker` and `vector` clauses
        * These are enabled/disabled by `-f[no-]openacc-implicit-worker` and
          `-f[no-]openacc-implicit-vector`.  Currently, they are disabled by
          default, but that might change in the future.
        * Their purpose is to try to increase parallelism and thus performance.
        * Currently, they are disabled by default, but that might change in the
          future.
        * Their goal is to try to increase parallelism and thus performance.
        * For a conforming OpenACC application (e.g., `loop` constructs are
          never misidentified as `independent`), they should never affect
          behavioral correctness.  Thus, the OpenACC specification does not
          specify whether or how they are determined but also does not prohibit
          them, and OpenACC compilers typically implement them.
        * The current algorithm for determining which `loop` constructs should
          receive implicit `worker` and `vector` clauses uses a simple heuristic
          similar to the algorithm specified by OpenACC for implicit `gang`
          clauses: after any conversion of `auto` clauses to `seq`, and after
          determining implicit `routine` directives (see "Implicit `routine`
          directive" below), select each loop nest's outermost `loop` constructs
          on which `worker` and `vector` clauses are permitted.  A more
          sophisticated analysis might be employed in the future.
        * The OpenACC spec does specify the implicit determination of `gang`
          clauses.  Like implicit `gang` clauses, implicit `worker` and `vector`
          clauses are determined after any conversion of `auto` clauses to `seq`
          and after implicit `routine` directives are determined (see "Implicit
          `routine` directive" below).
        * `-fopenacc-implicit-worker=none|vector|outer|vector-outer` specifies
          the `loop` constructs on which `worker` clauses are implicitly
          determined:
          - `none` suppresses all implicit `worker` clauses.
            `-fno-openacc-implicit-worker` is an alias.
          - `vector` specifies `loop` constructs with explicit `vector` clauses.
            This choice can be useful when compiling OpenACC applications
            primarily employing explicit `gang` and `vector` clauses while
            targeting an OpenMP implementation (like Clacc's) for which
            `omp simd` (to which Clacc translates `vector`) does not increase
            parallelism for the given offload devices.
          - `outer` specifies each loop nest's outermost `loop` constructs on
             which `worker` clauses are permitted.  This is similar to how the
             OpenACC spec places implicit `gang` clauses.
            `-fopenacc-implicit-worker` is currently an alias.
          - `vector-outer` applies `vector` followed by `outer`.
        * `-fopenacc-implicit-vector=none|outer` specifies the `loop` constructs
          on which `vector` clauses are implicitly determined:
          - `none` suppresses all implicit `vector` clauses.
            `-fno-openacc-implicit-vector` is an alias.
          - `outer` specifies each loop nest's outermost `loop` constructs on
             which `vector` clauses are permitted.  This is similar to how the
             OpenACC spec places implicit `gang` clauses.
            `-fopenacc-implicit-vector` is currently an alias.
        * The algorithms selected by `-fopenacc-implicit-worker` and
          `-fopenacc-implicit-vector` might change in the future as we determine
          better defaults.
    * For now, if none of these clauses appear (explicitly or
      implicitly), then a sequential loop is produced.
* Supported data attributes and clauses
+6 −2
Original line number Diff line number Diff line
@@ -257,8 +257,12 @@ LANGOPT(OpenMPNoNestedParallelism , 1, 0, "Assume that no thread in a parallel
LANGOPT(OpenMPOffloadMandatory  , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.")
LANGOPT(NoGPULib  , 1, 0, "Indicate a build without the standard GPU libraries.")
LANGOPT(OpenACC           , 1, 0, "OpenACC support")
LANGOPT(OpenACCImplicitWorker, 1, 0, "Add implicit worker clauses to loop constructs")
LANGOPT(OpenACCImplicitVector, 1, 0, "Add implicit vector clauses to loop constructs")
ENUM_LANGOPT(OpenACCImplicitWorker, OpenACCImplicitWorkerKind, 2,
             OpenACCImplicitWorker_None,
             "Loop constructs to which implicit worker clauses are added")
ENUM_LANGOPT(OpenACCImplicitVector, OpenACCImplicitVectorKind, 2,
             OpenACCImplicitVector_None,
             "Loop constructs to which implicit vector clauses are added")
ENUM_LANGOPT(OpenACCUpdatePresentOMP, OpenACCUpdatePresentOMPKind, 1,
             OpenACCUpdatePresentOMP_Present,
             "The OpenMP translation of the OpenACC 'update' directive without 'if_present'")
+38 −0
Original line number Diff line number Diff line
@@ -136,6 +136,44 @@ public:
    DCC_RegCall
  };

  enum OpenACCImplicitWorkerKind {
    OpenACCImplicitWorker_None,
    OpenACCImplicitWorker_Vector,
    OpenACCImplicitWorker_Outer,
    OpenACCImplicitWorker_VectorOuter,
    OpenACCImplicitWorker_Last = OpenACCImplicitWorker_VectorOuter
  };
  static StringRef
  getOpenACCImplicitWorkerValue(OpenACCImplicitWorkerKind K) {
    switch (K) {
    case OpenACCImplicitWorker_None:
      return "none";
    case OpenACCImplicitWorker_Vector:
      return "vector";
    case OpenACCImplicitWorker_Outer:
      return "outer";
    case OpenACCImplicitWorker_VectorOuter:
      return "vector-outer";
    }
    llvm_unreachable("unexpected OpenACCImplicitWorkerKind");
  }

  enum OpenACCImplicitVectorKind {
    OpenACCImplicitVector_None,
    OpenACCImplicitVector_Outer,
    OpenACCImplicitVector_Last = OpenACCImplicitVector_Outer
  };
  static StringRef
  getOpenACCImplicitVectorValue(OpenACCImplicitVectorKind K) {
    switch (K) {
    case OpenACCImplicitVector_None:
      return "none";
    case OpenACCImplicitVector_Outer:
      return "outer";
    }
    llvm_unreachable("unexpected OpenACCImplicitVectorKind");
  }

  enum OpenACCUpdatePresentOMPKind {
    OpenACCUpdatePresentOMP_Present,
    OpenACCUpdatePresentOMP_NoPresent,
+38 −12
Original line number Diff line number Diff line
@@ -2776,18 +2776,44 @@ def fopenacc_ast_print_EQ : Joined<["-"], "fopenacc-ast-print=">,
           " a result, the output will have preprocessor expansions and"
           " reformatting.">,
  Values<"acc,omp,acc-omp,omp-acc">;
defm openacc_implicit_worker: BoolFOption<"openacc-implicit-worker",
  LangOpts<"OpenACCImplicitWorker">, DefaultFalse,
  PosFlag<SetTrue, [CC1Option, NoArgumentUnused],
          "Enable implicitly determining worker clauses on loop constructs">,
  NegFlag<SetFalse, [CC1Option, NoArgumentUnused],
          "Disable implicitly determining worker clauses on loop constructs">>;
defm openacc_implicit_vector: BoolFOption<"openacc-implicit-vector",
  LangOpts<"OpenACCImplicitVector">, DefaultFalse,
  PosFlag<SetTrue, [CC1Option, NoArgumentUnused],
          "Enable implicitly determining vector clauses on loop constructs">,
  NegFlag<SetFalse, [CC1Option, NoArgumentUnused],
          "Disable implicitly determining vector clauses on loop constructs">>;
def fopenacc_implicit_worker_EQ :
  Joined<["-"], "fopenacc-implicit-worker=">, Group<f_Group>,
  Flags<[CC1Option, NoArgumentUnused]>, MetaVarName<"<where>">,
  HelpText<"Implicitly determine worker clauses on loop constructs specified "
           "by <where> as permitted.  'none' (default) suppresses all implicit "
           "worker clauses.  'vector' specifies loop constructs with explicit "
           "vector clauses.  'outer' specifies the outermost loop constructs "
           "permitted in every loop nest.  'vector-outer' applies 'vector' "
           "followed by 'outer'.">,
  Values<"none,vector,outer,vector-outer">;
def fopenacc_implicit_worker:
  Flag<["-"], "fopenacc-implicit-worker">, Group<f_Group>,
  Flags<[CC1Option, NoArgumentUnused]>,
  Alias<fopenacc_implicit_worker_EQ>, AliasArgs<["outer"]>,
  HelpText<"Equivalent to -fopenacc-implicit-worker=outer">;
def fno_openacc_implicit_worker:
  Flag<["-"], "fno-openacc-implicit-worker">, Group<f_Group>,
  Flags<[CC1Option, NoArgumentUnused]>,
  Alias<fopenacc_implicit_worker_EQ>, AliasArgs<["none"]>,
  HelpText<"Equivalent to -fopenacc-implicit-worker=none">;
def fopenacc_implicit_vector_EQ :
  Joined<["-"], "fopenacc-implicit-vector=">, Group<f_Group>,
  Flags<[CC1Option, NoArgumentUnused]>, MetaVarName<"<where>">,
  HelpText<"Implicitly determine vector clauses on loop constructs specified "
           "by <where> as permitted.  'none' (default) suppresses all implicit "
           "vector clauses.  'outer' specifies the outermost loop constructs "
           "permitted in every loop nest.">,
  Values<"none,outer">;
def fopenacc_implicit_vector:
  Flag<["-"], "fopenacc-implicit-vector">, Group<f_Group>,
  Flags<[CC1Option, NoArgumentUnused]>,
  Alias<fopenacc_implicit_vector_EQ>, AliasArgs<["outer"]>,
  HelpText<"Equivalent to -fopenacc-implicit-vector=outer">;
def fno_openacc_implicit_vector:
  Flag<["-"], "fno-openacc-implicit-vector">, Group<f_Group>,
  Flags<[CC1Option, NoArgumentUnused]>,
  Alias<fopenacc_implicit_vector_EQ>, AliasArgs<["none"]>,
  HelpText<"Equivalent to -fopenacc-implicit-vector=none">;
def fopenacc_update_present_omp_EQ :
  Joined<["-"], "fopenacc-update-present-omp=">,
  Flags<[CC1Option, NoArgumentUnused]>, Group<f_Group>,
+2 −4
Original line number Diff line number Diff line
@@ -6242,12 +6242,10 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
    Args.addOptOutFlag(CmdArgs, options::OPT_fopenmp_extensions,
                       options::OPT_fno_openmp_extensions);
  }
  Args.addOptInFlag(CmdArgs, options::OPT_fopenacc_implicit_worker,
                    options::OPT_fno_openacc_implicit_worker);
  Args.addOptInFlag(CmdArgs, options::OPT_fopenacc_implicit_vector,
                    options::OPT_fno_openacc_implicit_vector);
  Args.AddLastArg(CmdArgs, options::OPT_fopenacc_print_EQ);
  Args.AddLastArg(CmdArgs, options::OPT_fopenacc_ast_print_EQ);
  Args.AddAllArgs(CmdArgs, options::OPT_fopenacc_implicit_worker_EQ);
  Args.AddAllArgs(CmdArgs, options::OPT_fopenacc_implicit_vector_EQ);
  Args.AddAllArgs(CmdArgs, options::OPT_fopenacc_update_present_omp_EQ);
  Args.AddAllArgs(CmdArgs, options::OPT_fopenacc_structured_ref_count_omp_EQ);
  Args.AddAllArgs(CmdArgs, options::OPT_fopenacc_present_omp_EQ);
Loading