Commit 2ed2097d authored by Joel E. Denny's avatar Joel E. Denny
Browse files

[Clacc][OpenACC] Implement tile clause

Thus, deprecate `-fopenacc-fake-tile-clause` and make it a no-op.

Implement `-Wopenacc-omp-atomic-in-teams` to warn when an OpenMP
extension is used to support the `tile` clause in source-to-source
mode.
parent 14378aa8
Loading
Loading
Loading
Loading
+1 −2
Original line number Diff line number Diff line
@@ -1039,8 +1039,7 @@ debug (test):
    # correctly without a full implementation of the associated OpenACC
    # features.  The V&V suite is a convenient way to check for unexpected
    # issues with the incomplete implementations.
    - export VV_CFLAGS="-fopenacc -fopenacc-fake-async-wait
                        -fopenacc-fake-tile-clause -lm $VV_CFLAGS"
    - export VV_CFLAGS="-fopenacc -fopenacc-fake-async-wait -lm $VV_CFLAGS"
    - if test x"$GPU_TRIPLE" != x; then
        export VV_CFLAGS="$VV_CFLAGS -fopenmp-targets=$GPU_TRIPLE";
      fi
+41 −20
Original line number Diff line number Diff line
@@ -1322,7 +1322,7 @@ clarify these points in future versions of the OpenACC specification.

* For an `acc loop` directive with *exp* `seq` such that the loop
  control variable is just assigned instead of declared in the init of
  the attached `for` loop, the loop control variable is *imp*
  an attached `for` loop, the loop control variable is *imp*
  `shared`.  Notes:
    * Otherwise, there appears to be no way to tell an aggressive
      OpenACC compiler to leave such a loop as a normal sequential
@@ -1386,7 +1386,7 @@ clarify these points in future versions of the OpenACC specification.
      rule applies for sequential `acc loop` directives even
      though the mapping for them discards the `reduction`.
    * If the loop control variable is declared instead of just
      assigned in the init of the attached `for` loop, any reference
      assigned in the init of an attached `for` loop, any reference
      to the variable's name in the directive's clauses refers to a
      different variable, so this rule does not apply.
    * Clang's OpenMP implementation also enforces this constraint.
@@ -1926,17 +1926,16 @@ following are true:
Clacc's current mapping of a sequential `acc loop` directive and its
clauses to OpenMP is as follows:

* Translation discards the `acc loop` directive and the following
  clauses or attributes:
* Translation discards the following clauses or attributes:
    * *exp* `seq`, *exp* `independent`, *exp* `auto`
    * *exp* `gang`, *exp* `worker`, *exp* `vector`
    * *exp* `collapse`
    * *pre* `private` for a loop control variable that is declared in
      the init of the attached `for` loop
      the init of an attached `for` loop
    * *imp* `shared`, *exp* `reduction`
    * Notes:
        * For a loop control variable that is declared in the init of
          the attached `for` loop, a private copy is already made for
          an attached `for` loop, a private copy is already made for
          the one thread executing the loop.
        * *imp* `shared` is only for variables referenced within the
          loop but declared outside the loop, and these are already
@@ -1954,6 +1953,11 @@ clauses to OpenMP is as follows:
    * *exp*|*pre* `private` just needs to be local to the one thread
      executing the loop, and so creating a new local variable is
      sufficient.
* If *exp* `tile(`*sizes*`)`, then `acc loop` -> `omp tile sizes(`*sizes*`)`
  with any `*` or non-constant expression in *sizes* replaced by `1`.  Note: See
  `tile` documentation in `README-OpenACC-status.md` for why some arguments are
  converted to `1`.
* Else, translation discards the `acc loop` directive.

A sequential `acc loop` directive is gang-redundant, worker-single,
vector-single mode.  Thus, as far as partitioning is concerned, simple
@@ -1994,17 +1998,26 @@ its clauses to OpenMP is as follows:

* `acc loop` -> `omp`
* *exp*|*imp* `gang` -> `distribute`
* *exp* `worker` -> `parallel for`
* *exp* `vector` -> `simd`
* If *exp* `tile` and *not* `vector`, then translation discards *exp* `worker`.
  Note: In this case, `worker` should be applied to the element/intra-tile loop
  generated by the `tile` clause according to OpenACC 3.3, but there appears to
  be no way to express that behavior in OpenMP 5.2, as discussed in the `tile`
  documentation in `README-OpenACC-status.md`.
* Else, *exp* `worker` -> `parallel for`
* If *exp* `tile`, then translation discards *exp* `vector`.
  Note: In this case, `vector` should be applied to the element/intra-tile loop
  generated by the `tile` clause according to OpenACC 3.3, but there appears to
  be no way to express that behavior in OpenMP 5.2, as discussed in the `tile`
  documentation in `README-OpenACC-status.md`.
* Else, *exp* `vector` -> `simd`
* The output `distribute`, `parallel for`, and `simd` OpenMP directive
  components are sorted in the above order before all clauses regardless of the
  input clause order.
* If *exp* `worker`, then *exp* `num_workers` from ancestor `acc
  parallel` -> *exp* `num_threads` where the argument is either (1)
  the original *exp* `num_workers` argument if it is a constant
  expression or (2) otherwise an expression containing only a
  reference to the local `const` variable generated for that *exp*
  `num_workers`.  Notes:
* If *exp* `worker` and either *not* `tile` or *exp* `vector`, then *exp*
  `num_workers` from ancestor `acc parallel` -> *exp* `num_threads` where the
  argument is either (1) the original *exp* `num_workers` argument if it is a
  constant expression or (2) otherwise an expression containing only a reference
  to the local `const` variable generated for that *exp* `num_workers`.  Notes:
    * For the ancestor `acc parallel` and for all OpenACC directives
      nested between it and this `acc loop`, Clacc leaves the OpenMP
      data sharing attribute for the local `const` variable for
@@ -2013,12 +2026,20 @@ its clauses to OpenMP is as follows:
      efficient, but not all OpenMP directives permit an *exp*
      `shared` clause.  Thus, relying on implicit data sharing
      attributes throughout simplifies the implementation.
* If *exp* `vector`, then *exp* `vector_length` with a
  constant-expression argument from ancestor `acc parallel` -> *exp*
  `simdlen`.
* If *exp* `vector` and *not* `tile`, then *exp* `vector_length` with a
  constant-expression argument from ancestor `acc parallel` -> *exp* `simdlen`.
* `static:*` within `gang` -> `dist_schedule(static)`
* `static:`*N* within `gang` -> `dist_schedule(static,`*N*`)`
* `collapse` -> `collapse`
* If *exp* `tile(`*sizes*`)`, where *N* is the number of arguments in *sizes*,
  then:
    * If *N* > 1, then -> `collapse(`*N*`)`.  Note: The `collapse` clause
      applies loop partitioning to all generated tile/grid loops.
    * An `omp tile sizes(`*sizes*`)` is inserted as the next directive with any
      `*` or non-constant expression in *sizes* replaced by `1`.  Note: See
      `tile` documentation in `README-OpenACC-status.md` for why some arguments
      are converted to `1` and why the generated element/intra-tile loops remain
      sequential.
* The translation discards *imp* `shared`.  Notes:
    * We have not found a scenario in which *imp* `shared` -> *exp* `shared`
      would benefit behavior.
@@ -2030,11 +2051,11 @@ its clauses to OpenMP is as follows:
    * Because it is apparently useless, it would be confusing in the OpenMP
      source generated in source-to-source mode.
* *pre* `private` for a loop control variable that is declared in the
  init of the attached `for` loop -> *pre* `private`.  Notes:
  init of an attached `for` loop -> *pre* `private`.  Notes:
    * Mapping to *exp* `private` would be erroneous because it would
      refer to a variable from the enclosing scope.
* If *exp* `vector` and the loop control variable is just assigned
  instead of declared in the init of the attached `for` loop, then
  instead of declared in the init of an attached `for` loop, then
  *exp*|*pre* `private` for that variable -> *pre* `linear`.  Then,
  wrap the `omp simd` in a compound statement, and declare an
  uninitialized local copy of the loop control variable.  Notes:
@@ -2045,7 +2066,7 @@ its clauses to OpenMP is as follows:
      OpenMP spec says the step must be the increment from the
      attached loop, (2) the OpenMP spec says the default step for an
      *exp* `linear` is 1, and (3) we don't want to have to implement
      extracting the increment from the attached loop when we can just
      extracting the increment from an attached loop when we can just
      rely on the behavior of *pre* `linear` and thus on Clang's or
      some other target compiler's OpenMP implementation to extract it
      for us.
+74 −30
Original line number Diff line number Diff line
@@ -74,6 +74,7 @@ OpenACC-related and OpenMP-related command-line options, run Clacc's
    * `-Wopenacc-omp-map-present`
    * `-Wopenacc-omp-map-ompx-no-alloc`
    * `-Wopenacc-omp-atomic-in-teams`
    * `-Wopenacc-omp-tile-in-teams`
    * `-Wopenacc-omp-ext`
    * See the section "OpenMP Extensions" below for details.
* Other diagnostic options
@@ -119,34 +120,8 @@ OpenACC-related and OpenMP-related command-line options, run Clacc's
          still produce compile-time diagnostics.  We are adding them as the
          need arises in the applications we are investigating.
    * `-fopenacc-fake-tile-clause`
        * Clacc accepts but mostly discards the OpenACC `tile` clause.  That is,
          it has no OpenMP translation in source-to-source mode, but it can
          cause predetermined `private` clauses, as described below.
        * If a `collapse` clause and a `tile` clause appear on the same `loop`
          construct, a compile-time error diagnostic is produced.  While OpenACC
          3.3 does not specify this restriction, both NVHPC 22.11 and GCC 12.2.0
          enforce it.
        * If a `tile` clause contains *N* size expressions, there must be *N*
          tightly nested loops following the `loop` construct, or a compile-time
          error diagnostic is produced.
        * Predetermined `private` is computed for the loop control variables in
          those *N* loops in the same manner as it would be in the case of a
          `collapse(`*N*`)` clause.  This appears to mimic the behavior of GCC
          12.2.0.  OpenACC 3.3 uses the term "associated loop" in the
          specification of predetermined `private` clauses and `collapse`
          clauses but not in the specification of `tile` clauses, so this
          behavior is not clear.  NVHPC 22.11 instead performs a liveness
          analysis to determine data attributes.
        * Each size expression within a `tile` clause must be either `*` or a
          positive constant integer expression.  Otherwise, a compile-time error
          diagnostic is produced.  There is one exception for now: a size
          expression can also be a non-constant integer expression.  OpenACC 3.3
          does not require support for non-constant integer expressions, GCC
          12.2.0 rejects them with compile-time error diagnostics, and NVHPC
          22.11 ignores the entire `tile` clause if it contains one (as reported
          by `-Minfo`).   Clacc accepts them for now just because Kokkos's
          OpenACC backend currently uses them (even though they are apparently
          discarded by NVHPC).
        * Clacc now fully supports the `tile` clause, so this option is
          deprecated and has no effect.

Run-Time Environment Variables
------------------------------
@@ -375,7 +350,6 @@ Run-Time Environment Variables
    * Implicit `gang` clause
    * For now, if none of these clauses appear (explicitly or
      implicitly), then a sequential loop is produced.
* The `collapse` clause is supported.
* Supported data attributes and clauses
    * A loop control variable is:
        * Implicit `shared` if `seq` is explicitly specified and loop
@@ -407,6 +381,52 @@ Run-Time Environment Variables
          Clarifications" section in `README-OpenACC-design.md`.
    * See "Data Expressions in Clauses" below for details of their support in
      these clauses.
* Supported multiloop clauses
    * `collapse`
    * `tile`
    * If a `collapse` clause and a `tile` clause appear on the same `loop`
      construct, a compile-time error diagnostic is produced.
        * While OpenACC 3.3 does not specify this restriction, both NVHPC 22.11
          and GCC 12.2.0 enforce it.
    * If a `collapse` clause's argument is *N*, or if a `tile` clause contains
      *N* size expressions, there must be *N* tightly nested loops following the
      `loop` construct, or a compile-time error diagnostic is produced.
    * Predetermined `private` is computed for the loop control variables in
      those *N* loops in the same manner as it would be for a single loop
      without a `collapse` or `tile` clause.
        * This appears to mimic the behavior of GCC 12.2.0.
        * OpenACC 3.3 uses the term "associated loop" in the specification of
          predetermined `private` clauses and `collapse` clauses but not in the
          specification of `tile` clauses, so this behavior is not clear.
        * NVHPC 22.11 instead performs a liveness analysis to determine data
          attributes.
    * A `collapse` clause's argument must be a positive constant integer
      expression, and each size expression within a `tile` clause must be either
      `*` or a positive constant integer expression.  Otherwise, a compile-time
      error diagnostic is produced.
        * There is one exception for now: a size expression in a `tile` clause
          can also be a non-constant integer expression.
            * Clacc currently implements each such size expression as `1` but
              might produce a compile-time error diagnostic in the future.
            * Clacc accepts them for now just because the version of Kokkos's
              OpenACC backend that we target uses them (even though they are
              apparently discarded by NVHPC).
            * OpenACC 3.3 does not require support for non-constant integer
              expressions.
            * GCC 12.2.0 rejects them with compile-time error diagnostics.
            * NVHPC 22.11 ignores the entire `tile` clause if it contains one,
              as reported by `-Minfo`.
        * `*` is currently implemented as `1` because there currently is no
          corresponding OpenMP 5.2 feature.
    * Where OpenACC 3.3 specifies that `worker` and `vector` are applied to the
      generated element loop, Clacc currently discards them instead because
      there appears to be no way to express this behavior in OpenMP 5.2.  Thus,
      the generated element loops are always sequential.
    * The above rules apply and tiling is performed regardless of whether the
      associated loop nest is partitioned or executed sequentially (e.g., due to
      a `seq` clause).
    * See the section "OpenMP Extensions" below for caveats related to
      source-to-source mode and the `tile` clause.
* Detection of `break` statement for the associated loop
    * Compile error if implicit/explicit `independent`.
    * No error if `seq` or `auto`.
@@ -1235,7 +1255,7 @@ translations from OpenACC to OpenMP. Thus, it is not yet recommended
for use in hand-written OpenMP code as it might not integrate well
with some OpenMP features.

### `atomic` in Gang-Redundant Mode ###
### `atomic` Construct in Gang-Redundant Mode ###

* OpenACC Features Affected
    * `atomic` construct
@@ -1258,6 +1278,30 @@ with some OpenMP features.
* Translation Options
    * None.

### `tile` Clause in Gang-Redundant Mode ###

* OpenACC Features Affected
    * `tile` clause on sequential `loop` construct
* OpenMP Extension Employed
    * `tile` construct strictly nested in `target teams` region (not permitted
      in OpenMP 5.2)
* OpenACC Semantics Required
    * OpenACC 3.3 permits a sequential `loop` construct with a `tile` clause in
      gang-redundant mode.  This case occurs when a sequential `loop` construct
      with a `tile` clause is encountered within a `parallel` region and there
      is no partitioned `loop` region nested in between.
    * Clacc strictly nests OpenMP's `tile` construct within a `target teams`
      region to implement the case that an OpenACC sequential `loop` construct
      with a `tile` clause appears in gang-redundant mode.
* Diagnostic Options
    * `-Wopenacc-omp-tile-in-teams`
    * `-Wno-error=openacc-omp-tile-in-teams`
    * `-Wno-openacc-omp-tile-in-teams`
    * These warnings diagnose use of the above OpenMP extension only when the
      nesting is lexical.  Dynamic cases are not diagnosed by Clacc's compiler.
* Translation Options
    * None.

OpenMP Runtime Library API
--------------------------

+2 −2
Original line number Diff line number Diff line
@@ -1794,10 +1794,10 @@ public:
  unsigned sizelist_size() const { return NumSizeExprs; }
  bool sizelist_empty() const { return NumSizeExprs == 0; }

  sizelist_range sizelists() {
  sizelist_range sizelist() {
    return sizelist_range(sizelist_begin(), sizelist_end());
  }
  sizelist_const_range sizelists() const {
  sizelist_const_range sizelist() const {
    return sizelist_const_range(sizelist_begin(), sizelist_end());
  }

+1 −1
Original line number Diff line number Diff line
@@ -4090,7 +4090,7 @@ bool RecursiveASTVisitor<Derived>::VisitACCCollapseClause(

template <typename Derived>
bool RecursiveASTVisitor<Derived>::VisitACCTileClause(ACCTileClause *C) {
  for (auto *E : C->sizelists())
  for (auto *E : C->sizelist())
    TRY_TO(TraverseStmt(E));
  return true;
}
Loading