Commit c094e7dc authored by Mariya Podchishchaeva's avatar Mariya Podchishchaeva Committed by Alexey Bader
Browse files

[SYCL] Add sycl_kernel attribute for accelerated code outlining

SYCL is single source offload programming model relying on compiler to
separate device code (i.e. offloaded to an accelerator) from the code
executed on the host.

Here is code example of the SYCL program to demonstrate compiler
outlining work:

```
int foo(int x) { return ++x; }
int bar(int x) { throw std::exception("CPU code only!"); }
...
using namespace cl::sycl;
queue Q;
buffer<int, 1> a(range<1>{1024});
Q.submit([&](handler& cgh) {
  auto A = a.get_access<access::mode::write>(cgh);
  cgh.parallel_for<init_a>(range<1>{1024}, [=](id<1> index) {
    A[index] = index[0] + foo(42);
  });
}
...
```

SYCL device compiler must compile lambda expression passed to
cl::sycl::handler::parallel_for method and function foo called from this
lambda expression for an "accelerator". SYCL device compiler also must
ignore bar function as it's not required for offloaded code execution.

This patch adds the sycl_kernel attribute, which is used to mark code
passed to cl::sycl::handler::parallel_for as "accelerated code".

Attribute must be applied to function templates which parameters include
at least "kernel name" and "kernel function object". These parameters
will be used to establish an ABI between the host application and
offloaded part.

Reviewers: jlebar, keryell, Naghasan, ABataev, Anastasia, bader, aaron.ballman, rjmccall, rsmith

Reviewed By: keryell, bader

Subscribers: mgorny, OlegM, ArturGainullin, agozillon, aaron.ballman, ebevhan, Anastasia, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60455



Signed-off-by: default avatarAlexey Bader <alexey.bader@intel.com>
parent 0e9b0b6d
Loading
Loading
Loading
Loading
+13 −0
Original line number Diff line number Diff line
@@ -121,6 +121,11 @@ def GlobalVar : SubsetSubject<Var,
def InlineFunction : SubsetSubject<Function,
                             [{S->isInlineSpecified()}], "inline functions">;

def FunctionTmpl
    : SubsetSubject<Function, [{S->getTemplatedKind() ==
                                 FunctionDecl::TK_FunctionTemplate}],
                    "function templates">;

// FIXME: this hack is needed because DeclNodes.td defines the base Decl node
// type to be a class, not a definition. This makes it impossible to create an
// attribute subject which accepts a Decl. Normally, this is not a problem,
@@ -296,6 +301,7 @@ def MicrosoftExt : LangOpt<"MicrosoftExt">;
def Borland : LangOpt<"Borland">;
def CUDA : LangOpt<"CUDA">;
def HIP : LangOpt<"HIP">;
def SYCL : LangOpt<"SYCLIsDevice">;
def COnly : LangOpt<"COnly", "!LangOpts.CPlusPlus">;
def CPlusPlus : LangOpt<"CPlusPlus">;
def OpenCL : LangOpt<"OpenCL">;
@@ -1056,6 +1062,13 @@ def CUDAShared : InheritableAttr {
  let Documentation = [Undocumented];
}

def SYCLKernel : InheritableAttr {
  let Spellings = [Clang<"sycl_kernel">];
  let Subjects = SubjectList<[FunctionTmpl]>;
  let LangOpts = [SYCL];
  let Documentation = [SYCLKernelDocs];
}

def C11NoReturn : InheritableAttr {
  let Spellings = [Keyword<"_Noreturn">];
  let Subjects = SubjectList<[Function], ErrorDiag>;
+73 −0
Original line number Diff line number Diff line
@@ -253,6 +253,79 @@ any option of a multiversioned function is undefined.
  }];
}

def SYCLKernelDocs : Documentation {
  let Category = DocCatFunction;
  let Content = [{
The ``sycl_kernel`` attribute specifies that a function template will be used
to outline device code and to generate an OpenCL kernel.
Here is a code example of the SYCL program, which demonstrates the compiler's
outlining job:
.. code-block:: c++

  int foo(int x) { return ++x; }

  using namespace cl::sycl;
  queue Q;
  buffer<int, 1> a(range<1>{1024});
  Q.submit([&](handler& cgh) {
    auto A = a.get_access<access::mode::write>(cgh);
    cgh.parallel_for<init_a>(range<1>{1024}, [=](id<1> index) {
      A[index] = index[0] + foo(42);
    });
  }

A C++ function object passed to the ``parallel_for`` is called a "SYCL kernel".
A SYCL kernel defines the entry point to the "device part" of the code. The
compiler will emit all symbols accessible from a "kernel". In this code
example, the compiler will emit "foo" function.  More details about the
compilation of functions for the device part can be found in the SYCL 1.2.1
specification Section 6.4.
To show to the compiler entry point to the "device part" of the code, the SYCL
runtime can use the ``sycl_kernel`` attribute in the following way:
.. code-block:: c++
namespace cl {
namespace sycl {
class handler {
  template <typename KernelName, typename KernelType/*, ...*/>
  __attribute__((sycl_kernel)) void sycl_kernel_function(KernelType KernelFuncObj) {
    // ...
    KernelFuncObj();
  }

  template <typename KernelName, typename KernelType, int Dims>
  void parallel_for(range<Dims> NumWorkItems, KernelType KernelFunc) {
#ifdef __SYCL_DEVICE_ONLY__
    sycl_kernel_function<KernelName, KernelType, Dims>(KernelFunc);
#else
    // Host implementation
#endif
  }
};
} // namespace sycl
} // namespace cl

The compiler will also generate an OpenCL kernel using the function marked with
the ``sycl_kernel`` attribute.
Here is the list of SYCL device compiler expectations with regard to the
function marked with the ``sycl_kernel`` attribute:

- The function must be a template with at least two type template parameters.
  The compiler generates an OpenCL kernel and uses the first template parameter
  as a unique name for the generated OpenCL kernel. The host application uses
  this unique name to invoke the OpenCL kernel generated for the SYCL kernel
  specialized by this name and second template parameter ``KernelType`` (which
  might be an unnamed function object type).
- The function must have at least one parameter. The first parameter is
  required to be a function object type (named or unnamed i.e. lambda). The
  compiler uses function object type fields to generate OpenCL kernel
  parameters.
- The function must return void. The compiler reuses the body of marked functions to
  generate the OpenCL kernel body, and the OpenCL kernel must return `void`.

The SYCL kernel in the previous code sample meets these expectations.
  }];
}

def C11NoReturnDocs : Documentation {
  let Category = DocCatFunction;
  let Content = [{
+15 −0
Original line number Diff line number Diff line
@@ -10112,4 +10112,19 @@ def err_bit_cast_non_trivially_copyable : Error<
  "__builtin_bit_cast %select{source|destination}0 type must be trivially copyable">;
def err_bit_cast_type_size_mismatch : Error<
  "__builtin_bit_cast source size does not equal destination size (%0 vs %1)">;

// SYCL-specific diagnostics
def warn_sycl_kernel_num_of_template_params : Warning<
  "'sycl_kernel' attribute only applies to a function template with at least"
  " two template parameters">, InGroup<IgnoredAttributes>;
def warn_sycl_kernel_invalid_template_param_type : Warning<
  "template parameter of a function template with the 'sycl_kernel' attribute"
  " cannot be a non-type template parameter">, InGroup<IgnoredAttributes>;
def warn_sycl_kernel_num_of_function_params : Warning<
  "function template with 'sycl_kernel' attribute must have a single parameter">,
  InGroup<IgnoredAttributes>;
def warn_sycl_kernel_return_type : Warning<
  "function template with 'sycl_kernel' attribute must have a 'void' return type">,
  InGroup<IgnoredAttributes>;

} // end of sema component.
+42 −0
Original line number Diff line number Diff line
@@ -6412,6 +6412,45 @@ static void handleOpenCLAccessAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
  D->addAttr(::new (S.Context) OpenCLAccessAttr(S.Context, AL));
}

static void handleSYCLKernelAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
  // The 'sycl_kernel' attribute applies only to function templates.
  const auto *FD = cast<FunctionDecl>(D);
  const FunctionTemplateDecl *FT = FD->getDescribedFunctionTemplate();
  assert(FT && "Function template is expected");

  // Function template must have at least two template parameters.
  const TemplateParameterList *TL = FT->getTemplateParameters();
  if (TL->size() < 2) {
    S.Diag(FT->getLocation(), diag::warn_sycl_kernel_num_of_template_params);
    return;
  }

  // Template parameters must be typenames.
  for (unsigned I = 0; I < 2; ++I) {
    const NamedDecl *TParam = TL->getParam(I);
    if (isa<NonTypeTemplateParmDecl>(TParam)) {
      S.Diag(FT->getLocation(),
             diag::warn_sycl_kernel_invalid_template_param_type);
      return;
    }
  }

  // Function must have at least one argument.
  if (getFunctionOrMethodNumParams(D) != 1) {
    S.Diag(FT->getLocation(), diag::warn_sycl_kernel_num_of_function_params);
    return;
  }

  // Function must return void.
  QualType RetTy = getFunctionOrMethodResultType(D);
  if (!RetTy->isVoidType()) {
    S.Diag(FT->getLocation(), diag::warn_sycl_kernel_return_type);
    return;
  }

  handleSimpleAttribute<SYCLKernelAttr>(S, D, AL);
}

static void handleDestroyAttr(Sema &S, Decl *D, const ParsedAttr &A) {
  if (!cast<VarDecl>(D)->hasGlobalStorage()) {
    S.Diag(D->getLocation(), diag::err_destroy_attr_on_non_static_var)
@@ -6739,6 +6778,9 @@ static void ProcessDeclAttribute(Sema &S, Scope *scope, Decl *D,
  case ParsedAttr::AT_Flatten:
    handleSimpleAttribute<FlattenAttr>(S, D, AL);
    break;
  case ParsedAttr::AT_SYCLKernel:
    handleSYCLKernelAttr(S, D, AL);
    break;
  case ParsedAttr::AT_Format:
    handleFormatAttr(S, D, AL);
    break;
+14 −0
Original line number Diff line number Diff line
// RUN: %clang_cc1 -std=c++11 -fsyntax-only -fsycl-is-device -verify %s
// RUN: %clang_cc1 -std=c++11 -fsyntax-only -verify -x c++ %s

#ifndef __SYCL_DEVICE_ONLY__
// expected-warning@+7 {{'sycl_kernel' attribute ignored}}
// expected-warning@+8 {{'sycl_kernel' attribute ignored}}
#else
// expected-no-diagnostics
#endif

template <typename T, typename A, int B>
__attribute__((sycl_kernel)) void foo(T P);
template <typename T, typename A, int B>
[[clang::sycl_kernel]] void foo1(T P);
Loading