[Clacc][OpenACC] num_workers -> thread_limit
That is, instead of translating `num_workers` on a `parallel`
construct to `num_threads` at every lexically enclosed worker loop,
translate it to `thread_limit` on the `target teams` construct to
which the `parallel` construct is translated. This change offers
multiple improvements:
* `num_workers` now affects orphaned loops, as expected. Thus, it
addresses some fixmes in
`clang/test/OpenACC/directives/Tests/loop-tile.c`.
* It simplifies the generated OpenMP source. In particular, when the
`num_workers` argument is a non-constant expression, a local
variable no longer has to be inserted to capture its current value.
* It eliminates bugs from the old translation's implementation:
* The aforementioned local variable was inserted unnecessarily
when the only enclosed apparent worker parallelism was from a
worker function call or from a loop's worker clause that was
discarded in the translation due to a tile clause.
* The aforementioned local variable was mistakenly not inserted if
the only enclosed worker parallelism was from an implicit worker
clause.
This patch adds `openmp/libacc2omp/test/directives/num-workers.c` to
test when `num_workers` actually produces the number of workers
expected. As noted in a fixme comment there, there are some cases
where it does not if `-O0`. Based on our experiments, the old
translation to `num_threads` was no better for any use case but, as
described above, was worse for some use cases.