Fix call_once_per_loop to handle work stealing where id >> counter
When stealing tasks that are far ahead in the loop index, we need to
yield until the count catches up and we can proceeed. This fixes
a problem with HPX threads that can be stolen in any order.