Fix issue in cuda context where kernel prefix was would wrong if an output was...
Fix issue in cuda context where kernel prefix was would wrong if an output was reduced to a variable. Fix the jit test tolarance.
Fix issue in cuda context where kernel prefix was would wrong if an output was reduced to a variable. Fix the jit test tolarance.