Switch to manual copies to FFT temporary buffers
This MR converts the copies to the temporary view for the FFTs to manual Kokkos loops rather than sliced deep copies. The point is to get batching working again and this way is more explicit about what is going where. This works for single FFTs, but I'm still having batching issues.
Even if it doesn't fix the batching, this might be worth merging.
EDIT: I did some performance tests and this is ~2% faster. While that isn't worth the change on its own, I think this is the better way forward so, without a performance reason to keep the old way, we'll change to this.