Explore redistributing the fields so multiple fields are doing FFTs at once
In many cases we have a batch of 3D FFTs to do corresponding to multiple fields (for phi or conc) or multiple components (u or uGrad). Perhaps we could save on communication costs if we decompose each field on a subset of the MPI tasks.
I have no idea if this would be worth it.