Mpi index fixes
Some index mistakes in a few places were causing the MPI issues, those are now fixed.
Nominally, master now has all of the fixes from my "integration_testing_updated" branch, which works on CPU and GPU for both 1 and 4 MPI tasks, and gets the same result as the Fortran code to something like 13 digits.