More parallel+serial discussion

Created by: germasch

This is follow-up from #1443 (closed) and PR #1466. The PR addresses the fundamentals of the issue, but the work isn't really complete.

Open issues:

python bindings support (I think @pnorbert is working on this, should be relatively straightforward).
which parts of adios2 make sense to support in serial mode? (InsituMPI: probably not, but maybe all / most of the rest)? Right now, BP3/BP4 and HDF5 are supported, if lightly tested.
SST should be supported, but is harder because it's a separate lib, and uses C.
corner cases with the current implementation.

The current approach is to replace MPI_* functions by their SMPI_* wrapper equivalents. These wrappers will call through to the real MPI if it's initialized (and not finalized), otherwise it'll do the MPI dummy stuff. I somewhat deliberately only changed selected pieces to use those wrappers, since I'm not too convinced that it's a real clean approach, though I'm torn about that. It could however, be considered to do a global search and replace and use the wrappers everywhere. Some things still wouldn't work (e.g., no easy way to implement MPI_Send/MPI_Recv properly on a single proc). The current state isn't horrible, ie., if someone were to try to use some part which hasn't been converted to SMPI_* wrappers w/o calling MPI_Init, they'd get MPI complaining about just that, rather than some mysterious subtle problms.

What I think would be cleaner, but potentially involve some code duplication, would be to have separate code paths for serial/parallel case where both are supported. e.g., in the serial case, AggregateMetadata probably shouldn't even be called. This, however, requires being able to distinguish whether a function is meant to execute serially. I think there is one straightforward way to make that distinction, ie., define MPI_COMM_NULL as having the meaning "don't use MPI, this executes serially". That's clearer than the current reliance on MPI_Initialized/MPI_Finalized; that status, one should note, might actually change along the way. So it's possible to have SMPI_Comm_dup return the communicator itself when called before MPI_Init (because that's what mpidummy) does, then initialize MPI, and then eventually have SMPI_Comm_free call the real MPI_Comm_free, which will rightfully complain that the communicator hadn't been dup'd in the first place. This actually happens in real life, which is why I didn't change the MPI_Comm_dup handling inside of core::ADIOS for now.

Switching between real MPI and mpidummy based on the communicator (MPI_COMM_NULL vs not) resolves a lot of ambiguity -- it's also cheaper than having to check MPI_Initialized and MPI_Finalized on every MPI call, so I'd be in favor of going that way. That's also a pretty small/localized change.

On the bigger issue (wrap everything with SMPI vs separate code paths), I think one might have try and look at it to decide what's better.