Fix scalability bottleneck in InSituMPIReader
Created by: keichi
Currently, InSituMPIReader receives and deserializes messages from each writer one by one. This design is limiting the scalability in terms of the number of writers per reader. This PR will change the behavior to a bulk receive followed by abulk deserialization.
I tested this PR with 64 writers (brusselator) + 1 reader (norm_calc) and observed a 33% speedup in total execution time.
@pnorbert This is the patch I was talking about the other day. Could you take a look?