Quadratic increasing costs for Engine::Put() during one step when using the SST Engine
Created by: franzpoeschel
While using ADIOS2 in PIConGPU for streaming IO, I noticed that during one step, each call to Engine::Put()
took at least as long as the previous call. While investigating this, I found out the following:
- The SST engine does not distinguish between sync and deferred mode (see → and →), so each written dataset is instantly marshalled and written to an internal buffer. The total incoming data is not known up front, so reallocation becomes necessary.
- Reallocation in the BP3 serializer (which is used by SST) intentionally overrides the STL GNU default power of 2 reallocation and instead enforces a linear behavior (see →)
This means that data written during one step in the SST engine will be reallocated as many times as subsequent chunks are written. This is probably fine for BP3 where deferred workflows are encouraged that avoid this behavior, but this will not work for SST.
By uncommenting this line, I was able to avoid this issue for now.