Corrupted data in first output step in some cases of multi-block writing
Created by: pnorbert
The standalone test program PerfManyVars exhibits this problem as well as its test counterpart TestManyVars (if those parameters are uncommented in the source that causes this problem).
The test writes multiple variables multiple blocks per process and multiple timesteps, the it reads them back and check the data value by value. At some limit of variables x blocks, there will be corrupted blocks in the output (a write problem). This only happens for the first output step; the test is always good for any other steps with any combo. Known examples that exhibit the problem: 8 x 9, and 5 x 15. Note that 8x8 works and none of 8xN, N>=9 works.
$ ./bin/PerfManyVars 8 9 2
fmt=[v%1.1d]
varname[0]=v0
varname[7]=v7
-- Define variables.
[rank=000, line 273]: Write step 0 to many_vars.bp
[rank=000, line 280]: Write block 0, value 0 to many_vars.bp
[rank=000, line 280]: Write block 1, value 1 to many_vars.bp
[rank=000, line 280]: Write block 2, value 2 to many_vars.bp
[rank=000, line 280]: Write block 3, value 3 to many_vars.bp
[rank=000, line 280]: Write block 4, value 4 to many_vars.bp
[rank=000, line 280]: Write block 5, value 5 to many_vars.bp
[rank=000, line 280]: Write block 6, value 6 to many_vars.bp
[rank=000, line 280]: Write block 7, value 7 to many_vars.bp
[rank=000, line 280]: Write block 8, value 8 to many_vars.bp
[rank=000, line 295]: Write time for step 0 was 0.004 seconds
[rank=000, line 273]: Write step 1 to many_vars.bp
[rank=000, line 280]: Write block 0, value 10000 to many_vars.bp
[rank=000, line 280]: Write block 1, value 10001 to many_vars.bp
[rank=000, line 280]: Write block 2, value 10002 to many_vars.bp
[rank=000, line 280]: Write block 3, value 10003 to many_vars.bp
[rank=000, line 280]: Write block 4, value 10004 to many_vars.bp
[rank=000, line 280]: Write block 5, value 10005 to many_vars.bp
[rank=000, line 280]: Write block 6, value 10006 to many_vars.bp
[rank=000, line 280]: Write block 7, value 10007 to many_vars.bp
[rank=000, line 280]: Write block 8, value 10008 to many_vars.bp
[rank=000, line 295]: Write time for step 1 was 0.002 seconds
[rank=000, line 366]: Read and check data in many_vars.bp
[rank=000, line 377]: Check variable definitions... many_vars.bp
[rank=000, line 387]: Time to check all vars' info: 0.000 seconds
[rank=000, line 390]: Check variable content...
[rank=000, line 410]: Step 0 block 0: value=0
[rank=000, line 410]: Step 0 block 1: value=1
[rank=000, line 410]: Step 0 block 2: value=2
[rank=000, line 410]: Step 0 block 3: value=3
[rank=000, line 410]: Step 0 block 4: value=4
[rank=000, line 410]: Step 0 block 5: value=5
[rank=000, line 410]: Step 0 block 6: value=6
[rank=000, line 410]: Step 0 block 7: value=7
[rank=000, line 410]: Step 0 block 8: value=8
[rank=000, line 421]: ERROR: v6[0] step 0 block 8: wrote 8 but read 0
[rank=000, line 421]: ERROR: v7[0] step 0 block 8: wrote 8 but read 0
[rank=000, line 435]: Read time for step 0 was 0.008s
[rank=000, line 410]: Step 1 block 0: value=10000
[rank=000, line 410]: Step 1 block 1: value=10001
[rank=000, line 410]: Step 1 block 2: value=10002
[rank=000, line 410]: Step 1 block 3: value=10003
[rank=000, line 410]: Step 1 block 4: value=10004
[rank=000, line 410]: Step 1 block 5: value=10005
[rank=000, line 410]: Step 1 block 6: value=10006
[rank=000, line 410]: Step 1 block 7: value=10007
[rank=000, line 410]: Step 1 block 8: value=10008
[rank=000, line 435]: Read time for step 1 was 0.007s
Source code is in testing/adios2/performance/manyvars, a C code for the standalone program, C++ code for the test where the combos can be added to break the test.