ADIOS2 issueshttps://code.ornl.gov/pnb/ADIOS2/-/issues2019-11-09T14:28:56Zhttps://code.ornl.gov/pnb/ADIOS2/-/issues/1862Python SST BeginStep segmentation fault / hang with timeout other than -1 or 02019-11-09T14:28:56ZPodhorszki, NorbertPython SST BeginStep segmentation fault / hang with timeout other than -1 or 0*Created by: suchyta1*
I've been running into some errors when I try to read using SST from Python and the timeout parameter is set to something other than -1 or 0. To reproduce the behavior, I wrote a simple writer/reader pair, include...*Created by: suchyta1*
I've been running into some errors when I try to read using SST from Python and the timeout parameter is set to something other than -1 or 0. To reproduce the behavior, I wrote a simple writer/reader pair, included below.
With the timeout of 0.5 and ADIOS 2.4.0, ADIOS 2.5.0, or newer, I get a segmentation fault or a reader hang when I run on my Mac or Ubuntu virtual machine. Based on the print statements, the reader is hanging or segmentation faulting inside the BeginStep() call. (Sometimes, the job will run fine, but not reliably.) As far as I can tell, if I change the timeout to -1 or 0.0, it runs fine.
I'm not sure if there's an issue from any other of the bindings. I haven't noticed it if there is, but I haven't exhaustively tested.
## writer.py
```
#!/usr/bin/env python
from mpi4py import MPI
import adios2
import numpy as np
import time
import sys
if __name__ == "__main__":
tmp = np.ones(10)
comm = MPI.COMM_WORLD
adios = adios2.ADIOS(comm, adios2.DebugON)
io = adios.DeclareIO("test")
io.SetEngine('SST')
io.SetParameter('RendezvousReaderCount', '1')
io.SetParameter("QueueLimit", "1")
io.SetParameter("QueueFullPolicy", "Discard")
io.DefineVariable("t", tmp, [10], [10], [10])
time.sleep(1)
engine = io.Open("test.bp", adios2.Mode.Write, comm)
for i in range(10):
engine.BeginStep()
v = io.InquireVariable("t")
engine.Put(v, tmp)
engine.EndStep()
print("Wrote Step:", i); sys.stdout.flush()
time.sleep(1)
engine.Close()
```
## reader.py
```
#!/usr/bin/env python
from mpi4py import MPI
import adios2
import numpy as np
import time
import sys
if __name__ == "__main__":
comm = MPI.COMM_WORLD
adios = adios2.ADIOS(comm, adios2.DebugON)
io = adios.DeclareIO("test")
io.SetEngine('SST')
"""
io.SetParameter('RendezvousReaderCount', '0')
io.SetParameter("QueueLimit", "1")
io.SetParameter("QueueFullPolicy", "Discard")
"""
engine = io.Open("test.bp", adios2.Mode.Read, comm)
while True:
print("Read begin step"); sys.stdout.flush()
StepStatus = engine.BeginStep(adios2.StepMode.Read, 0.5)
print("Done read begin step"); sys.stdout.flush()
if StepStatus == adios2.StepStatus.EndOfStream:
print("Done looping"); sys.stdout.flush()
break
elif StepStatus == adios2.StepStatus.OK:
print("Read found step:", engine.CurrentStep(), "\n"); sys.stdout.flush()
engine.EndStep(); sys.stdout.flush()
elif StepStatus == adios2.StepStatus.NotReady:
print("Not ready\n"); sys.stdout.flush()
continue
elif StepStatus == adios2.StepStatus.OtherError:
print("Other"); sys.stdout.flush()
break
engine.Close()
```
## job.sh
```
mpirun -n 1 ./writer.py &
mpirun -n 1 ./reader.py &
wait
```
## Example of the error output, from the virtual machine
```
Read begin step
Wrote Step: 0
Done read begin step
Read found step: 0
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 1
Done read begin step
Read found step: 1
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 2
[adiosVM:03069] *** Process received signal ***
[adiosVM:03069] Signal: Segmentation fault (11)
[adiosVM:03069] Signal code: Address not mapped (1)
[adiosVM:03069] Failing at address: 0x38
[adiosVM:03069] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fdb2ac50390]
[adiosVM:03069] [ 1] /opt/adios2/lib/libadios2_cmselect.so(+0x1db0)[0x7fdb0cac3db0]
[adiosVM:03069] [ 2] /opt/adios2/lib/libadios2_cmselect.so(libcmselect_LTX_remove_periodic+0x19)[0x7fdb0cac3e59]
[adiosVM:03069] [ 3] /opt/adios2/lib/libadios2_evpath.so(INT_CMremove_task+0x20)[0x7fdb19009400]
[adiosVM:03069] [ 4] /opt/adios2/lib/libadios2_evpath.so(CMremove_task+0x28)[0x7fdb1901d248]
[adiosVM:03069] [ 5] /opt/adios2/lib/libadios2_sst.so.2(+0xe3da)[0x7fdb1a1f13da]
[adiosVM:03069] [ 6] /opt/adios2/lib/libadios2_sst.so.2(SstAdvanceStep+0x3a6)[0x7fdb1a1f3406]
[adiosVM:03069] [ 7] /opt/adios2/lib/libadios2.so.2(_ZN6adios24core6engine9SstReader9BeginStepENS_8StepModeEf+0x12f)[0x7fdb1ac3d56f]
[adiosVM:03069] [ 8] /opt/adios2/lib/python3.5/site-packages/adios2.cpython-35m-x86_64-linux-gnu.so(+0x29903)[0x7fdb1cc2b903]
[adiosVM:03069] [ 9] /opt/adios2/lib/python3.5/site-packages/adios2.cpython-35m-x86_64-linux-gnu.so(+0x3fe9b)[0x7fdb1cc41e9b]
[adiosVM:03069] [10] /opt/adios2/lib/python3.5/site-packages/adios2.cpython-35m-x86_64-linux-gnu.so(+0x16ca3)[0x7fdb1cc18ca3]
[adiosVM:03069] [11] python(PyCFunction_Call+0x77)[0x4ea117]
[adiosVM:03069] [12] python(PyEval_EvalFrameEx+0x5a36)[0x529816]
[adiosVM:03069] [13] python[0x52d2b9]
[adiosVM:03069] [14] python(PyEval_EvalCode+0x1f)[0x52dfbf]
[adiosVM:03069] [15] python[0x5fc652]
[adiosVM:03069] [16] python(PyRun_FileExFlags+0x9a)[0x5feafa]
[adiosVM:03069] [17] python(PyRun_SimpleFileExFlags+0x1bc)[0x5fecec]
[adiosVM:03069] [18] python(Py_Main+0x456)[0x63ec96]
[adiosVM:03069] [19] python(main+0xe1)[0x4d02e1]
[adiosVM:03069] [20] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fdb2a895830]
[adiosVM:03069] [21] python(_start+0x29)[0x5d4b99]
[adiosVM:03069] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3069 on node adiosVM exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Wrote Step: 3
Wrote Step: 4
Wrote Step: 5
Wrote Step: 6
Wrote Step: 7
Wrote Step: 8
Wrote Step: 9
```
## Hanging looks similar
```
Read begin step
Wrote Step: 0
Done read begin step
Read found step: 0
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 1
Done read begin step
Read found step: 1
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 2
Done read begin step
Read found step: 2
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 3
Done read begin step
Read found step: 3
Read begin step
Done read begin step
Not ready
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 4
Done read begin step
Read found step: 4
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 5
Done read begin step
Read found step: 5
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 6
Done read begin step
Read found step: 6
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 7
Done read begin step
Read found step: 7
Read begin step
Done read begin step
Not ready
Read begin step
Wrote Step: 8
Wrote Step: 9
```
@eisenhauer Eisenhauer, GregEisenhauer, Greg