SST + compression bug
Created by: pnorbert
adios_iotest can be used to demonstrate the bug where the reader side will eventually segfault because of bad reads. Configuration files added at the end of the this issue.
Run the two commands at the same time in different terminals. Writer
$ mpirun -n 2 /opt/adios2/bin/adios_iotest -a 1 -c sst-compression-bug.txt -w -x sst-compression-bug.xml -d 2
Step 1:
Step 2:
Step 3:
Reader
$ mpirun -n 1 /opt/adios2/bin/adios_iotest -a 2 -c sst-compression-bug.txt -w -x sst-compression-bug.xml -d 1
Step 1:
Step 2:
Step 3:
Step 4:
[adiosVM:08487] *** Process received signal ***
[adiosVM:08487] Signal: Segmentation fault (11)
[adiosVM:08487] Signal code: (128)
[adiosVM:08487] Failing at address: (nil)
The number of steps the reader is able to run is random. The output data indicates that the read was wrong, the data only contains one process' output (but twice). Variable a, b, c are compressed, d is uncompressed. d is correct, the others are wrong. They should be identical.
$ bpls -l stream_T2.bp -D a
double a 3*{64, 32, 32} = 1 / 1.2
step 0:
block 0: [ 0:63, 0:31, 0:31] = 1 / 1
step 1:
block 0: [ 0:63, 0:31, 0:31] = 1.1 / 1.1
step 2:
block 0: [ 0:63, 0:31, 0:31] = 1.2 / 1.2
$ bpls -l stream_T2.bp -D d
double d 3*{64, 32, 32} = 0 / 1.2
step 0:
block 0: [ 0:63, 0:31, 0:31] = 0 / 1
step 1:
block 0: [ 0:63, 0:31, 0:31] = 0.1 / 1.1
step 2:
block 0: [ 0:63, 0:31, 0:31] = 0.2 / 1.2
If I run the reader through valgrind, it will show an error happening in the very first read:
$ valgrind /opt/adios2/bin/adios_iotest -a 2 -c sst-compression-bug.txt -w -x sst-compression-bug.xml -d 1
==8536== Memcheck, a memory error detector
==8536== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8536== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==8536== Command: /opt/adios2/bin/adios_iotest -a 2 -c sst-compression-bug.txt -w -x sst-compression-bug.xml -d 1
==8536==
Step 1:
==8536== Thread 3:
==8536== Invalid write of size 8
==8536== at 0x4C326CB: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8536== by 0x7DA994E: EvpathReadReplyHandler (evpath_dp.c:343)
==8536== by 0x97DE9BA: CMact_on_data (cm.c:2554)
==8536== by 0x97DC581: CMDataAvailable (cm.c:2074)
==8536== by 0x1454500A: socket_select (cmselect.c:449)
==8536== by 0x14546406: libcmselect_LTX_blocking_function (cmselect.c:1050)
==8536== by 0x97D4D38: CMcontrol_list_wait (cm.c:676)
==8536== by 0x97D2C8C: CMpoll_forever (cm.c:172)
==8536== by 0x97D2F38: server_thread_func (cm.c:195)
==8536== by 0x755C6B9: start_thread (pthread_create.c:333)
==8536== by 0x787941C: clone (clone.S:109)
==8536== Address 0x10f39fa0 is 0 bytes inside a block of size 80 free'd
==8536== at 0x4C2F24B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8536== by 0x42A877: __gnu_cxx::new_allocator<char>::deallocate(char*, unsigned long) (new_allocator.h:110)
==8536== by 0x428438: std::allocator_traits<std::allocator<char> >::deallocate(std::allocator<char>&, char*, unsigned long) (alloc_traits.h:517)
==8536== by 0x425A85: std::_Vector_base<char, std::allocator<char> >::_M_deallocate(char*, unsigned long) (stl_vector.h:178)
==8536== by 0x595A860: std::vector<char, std::allocator<char> >::_M_fill_insert(__gnu_cxx::__normal_iterator<char*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (vector.tcc:527)
==8536== by 0x595A455: std::vector<char, std::allocator<char> >::insert(__gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (stl_vector.h:1054)
==8536== by 0x595A2B8: std::vector<char, std::allocator<char> >::resize(unsigned long, char const&) (stl_vector.h:696)
==8536== by 0x5A14A1C: void adios2::format::BP3Deserializer::PreDataRead<double>(adios2::core::Variable<double>&, adios2::core::Variable<double>::Info&, adios2::helper::SubStreamBoxInfo const&, char*&, unsigned long&, unsigned long&, unsigned long) (BP3Deserializer.tcc:492)
==8536== by 0x5C73170: void adios2::core::engine::SstReader::ReadVariableBlocks<double>(adios2::core::Variable<double>&) (SstReader.tcc:56)
==8536== by 0x5C678A0: adios2::core::engine::SstReader::PerformGets() (in /opt/adios2/lib/libadios2.so.2.3.1)
==8536== by 0x5C6314A: adios2::core::engine::SstReader::EndStep() (SstReader.cpp:305)
==8536== by 0x5D20625: adios2::Engine::EndStep() (Engine.cpp:100)
==8536== Block was alloc'd at
==8536== at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8536== by 0x4383F7: __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (new_allocator.h:104)
==8536== by 0x43811B: std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) (alloc_traits.h:491)
==8536== by 0x437D85: std::_Vector_base<char, std::allocator<char> >::_M_allocate(unsigned long) (stl_vector.h:170)
==8536== by 0x595A74B: std::vector<char, std::allocator<char> >::_M_fill_insert(__gnu_cxx::__normal_iterator<char*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (vector.tcc:491)
==8536== by 0x595A455: std::vector<char, std::allocator<char> >::insert(__gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (stl_vector.h:1054)
==8536== by 0x595A2B8: std::vector<char, std::allocator<char> >::resize(unsigned long, char const&) (stl_vector.h:696)
==8536== by 0x5A14A1C: void adios2::format::BP3Deserializer::PreDataRead<double>(adios2::core::Variable<double>&, adios2::core::Variable<double>::Info&, adios2::helper::SubStreamBoxInfo const&, char*&, unsigned long&, unsigned long&, unsigned long) (BP3Deserializer.tcc:492)
==8536== by 0x5C73170: void adios2::core::engine::SstReader::ReadVariableBlocks<double>(adios2::core::Variable<double>&) (SstReader.tcc:56)
==8536== by 0x5C678A0: adios2::core::engine::SstReader::PerformGets() (in /opt/adios2/lib/libadios2.so.2.3.1)
==8536== by 0x5C6314A: adios2::core::engine::SstReader::EndStep() (SstReader.cpp:305)
==8536== by 0x5D20625: adios2::Engine::EndStep() (Engine.cpp:100)
sst-compression-bug.xml
<?xml version="1.0"?>
<adios-config>
<io name="io_T1">
<engine type="SST">
<parameter key="RendezvousReaderCount" value="1"/>
<parameter key="QueueLimit" value="1"/>
<parameter key="QueueFullPolicy" value="Block"/>
</engine>
<variable name="a">
<operation type="sz">
<parameter key="accuracy" value="0.00001"/>
</operation>
</variable>
<variable name="b">
<operation type="zfp">
<parameter key="accuracy" value="0.00001"/>
</operation>
</variable>
<variable name="c">
<operation type="mgard">
<parameter key="accuracy" value="0.00001"/>
</operation>
</variable>
</io>
<io name="io_T2_in">
<engine type="SST">
</engine>
</io>
<io name="io_T2_out">
<engine type="BP4">
</engine>
</io>
</adios-config>
sst-compression-bug.txt
group io_T1
# item type varname N [dim1 dim2 ... dimN decomp1 decomp2 ... decompN]
array double a 3 32 32 32 XYZ 1 1
array double b 3 32 32 32 XYZ 1 1
array double c 3 32 32 32 XYZ 1 1
array double d 3 32 32 32 XYZ 1 1
group io_T2_in
# item type varname N [dim1 dim2 ... dimN decomp1 decomp2 ... decompN]
array double a 3 64 32 32 XYZ 1 1
array double b 3 64 32 32 XYZ 1 1
array double c 3 64 32 32 XYZ 1 1
array double d 3 64 32 32 XYZ 1 1
group io_T2_out
# use all variables read into io_T2_in in the output
link group io_T2_in
app 1
steps 3
sleep 2.0
write stream_T1.bp io_T1
app 2
steps over stream_T1.bp
read next stream_T1.bp io_T2_in -1.0
cond stream_T1.bp sleep 1.0
cond stream_T1.bp write stream_T2.bp io_T2_out
sleep 0.1