SST + compression bug

Created by: pnorbert

adios_iotest can be used to demonstrate the bug where the reader side will eventually segfault because of bad reads. Configuration files added at the end of the this issue.

Run the two commands at the same time in different terminals. Writer

$ mpirun -n 2 /opt/adios2/bin/adios_iotest -a 1 -c sst-compression-bug.txt -w -x sst-compression-bug.xml -d 2
Step 1: 
Step 2: 
Step 3: 

Reader

$ mpirun -n 1 /opt/adios2/bin/adios_iotest -a 2 -c sst-compression-bug.txt -w -x sst-compression-bug.xml -d 1 
Step 1: 
Step 2: 
Step 3: 
Step 4: 
[adiosVM:08487] *** Process received signal ***
[adiosVM:08487] Signal: Segmentation fault (11)
[adiosVM:08487] Signal code:  (128)
[adiosVM:08487] Failing at address: (nil)

The number of steps the reader is able to run is random. The output data indicates that the read was wrong, the data only contains one process' output (but twice). Variable a, b, c are compressed, d is uncompressed. d is correct, the others are wrong. They should be identical.

$ bpls -l stream_T2.bp -D a
  double   a     3*{64, 32, 32} = 1 / 1.2
        step 0: 
          block 0: [ 0:63,  0:31,  0:31] = 1 / 1
        step 1: 
          block 0: [ 0:63,  0:31,  0:31] = 1.1 / 1.1
        step 2: 
          block 0: [ 0:63,  0:31,  0:31] = 1.2 / 1.2

$ bpls -l stream_T2.bp -D d
  double   d     3*{64, 32, 32} = 0 / 1.2
        step 0: 
          block 0: [ 0:63,  0:31,  0:31] = 0 / 1
        step 1: 
          block 0: [ 0:63,  0:31,  0:31] = 0.1 / 1.1
        step 2: 
          block 0: [ 0:63,  0:31,  0:31] = 0.2 / 1.2

If I run the reader through valgrind, it will show an error happening in the very first read:

$ valgrind /opt/adios2/bin/adios_iotest -a 2 -c sst-compression-bug.txt -w -x sst-compression-bug.xml -d 1 
==8536== Memcheck, a memory error detector
==8536== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8536== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==8536== Command: /opt/adios2/bin/adios_iotest -a 2 -c sst-compression-bug.txt -w -x sst-compression-bug.xml -d 1
==8536== 
Step 1: 
==8536== Thread 3:
==8536== Invalid write of size 8
==8536==    at 0x4C326CB: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8536==    by 0x7DA994E: EvpathReadReplyHandler (evpath_dp.c:343)
==8536==    by 0x97DE9BA: CMact_on_data (cm.c:2554)
==8536==    by 0x97DC581: CMDataAvailable (cm.c:2074)
==8536==    by 0x1454500A: socket_select (cmselect.c:449)
==8536==    by 0x14546406: libcmselect_LTX_blocking_function (cmselect.c:1050)
==8536==    by 0x97D4D38: CMcontrol_list_wait (cm.c:676)
==8536==    by 0x97D2C8C: CMpoll_forever (cm.c:172)
==8536==    by 0x97D2F38: server_thread_func (cm.c:195)
==8536==    by 0x755C6B9: start_thread (pthread_create.c:333)
==8536==    by 0x787941C: clone (clone.S:109)
==8536==  Address 0x10f39fa0 is 0 bytes inside a block of size 80 free'd
==8536==    at 0x4C2F24B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8536==    by 0x42A877: __gnu_cxx::new_allocator<char>::deallocate(char*, unsigned long) (new_allocator.h:110)
==8536==    by 0x428438: std::allocator_traits<std::allocator<char> >::deallocate(std::allocator<char>&, char*, unsigned long) (alloc_traits.h:517)
==8536==    by 0x425A85: std::_Vector_base<char, std::allocator<char> >::_M_deallocate(char*, unsigned long) (stl_vector.h:178)
==8536==    by 0x595A860: std::vector<char, std::allocator<char> >::_M_fill_insert(__gnu_cxx::__normal_iterator<char*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (vector.tcc:527)
==8536==    by 0x595A455: std::vector<char, std::allocator<char> >::insert(__gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (stl_vector.h:1054)
==8536==    by 0x595A2B8: std::vector<char, std::allocator<char> >::resize(unsigned long, char const&) (stl_vector.h:696)
==8536==    by 0x5A14A1C: void adios2::format::BP3Deserializer::PreDataRead<double>(adios2::core::Variable<double>&, adios2::core::Variable<double>::Info&, adios2::helper::SubStreamBoxInfo const&, char*&, unsigned long&, unsigned long&, unsigned long) (BP3Deserializer.tcc:492)
==8536==    by 0x5C73170: void adios2::core::engine::SstReader::ReadVariableBlocks<double>(adios2::core::Variable<double>&) (SstReader.tcc:56)
==8536==    by 0x5C678A0: adios2::core::engine::SstReader::PerformGets() (in /opt/adios2/lib/libadios2.so.2.3.1)
==8536==    by 0x5C6314A: adios2::core::engine::SstReader::EndStep() (SstReader.cpp:305)
==8536==    by 0x5D20625: adios2::Engine::EndStep() (Engine.cpp:100)
==8536==  Block was alloc'd at
==8536==    at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8536==    by 0x4383F7: __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (new_allocator.h:104)
==8536==    by 0x43811B: std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) (alloc_traits.h:491)
==8536==    by 0x437D85: std::_Vector_base<char, std::allocator<char> >::_M_allocate(unsigned long) (stl_vector.h:170)
==8536==    by 0x595A74B: std::vector<char, std::allocator<char> >::_M_fill_insert(__gnu_cxx::__normal_iterator<char*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (vector.tcc:491)
==8536==    by 0x595A455: std::vector<char, std::allocator<char> >::insert(__gnu_cxx::__normal_iterator<char const*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (stl_vector.h:1054)
==8536==    by 0x595A2B8: std::vector<char, std::allocator<char> >::resize(unsigned long, char const&) (stl_vector.h:696)
==8536==    by 0x5A14A1C: void adios2::format::BP3Deserializer::PreDataRead<double>(adios2::core::Variable<double>&, adios2::core::Variable<double>::Info&, adios2::helper::SubStreamBoxInfo const&, char*&, unsigned long&, unsigned long&, unsigned long) (BP3Deserializer.tcc:492)
==8536==    by 0x5C73170: void adios2::core::engine::SstReader::ReadVariableBlocks<double>(adios2::core::Variable<double>&) (SstReader.tcc:56)
==8536==    by 0x5C678A0: adios2::core::engine::SstReader::PerformGets() (in /opt/adios2/lib/libadios2.so.2.3.1)
==8536==    by 0x5C6314A: adios2::core::engine::SstReader::EndStep() (SstReader.cpp:305)
==8536==    by 0x5D20625: adios2::Engine::EndStep() (Engine.cpp:100)

sst-compression-bug.xml

<?xml version="1.0"?>
<adios-config>
    <io name="io_T1">
        <engine type="SST">
            <parameter key="RendezvousReaderCount" value="1"/>
            <parameter key="QueueLimit" value="1"/>
            <parameter key="QueueFullPolicy" value="Block"/>
        </engine>

        <variable name="a">
            <operation type="sz">
                <parameter key="accuracy" value="0.00001"/>
            </operation>
        </variable>
        <variable name="b">
            <operation type="zfp">
                <parameter key="accuracy" value="0.00001"/>
            </operation>
        </variable>
        <variable name="c">
            <operation type="mgard">
                <parameter key="accuracy" value="0.00001"/>
            </operation>
        </variable>
    </io>
    
    <io name="io_T2_in">
        <engine type="SST">
        </engine>
    </io>

    <io name="io_T2_out">
        <engine type="BP4">
        </engine>
    </io>
</adios-config>

sst-compression-bug.txt

group  io_T1
  # item  type    varname     N   [dim1 dim2 ... dimN  decomp1 decomp2 ... decompN]
  array   double  a           3    32    32    32      XYZ     1       1
  array   double  b           3    32    32    32      XYZ     1       1
  array   double  c           3    32    32    32      XYZ     1       1
  array   double  d           3    32    32    32      XYZ     1       1

group  io_T2_in
  # item  type    varname     N   [dim1 dim2 ... dimN  decomp1 decomp2 ... decompN]
  array   double  a           3    64    32    32      XYZ     1       1
  array   double  b           3    64    32    32      XYZ     1       1
  array   double  c           3    64    32    32      XYZ     1       1
  array   double  d           3    64    32    32      XYZ     1       1

group  io_T2_out
  # use all variables read into io_T2_in in the output 
  link group io_T2_in

app 1
  steps   3
  sleep   2.0 
  write   stream_T1.bp    io_T1

app 2
  steps   over stream_T1.bp   
  read  next  stream_T1.bp    io_T2_in  -1.0  
  cond stream_T1.bp   sleep   1.0     
  cond stream_T1.bp   write   stream_T2.bp    io_T2_out  
  sleep   0.1