SST hangs for heat2d code on Titan
Created by: swatisgupta
I am trying to run heat2d(cpp) example using SST engine but both the codes seem to hang while trying to connect to each other (works fine with BPFile engine). The output and runtime configurations are as follows:
(1) aprun -n 12 ./heatSimulation sim.bp 4 3 5 10 10 10 Process decomposition : 4 x 3 Array size per process : 5 x 10 Number of output steps : 10 Iterations per step : 10 Simulation step 0: initialization Simulation step 1 Simulation step 2 Simulation step 3 Simulation step 4 Simulation step 5 Simulation step 6 Simulation step 7 Simulation step 8 Simulation step 9 Total runtime = 37.5355s
(2) aprun -n 2 ./heatAnalysis sim.bp analysis.bp 2 1 rank 1 reads 2D slice gndx = 20 10 x 30 from offset (10,0) gndy = 30 rank 0 reads 2D slice 10 x 30 from offset (0,0)
(3) Contents of adios2.xml
<?xml version="1.0"?>
<adios-config>
<io name="SimulationOutput">
<engine type="SST">
<parameter key="DataTransport" value="wan"/>
</engine>
</io>
<io name="AnalysisOutput">
<engine type="SST">
<parameter key="DataTransport" value="wan"/>
</engine>
</io>
<io name="VizInput">
<engine type="SST">
<parameter key="DataTransport" value="wan"/>
</engine>
</io>
</adios-config>
(4) Changes to heat2d Makefile: override ADIOS_INC=-I/lustre/atlas2/csc143/scratch/ssinghal/BUILDS/include override ADIOS_LIB=-L/lustre/atlas2/csc143/scratch/ssinghal/BUILDS/lib -ladios2 -L/usr/lib64 override ADIOS_LIB+= -ladios2_sst -ladios2_evpath -ladios2_enet -ladios2_dill -ladios2_ffs -ladios2_dill -lfabric -ladios2_atl -lzmq -ldl -lbz2
(5) List of loaded modules:
- modules/3.2.10.6
- nodestat/2.2-1.0502.60539.1.31.gem
- sdb/1.1-1.0502.63652.4.27.gem
- alps/5.2.4-2.0502.9950.37.1.gem
- lustre-cray_gem_s/2.8.2_3.0.101_0.46.1_1.0502.8871-1.0502.0.32.1
- udreg/2.3.2-1.0502.10518.2.17.gem
- ugni/6.0-1.0502.10863.8.28.gem
- gni-headers/4.0-1.0502.10859.9.19.gem
- dmapp/7.0.1-1.0502.11080.8.74.gem
- xpmem/0.1-2.0502.64982.7.19.gem
- hss-llm/7.2.0
- Base-opts/1.0.2-1.0502.60680.2.4.gem
- cray-mpich/7.6.3
- craype-network-gemini
- craype-interlagos
- craype/2.5.13
- lustredu/1.4
- xalt/0.7.5
- git/2.13.0
- module_msg/0.1
- modulator/1.2.0
- hsi/5.0.2.p1
- DefApps
- gcc/7.3.0
- cmake3/3.6.0
- cray-libsci/16.11.1
- pmi/5.0.9-1.0000.10911.175.4.gem
- atp/2.1.1
- PrgEnv-gnu/5.2.82
(6) ADIOS cmake configuration options used: cmake -DCMAKE_INSTALL_PREFIX=/lustre/atlas/scratch/ssinghal/csc143/BUILDS ../ADIOS2 -- Cray Programming Environment 2.5.13 C -- Cray Programming Environment 2.5.13 CXX -- Could NOT find ZFP (missing: ZFP_LIBRARY ZFP_INCLUDE_DIR) -- Could NOT find SZ (missing: SZ_LIBRARY SZ_INCLUDE_DIR) -- Cray Programming Environment 2.5.13 Fortran -- Found MPI: TRUE (found version "3.1") found components: C Fortran -- Could NOT find HDF5 (missing: HDF5_LIBRARIES HDF5_INCLUDE_DIRS C) -- Checking for module 'libfabric>=1.6' -- Found libfabric, version 1.6.1
-- ADIOS2 ThirdParty: Configuring KWSys
-- ADIOS2 ThirdParty: Configuring GTest
-- ADIOS2 ThirdParty: Configuring pugixml
-- ADIOS2 ThirdParty: Configuring nlohmann_json -- Using the single-header code from /lustre/atlas/scratch/ssinghal/csc143/ADIOS2/thirdparty/nlohmann_json/nlohmann_json/single_include/
-- ADIOS2 ThirdParty: Configuring atl -- Found atl: /lustre/atlas/scratch/ssinghal/csc143/builds/thirdparty/atl/atl/atl-config.cmake (found version "2.2.1")
-- ADIOS2 ThirdParty: Configuring dill -- Found dill: /lustre/atlas/scratch/ssinghal/csc143/builds/thirdparty/dill/dill/dill-config.cmake (found version "2.3.2")
-- ADIOS2 ThirdParty: Configuring ffs -- Found dill: /lustre/atlas/scratch/ssinghal/csc143/builds/thirdparty/dill/dill/dill-config.cmake (found suitable version "2.3.2", minimum required is "2.3.1") -- Found atl: /lustre/atlas/scratch/ssinghal/csc143/builds/thirdparty/atl/atl/atl-config.cmake (found suitable version "2.2.1", minimum required is "2.2.1") -- Found ffs: /lustre/atlas/scratch/ssinghal/csc143/builds/thirdparty/ffs/ffs/ffs-config.cmake (found version "1.5.2")
-- ADIOS2 ThirdParty: Configuring enet -- Found enet: /lustre/atlas/scratch/ssinghal/csc143/builds/thirdparty/enet/enet/enet-config.cmake (found version "1.3.13")
-- ADIOS2 ThirdParty: Configuring EVPath -- Found ffs: /lustre/atlas/scratch/ssinghal/csc143/builds/thirdparty/ffs/ffs/ffs-config.cmake (found suitable version "1.5.2", minimum required is "1.5.1") -- Could NOT find nvml (missing: NVML_LIBRARY NVML_INCLUDE_DIR) -- Found enet: /lustre/atlas/scratch/ssinghal/csc143/builds/thirdparty/enet/enet/enet-config.cmake (found suitable version "1.3.13", minimum required is "1.3.13") -- Checking for module 'libfabric' -- Found libfabric, version 1.6.1 -- Could NOT find nnti (missing: NNTI_INCLUDE_DIR NNTI_trios_nnti_LIBRARY NNTI_trios_support_LIBRARY)
-- Found MPI: TRUE (found version "3.1") found components: C
ADIOS2 build configuration: ADIOS Version: 2.2.0 C++ Compiler : GNU 7.3.0 CrayPrgEnv /opt/cray/craype/2.5.13/bin/CC
Fortran Compiler : GNU 7.3.0 CrayPrgEnv /opt/cray/craype/2.5.13/bin/ftn
Installation prefix: /lustre/atlas/scratch/ssinghal/csc143/BUILDS bin: bin lib: lib include: include cmake: lib/cmake/adios2
Features: Library Type: static (without PIC) Build Type: Release Testing: ON Build Options: BZip2 : ON ZFP : OFF SZ : OFF MPI : ON DataMan : ON SST : ON ZeroMQ : ON HDF5 : OFF ADIOS1 : OFF Python : OFF Fortran : ON SysVShMem: ON
-- Configuring done -- Generating done -- Build files have been written to: /lustre/atlas/scratch/ssinghal/csc143/builds