SST: RDMA data plane issues on InfiniBand
Created by: keichi
I found some issues in the RDMA data plane on InfiniBand.
RDMA DP fails to detect InfiniBand network interfaces
The RDMA DP checks if the name of the underlying fabric provider of an interface matches some pre-defined value (verbs, gni and psm2) to see if it supports RDMA (except if a specific interface is forced via FABRIC_IFACE or DataInterface). However, the provider name can also be something like verbs;ofi_rxm or ofi_rxd;verbs, where ofi_rxm and ofi_rxd are utility fabric providers that are layered on top of base providers (see libfabric docs about prov_name). In this case, that fabric interface gets ignored by RDMA DP.
$ jsrun -n 1 $PROJWORK/csc143/keichi/libfabric-install/bin/fi_info -p verbs -t FI_EP_RDM
provider: verbs;ofi_rxm
fabric: IB-0xfe80000000000000
domain: mlx5_0
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXM
provider: verbs;ofi_rxd
fabric: IB-0xfe80000000000000
domain: mlx5_1-dgram
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: verbs;ofi_rxd
fabric: IB-0xfe80000000000000
domain: mlx5_3-dgram
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: verbs;ofi_rxd
fabric: IB-0xfe80000000000000
domain: mlx5_0-dgram
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
provider: verbs;ofi_rxd
fabric: IB-0xfe80000000000000
domain: mlx5_2-dgram
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD
RDMA DP hangs on InfiniBand
RDMA DP reader hangs inside RdmaWaitForCompletion() while synchronously waiting for RMA read requests to complete. This is because the RDMA DP does not explicitly request automatic progress (FI_PROGRESS_AUTO). If the provider chooses manual progress (FI_PROGRESS_MANUAL), the application will need to periodically call fi_cq_read() or similar functions to progress the communication. Currently only the reader side callsfi_cq_read() in RDMA DP. Although all fabric providers are required to support automatic progress, some (including theverbs provider) choose manual progress by default. RMA reads won’t complete on these fabric providers unless we call fi_cq_read() on the reader.
The issues listed above are confirmed on both Summit and JAXA JSS2, using the latest libfabric release (v1.7.2).