Summary
Add support for mlx5 Dynamically Connected (DC) transport, which enables scalable RDMA without per-peer QP setup. DC is critical for large clusters where RC (Reliable Connection) creates N² QP state.
Motivation
With RC, each node needs a dedicated QP per remote connection. At 100+ nodes with 64 connections each, this becomes 6400+ QPs per node. DC reduces this to 1 DCT (target) + W DCI (initiator) QPs per node, regardless of cluster size.
What DC provides
- DCI (DC Initiator): Client-side QP that can talk to any DCT on any remote node
- DCT (DC Target): Server-side QP that accepts from any DCI
- Per-WR addressing:
wr_set_dc_addr(ah, remote_dctn, dc_key) sets the target per work request, not per QP
- dc_key: 64-bit shared secret for authentication
Required API additions
rdma-mummy-sys (C bindings)
- Bind
mlx5dv.h — specifically:
mlx5dv_create_qp() for DCI/DCT creation
mlx5dv_context and mlx5dv_query_device() for capability detection
MLX5DV_QP_INIT_ATTR_MASK_DC, MLX5DV_DCTYPE_DCT, MLX5DV_DCTYPE_DCI
- DC-related fields in
mlx5dv_qp_init_attr
- Link against
libmlx5 provider library
sideway (Rust API)
Mlx5DeviceContext — query DC capabilities
DcInitiatorQueuePair / DcTargetQueuePair types via mlx5dv_create_qp()
wr_set_dc_addr(ah, dctn, dc_key) on the extended send path
QueuePairType::DcInitiator / QueuePairType::DcTarget enum variants
- Feature-gated:
#[cfg(feature = "mlx5")] since DC is Mellanox/NVIDIA-specific
Hardware tested on
- ConnectX-6 (vendor_part_id 4129), firmware 28.44.1036
- MLNX_OFED 25.01, rdma-core 2501mlnx56
mlx5dv.h present with MLX5DV_QP_INIT_ATTR_MASK_DC confirmed
References
Happy to contribute a PR for this. Can test on real ConnectX-6 hardware.
Summary
Add support for mlx5 Dynamically Connected (DC) transport, which enables scalable RDMA without per-peer QP setup. DC is critical for large clusters where RC (Reliable Connection) creates N² QP state.
Motivation
With RC, each node needs a dedicated QP per remote connection. At 100+ nodes with 64 connections each, this becomes 6400+ QPs per node. DC reduces this to 1 DCT (target) + W DCI (initiator) QPs per node, regardless of cluster size.
What DC provides
wr_set_dc_addr(ah, remote_dctn, dc_key)sets the target per work request, not per QPRequired API additions
rdma-mummy-sys (C bindings)
mlx5dv.h— specifically:mlx5dv_create_qp()for DCI/DCT creationmlx5dv_contextandmlx5dv_query_device()for capability detectionMLX5DV_QP_INIT_ATTR_MASK_DC,MLX5DV_DCTYPE_DCT,MLX5DV_DCTYPE_DCImlx5dv_qp_init_attrlibmlx5provider librarysideway (Rust API)
Mlx5DeviceContext— query DC capabilitiesDcInitiatorQueuePair/DcTargetQueuePairtypes viamlx5dv_create_qp()wr_set_dc_addr(ah, dctn, dc_key)on the extended send pathQueuePairType::DcInitiator/QueuePairType::DcTargetenum variants#[cfg(feature = "mlx5")]since DC is Mellanox/NVIDIA-specificHardware tested on
mlx5dv.hpresent withMLX5DV_QP_INIT_ATTR_MASK_DCconfirmedReferences
Happy to contribute a PR for this. Can test on real ConnectX-6 hardware.