Skip to content

fix: relay reconnection does not trigger when relay is down on startup #899

@rekmarks

Description

@rekmarks

Problem

When a kernel initializes remote comms with a known relay that is currently unreachable, the libp2p node starts successfully but has no /p2p-circuit address. Relay-based connections (both inbound and outbound) are unavailable, and the kernel does not recover when the relay comes back up.

The custom relay reconnection logic in ConnectionFactory (packages/ocap-kernel/src/remotes/platform/connection-factory.ts) only triggers on connection:close events:

this.#libp2p.addEventListener('connection:close', (evt) => {
  const remotePeerId = evt.detail.remotePeer.toString();
  if (this.#relayPeerIds.has(remotePeerId)) {
    this.#scheduleRelayReconnect(remotePeerId);
  }
});

A connection:close event is never emitted for a relay that was never successfully connected, so #scheduleRelayReconnect / #reconnectRelay are never invoked. libp2p's autoDial does not reliably compensate for this — in practice the kernel remains in a broken state with respect to that relay until restart.

Expected behavior

If a relay is unreachable on startup, the kernel should actively retry connecting to it with the same exponential-backoff mechanism used for post-connection relay dropouts (#reconnectRelay, base delay 5s, max delay 60s, max 10 attempts), and recover automatically when the relay comes back up.

Suggested fix

After libp2p.start(), check whether each known relay is connected. For any relay not yet connected, call #scheduleRelayReconnect immediately so the retry path is exercised symmetrically for both startup failures and post-connection dropouts.

A periodic watchdog that detects and reschedules disconnected relays would also address this more robustly.

Affected file

packages/ocap-kernel/src/remotes/platform/connection-factory.ts

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions