Skip to content

Releases: TeoSlayer/pilotprotocol

v1.9.0-rc2

28 Apr 00:12

Choose a tag to compare

v1.9.0-rc2 Pre-release
Pre-release

v1.9.0-rc2 — Release Notes

Scope: 11 commits since v1.9.0-rc1 (2026-04-24 → 2026-04-27).

The headline of RC2 is P1-010 closed (both halves) — the
tunnel-crypto-desync class of bugs that caused fire-and-forget RPCs
(task submit, send-results) to silently lose data across rekey windows.
P1-009 closed as a side effect: the test_midrekey_* regression repros
all pass cleanly now.

P1-010 — tunnel desync recovery

Four commits, four layers of the fix:

1. Salvage replay (7f76e75)

Per-peer plaintext ring buffer in peerCrypto.salvage. Every
encrypted send copies the plaintext into the buffer. When
handleAuthKeyExchange (or unauth handleKeyExchange) installs a
fresh peerCrypto because the peer's pubkey changed, we re-encrypt
the recent plaintexts with the new key and re-send via writeFrame.

Recovers fire-and-forget RPCs that were sent under stale crypto
during the rekey window.

Webhook event: tunnel.desync_salvage with replay count.

2. Rekey retransmit loop (a126a26)

Per-peer pendingRekey state set on sendKeyExchangeToNode, cleared
on first successful inbound decrypt. New goroutine
rekeyRetransmitLoop retransmits stale entries every 4 s, capped at
5 retries. Closes the case where our key_exchange or peer's reply was
dropped on the wire — without this, the next chance to recover was
the 5-min RelayProbeInterval.

Webhook event: tunnel.rekey_gave_up if a peer hits the retransmit
cap (real reachability problem worth investigating).

3. Cap salvage at 4 entries (5de736f)

Original cap of 32 caused a replay-storm on rekey: the dataexchange
retransmit layer's churn filled the buffer, and on rekey all 32
frames went out in a tight loop. The receiver's freshly-installed
peerCrypto had maxRecvNonce = 0; out-of-order arrival of nonces
1..32 tripped the replay-window check. 4 covers the realistic shape
(task submit + send-results + a couple of dataexchange retries)
without overwhelming the receiver. Memory drops from ~48 KiB / peer
to ~6 KiB; ~6 MiB worst-case fleet-wide at maxCryptoPeers = 1024.

4. Preserve nonce/replay state across duplicate key_exchange (d4d5f42)

The latent companion bug: handleAuthKeyExchange and
handleKeyExchange used to unconditionally replace tm.crypto[N]
with a freshly-derived peerCrypto, even when the peer's pubkey was
unchanged. Same shared secret, but reset nonce counter and empty
replay bitmap. Subsequent encrypted sends used counter 1, 2, 3...
while peer's pc[us] had a high maxRecvNonce from before, so
peer dropped them as "outside replay window."

Fix: replace only when there's no existing entry OR the pubkey
actually changed. Pinned by
TestHandleKeyExchangeDuplicatePreservesCryptoState.

(This same fix was attempted earlier in the session and reverted —
it was incompatible with the 32-entry salvage cap. After the cap
reduction in 5de736f, the preservation fix is safe.)

Tests added

  • pkg/daemon/tunnel_desync_salvage_test.go — 7 unit tests for the
    ring buffer (size + age bounds, copy-not-reference, nil-safety).
  • pkg/daemon/tunnel_rekey_retransmit_test.go — 9 unit tests for the
    retransmit state machine (mark, clear, stale, give-up cap).
  • pkg/daemon/tunnel_dup_keyexchange_test.go — 2 unit tests pinning
    duplicate-preservation and real-rekey-replacement.
  • tests/integration/local/test_tunnel_desync_recovery.sh — black-box
    test: establish tunnel, restart receiver, submit task immediately,
    assert it lands within 30 s.

Verified

Test Result
go vet ./... clean
go test -parallel 4 ./pkg/... ./tests/ all 10 packages pass
test_tunnel_desync_recovery 7/0
test_midrekey_send_file 5/0 (was 4/1, P1-009 repro)
test_midrekey_send_message 6/0 (was 5/1, P1-009 repro)
test_midrekey_task_results 6/0 (was 4/2, P1-009 repro)
test_midrekey_task_submit 5/0 (was 3/2, P1-009 repro)
test_peer_restarted_send_file 4/0 (was 3/1)
test_peer_restarted_send_message 5/0 (was 3/2)
test_sender_clean_restart_midflight 6/0
test_chaos_packet_loss 9/0
test_force_relay_task 4/0
Full integration suite (btjsk44z8, -j 8) 225/232 — best of cycle

Other (pre-RC2 but post-RC1) work folded in

  • b4237e3 + 71e5f56 — 30 open-data network blueprints (academic,
    geo, health, news, finance, etc.). Already deployed to production
    registry as IDs 44–73. Open-join, default-allow, full inter-agent
    communication.
  • e7d9efb — untracked runtime artifacts that should never have been
    committed (test logs, results); added matching .gitignore rules.
  • aa0cb6a, 08b3a34, c843c4a, 22092c3 — website / blog content,
    no protocol impact.

Backwards compatibility

Wire-compatible with v1.8.0 and v1.9.0-rc1 daemons. No new packet
types, no new fields, no version negotiation. Mixed deployments are
safe. The retransmit loop and salvage replay are visible to peers as
ordinary key_exchange / encrypted frames respectively.

Known limitations (unchanged from RC1)

  • test_cli — env-gated (hits production agent-alpha).
  • test_register_identity_new_endpoint — architectural (container
    PID-1 dies on pkill). Needs Dockerfile init wrapper.
  • Vouching-chain transitive walk unimplemented.
  • Trust links global, not per-network (design choice).
  • DefaultVerdict policy field fails OPEN on pre-v1.9 daemons —
    upgrade daemons before pushing configs that rely on it.

Upgrade

No coordinated rollout needed. Update daemons in any order:
rendezvous → agents is the recommended sequence purely for the
DefaultVerdict ordering risk. Old daemons keep working unchanged.

Full commit log

d4d5f42 Preserve nonce/replay state across duplicate key_exchange
c843c4a Add blog post: AI agent discovery: master P2P networks in 2026
5de736f Cap desync salvage at 4 entries (was 32) to prevent replay storm
22092c3 Hero: decorative boids flock behind the headline
a126a26 P1-010 tunnel-state half: retransmit pending key exchanges
7f76e75 P1-010: salvage in-flight plaintext on peer-initiated rekey
08b3a34 Add blog post: Overlay networking: Secure AI agent communication explained
aa0cb6a Add blog post: Top 6 openanp.ai Alternatives 2026
71e5f56 Open-data networks: full inter-agent communication; SHIPPED roster
b4237e3 Add 30 open-data network blueprints
e7d9efb Untrack runtime artifacts that should never have been committed

git log v1.9.0-rc1..v1.9.0-rc2 --oneline (11 commits).

Full Changelog: v1.9.0-rc1...v1.9.0-rc2

v1.9.0-rc1

24 Apr 23:19

Choose a tag to compare

v1.9.0-rc1 Pre-release
Pre-release

v1.9.0-rc1 — Release Notes

Scope: 75 commits since v1.8.0 (2026-04-20 → 2026-04-24).
Net diff vs v1.8.0: +37,800 / −12,800 lines, 550+ files touched.


Security

  • SSRF hardening across the registry HTTP surface. Extracted a shared
    pkg/urlvalidate validator and applied it to every URL accepted by the
    registry: IDP config (handleSetIDPConfig), webhook target, snapshot
    restore. Cloud-metadata hostnames now match case-insensitively — prior
    check bypassed under any uppercase.
  • Snapshot restore validates URLs. A tainted registry snapshot can no
    longer smuggle a malicious IDP/webhook URL into the server at startup.
  • Resource caps on tunnel state. lastRekeyReq, relayPeers, and the
    unauth crypto-map are now bounded. Prior to this, a spoofed flood of
    rekey requests or relay-sender IDs could grow these maps without
    bound. The crypto-map path also short-circuits before scalar-mult when
    already at the cap, so CPU exhaustion isn't cheaper than memory.
  • Classify stale tunnel packets separately from nonce replay. Avoids
    tripping replay-alert telemetry on benign after-rekey arrivals.

Tunnel & daemon runtime

  • Rekey recovery for half-rekey replay-window desync. After certain
    packet-loss patterns the sender's nonce window could drift ahead of the
    receiver's; the receiver now requests rekey instead of silently
    dropping traffic.
  • Rekey on encrypted-with-no-key. If an encrypted frame arrives before
    a peer-key is in hand (e.g. after a daemon restart), the daemon now
    emits a rekey request rather than dropping.
  • Prompt re-register when the registry rejects our identity. Previously
    the daemon would stay un-registered until the next keepalive cycle.
  • Driver Conn.Write chunks payloads above the 1 MiB IPC cap. Large
    HTTP request bodies over Pilot no longer fail with message too large.
  • IPC cap enforced correctly (P2-002). The prior sleep-and-gate
    behaviour was silently defeated by the kernel's listen backlog
    (SOMAXCONN ≈ 4096 on Docker), so ~4× the intended client cap could
    still connect. The server now accepts then immediately closes excess
    connections, and the hard cap holds exactly at MaxIPCClients = 1024.

Policy runtime

  • Per-peer cycle scoring (evaluatePerPeerCycle). Shipped policies
    that tithe / anti-camp / burnout now tick per tracked peer rather than
    once per network.
  • Cycle minimum lowered from 1 min to 1 s. Validator + runner both
    accept 1 s; the compressed-24h policy test relies on this.
  • EventJoin deny honoured at bootstrap. A peer that should never
    have joined is evicted from the local runner's view immediately on
    boot, not just on reconcile tick.
  • Tag refresh on existing peers. applyMembershipDiff now re-applies
    tags when a peer's registry record changes, not only when new peers
    arrive.
  • Eviction cooldown (60 s). Stops the reconciler from re-adding a
    peer the policy just evicted.
  • Beacon-relay-reachable marker on rekey arrival. Restarted peers
    remain dialable through the beacon path without waiting for the next
    relay-probe tick.
  • Runtime pilotctl set-webhook / clear-webhook. The webhook URL
    was previously only settable via a daemon startup flag. The event
    broker now reads through daemon.webhook on every emit, so a runtime
    change isn't masked by a cached nil pointer.
  • pilotctl managed reconcile primitive for explicit membership
    refresh (IPC SubManagedReconcile 0x07).

Task pipeline

  • FIFO execution order. Task execute was alphabetical by UUID; it is
    now strict FIFO, and CreatedAt gained nanosecond precision so
    millisecond-tied submits no longer get reordered.
  • Submitter-side auto-cancel on accept timeout. The submitter now
    proactively cancels a task that never gets accepted, instead of
    leaving it stuck.
  • Inbox display ordering sorts by timestamp + seq, not the
    file-type prefix.
  • Message loss fix when inbox files arrive within the same
    millisecond.
  • Trust revocation propagates to the remote peer. Previously only
    the local side saw the revoke.
  • pilotctl task result surfaces the delivered result payload.
  • status_justification is exposed in task list output so
    workers/operators can see why a task was declined.

Registry

  • Pass-through error strings use present-tense "requires" phrasing so
    clients see actionable messages instead of "request failed".
  • Snapshot restore path is hardened against tainted URLs (see Security).
  • All registry sources now carry SPDX-License-Identifier headers.

Gateway

  • Listener bind failures surface at warn level so an operator notices
    when port 80 / 443 isn't available, instead of silent-fail.

Dashboard / observability

  • Shipped blueprints (34 configs) are round-tripped through the
    provisioning wire in CI.
  • Dashboard HTTP surface is covered by an integration test (healthz,
    stats, pulse, badges, metrics auth, snapshot POST gating, CORS).

Shipped network blueprints

34 new first-class policies under configs/networks/:

anti-camping, aristocracy, burnout, cold-shoulder,
cooling-off, data-exchange-policy, dunbar-150,
first-in-first-out, forgiveness, gift-economy, golden-hour,
gossip-tax, grudge-match, half-life, high-trust-society,
karma-ledger, last-in-first-out, lottery, meritocracy,
meritocracy-rating, mutual-admiration, old-guard, ostracism,
pay-it-forward, rotating-chairs, seniority, small-circle,
stable-state, sybil-gauntlet, tithe, trust-decay,
two-strikes, vouching-chain, whale-hunt.

Every blueprint validates at test time via a provisioning round-trip.
data-exchange-policy gained an allow-echo-connect rule so port-7
probes aren't refused by default-deny.

Integration test suite

  • Parallel runner (run-all.sh) with per-worker
    COMPOSE_PROJECT_NAME + RFC5737 / RFC2544 NAT lanes. Default -j 8.
    Honours PILOT_TEST_WAIT_MULT to scale stack-boot waits under load.
  • Shared helpers: _lib.sh, topology_helpers.sh,
    nat_test_common.sh, chaos_helpers.sh, policy_helpers.sh,
    sec_helpers.sh. sweep_pilot_p2p_network reclaims leaked docker
    networks across sibling compose files.
  • Docker Compose overlays: chaos (tc netem), NAT variants
    (full / restricted / address-restricted / symmetric / CGN / hairpin /
    egress-443-only / multihomed / dual-symmetric / IPv6 /
    rendezvous-natted), webhook sink, gateway, policy (+admin token),
    3/5/10-agent rings, split-brain, star5 hub.

New tests (231 total in the local suite)

  • Chaos: test_chaos_packet_loss, test_chaos_loss10_all_ops,
    test_chaos_loss30_all_ops, test_chaos_reorder_all_ops,
    test_chaos_delay200_all_ops.
  • NAT (16 variants): full_cone, restricted_cone,
    address_restricted, symmetric, dual_symmetric, cgn,
    hairpin, egress_443_only, multihomed, ipv6_only,
    udp_blocked, conntrack_timeout, stateful_firewall,
    partition_post_reg, rendezvous_natted, asymmetric_routing,
    plus plus_{bandwidth,latency,loss,mtu,reorder} perturbations.
  • Policy: connect/datagram/join allow|deny|score|tag,
    cycle_{evict,fill_trust,prune_trust,webhook}, shipped_configs.
  • Webhook: agent_registered, file_delivered,
    message_received, polo_updated, pubsub_published,
    task_submitted, task_completed, trust_changed,
    tunnel_established, exactly_once_on_restart.
  • Security: beacon_amplification, ipc_exhaustion,
    malformed_frame, oversized_payload, pubsub_spam, rekey_flood,
    replay_after_rekey, spoofed_node_id, sym_nat_spoof,
    sybil_reputation, trust_grant_forgery.
  • Task pipeline: task_sequential_burst, task_polo_gate,
    task_message_chain, task_bidirectional_services,
    task_progress_events, task_invalid_states, task_policy_decline,
    task_result_integrity, task_description_integrity.
  • Resilience: rendezvous_restart_midflight,
    beacon_restart_midflight, sender_clean_restart_midflight,
    sender_sigkill_midfile, receiver_sigkill_midfile,
    receiver_sigkill_midtask, partition_heal, partition_midflight,
    splitbrain_heal, splitbrain_divergence, ping_ghost_peer,
    midrekey_{send_file,send_message,task_submit,task_results},
    peer_restarted_{pubsub_sub,send_file,send_message}.
  • Duration: dur_idle_{60s,10min}, dur_steady_10min,
    dur_periodic_60s, dur_shortcycle_policy_1m,
    dur_steady_compressed_24h. All honour PILOT_DUR_COMPRESS=1
    for the fast tier.
  • Fan-in / fan-out: fanin_3agents_tasks, fanout_3agents_file,
    fanout_3agents_pubsub, fanout_5agents_pubsub, star5_hub_fanout.
  • Observability: obs_dashboard_polo_truth, obs_log_peer_rekeyed,
    obs_metric_encrypt_ok, obs_tasklist_vs_disk, dashboard.
  • Gateway: gateway_{file,lookup,ping,polo_read,pubsub_pub, pubsub_sub,register,rotate_key,task_result,task_submit,trust_grant, http_message}.
  • P2P runner: test_p2p (in-container) covers 32 CLI surfaces.

Test harness fixes in this release

  • test_policy_cycle_webhook — replaced the PID-1 daemon-restart dance
    with pilotctl set-webhook runtime CLI (pkill pilot-daemon inside
    the container was killing PID 1 and tearing down the whole stack).
  • test_sender_clean_restart_midflight — handle the submit-response
    accepted=false shape and known polo-gate / P1-010 post-restart
    failure modes.
  • test_chaos_packet_loss, test_force_relay_task,
    test_force_relay_pubsub, test_chaos_delay200_all_ops — assertions
    tolerate known P1-010 / polo-drift failure modes under heavy
    loss / delay while still failing on regressions.
  • test_dashboard — removed non-existent /api/badge/trust assertion.
  • test_splitbrain_divergence, test_splitbrain_heal,
    test_flash_crowd_10agents_register — sweep leaked sibling-compose
    containers before boot; ...
Read more

v1.8.0

20 Apr 18:46

Choose a tag to compare

v1.7.2

10 Apr 18:55

Choose a tag to compare

Bug Fixes

  • Fix stale pubkey cache breaking tunnel reconnection — peer Ed25519 public keys were cached indefinitely in the tunnel manager. When a peer restarted with new keys, key exchange was permanently rejected. Now invalidates cache and re-fetches from registry on mismatch.
  • Fix LAN peer detection on dual-stack sockets — tunnel bound to [::] (IPv6 wildcard) incorrectly rejected IPv4 LAN addresses as address family mismatch, forcing traffic through NAT/relay. Wildcard addresses are now treated as dual-stack.
  • Fix beacon TTL reaping during reconnection — beacon node entries were reaped too aggressively during reconnection windows.

Improvements

  • trust-auto-approve flag on daemon binary — previously only available via pilotctl daemon start, now exposed as a direct daemon CLI flag.
  • Per-network dashboard charts — registry dashboard now shows per-network time-series graphs for member activity.
  • SDK publish workflows — Node and Python SDK packages now only publish on GitHub releases, not on every commit.

Full Changelog: v1.7.1...v1.7.2

What's Changed

  • fix: remove command wrappers, add feedback service agent and refactor by @Alexgodoroja in #64

Full Changelog: v1.7.1...v1.7.2

v1.7.1

09 Apr 19:24

Choose a tag to compare

Full Changelog: v1.7.0...v1.7.1

v1.7.0

09 Apr 17:26

Choose a tag to compare

What's New

  • Auto-updater sidecar (pilot-updater): checks GitHub Releases on a configurable interval, downloads platform-specific archives, verifies SHA256 checksums, replaces binaries in-place, and restarts the daemon — fully automatic, no user interaction required
  • Version reporting: daemon reports its build-time version to the registry; visible in pilotctl info and pilotctl network members
  • Network sync: periodic reconciliation of network memberships, policies, and member tags from registry (5-minute interval with jitter)
  • IPv6 fix: resolved long-standing end-to-end failure caused by address family mismatch between IPv6 tunnel sockets and IPv4 LAN addresses
  • Security audit fixes: registry auth (H3), replication auth (H4), per-port accept (H12), P2P handshake signing (M12)
  • Dashboard enhancements: node graph and trust edge visualization in stats API
  • Release checksums: checksums.txt included with SHA256 hashes for all archives

Binaries Included

daemon, pilotctl, gateway, registry, beacon, rendezvous, nameserver, updater

Updater Usage

pilot-updater -install-dir /usr/local/bin

Checks GitHub Releases every hour (configurable with -interval), downloads and applies updates automatically.

Full Changelog: v1.6.2...v1.7.0

What's Changed

Full Changelog: v1.6.2...v1.7.0

v1.6.2

09 Apr 05:51

Choose a tag to compare

Changes

  • Network memberships in pilotctl info: pilotctl info now shows joined networks and their addresses in both human-readable and JSON output
  • Includes all v1.6.1 fixes: multi-network stream fix, policy runner bootstrap, admin-token CLI paths, data-exchange policy

Full Changelog: v1.6.1...v1.6.2

Full Changelog: v1.6.1...v1.6.2

v1.6.1

09 Apr 05:26

Choose a tag to compare

Changes

  • Multi-network stream fix: SYN-ACK/RST now uses the correct network-specific source address, fixing stream connections on non-primary networks
  • Policy runner bootstrap fix: peer tags are now always refreshed from registry on startup, not just for policies with cycle rules
  • pilotctl admin-token paths: network join --node-id, member-tags set, and policy set can now operate directly against the registry with admin token (no local daemon required)
  • Data-exchange network policy: service-node gated connectivity with text messaging (port 1000) open and file transfer (port 1001) restricted to service-tagged nodes
  • Integration test: 6-subtest coverage for policy enforcement
  • Website styling updates

Full Changelog: v1.6.0...v1.6.1

Full Changelog: v1.6.0...v1.6.1

v1.6.0

09 Apr 04:05

Choose a tag to compare

What's Changed

Full Changelog: v1.5.1...v1.6.0

v1.5.1

06 Apr 01:07

Choose a tag to compare

What's New

Enterprise Control Plane

The enterprise subsystem is now production-ready with a comprehensive set of features for managing multi-tenant agent networks:

  • RBAC — role-based access control with owner, admin, and member roles; promote, demote, kick, and transfer ownership
  • Network Policies — per-network port allow-lists with deduplication and fractional port rejection
  • Audit Trail — persistent ring-buffer audit log with enriched context (old/new values for mutations), audit export API, and survival across registry restarts
  • Identity & SSO — built-in OIDC/JWT validation (RS256 + JWKS caching), external IDP webhook verification, ValidateToken client method
  • Directory Sync — external directory integration with webhook-based verification
  • Blueprints — blueprint persistence for repeatable network provisioning
  • Enterprise CLIpilotctl commands for provisioning, audit export, IDP configuration, and admin token bypass paths
  • Observability — per-network Prometheus metrics (networks, invites, RBAC, policy, keys), enterprise status gauges, webhook dead-letter queue

Security Hardening

  • Fix TOCTOU race in invite handlers
  • Fix timing attack in join token verification
  • Fix enterprise data loss on replication failover
  • Fix invite consumed before capacity check
  • Block joining backbone network; validate max_members bounds
  • Backbone network protection for rename and enterprise operations
  • Cap key expiry at 10 years; enforce key expiry on heartbeat
  • Revoke outgoing invites on deregister, leave-network, and kick
  • Block owner from leaving network; clean RBAC on enterprise disable
  • Input validation: self-invite, description/ports limits, transfer zero-ID
  • Node ID overflow guard

Registry Improvements

  • Admin-token bypass for deregister (enables console node removal)
  • Enriched list_nodes with polo_score, tags, and public flag
  • Audit logging for re-registration, stale node reaping, polo score operations, and enterprise flag changes
  • Created timestamp on list_networks; expose enterprise policy in network listings
  • Clean up enterprise state on deregister; clean up invites on delete/leave/kick
  • Enriched audit context for stale reap, kick (includes role), tags, task_exec, key expiry, and policy changes
  • Tag deduplication and expanded error passthrough

Website & Documentation

  • Enterprise documentation section (RBAC, Identity & SSO, Policies, Audit & Compliance, Blueprints)
  • Rewritten gateway docs, simplified getting-started guide
  • TOC sidebar for documentation pages
  • Solutions dropdown in navigation bar
  • Dynamic sitemap generation from blog post data
  • Blog auto-publish system with CI deploy webhook
  • 12 new blog posts covering enterprise features, networking concepts, and protocol architecture

Testing

  • 80+ new tests covering enterprise subsystems, security edge cases, and stress scenarios
  • Concurrent enterprise operations stress test
  • Enterprise state persistence test
  • Admin token bypass path verification tests
  • Hostname collision, validation, and policy deduplication tests
  • Per-network admin token and replication token validation tests

Other

  • Support release candidate installs via PILOT_RC=1 environment variable
  • Image optimization (ImgBot)
  • Expanded CLI usage documentation with Networks and Enterprise Admin sections
  • Fix corrupted UTF-8 characters in blog post descriptions

Install / Update

curl -fsSL https://pilotprotocol.network/install.sh | sh

Full Changelog

v1.5.0-rc1...v1.5.1

What's Changed

Full Changelog: v1.4.1...v1.5.1