Releases: TeoSlayer/pilotprotocol
v1.9.0-rc2
v1.9.0-rc2 — Release Notes
Scope: 11 commits since v1.9.0-rc1 (2026-04-24 → 2026-04-27).
The headline of RC2 is P1-010 closed (both halves) — the
tunnel-crypto-desync class of bugs that caused fire-and-forget RPCs
(task submit, send-results) to silently lose data across rekey windows.
P1-009 closed as a side effect: the test_midrekey_* regression repros
all pass cleanly now.
P1-010 — tunnel desync recovery
Four commits, four layers of the fix:
1. Salvage replay (7f76e75)
Per-peer plaintext ring buffer in peerCrypto.salvage. Every
encrypted send copies the plaintext into the buffer. When
handleAuthKeyExchange (or unauth handleKeyExchange) installs a
fresh peerCrypto because the peer's pubkey changed, we re-encrypt
the recent plaintexts with the new key and re-send via writeFrame.
Recovers fire-and-forget RPCs that were sent under stale crypto
during the rekey window.
Webhook event: tunnel.desync_salvage with replay count.
2. Rekey retransmit loop (a126a26)
Per-peer pendingRekey state set on sendKeyExchangeToNode, cleared
on first successful inbound decrypt. New goroutine
rekeyRetransmitLoop retransmits stale entries every 4 s, capped at
5 retries. Closes the case where our key_exchange or peer's reply was
dropped on the wire — without this, the next chance to recover was
the 5-min RelayProbeInterval.
Webhook event: tunnel.rekey_gave_up if a peer hits the retransmit
cap (real reachability problem worth investigating).
3. Cap salvage at 4 entries (5de736f)
Original cap of 32 caused a replay-storm on rekey: the dataexchange
retransmit layer's churn filled the buffer, and on rekey all 32
frames went out in a tight loop. The receiver's freshly-installed
peerCrypto had maxRecvNonce = 0; out-of-order arrival of nonces
1..32 tripped the replay-window check. 4 covers the realistic shape
(task submit + send-results + a couple of dataexchange retries)
without overwhelming the receiver. Memory drops from ~48 KiB / peer
to ~6 KiB; ~6 MiB worst-case fleet-wide at maxCryptoPeers = 1024.
4. Preserve nonce/replay state across duplicate key_exchange (d4d5f42)
The latent companion bug: handleAuthKeyExchange and
handleKeyExchange used to unconditionally replace tm.crypto[N]
with a freshly-derived peerCrypto, even when the peer's pubkey was
unchanged. Same shared secret, but reset nonce counter and empty
replay bitmap. Subsequent encrypted sends used counter 1, 2, 3...
while peer's pc[us] had a high maxRecvNonce from before, so
peer dropped them as "outside replay window."
Fix: replace only when there's no existing entry OR the pubkey
actually changed. Pinned by
TestHandleKeyExchangeDuplicatePreservesCryptoState.
(This same fix was attempted earlier in the session and reverted —
it was incompatible with the 32-entry salvage cap. After the cap
reduction in 5de736f, the preservation fix is safe.)
Tests added
pkg/daemon/tunnel_desync_salvage_test.go— 7 unit tests for the
ring buffer (size + age bounds, copy-not-reference, nil-safety).pkg/daemon/tunnel_rekey_retransmit_test.go— 9 unit tests for the
retransmit state machine (mark, clear, stale, give-up cap).pkg/daemon/tunnel_dup_keyexchange_test.go— 2 unit tests pinning
duplicate-preservation and real-rekey-replacement.tests/integration/local/test_tunnel_desync_recovery.sh— black-box
test: establish tunnel, restart receiver, submit task immediately,
assert it lands within 30 s.
Verified
| Test | Result |
|---|---|
go vet ./... |
clean |
go test -parallel 4 ./pkg/... ./tests/ |
all 10 packages pass |
test_tunnel_desync_recovery |
7/0 |
test_midrekey_send_file |
5/0 (was 4/1, P1-009 repro) |
test_midrekey_send_message |
6/0 (was 5/1, P1-009 repro) |
test_midrekey_task_results |
6/0 (was 4/2, P1-009 repro) |
test_midrekey_task_submit |
5/0 (was 3/2, P1-009 repro) |
test_peer_restarted_send_file |
4/0 (was 3/1) |
test_peer_restarted_send_message |
5/0 (was 3/2) |
test_sender_clean_restart_midflight |
6/0 |
test_chaos_packet_loss |
9/0 |
test_force_relay_task |
4/0 |
Full integration suite (btjsk44z8, -j 8) |
225/232 — best of cycle |
Other (pre-RC2 but post-RC1) work folded in
b4237e3+71e5f56— 30 open-data network blueprints (academic,
geo, health, news, finance, etc.). Already deployed to production
registry as IDs 44–73. Open-join, default-allow, full inter-agent
communication.e7d9efb— untracked runtime artifacts that should never have been
committed (test logs, results); added matching.gitignorerules.aa0cb6a,08b3a34,c843c4a,22092c3— website / blog content,
no protocol impact.
Backwards compatibility
Wire-compatible with v1.8.0 and v1.9.0-rc1 daemons. No new packet
types, no new fields, no version negotiation. Mixed deployments are
safe. The retransmit loop and salvage replay are visible to peers as
ordinary key_exchange / encrypted frames respectively.
Known limitations (unchanged from RC1)
test_cli— env-gated (hits productionagent-alpha).test_register_identity_new_endpoint— architectural (container
PID-1 dies onpkill). Needs Dockerfile init wrapper.- Vouching-chain transitive walk unimplemented.
- Trust links global, not per-network (design choice).
DefaultVerdictpolicy field fails OPEN on pre-v1.9 daemons —
upgrade daemons before pushing configs that rely on it.
Upgrade
No coordinated rollout needed. Update daemons in any order:
rendezvous → agents is the recommended sequence purely for the
DefaultVerdict ordering risk. Old daemons keep working unchanged.
Full commit log
d4d5f42 Preserve nonce/replay state across duplicate key_exchange
c843c4a Add blog post: AI agent discovery: master P2P networks in 2026
5de736f Cap desync salvage at 4 entries (was 32) to prevent replay storm
22092c3 Hero: decorative boids flock behind the headline
a126a26 P1-010 tunnel-state half: retransmit pending key exchanges
7f76e75 P1-010: salvage in-flight plaintext on peer-initiated rekey
08b3a34 Add blog post: Overlay networking: Secure AI agent communication explained
aa0cb6a Add blog post: Top 6 openanp.ai Alternatives 2026
71e5f56 Open-data networks: full inter-agent communication; SHIPPED roster
b4237e3 Add 30 open-data network blueprints
e7d9efb Untrack runtime artifacts that should never have been committed
git log v1.9.0-rc1..v1.9.0-rc2 --oneline (11 commits).
Full Changelog: v1.9.0-rc1...v1.9.0-rc2
v1.9.0-rc1
v1.9.0-rc1 — Release Notes
Scope: 75 commits since v1.8.0 (2026-04-20 → 2026-04-24).
Net diff vs v1.8.0: +37,800 / −12,800 lines, 550+ files touched.
Security
- SSRF hardening across the registry HTTP surface. Extracted a shared
pkg/urlvalidatevalidator and applied it to every URL accepted by the
registry: IDP config (handleSetIDPConfig), webhook target, snapshot
restore. Cloud-metadata hostnames now match case-insensitively — prior
check bypassed under any uppercase. - Snapshot restore validates URLs. A tainted registry snapshot can no
longer smuggle a malicious IDP/webhook URL into the server at startup. - Resource caps on tunnel state.
lastRekeyReq,relayPeers, and the
unauth crypto-map are now bounded. Prior to this, a spoofed flood of
rekey requests or relay-sender IDs could grow these maps without
bound. The crypto-map path also short-circuits before scalar-mult when
already at the cap, so CPU exhaustion isn't cheaper than memory. - Classify stale tunnel packets separately from nonce replay. Avoids
tripping replay-alert telemetry on benign after-rekey arrivals.
Tunnel & daemon runtime
- Rekey recovery for half-rekey replay-window desync. After certain
packet-loss patterns the sender's nonce window could drift ahead of the
receiver's; the receiver now requests rekey instead of silently
dropping traffic. - Rekey on encrypted-with-no-key. If an encrypted frame arrives before
a peer-key is in hand (e.g. after a daemon restart), the daemon now
emits a rekey request rather than dropping. - Prompt re-register when the registry rejects our identity. Previously
the daemon would stay un-registered until the next keepalive cycle. - Driver Conn.Write chunks payloads above the 1 MiB IPC cap. Large
HTTP request bodies over Pilot no longer fail withmessage too large. - IPC cap enforced correctly (P2-002). The prior sleep-and-gate
behaviour was silently defeated by the kernel's listen backlog
(SOMAXCONN ≈ 4096 on Docker), so ~4× the intended client cap could
still connect. The server now accepts then immediately closes excess
connections, and the hard cap holds exactly atMaxIPCClients = 1024.
Policy runtime
- Per-peer cycle scoring (
evaluatePerPeerCycle). Shipped policies
that tithe / anti-camp / burnout now tick per tracked peer rather than
once per network. - Cycle minimum lowered from 1 min to 1 s. Validator + runner both
accept 1 s; the compressed-24h policy test relies on this. - EventJoin deny honoured at bootstrap. A peer that should never
have joined is evicted from the local runner's view immediately on
boot, not just on reconcile tick. - Tag refresh on existing peers.
applyMembershipDiffnow re-applies
tags when a peer's registry record changes, not only when new peers
arrive. - Eviction cooldown (60 s). Stops the reconciler from re-adding a
peer the policy just evicted. - Beacon-relay-reachable marker on rekey arrival. Restarted peers
remain dialable through the beacon path without waiting for the next
relay-probe tick. - Runtime
pilotctl set-webhook/clear-webhook. The webhook URL
was previously only settable via a daemon startup flag. The event
broker now reads throughdaemon.webhookon every emit, so a runtime
change isn't masked by a cached nil pointer. pilotctl managed reconcileprimitive for explicit membership
refresh (IPCSubManagedReconcile0x07).
Task pipeline
- FIFO execution order. Task execute was alphabetical by UUID; it is
now strict FIFO, andCreatedAtgained nanosecond precision so
millisecond-tied submits no longer get reordered. - Submitter-side auto-cancel on accept timeout. The submitter now
proactively cancels a task that never gets accepted, instead of
leaving it stuck. - Inbox display ordering sorts by
timestamp + seq, not the
file-type prefix. - Message loss fix when inbox files arrive within the same
millisecond. - Trust revocation propagates to the remote peer. Previously only
the local side saw the revoke. pilotctl task resultsurfaces the delivered result payload.status_justificationis exposed intask listoutput so
workers/operators can see why a task was declined.
Registry
- Pass-through error strings use present-tense "requires" phrasing so
clients see actionable messages instead of "request failed". - Snapshot restore path is hardened against tainted URLs (see Security).
- All registry sources now carry
SPDX-License-Identifierheaders.
Gateway
- Listener bind failures surface at warn level so an operator notices
when port 80 / 443 isn't available, instead of silent-fail.
Dashboard / observability
- Shipped blueprints (34 configs) are round-tripped through the
provisioning wire in CI. - Dashboard HTTP surface is covered by an integration test (healthz,
stats, pulse, badges, metrics auth, snapshot POST gating, CORS).
Shipped network blueprints
34 new first-class policies under configs/networks/:
anti-camping, aristocracy, burnout, cold-shoulder,
cooling-off, data-exchange-policy, dunbar-150,
first-in-first-out, forgiveness, gift-economy, golden-hour,
gossip-tax, grudge-match, half-life, high-trust-society,
karma-ledger, last-in-first-out, lottery, meritocracy,
meritocracy-rating, mutual-admiration, old-guard, ostracism,
pay-it-forward, rotating-chairs, seniority, small-circle,
stable-state, sybil-gauntlet, tithe, trust-decay,
two-strikes, vouching-chain, whale-hunt.
Every blueprint validates at test time via a provisioning round-trip.
data-exchange-policy gained an allow-echo-connect rule so port-7
probes aren't refused by default-deny.
Integration test suite
- Parallel runner (
run-all.sh) with per-worker
COMPOSE_PROJECT_NAME+ RFC5737 / RFC2544 NAT lanes. Default-j 8.
HonoursPILOT_TEST_WAIT_MULTto scale stack-boot waits under load. - Shared helpers:
_lib.sh,topology_helpers.sh,
nat_test_common.sh,chaos_helpers.sh,policy_helpers.sh,
sec_helpers.sh.sweep_pilot_p2p_networkreclaims leaked docker
networks across sibling compose files. - Docker Compose overlays: chaos (tc netem), NAT variants
(full / restricted / address-restricted / symmetric / CGN / hairpin /
egress-443-only / multihomed / dual-symmetric / IPv6 /
rendezvous-natted), webhook sink, gateway, policy (+admin token),
3/5/10-agent rings, split-brain, star5 hub.
New tests (231 total in the local suite)
- Chaos:
test_chaos_packet_loss,test_chaos_loss10_all_ops,
test_chaos_loss30_all_ops,test_chaos_reorder_all_ops,
test_chaos_delay200_all_ops. - NAT (16 variants):
full_cone,restricted_cone,
address_restricted,symmetric,dual_symmetric,cgn,
hairpin,egress_443_only,multihomed,ipv6_only,
udp_blocked,conntrack_timeout,stateful_firewall,
partition_post_reg,rendezvous_natted,asymmetric_routing,
plusplus_{bandwidth,latency,loss,mtu,reorder}perturbations. - Policy: connect/datagram/join
allow|deny|score|tag,
cycle_{evict,fill_trust,prune_trust,webhook},shipped_configs. - Webhook:
agent_registered,file_delivered,
message_received,polo_updated,pubsub_published,
task_submitted,task_completed,trust_changed,
tunnel_established,exactly_once_on_restart. - Security:
beacon_amplification,ipc_exhaustion,
malformed_frame,oversized_payload,pubsub_spam,rekey_flood,
replay_after_rekey,spoofed_node_id,sym_nat_spoof,
sybil_reputation,trust_grant_forgery. - Task pipeline:
task_sequential_burst,task_polo_gate,
task_message_chain,task_bidirectional_services,
task_progress_events,task_invalid_states,task_policy_decline,
task_result_integrity,task_description_integrity. - Resilience:
rendezvous_restart_midflight,
beacon_restart_midflight,sender_clean_restart_midflight,
sender_sigkill_midfile,receiver_sigkill_midfile,
receiver_sigkill_midtask,partition_heal,partition_midflight,
splitbrain_heal,splitbrain_divergence,ping_ghost_peer,
midrekey_{send_file,send_message,task_submit,task_results},
peer_restarted_{pubsub_sub,send_file,send_message}. - Duration:
dur_idle_{60s,10min},dur_steady_10min,
dur_periodic_60s,dur_shortcycle_policy_1m,
dur_steady_compressed_24h. All honourPILOT_DUR_COMPRESS=1
for the fast tier. - Fan-in / fan-out:
fanin_3agents_tasks,fanout_3agents_file,
fanout_3agents_pubsub,fanout_5agents_pubsub,star5_hub_fanout. - Observability:
obs_dashboard_polo_truth,obs_log_peer_rekeyed,
obs_metric_encrypt_ok,obs_tasklist_vs_disk,dashboard. - Gateway:
gateway_{file,lookup,ping,polo_read,pubsub_pub, pubsub_sub,register,rotate_key,task_result,task_submit,trust_grant, http_message}. - P2P runner:
test_p2p(in-container) covers 32 CLI surfaces.
Test harness fixes in this release
test_policy_cycle_webhook— replaced the PID-1 daemon-restart dance
withpilotctl set-webhookruntime CLI (pkill pilot-daemoninside
the container was killing PID 1 and tearing down the whole stack).test_sender_clean_restart_midflight— handle the submit-response
accepted=falseshape and known polo-gate / P1-010 post-restart
failure modes.test_chaos_packet_loss,test_force_relay_task,
test_force_relay_pubsub,test_chaos_delay200_all_ops— assertions
tolerate known P1-010 / polo-drift failure modes under heavy
loss / delay while still failing on regressions.test_dashboard— removed non-existent/api/badge/trustassertion.test_splitbrain_divergence,test_splitbrain_heal,
test_flash_crowd_10agents_register— sweep leaked sibling-compose
containers before boot; ...
v1.8.0
Full Changelog: v1.7.2...v1.8.0
v1.7.2
Bug Fixes
- Fix stale pubkey cache breaking tunnel reconnection — peer Ed25519 public keys were cached indefinitely in the tunnel manager. When a peer restarted with new keys, key exchange was permanently rejected. Now invalidates cache and re-fetches from registry on mismatch.
- Fix LAN peer detection on dual-stack sockets — tunnel bound to
[::](IPv6 wildcard) incorrectly rejected IPv4 LAN addresses as address family mismatch, forcing traffic through NAT/relay. Wildcard addresses are now treated as dual-stack. - Fix beacon TTL reaping during reconnection — beacon node entries were reaped too aggressively during reconnection windows.
Improvements
trust-auto-approveflag on daemon binary — previously only available viapilotctl daemon start, now exposed as a direct daemon CLI flag.- Per-network dashboard charts — registry dashboard now shows per-network time-series graphs for member activity.
- SDK publish workflows — Node and Python SDK packages now only publish on GitHub releases, not on every commit.
Full Changelog: v1.7.1...v1.7.2
What's Changed
- fix: remove command wrappers, add feedback service agent and refactor by @Alexgodoroja in #64
Full Changelog: v1.7.1...v1.7.2
v1.7.1
Full Changelog: v1.7.0...v1.7.1
v1.7.0
What's New
- Auto-updater sidecar (
pilot-updater): checks GitHub Releases on a configurable interval, downloads platform-specific archives, verifies SHA256 checksums, replaces binaries in-place, and restarts the daemon — fully automatic, no user interaction required - Version reporting: daemon reports its build-time version to the registry; visible in
pilotctl infoandpilotctl network members - Network sync: periodic reconciliation of network memberships, policies, and member tags from registry (5-minute interval with jitter)
- IPv6 fix: resolved long-standing end-to-end failure caused by address family mismatch between IPv6 tunnel sockets and IPv4 LAN addresses
- Security audit fixes: registry auth (H3), replication auth (H4), per-port accept (H12), P2P handshake signing (M12)
- Dashboard enhancements: node graph and trust edge visualization in stats API
- Release checksums:
checksums.txtincluded with SHA256 hashes for all archives
Binaries Included
daemon, pilotctl, gateway, registry, beacon, rendezvous, nameserver, updater
Updater Usage
pilot-updater -install-dir /usr/local/binChecks GitHub Releases every hour (configurable with -interval), downloads and applies updates automatically.
Full Changelog: v1.6.2...v1.7.0
What's Changed
- Implement feature agents by @Alexgodoroja in #62
- Feature/service agents by @Alexgodoroja in #63
Full Changelog: v1.6.2...v1.7.0
v1.6.2
Changes
- Network memberships in pilotctl info:
pilotctl infonow shows joined networks and their addresses in both human-readable and JSON output - Includes all v1.6.1 fixes: multi-network stream fix, policy runner bootstrap, admin-token CLI paths, data-exchange policy
Full Changelog: v1.6.1...v1.6.2
Full Changelog: v1.6.1...v1.6.2
v1.6.1
Changes
- Multi-network stream fix: SYN-ACK/RST now uses the correct network-specific source address, fixing stream connections on non-primary networks
- Policy runner bootstrap fix: peer tags are now always refreshed from registry on startup, not just for policies with cycle rules
- pilotctl admin-token paths: network join --node-id, member-tags set, and policy set can now operate directly against the registry with admin token (no local daemon required)
- Data-exchange network policy: service-node gated connectivity with text messaging (port 1000) open and file transfer (port 1001) restricted to service-tagged nodes
- Integration test: 6-subtest coverage for policy enforcement
- Website styling updates
Full Changelog: v1.6.0...v1.6.1
Full Changelog: v1.6.0...v1.6.1
v1.6.0
What's Changed
- fix: point website to correct email by @Alexgodoroja in #60
- fix: fix slack invite link by @Alexgodoroja in #61
Full Changelog: v1.5.1...v1.6.0
v1.5.1
What's New
Enterprise Control Plane
The enterprise subsystem is now production-ready with a comprehensive set of features for managing multi-tenant agent networks:
- RBAC — role-based access control with owner, admin, and member roles; promote, demote, kick, and transfer ownership
- Network Policies — per-network port allow-lists with deduplication and fractional port rejection
- Audit Trail — persistent ring-buffer audit log with enriched context (old/new values for mutations), audit export API, and survival across registry restarts
- Identity & SSO — built-in OIDC/JWT validation (RS256 + JWKS caching), external IDP webhook verification,
ValidateTokenclient method - Directory Sync — external directory integration with webhook-based verification
- Blueprints — blueprint persistence for repeatable network provisioning
- Enterprise CLI —
pilotctlcommands for provisioning, audit export, IDP configuration, and admin token bypass paths - Observability — per-network Prometheus metrics (networks, invites, RBAC, policy, keys), enterprise status gauges, webhook dead-letter queue
Security Hardening
- Fix TOCTOU race in invite handlers
- Fix timing attack in join token verification
- Fix enterprise data loss on replication failover
- Fix invite consumed before capacity check
- Block joining backbone network; validate
max_membersbounds - Backbone network protection for rename and enterprise operations
- Cap key expiry at 10 years; enforce key expiry on heartbeat
- Revoke outgoing invites on deregister, leave-network, and kick
- Block owner from leaving network; clean RBAC on enterprise disable
- Input validation: self-invite, description/ports limits, transfer zero-ID
- Node ID overflow guard
Registry Improvements
- Admin-token bypass for deregister (enables console node removal)
- Enriched
list_nodeswith polo_score, tags, and public flag - Audit logging for re-registration, stale node reaping, polo score operations, and enterprise flag changes
- Created timestamp on
list_networks; expose enterprise policy in network listings - Clean up enterprise state on deregister; clean up invites on delete/leave/kick
- Enriched audit context for stale reap, kick (includes role), tags, task_exec, key expiry, and policy changes
- Tag deduplication and expanded error passthrough
Website & Documentation
- Enterprise documentation section (RBAC, Identity & SSO, Policies, Audit & Compliance, Blueprints)
- Rewritten gateway docs, simplified getting-started guide
- TOC sidebar for documentation pages
- Solutions dropdown in navigation bar
- Dynamic sitemap generation from blog post data
- Blog auto-publish system with CI deploy webhook
- 12 new blog posts covering enterprise features, networking concepts, and protocol architecture
Testing
- 80+ new tests covering enterprise subsystems, security edge cases, and stress scenarios
- Concurrent enterprise operations stress test
- Enterprise state persistence test
- Admin token bypass path verification tests
- Hostname collision, validation, and policy deduplication tests
- Per-network admin token and replication token validation tests
Other
- Support release candidate installs via
PILOT_RC=1environment variable - Image optimization (ImgBot)
- Expanded CLI usage documentation with Networks and Enterprise Admin sections
- Fix corrupted UTF-8 characters in blog post descriptions
Install / Update
curl -fsSL https://pilotprotocol.network/install.sh | shFull Changelog
What's Changed
- docs: update, improve and simplify docs by @Alexgodoroja in #55
- [ImgBot] Optimize images by @imgbot[bot] in #56
- blog: add blog about scriptorium by @Alexgodoroja in #57
- fix: add contact us section by @Alexgodoroja in #59
Full Changelog: v1.4.1...v1.5.1