Skip to content

smartcontract: add flex-algo topology accounts and link classification (RFC-18, PR 1/4)#3474

Draft
ben-malbeclabs wants to merge 55 commits intomainfrom
bc/rfc18-smartcontract
Draft

smartcontract: add flex-algo topology accounts and link classification (RFC-18, PR 1/4)#3474
ben-malbeclabs wants to merge 55 commits intomainfrom
bc/rfc18-smartcontract

Conversation

@ben-malbeclabs
Copy link
Copy Markdown
Contributor

@ben-malbeclabs ben-malbeclabs commented Apr 7, 2026

Summary of Changes

  • Adds TopologyInfo as a new onchain account type. DZF creates topologies with an auto-assigned IS-IS TE admin-group bit (1–62), a derived flex-algo number (128 + bit), and a constraint type (include-any or include-all). Topologies are enforced to be unique by name and limited to 62 total via an AdminGroupBits resource extension.
  • Adds link_topologies: Vec<Pubkey> (capped at 8) and link_flags: u8 (bit 0 = unicast-drained) to the Link account. Link activation now requires the UNICAST-DEFAULT topology to exist — the activate processor enforces this.
  • Adds include_topologies to the Tenant account, allowing tenants to opt into topology-filtered routing.
  • Adds full topology lifecycle CLI commands: doublezero link topology create/delete/clear/list/backfill.
  • Extends doublezero link update with --link-topology (assign a named topology, or default to clear) and --unicast-drained true/false.
  • Extends doublezero link list with --topology <name> filter (default = untagged links).
  • Extends doublezero tenant update with --include-topologies <name>.
  • Adds doublezero-admin migrate flex-algo [--dry-run] to tag existing links with UNICAST-DEFAULT and backfill Vpnv4 loopback node segments for pre-existing accounts.
  • Updates Rust, Go, Python, and TypeScript SDKs with TopologyInfo deserialization and the new link/tenant fields.

Stacked PRs: This is PR 1/4. #3475 (activator) and #3476 (controller) depend on this one. Merge this first. #3475 and #3476 can merge in either order; #3477 (e2e) requires all three.

RFC: rfcs/rfc18-link-classification-flex-algo.md

Diff Breakdown

Category Files Lines (+/-) Net
Core logic (program + Rust SDK) ~55 +2,100 / -120 +1,980
CLI + admin commands ~25 +1,400 / -55 +1,345
Language SDKs (Go/Python/TS) + fixtures ~20 +430 / -15 +415
Tests ~25 +3,700 / -5 +3,695
Scaffolding ~10 +180 / -0 +180

Test-heavy: ~52% of additions are program integration tests. The topology test file alone is 2,410 lines.

Key files (click to expand)
  • smartcontract/programs/doublezero-serviceability/tests/topology_test.rs — 2,410-line integration test suite covering all topology processor paths
  • smartcontract/programs/doublezero-serviceability/src/processors/topology/create.rs — auto-assigns admin-group bit, derives flex-algo number, validates caps
  • smartcontract/programs/doublezero-serviceability/src/processors/link/update.rs — topology assignment and link_flags bitmask handling
  • smartcontract/programs/doublezero-serviceability/src/processors/link/activate.rs — enforces UNICAST-DEFAULT exists before activation
  • smartcontract/programs/doublezero-serviceability/src/state/topology.rsTopologyInfo account definition
  • smartcontract/cli/src/topology/ — create, delete, clear (with auto-discovery), list, backfill
  • smartcontract/cli/src/link/list.rs--topology filter; displays topology and unicast-drained in output
  • controlplane/doublezero-admin/src/cli/migrate.rsmigrate flex-algo with --dry-run support

Testing Verification

  • make rust-test passes: 675+ tests across workspace, program integration tests, and account compat checks
  • Topology lifecycle, update validation, and error cases fully covered by program integration tests
  • Link activation failure without UNICAST-DEFAULT covered in tests/link_wan_test.rs
  • Tenant include_topologies serialization verified against binary fixtures in Go, Python, and TypeScript SDKs

@ben-malbeclabs ben-malbeclabs self-assigned this Apr 7, 2026
@ben-malbeclabs ben-malbeclabs force-pushed the bc/rfc18-smartcontract branch from fa02ab9 to 25b32e7 Compare April 7, 2026 04:04
ben-malbeclabs added a commit that referenced this pull request Apr 7, 2026
- interface_test.rs: add admin_group_bits_pda to SetGlobalConfig accounts
  in test_interface_create_invalid_mtu_{non_cyoa,cyoa}; RFC-18 added
  AdminGroupBits as a required account for SetGlobalConfig
- sdk/go/serviceability/state.go: fix gofmt formatting violation
- sdk/rs/client.rs: fix payer deduplication when user_payer==payer;
  check `a.is_signer` before skipping payer insertion so the payer
  is added as a signer even when already present as a non-signer
- e2e/compatibility_test.go: add testnet envOverride for all interface
  and link steps covering v0.10.0–v0.16.x (testnet v0.10.0–v0.11.0
  predate the commands; testnet v0.16.0 was built before RFC-18
  InterfaceV2 change); create unicast-default topology in compat test
  setup so the activator can activate links (required by RFC-18)
ben-malbeclabs added a commit that referenced this pull request Apr 7, 2026
- interface_test.rs: add admin_group_bits_pda to SetGlobalConfig accounts
  in test_interface_create_invalid_mtu_{non_cyoa,cyoa}; RFC-18 added
  AdminGroupBits as a required account for SetGlobalConfig
- sdk/go/serviceability/state.go: fix gofmt formatting violation
- sdk/rs/client.rs: fix payer deduplication when user_payer==payer;
  check `a.is_signer` before skipping payer insertion so the payer
  is added as a signer even when already present as a non-signer
- e2e/compatibility_test.go: add testnet envOverride for all interface
  and link steps covering v0.10.0–v0.16.x (testnet v0.10.0–v0.11.0
  predate the commands; testnet v0.16.0 was built before RFC-18
  InterfaceV2 change); create unicast-default topology in compat test
  setup so the activator can activate links (required by RFC-18)
@ben-malbeclabs ben-malbeclabs force-pushed the bc/rfc18-smartcontract branch from d771fbc to d431ac6 Compare April 7, 2026 18:20
Copy link
Copy Markdown
Contributor

@nikw9944 nikw9944 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: RFC-18 Smartcontract — Flex-Algo Topology Accounts

Overall this is a well-structured, well-tested PR that cleanly introduces the topology lifecycle. The onchain processors have proper authorization checks, the CLI provides good UX with auto-discovery, and the test coverage is thorough. A few issues worth addressing:

Bug: Python SDK AccountType mismatch

sdk/serviceability/python/serviceability/state.pyTOPOLOGY = 16 but the Rust AccountType enum defines Topology = 17 (with Index = 16). This will cause Python SDK consumers to misidentify topology accounts as index accounts and vice versa.

The fixture JSON metadata (topology_info.json) also has "account_type": 16 which appears to be a copy from the Rust enum ordinal rather than the discriminant value. The binary fixture should be correct (generated from borsh::to_vec which uses the Rust discriminant of 17), but the JSON metadata is misleading.

Fix: Change TOPOLOGY = 16 to TOPOLOGY = 17 in the Python SDK, and update topology_info.json's account_type and AccountType.value from 16 to 17.

Bug: PR description says include-all but code has Exclude

The PR summary says "constraint type (include-any or include-all)" but the actual TopologyConstraint enum has IncludeAny = 0 and Exclude = 1. The CLI parser also accepts "include-any" or "exclude". Minor, but worth fixing the PR description to avoid confusion for reviewers of downstream PRs.

Observation: resolve_topology_names duplicated 4 times

The identical resolve_topology_names function is copy-pasted across link/get.rs, link/list.rs, tenant/get.rs, and tenant/list.rs. Consider extracting it to a shared utility (e.g., in smartcontract/cli/src/util.rs) to avoid drift. Not blocking, but worth a follow-up.

Observation: MTU validation removal

The PR removes the LINK_MTU check from process_create_link (link create processor) and process_update_link (link update processor). This is a separate behavioral change from topology support — links can now be created/updated with any MTU value. If this is intentional (not just an artifact of the topology work), it might be worth calling out explicitly in the PR description for reviewer awareness, since it changes the validation contract.

Observation: AdminGroupBits range vs PR description

PR description says "capped at 62 topologies" and "admin-group bit (1–62)" but the AdminGroupBits resource range is IdRange(1, 127), allowing up to 127 topologies. The code is more permissive than the description suggests — just a documentation nit.

Observation: topology_info.json missing trailing newline

Both topology_info.json and the updated link.json / tenant.json fixtures are missing a trailing newline (\ No newline at end of file). Minor, but most linters flag this.

Minor: Unused binding in topology/list.rs

let _ = get_topology_pda(&program_id, &t.name); // kept for symmetry; use key directly

This binding is explicitly unused (commented "kept for symmetry"). It should be removed — dead code with a comment explaining why it exists is worse than just not having it.

Design: Auto-tagging at activation

The process_activate_link processor unconditionally sets link.link_topologies = vec![*unicast_default_topology_account.key] — this overwrites any topology that might have been assigned between create and activate. In the current flow this is fine (links are created with empty topologies), but it's worth noting that any pre-activation topology assignment would be lost. A push + dedup approach would be more defensive, but given the lifecycle this is acceptable as-is.


Verdict: The AccountType mismatch in the Python SDK is a real bug that should be fixed before merge. The rest are minor observations. Solid work overall — the test coverage (52% of additions) and the stacked-PR approach are well done.

…location

Implements the TopologyCreate instruction (variant 104) for the
doublezero-serviceability program. The instruction creates a TopologyInfo PDA,
allocates the lowest available bit from the AdminGroupBits ResourceExtension
(skipping pre-reserved bit 1 / UNICAST-DRAINED), derives flex_algo_number =
128 + admin_group_bit, and optionally backfills Vpnv4 loopback interfaces on
Device accounts with a FlexAlgoNodeSegment entry.

Also adds stub TopologyDelete (105) and TopologyClear (106) instructions for
future implementation, and fixes missing flex_algo_node_segments field in CLI
test fixtures for InterfaceV3.
Wire AdminGroupBits resource creation into process_set_globalconfig,
following the same pattern as DeviceTunnelBlock, LinkIds, etc. The PDA
is created on first initialization and a migration path handles existing
deployments that predate RFC-18.

Remove the separate create_admin_group_bits test helper and replace all
call sites with a plain PDA derivation; update all SetGlobalConfig
account lists to include admin_group_bits_pda as the 9th account.
  compile errors in cli link commands and activator tests
- interface_test.rs: add admin_group_bits_pda to SetGlobalConfig accounts
  in test_interface_create_invalid_mtu_{non_cyoa,cyoa}; RFC-18 added
  AdminGroupBits as a required account for SetGlobalConfig
- sdk/go/serviceability/state.go: fix gofmt formatting violation
- sdk/rs/client.rs: fix payer deduplication when user_payer==payer;
  check `a.is_signer` before skipping payer insertion so the payer
  is added as a signer even when already present as a non-signer
- e2e/compatibility_test.go: add testnet envOverride for all interface
  and link steps covering v0.10.0–v0.16.x (testnet v0.10.0–v0.11.0
  predate the commands; testnet v0.16.0 was built before RFC-18
  InterfaceV2 change); create unicast-default topology in compat test
  setup so the activator can activate links (required by RFC-18)
…nd deserialize.go

- topology/create.rs: pass &Pubkey directly to validate_program_account! pda
  param instead of Some(&Pubkey)
- deserialize.go: remove duplicate DeserializeTenant function introduced by
  rebase conflict resolution
@ben-malbeclabs ben-malbeclabs force-pushed the bc/rfc18-smartcontract branch from b9b2b3b to 3c428f2 Compare April 7, 2026 20:20
… topology and wrong loopback MTU

- devnet/smartcontract_init.go: create unicast-default topology during devnet init so link
  activation succeeds (RFC-18 requires this PDA to exist)
- topology_test.rs: change Vpnv4 loopback interface MTU from 1500 to 9000 to satisfy the
  non-CYOA/DIA interface MTU validation
@ben-malbeclabs ben-malbeclabs force-pushed the bc/rfc18-smartcontract branch from 0914071 to d6881e1 Compare April 7, 2026 22:30
@ben-malbeclabs ben-malbeclabs force-pushed the bc/rfc18-smartcontract branch from ecda500 to a92da56 Compare April 7, 2026 22:35
- topology_test.rs: fix assign_link_topology helper to pass topology
  pubkeys as extra accounts in the UpdateLink transaction; the processor
  validates topology accounts as trailing accounts after system_program,
  but the helper was only including them in the instruction args
- e2e/compatibility_test.go: extend default known-incompatible range for
  interface-update and link steps from [0.12.0, 0.16.0) to [0.10.0, 0.16.0);
  RFC-18 added flex_algo_node_segments to InterfaceV2 which causes v0.10.0
  and v0.11.0 to fail deserialization ("Interface not found") when reading
  back accounts written by the new program (trailing bytes); interface
  create steps are unaffected (old CLIs don't read back existing interfaces)
- Add admin_group_bits_pda to SetGlobalConfig call in telemetry
  ServiceabilityProgramHelper::new(), which RFC-18 requires as the 10th account
- Create the unicast-default topology in ServiceabilityProgramHelper::new()
  and pass it to ActivateLink, which RFC-18 requires for auto-tagging
- Add unicast_default_topology_pubkey field to ServiceabilityProgramHelper
- Restore WAN link mtu values to 9000 in initialize_device_latency_samples_tests.rs
  (rebase artifact: incorrectly changed to 0/1500/4500)
- Set compute_max_units(1_000_000) on all inline ProgramTest setups that call
  SetGlobalConfig; RFC-18 creates an additional AdminGroupBits resource extension
  account which can exceed the default 200k CU limit under parallel test load
…S compat timeout

The testnet envOverride for device_interface_create* was incorrectly including
v0.10.0, v0.11.0, and v0.16.0. These CLI versions can successfully create
interfaces (the create instruction doesn't read back the account), so the step
was passing but our known-incompatible table said it should fail.

Testnet v0.16.0 can't deserialize the new InterfaceV2 format (built before
RFC-18), but that only affects operations that READ interface accounts. The
create step itself succeeds. The default range [0.12.0, 0.16.0) covers the MTU
issue (versions that send the wrong MTU value).

Also increase the TypeScript compat test setDefaultTimeout from 30s to 120s.
The getProgramData test fetches all mainnet accounts via RPC, which can take
60-90s during busy periods, causing the test to timeout.
@ben-malbeclabs ben-malbeclabs marked this pull request as draft April 8, 2026 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants