Skip to content

Try out CAP-0071 xdr with xdr ifdefs#5214

Open
sisuresh wants to merge 5 commits intostellar:masterfrom
sisuresh:CAP71-xdr
Open

Try out CAP-0071 xdr with xdr ifdefs#5214
sisuresh wants to merge 5 commits intostellar:masterfrom
sisuresh:CAP71-xdr

Conversation

@sisuresh
Copy link
Copy Markdown
Contributor

@sisuresh sisuresh commented Apr 9, 2026

Description

Resolves #5154

Related to https://github.com/stellar/pod-fsr/issues/29

List of changes

  • Single XDR source with feature ifdefs: Replace the curr/next branch model with a single main branch where next-protocol XDR changes are behind #ifdef guards (e.g., #ifdef CAP_0071)
  • Per-feature configure flags: --enable-cap-0071 enables a specific feature on both C++ (xdrc + preprocessor) and Rust (cargo feature on soroban-env-host → rs-stellar-xdr); --enable-next-protocol-version-unsafe-for-production enables all
    features
  • BUILDING_NEXT_PROTOCOL: Derived conditional that's true when any next-protocol feature is enabled; controls protocol version bump, soroban p27 inclusion, and replaces ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION in all C++ code
  • XDR hash check updated: Both C++ (hash-xdrs.sh) and Rust (rs-stellar-xdr xdrgen) strip #ifdef/#else/#endif blocks and remove whitespace before hashing, so hashes match regardless of ifdefs; skipped for p26 since the released crate uses raw
    hashes
  • Feature flag validation: At startup, C++ and Rust report their enabled XDR features and compare; catches misconfigurations where one side enables a feature the other doesn't

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@sisuresh sisuresh marked this pull request as ready for review April 13, 2026 21:07
Copilot AI review requested due to automatic review settings April 13, 2026 21:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Stellar Core’s XDR identity checking and build plumbing to support XDR files that contain feature-gated #ifdef blocks (CAP-0071 and a test fixture), and adds a new Soroban p27 “next” wiring path to trial the updated Rust XDR metadata.

Changes:

  • Introduces XDR feature-flag propagation across configure/make/C++/Rust and validates enabled XDR features match between C++ and Rust at startup.
  • Switches XDR hashing generation to canonicalize XDR inputs by stripping #ifdef blocks and whitespace before hashing.
  • Adds Soroban p27 wiring behind the Rust next feature and updates build conditionals to use BUILDING_NEXT_PROTOCOL.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/rust/src/soroban_proto_any.rs Refactors version/XDR metadata reporting to use per-protocol helper getters and adds reporting of XDR feature flags.
src/rust/src/soroban_proto_all.rs Enables soroban_curr selection of p27 under feature = "next" and adds p27 adaptor module + XDR metadata getters.
src/rust/src/soroban_module_cache.rs Adds a p27 module cache and correct protocol 27 dispatch under feature = "next".
src/rust/src/common.rs Updates Rust-side cross-crate XDR identity check to use the updated quorum-analyzer XDR symbol path.
src/rust/src/bridge.rs Extends SorobanVersionInfo with xdr_features for C++↔Rust feature parity checks.
src/rust/Cargo.toml Adds optional soroban-env-host-p27 dependency and includes it in the unified feature; bumps quorum-analyzer rev.
src/main/main.cpp Adds C++↔Rust XDR feature parity validation and changes when XDR identity checks run at startup.
src/main/Config.cpp Switches protocol bump gating from ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION to BUILDING_NEXT_PROTOCOL.
src/Makefile.am Plumbs XDR feature flags into xdrc invocation and Soroban cargo feature flags; updates next-protocol conditional usage.
hash-xdrs.sh Canonicalizes XDR hashing by stripping feature blocks and whitespace prior to hashing.
configure.ac Adds --enable-cap-0071 / --enable-test-feature and introduces derived BUILDING_NEXT_PROTOCOL conditional.
common.mk Adds -DBUILDING_NEXT_PROTOCOL, -DCAP_0071, -DTEST_FEATURE, and removes protocol-next include path usage.
.gitmodules Removes protocol-next submodule and adds p27 Soroban submodule; repoints protocol-curr branch to main.
Cargo.lock / src/rust/src/dep-trees/p27-expect.txt Lockfile and dependency-tree expectation updates for new Rust dependencies.

Comment thread src/rust/src/soroban_proto_any.rs Outdated
Comment thread src/main/main.cpp
Comment on lines +410 to +419
// The p26 rs-stellar-xdr crate uses raw file
// hashes, which can't match the ifdef-stripped hashes used here.
// The easiest thing to do was to just skip the check for p26. Which
// should be fine as the xdr on the rust side shouldn't change, and the
// xdr on the core should always be backwards compatible. This is
// temporary until we bump to p27.
if (Config::CURRENT_LEDGER_PROTOCOL_VERSION != 26)
{
checkXDRFileIdentity();
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping checkXDRFileIdentity() for protocol 26 disables both the Rust-side cross-crate XDR consistency check (check_xdr_version_identities) and the C++↔Rust file-hash comparison on the current stable protocol. This weakens a startup safety invariant and can let incompatible XDR definitions ship unnoticed. Prefer keeping the check enabled by making the Rust-side hashes comparable for p26 (eg. compute the same canonical/ifdef-stripped hash there too), or gate only the specific soroban hash comparison that is known to be incompatible (while still running the Rust-side cross-crate check).

Suggested change
// The p26 rs-stellar-xdr crate uses raw file
// hashes, which can't match the ifdef-stripped hashes used here.
// The easiest thing to do was to just skip the check for p26. Which
// should be fine as the xdr on the rust side shouldn't change, and the
// xdr on the core should always be backwards compatible. This is
// temporary until we bump to p27.
if (Config::CURRENT_LEDGER_PROTOCOL_VERSION != 26)
{
checkXDRFileIdentity();
}
// Always run the XDR identity check during startup so we preserve both
// the Rust-side cross-crate consistency validation and the C++↔Rust
// file identity check on the current stable protocol.
checkXDRFileIdentity();

Copilot uses AI. Check for mistakes.
Comment thread configure.ac
Comment thread src/main/Config.cpp
@sisuresh sisuresh requested a review from dmkozh April 13, 2026 21:20
@sisuresh sisuresh requested a review from jayz22 April 13, 2026 23:25
Comment thread src/main/Config.cpp
{
uint32 const Config::CURRENT_LEDGER_PROTOCOL_VERSION = 26
#ifdef ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION
#ifdef BUILDING_NEXT_PROTOCOL
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the purpose of this rename. It might be quite disruptive and I don't see what is the benefit that it brings.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUILDING_NEXT_PROTOCOL is set if either the ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION flag is set, or if any of the feature flags are enabled. I agree that this could be confusing for future development (someone could gate on the old flag when they should gate on the new flag). I'll think about ways to improve this, but it'll also depend on the answer to your other comment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the other discussion, could you please explain the difference between the two flags?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUILDING_NEXT_PROTOCOL just bumps CURRENT_LEDGER_PROTOCOL_VERSION, which is what ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION did prior to this PR. BUILDING_NEXT_PROTOCOL is set by other flags, not the user.

ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION now enables all protocol feature flags and bumps CURRENT_LEDGER_PROTOCOL_VERSION by enabling BUILDING_NEXT_PROTOCOL.

The individual flags like --enable-cap71 need to be able to bump the protocol version without enabling other XDR features, so they will also set BUILDING_NEXT_PROTOCOL.

This is confusing though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the reason for why I'm arguing for only adding the flags to disable features (when necessary).

ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION would bump the protocol version and also enable all the XDR features. Then on the off-chance we need to be able to disable the feature, we just remove it from the list of features enabled by ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION. That seems much simpler to me.

Comment thread configure.ac
AM_CONDITIONAL(ENABLE_NEXT_PROTOCOL_VERSION_UNSAFE_FOR_PRODUCTION,
[test x$enable_next_protocol_version_unsafe_for_production = xyes])

AC_ARG_ENABLE(cap-0071,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why do we need detailed per-feature flags, or at least why are they opt-in.
I thought that we still will have a single next protocol flag that would enable the features that have been implemented. We may add flags that disable the features on-demand in case if that helps with downstream integration, though I would imagine it shouldn't matter most of the time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to let downstream systems enable exactly what they have support for. What your talking about is somewhat disruptive in that a core update would require downstream to pass in a new flag because it doesn't have support for that feature.

Copy link
Copy Markdown
Contributor

@dmkozh dmkozh Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downstream systems should generally not create custom core builds and I don't think they should normally pick the features from core - it's up to us to provide a proper build.

I thought the idea is that they can set the feature flags in XDR when necessary in order to avoid supporting stuff like new enum variants etc. while trying to implement some other feature. CAP-71 is actually a good example: building CAP-71 transactions requires rather involved downstream integration, but they can still use a Core build that accepts CAP-71 XDR even if they have none of the support. Since XDR is generally backwards-compatible, I think we should rarely need intermediate builds that disable features, which is why opt-out seems more reasonable than opt-in to me. The main way to break downstream with XDR is likely meta, but it probably could be guarded by a runtime config flag anyways.

Basically I'd like to keep the build configuration as simple as possible. There is seemingly no need to guard CAP-71 on the core side (and of course there shouldn't be a need to guard 'test feature'), so we shouldn't make the configs more complex because of these.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the goals of this was to be able to release a subset of the protocol changes whenever we want. Lets say for some reason CAP-71 isn't implemented in core in time, or downstream hasn't adapted in time. We don't want to have to go in and remove or opt out of CAP-71 everywhere. Maybe we need to rethink this priority, but that's the point of this change. CAP-71 isn't as complex, but a more complex CAP that takes a while to implement might not be ready by our self-imposed deadline, which will then need to get pushed. Also re: config-complexity - these flags will be removed once a change is included into a protocol.

You do make an interesting point about changes that downstream should be able to just pass through like CAP-71 though. Maybe in this specific case the ifdefs make less sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the goals of this was to be able to release a subset of the protocol changes whenever we want.

That's a pretty general goal that doesn't dictate the implementation. I don't think that encoding releases in build flags is a good idea (do you imagine making a release by just specifying the flags e.g. in Jenkins? that would be really brittle and error-prone). I think that should be driven by checked-in artifacts. Build flags should only be necessary for one-off/integration builds etc.

We don't want to have to go in and remove or opt out of CAP-71 everywhere

Well, that's what we can use a disable flag for. Though again, I would argue a better way would be to merge a respective change that explicitly disables it (e.g. in Config.h or wherever we find suitable). Basically we should have a single canonical definition that says 'p27 is features X,Y,Z'. Then if we do want to disable feature X, we change the definition to 'p27 is features Y,Z'. The question when do we promote feature to be canonical is debatable, but probably it's better to do sooner than later (to avoid integration issues), and then disable if necessary. Also, keep in mind that disabling a feature is supposed to be simple, but I'm sure that there may be more complex situations where disabling it would lead to test failures, which would require a test PR anyways. But I think that's good and still much better than running CI/tests on all the possible feature combinations, or ignoring the features until the very last moment.

You do make an interesting point about changes that downstream should be able to just pass through like CAP-71 though. Maybe in this specific case the ifdefs make less sense.

Yes, disabling the feature for downstream integration is very different from disabling it from the release. The former should be rarely necessary at the core level, and as I've mentioned before we might be better off with runtime flags anyways (like meta). But for the release configuration we need something more stable than a build config flag.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that encoding releases in build flags is a good idea (do you imagine making a release by just specifying the flags e.g. in Jenkins? that would be really brittle and error-prone)

To be clear. a release should not enable any of these ifdefs. We should remove the ifdefs and feature flags for the CAPs being released, which should happen when we decide to cut the protocol. These flags should be used by downstream teams for integration.

Well, that's what we can use a disable flag for. Though again, I would argue a better way would be to merge a respective change that explicitly disables it (e.g. in Config.h or wherever we find suitable).

Are you saying that a release build will contain these disable flags? That seems error prone, and I don't see how this is easier to automate integration tests with. A code change also isn't desirable because downstream will have to update and pick it up.

But I think that's good and still much better than running CI/tests on all the possible feature combinations, or ignoring the features until the very last moment.

Our CI will continue running for vnext, which will include all features. We won't be using these individual flags in CI.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These flags should be used by downstream teams for integration.

To reiterate my point, I'm not convinced downstream teams should do custom core builds, and also shouldn't normally need to customize this at all. We can use these ourselves of course, but again, I'd prefer to avoid this as much as possible until actually necessary (i.e. when we have features that we're concerned about).

Are you saying that a release build will contain these disable flags?

No, these are compile-time flags, aren't they? What I'm suggesting is to keep the single vnext config flag that we use for vnext builds, and add --disable-$feature flags for features that actually need it for on-demand builds (so that you have a non-prod vnext build with a feature disabled explicitly).

A code change also isn't desirable because downstream will have to update and pick it up.

I would argue a code change is desirable and necessary if we want to change the protocol contents. Obviously downstream will have to pick it up (because the protocol has been changed), but that's find and by design, isn't it?

Our CI will continue running for vnext, which will include all features. We won't be using these individual flags in CI.

I think we agree on this point, my main suggestion is to not introduce the flags for the features that don't need to be disabled (and also to switch from opt-in to opt-out, though this is debatable and not as important).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all of this is an experiment that we can rollback if it doesn't yield what we're looking for. While CAP-0071 doesn't need to be behind a flag for downstream integration, it'd be helpful to see how this flag will flow downstream. Also, disable flags work against what we've been discussing wrt FSR.

It'd also be good to get @Shaptic's input on this, and I'd prefer not to stall on what we're experimenting with here unless there's a good reason.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the point of experiment is to figure out what works/doesn't work. I find the proposed approach concerning. Yes, we can merge this, but I don't see what exactly does this achieve or prove, given that it doesn't seem like the build granularity is not necessary now and I will repeat myself yet again, I believe it's conceptually problematic.
I don't really want to block this, but on the other hand I'm not sure I understand the intended design fully (like 'disable flags work against what we've been discussing wrt FSR' point), so maybe this is worth some kind of a process doc and a team-wide discussion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I don't want to dismiss your feedback, but I'd like to get input from other before deviating from what we have here. We can wait for @Shaptic and @leighmcculloch to chime in before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update XDR check between core and the host to handle ifdefs

4 participants