Skip to content

replicator: move EC local/remote logic from policer#3854

Open
End-rey wants to merge 1 commit intomasterfrom
fix-shard-evacuation-for-ec
Open

replicator: move EC local/remote logic from policer#3854
End-rey wants to merge 1 commit intomasterfrom
fix-shard-evacuation-for-ec

Conversation

@End-rey
Copy link
Contributor

@End-rey End-rey commented Feb 27, 2026

Closes #3809.

@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 0% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 25.78%. Comparing base (6ac061a) to head (bee32e1).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
pkg/services/replicator/process.go 0.00% 21 Missing ⚠️
pkg/services/policer/ec.go 0.00% 19 Missing ⚠️
pkg/services/replicator/replicator.go 0.00% 3 Missing ⚠️
cmd/neofs-node/object.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3854      +/-   ##
==========================================
- Coverage   25.79%   25.78%   -0.02%     
==========================================
  Files         670      670              
  Lines       43139    43122      -17     
==========================================
- Hits        11128    11118      -10     
+ Misses      30994    30984      -10     
- Partials     1017     1020       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit otherwise.

type Network interface {
// IsLocalNodePublicKey checks whether given binary-encoded public key is
// assigned in the network map to a local storage node running [Replicator].
IsLocalNodePublicKey([]byte) bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I still have no idea why we're not passing this slice into services directly to perform any comparisons locally. It can't change for a node. But that's a bit different topic, every service has it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I just wanted to pass the key itself, but then I saw that it was taken from netmap, so I assumed that it could change.

func nodeKeyFromNetmap(c *cfg) []byte {
ni, ok := c.cfgNetmap.state.getNodeInfo()
if ok {
return ni.PublicKey()
}
return nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not support key hot swap, so it is always a slice for a node that does not restart

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be changed for all services, here we can follow whatever is done everywhere already.

}

for i := range iec.NodeSequenceForPart(partIdx, int(rule.DataPartNum)+int(rule.ParityPartNum), len(sortedNodes)) {
if p.network.IsLocalNodePublicKey(sortedNodes[i].PublicKey()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this check is practical at all. A node can't detect it's own missing part. Even if netmap changes the set of nodes is the same from check to recreate. We also have some part already (which is why we're here). So we should replicate this part (if needed) and check for other parts (on other nodes) and then replicate new parts to that remote nods (if needed).

One thing I can imagine netmap change + a lost part in which case we'd have a part, but a wrong one and we'd like to both push our part to another node and recreate this missing part for ourselves. How practical that is (given that other nodes will recreate/replicate it too) is an open question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this check then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@End-rey End-rey force-pushed the fix-shard-evacuation-for-ec branch from 35b1188 to 59bc938 Compare March 4, 2026 16:05

if p.localNodeKey.IsLocalNodePublicKey(task.nodes[i].PublicKey()) {
if task.obj == nil {
log.Error("cannot put object to local storage: object not provided in task")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would say this is a dev err, so either panic or do not Error, admin can do nothing about it

zap.Stringer("object", task.addr),
)

if p.localNodeKey.IsLocalNodePublicKey(task.nodes[i].PublicKey()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, how is it possible? we are the node that started replication process, and we are the node that should store it? i think i do not understand smth, i thought in evacuation process replication is only called when we already failed to put it locally

type Network interface {
// IsLocalNodePublicKey checks whether given binary-encoded public key is
// assigned in the network map to a local storage node running [Replicator].
IsLocalNodePublicKey([]byte) bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be changed for all services, here we can follow whatever is done everywhere already.

@End-rey End-rey force-pushed the fix-shard-evacuation-for-ec branch 2 times, most recently from 0f35f9d to bee32e1 Compare March 6, 2026 12:12
Always use replication API instead of `PutObject` for all replication paths,
including when the object is provided directly in the task. Local/remote node
decision logic is moved from policer's `recreateECPart` into `HandleTask`, so
`recreateECPart` now simply forms the EC part object and delegates everything to
the generic `HandleTask`. `PutObjectToNode` is dropped from replicator.

Closes #3809.

Signed-off-by: Andrey Butusov <andrey@nspcc.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shard evacuation fault handler doesn't work correctly for EC parts

3 participants