Skip to content

[FLINK-39308][runtime] Skip empty file-merging operator state snapshots#27814

Open
infocusmodereal wants to merge 1 commit intoapache:masterfrom
infocusmodereal:fix/FLINK-39308/skip-empty-operator-state
Open

[FLINK-39308][runtime] Skip empty file-merging operator state snapshots#27814
infocusmodereal wants to merge 1 commit intoapache:masterfrom
infocusmodereal:fix/FLINK-39308/skip-empty-operator-state

Conversation

@infocusmodereal
Copy link

What is the purpose of the change

This pull request avoids materializing file-merging operator state handles when operator list state is registered but empty.

Today DefaultOperatorStateBackendSnapshotStrategy only takes the empty fast path when there are no registered operator states at all. If empty operator list states are registered, file-merging checkpoints can still create tiny segment-backed handles. During restore, OperatorStateRestoreOperation opens those handles even though they contain no operator-state partitions. On object stores this adds avoidable restore overhead.

Brief change log

  • Detect the case where there are no broadcast states and all registered operator list states are empty
  • Reuse the existing empty snapshot fast path for that case
  • Return EmptyFileMergingOperatorStreamStateHandle for file-merging checkpoints and SnapshotResult.empty() otherwise
  • Add tests for empty registered operator state snapshots and file-merging restore

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

This change added tests and can be verified as follows:

  • Added testSnapshotWithEmptyRegisteredOperatorState
  • Added testFileMergingSnapshotRestoreWithEmptyRegisteredUnionState
  • Ran JAVA_HOME=$(/usr/libexec/java_home -v 17) ./mvnw -pl flink-runtime -Dtest=OperatorStateBackendTest,OperatorStateRestoreOperationTest,SharedStateRegistryTest test -Djdk17 -Pjava17-target

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 23, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants