Skip to content

NIFI-15682: Add Bulk Replay feature for provenance events#11016

Open
Scrooge-McDucks wants to merge 1 commit intoapache:mainfrom
Scrooge-McDucks:NIFI-15682
Open

NIFI-15682: Add Bulk Replay feature for provenance events#11016
Scrooge-McDucks wants to merge 1 commit intoapache:mainfrom
Scrooge-McDucks:NIFI-15682

Conversation

@Scrooge-McDucks
Copy link
Contributor

@Scrooge-McDucks Scrooge-McDucks commented Mar 18, 2026

Summary

This PR adds Bulk Replay support for provenance events.

NiFi already allows replaying individual provenance events, but recovering from a broader issue can still be slow and repetitive. When a processor is misconfigured, a downstream dependency fails, or a large set of FlowFiles is processed incorrectly, users may need to replay many related events rather than one at a time. This change introduces a bulk replay workflow so users can search for provenance events for a processor, select the events they want, and submit them as a server-side replay job.

This is also my first frontend feature in NiFi, so I would especially welcome feedback on the UI approach and overall user experience.

Motivation

Replay is an important recovery mechanism in NiFi, but the current experience is centered on single-event replay. That works well for targeted recovery, but it becomes inefficient when operators need to recover from larger operational issues affecting many FlowFiles.

This feature is intended to make that recovery workflow more practical by:

  • allowing users to search and select many replayable provenance events for a processor
  • executing replay as a managed server-side job instead of a one-off UI action
  • giving users visibility into progress, failures, and job outcomes
  • handling clustered execution more clearly, including disconnected-node scenarios

What changed

  • Added a Bulk Replay action to the processor context menu
  • Added a processor-scoped provenance search and selection dialog for replay
  • Added server-side bulk replay job execution and tracking
  • Added a Bulk Replay Status page in the global menu
  • Added a job details view showing per-item replay status and failure reasons
  • Added support for clearing successful jobs, finished jobs, or all jobs from the status list
  • Added cluster-aware handling for disconnected nodes during replay
  • Added configuration properties for bulk replay behavior and limits
  • Added backend and frontend test coverage for the new functionality
  • Added user and administration documentation for the feature

User experience

Users can right-click a processor and select Bulk Replay to open a replay search dialog scoped to that processor. From there they can:

  • search provenance events for the processor
  • filter results by date range and other fields
  • select one or more events to replay
  • optionally provide a job name
  • submit the selection as a bulk replay job

Submitted jobs are visible from the Bulk Replay Status page, where users can:

  • see job-level status and counts
  • open a details view for a specific job
  • monitor replay progress
  • inspect failures and error messages
  • cancel active jobs
  • clear old jobs from the list

Permissions

Bulk Replay is permission-bound by provenance access. A user must have permission to query and view provenance events for the target component before they can discover eligible events and submit them for replay. This keeps bulk replay aligned with NiFi’s existing provenance security model.

Cluster behavior

Bulk replay jobs execute on the primary node.

If a replay item depends on content located on a disconnected node, the worker waits up to the configured timeout for that node to reconnect before marking the item as failed. This gives the cluster a chance to recover before replay is abandoned for affected items.

If the primary node is lost during execution, the job can be resumed by the newly elected primary node.

Configuration

This PR adds configuration for bulk replay limits and behavior, including:

  • maximum concurrent bulk replay jobs
  • maximum retained jobs
  • disconnected-node wait timeout

Design notes

This implementation keeps the replay workflow close to the processor and provenance experience users already know, while moving execution into a managed server-side job model. That provides better visibility, better operational control, and a more scalable workflow for replaying many events.

Bulk Replay jobs are currently stored in memory. This keeps the initial implementation simpler, but it also means job state and job history are lost after a full restart. A future enhancement could add a persistent job store so replay jobs and history survive restart and provide stronger operational durability.

Testing

Testing performed includes:

  • backend unit tests for execution, resource handling, job lifecycle, and configuration
  • frontend test coverage for status, dialog, reducer, and interaction flows
  • validation of the end-to-end replay flow from processor context menu through job submission
  • validation of status updates, details view, cancellation, and clear-job behavior
  • validation of disconnected-node handling and timeout-driven failure behavior

Feedback welcome

This is my first frontend feature in NiFi, so I would particularly welcome feedback on the UI flow, terminology, and any areas where the user experience could be improved.

Screenshots

Processor context menu

context-menu

Bulk Replay search and selection dialog

bulk-replay-search-window bulk-replay-search-window-node-disconnected

Bulk Replay status page

status

Bulk Replay job details

details-part-small details-part-success ### Bulk Replay job details while waiting for disconnected node Waiting For Node

Clear replay jobs dialog

clear-jobs

Summary

NIFI-156820

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000
  • Pull request contains commits signed with a registered key indicating Verified status

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using ./mvnw clean install -P contrib-check
    • JDK 21
    • JDK 25

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant