[FEATURE] Recover WriteStatus from previously successful writer attempt to prevent duplicate files during Spark task/stage retries

## Describe the problem

When Spark tasks or stages are retried (due to failures, speculation, or task resubmission), write operations may be re-executed. This can cause:

1. **Duplicate data files** - Multiple copies of the same logical write are created with different file names
2. **Corrupted/partial Parquet files** - When speculative tasks are killed mid-write, they leave behind incomplete files (often just headers)
3. **Downstream job failures** - Subsequent read, ingestion, or compaction jobs fail when encountering corrupt or duplicate files

### Related issues
- #9615 - After enabling speculation execution, broken parquet files are generated
- #697 - Spark retry problem causing duplicate files
- #1764 - Commits stay INFLIGHT due to duplicate file cleanup failures

## Describe the solution

When DIRECT markers are configured:

1. **Store WriteStatus in completion markers** - After a file is successfully written, serialize and store the `WriteStatus` in the completed marker file (with checksum for integrity)

2. **Recover WriteStatus on retry** - When a task retry attempts to create an in-progress marker, check if a completed marker already exists. If it does:
   - Deserialize and recover the `WriteStatus` from the existing marker
   - Skip the write operation entirely
   - Reuse the original data file

3. **Support file splitting** - Handle scenarios where a single executor splits records into multiple files by recovering `WriteStatus` for all files with matching fileId prefix

### Key changes required
- `HoodieWriteHandle`: Add `hasRecoveredWriteStatus` flag and `recoveredWriteStatuses` list; modify `createInProgressMarkerFile` to return boolean and recover write statuses from existing completed markers
- `DirectWriteMarkers`: Add methods to write serialized data with checksum to completion markers and read it back
- Various write handles (`HoodieAppendHandle`, `HoodieCreateHandle`, `HoodieMergeHandle`): Store serialized `WriteStatus` in completed markers
- Table classes: Check for recovered write status and skip re-writing if already completed

### Benefits
- Prevents duplicate data files during task/stage retries
- Avoids corrupted partial files from killed speculative tasks
- Improves job reliability when speculation is enabled
- No data loss since the original successfully written file is preserved

## Additional context
This enhancement works with DIRECT markers only. Timeline-server-based markers do not support storing content in markers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Recover WriteStatus from previously successful writer attempt to prevent duplicate files during Spark task/stage retries #18049

Describe the problem

Related issues

Describe the solution

Key changes required

Benefits

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Recover WriteStatus from previously successful writer attempt to prevent duplicate files during Spark task/stage retries #18049

Description

Describe the problem

Related issues

Describe the solution

Key changes required

Benefits

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions