[Issue #1293] implement visibility checkpoint cache on worker node #1295

gengdy1545 · 2026-02-11T04:49:48Z

Pixels Visibility Checkpoint Mechanism

The Checkpoint mechanism in Pixels is a critical design for handling long-running transactions (LRTs) and optimizing Garbage Collection (GC) in the Retina service. It ensures that LRTs can maintain a consistent view of the data without preventing the system from reclaiming memory occupied by old visibility bitmaps.

1. Core Objectives

GC Blocking Prevention: Prevent LRTs from holding up the Global Safe Timestamp, which allows the system to prune old visibility versions from memory.
Scalability: Offload large visibility bitmaps to external storage (HDFS/S3) for long-running queries, reducing JVM heap pressure on Retina nodes.
Reliability: Support Retina node recovery by persisting the system state (GC Checkpoint).

2. Implementation Mechanism

The implementation is primarily located in RetinaResourceManager.java and interacts with Trino through PixelsOffloadDetector.java.

A. Lifecycle of a Long-Running Query Checkpoint

Detection (PixelsOffloadDetector.java):
A background thread in Trino monitors active transactions. If currentTime - startTime > threshold, it triggers an offload:

// 1. RPC to Retina to create checkpoint
this.retinaService.registerOffload(context.getTimestamp());
// 2. Notify Daemon side TransService
this.transService.markTransOffloaded(context.getTransId());

Persistence (RetinaResourceManager.java):
When registerOffload(timestamp) is called, Retina performs the following:
- Parallel Capture: It iterates through all RGVisibility objects in memory.
- Async Write: It uses a BlockingQueue and checkpointExecutor to write the bitmaps to a file in a producer-consumer pattern.
- Filename Convention: RetinaUtils generates a unique name: offload_<hostname>_<timestamp>.bin.
- Storage: The file is written to the path defined by pixels.retina.checkpoint.dir (typically on shared storage).
Routing & Loading (RetinaServerImpl.java & VisibilityCheckpointCache.java):
- Daemon Routing: When a Worker asks for visibility via queryVisibility, Retina checks offloadedCheckpoints. If a path exists, it returns the path instead of the bitmaps.
- Worker Loading: The Worker's PixelsReader receives the path and delegates to VisibilityCheckpointCache, which downloads and parses the .bin file into a local Caffeine cache.
Cleanup:
When the query commits or rolls back, Trino calls unregisterOffload(timestamp). Retina decrements a refCount and deletes the physical file once the count reaches zero.

B. System State Checkpoint (GC Checkpoint)

In addition to LRT offloading, Retina periodically runs GC:

Trigger: runGC() runs every retina.gc.interval seconds.
Mechanism: Before clearing old bitmaps from memory, it calls createCheckpoint(timestamp, CheckpointType.GC).
Recovery: On startup, recoverCheckpoints() scans the directory, finds the latest gc_*.bin file, and populates rgVisibilityMap, effectively restoring the system state to the last GC point.

3. Data Format (.bin File)

The checkpoint file is a flat binary format optimized for sequential reading:

Field	Type	Description
`totalRgs`	`int`	Number of Row Groups in this checkpoint
Repeated Block		For each Row Group:
`fileId`	`long`	Unique identifier for the data file
`rgId`	`int`	Row Group index within the file
`recordNum`	`int`	Total number of rows in the RG
`bitmapLen`	`int`	Length of the long array
`bitmap`	`long[]`	The actual visibility bits

4. Key Components Summary

Class	Function
`PixelsOffloadDetector`	Trino-side monitor that identifies and triggers offloads for LRTs.
`RetinaResourceManager`	The "Brain" - manages memory bitmaps, triggers async writes, and handles ref-counting.
`RGVisibility`	Stores the actual versioned bitmaps and performs the low-level GC/Deletion.
`VisibilityCheckpointCache`	Worker-side cache that prevents redundant IO for the same checkpoint file.
`RetinaUtils`	Utility for path/filename generation ensuring no conflicts in multi-node setups.

5. Distributed Consistency

Multi-Retina: Each node writes its own managed RGs to a file tagged with its hostname. A Worker query for a specific file is routed to the correct Retina node, which provides the correct node-specific checkpoint path.
Multi-Worker: Workers are stateless. They obtain the either memory data or a checkpoint path from the Retina nodes. Shared storage ensures all Workers can see the checkpoint files.

gengdy1545 added 15 commits February 10, 2026 00:04

feat: add cp cache

741360f

feat: cp read opt

7114aff

feat: update cp test stat

48ccb52

feat: update cp cache

160b454

feat: opt checkpoint

5a285c3

feat: opt cp

e42829d

fix: clean import

3cb49b3

feat: add cp test scripts

c6fa0e4

fix: decoupled thread pool

16244fb

fix: multi retina node

d76d70b

feat: use cp cache in worker

1c640a8

fix: checkpoint load test

62faaa1

feat: update cp cache test

dfa1ab0

fix: cp recover

6374cd7

fix: restore the modifications made for testing

2d150d8

gengdy1545 requested a review from bianhq February 11, 2026 04:49

gengdy1545 self-assigned this Feb 11, 2026

gengdy1545 added the enhancement New feature or request label Feb 11, 2026

gengdy1545 added this to the Real-time CRUD milestone Feb 11, 2026

gengdy1545 linked an issue Feb 11, 2026 that may be closed by this pull request

[pixels-daemon, common, core, retina] visibility checkpoint needs to have a read cache #1293

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue #1293] implement visibility checkpoint cache on worker node #1295

[Issue #1293] implement visibility checkpoint cache on worker node #1295

Uh oh!

gengdy1545 commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Issue #1293] implement visibility checkpoint cache on worker node #1295

Are you sure you want to change the base?

[Issue #1293] implement visibility checkpoint cache on worker node #1295

Uh oh!

Conversation

gengdy1545 commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pixels Visibility Checkpoint Mechanism

1. Core Objectives

2. Implementation Mechanism

A. Lifecycle of a Long-Running Query Checkpoint

B. System State Checkpoint (GC Checkpoint)

3. Data Format (.bin File)

4. Key Components Summary

5. Distributed Consistency

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gengdy1545 commented Feb 11, 2026 •

edited

Loading