Skip to content

Conversation

@bakjos
Copy link
Contributor

@bakjos bakjos commented Feb 9, 2026

Purpose

Query performance for the data lake table is very slow compared to querying remote storage. Add per-task lazy caching of Iceberg Catalog and Table inside IcebergLakeSource so that createRecordReader reuses one loadTable for all lake splits in a Flink source task, eliminating O(splits) REST round-trips when using a REST catalog.

Before: N splits → N × (createCatalog + loadTable) → N REST calls per task.
After: N splits → 1 × (createCatalog + loadTable) on first split, then N-1 reuses → 1 REST loadTable per task. With TTL enabled, the cache is refreshed after the TTL period so externally changed table metadata is picked up.

Linked issue: close #2619

Brief change log

Tests

API and Format

Documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg s3 table tearing bugs

1 participant