Spark: preload delete files to avoid deadlocks by kinolaev · Pull Request #15712 · apache/iceberg

kinolaev · 2026-03-21T15:49:05Z

In spark data file loading holds a connection while lazy delete file loading tries to acquire another. When number of simultaneous connections is limited (for example by http-client.apache.max-connections) it leads to a deadlock as soon as all connections are held by data files loading. To avoid the deadlock this PR adds delete file preloading to SparkDeleteFilter constructor.

The problem can be reproduced using spark-sql with S3FileIO and spark.sql.catalog.iceberg.http-client.apache.max-connections=1:

create table sparkdeletefilter(id bigint)
  tblproperties('write.delete.mode'='merge-on-read');
-- create a data file
insert into sparkdeletefilter select id from range(2);
-- create a delete file
delete from sparkdeletefilter where id in (select id from range(0, 2, 2));
-- Reader opens the data file first, keeps it open and
-- fails to load the delete file with ConnectionPoolTimeoutException
select count(id) from sparkdeletefilter;

Signed-off-by: Sergei Nikolaev <kinolaev@gmail.com>

Spark: preload delete files to avoid deadlocks

d8a0019

Signed-off-by: Sergei Nikolaev <kinolaev@gmail.com>

github-actions bot added the spark label Mar 21, 2026

kinolaev mentioned this pull request Mar 21, 2026

SparkExecutorCache causes slowness of RewriteDataFilesSparkAction #11648

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: preload delete files to avoid deadlocks#15712

Spark: preload delete files to avoid deadlocks#15712
kinolaev wants to merge 1 commit intoapache:mainfrom
kinolaev:preload-delete-files

kinolaev commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kinolaev commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant