Skip to content

Spark: preload delete files to avoid deadlocks#15712

Open
kinolaev wants to merge 1 commit intoapache:mainfrom
kinolaev:preload-delete-files
Open

Spark: preload delete files to avoid deadlocks#15712
kinolaev wants to merge 1 commit intoapache:mainfrom
kinolaev:preload-delete-files

Conversation

@kinolaev
Copy link

In spark data file loading holds a connection while lazy delete file loading tries to acquire another. When number of simultaneous connections is limited (for example by http-client.apache.max-connections) it leads to a deadlock as soon as all connections are held by data files loading. To avoid the deadlock this PR adds delete file preloading to SparkDeleteFilter constructor.

The problem can be reproduced using spark-sql with S3FileIO and spark.sql.catalog.iceberg.http-client.apache.max-connections=1:

create table sparkdeletefilter(id bigint)
  tblproperties('write.delete.mode'='merge-on-read');
-- create a data file
insert into sparkdeletefilter select id from range(2);
-- create a delete file
delete from sparkdeletefilter where id in (select id from range(0, 2, 2));
-- Reader opens the data file first, keeps it open and
-- fails to load the delete file with ConnectionPoolTimeoutException
select count(id) from sparkdeletefilter;

Signed-off-by: Sergei Nikolaev <kinolaev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant