ext4: refactor partial block zero-out for iomap conversion#953
Closed
vfsci-bot[bot] wants to merge 13 commits intovfs.base.cifrom
Closed
ext4: refactor partial block zero-out for iomap conversion#953vfsci-bot[bot] wants to merge 13 commits intovfs.base.cifrom
vfsci-bot[bot] wants to merge 13 commits intovfs.base.cifrom
Conversation
Add a bool *did_zero output parameter to ext4_block_zero_page_range() and __ext4_block_zero_page_range(). The parameter reports whether a partial block was zeroed out, which is needed for the upcoming iomap buffered I/O conversion. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz>
Rename ext4_block_truncate_page() to ext4_block_zero_eof() and extend its signature to accept an explicit 'end' offset instead of calculating the block boundary. This helper function now can replace all cases requiring zeroing of the partial EOF block, including the append buffered write paths in ext4_*_write_end(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz>
Refactor __ext4_block_zero_page_range() by separating the block zeroing
operations for ordered data mode and journal data mode into two distinct
functions:
- ext4_block_do_zero_range(): handles non-journal data mode with
ordered data support
- ext4_block_journalled_zero_range(): handles journal data mode
Also extract a common helper, ext4_load_tail_bh(), to handle buffer head
and folio retrieval, along with the associated error handling. This
prepares for converting the partial block zero range to the iomap
infrastructure.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Rename ext4_block_zero_page_range() to ext4_block_zero_range() since the "page" naming is no longer appropriate for current context. Also change its signature to take an inode pointer instead of an address_space. This aligns with the caller ext4_block_zero_eof() and ext4_zero_partial_blocks(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz>
Remove the handle parameter from ext4_block_do_zero_range() and move the ordered data handling to ext4_block_zero_eof(). This is necessary for truncate up and append writes across a range extending beyond EOF. The ordered data must be committed before updating i_disksize to prevent exposing stale on-disk data from concurrent post-EOF mmap writes during previous folio writeback or in case of system crash during append writes. This is unnecessary for partial block hole punching because the entire punch operation does not provide atomicity guarantees and can already expose intermediate results in case of crash. Hole punching can only ever expose data that was there before the punch but missed zeroing during append / truncate could expose data that was not visible in the file before the operation. Since ordered data handling is no longer performed inside ext4_zero_partial_blocks(), ext4_punch_hole() no longer needs to attach jinode. This is prepared for the conversion to the iomap infrastructure, which does not use ordered data mode while zeroing post-EOF partial blocks. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz>
Only journal data mode requires an active journal handle when zeroing partial blocks. Stop passing handle_t *handle to ext4_zero_partial_blocks() and related functions, and make ext4_block_journalled_zero_range() start a handle independently. This change has no practical impact now because all callers invoke these functions within the context of an active handle. It prepares for moving ext4_block_zero_eof() out of an active handle in the next patch, which is a prerequisite for converting block zero range operations to iomap infrastructure. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz>
Change ext4_alloc_file_blocks() to accept offset and len in byte granularity instead of block granularity. This allows callers to pass byte offsets and lengths directly, and this prepares for moving the ext4_zero_partial_blocks() call from the while(len) loop for unaligned append writes, where it only needs to be invoked once before doing block allocation. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz>
Move ext4_block_zero_eof() and ext4_zero_partial_blocks() calls out of the active handle context, making them independent operations, and also add return value checks. This is safe because it still ensures data is updated before metadata for data=ordered mode and data=journal mode because we still zero data and ordering data before modifying the metadata. This change is required for iomap infrastructure conversion because the iomap buffered I/O path does not use the same journal infrastructure for partial block zeroing. The lock ordering of folio lock and starting transactions is "folio lock -> transaction start", which is opposite of the current path. Therefore, zeroing partial blocks cannot be performed under the active handle. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz>
In ext4_zero_range() and ext4_punch_hole(), when operating in SYNC mode and zeroing a partial block, only data=journal modes guarantee that the zeroed data is synchronously persisted after the operation completes. For data=ordered/writeback mode and non-journal modes, this guarantee is missing. Introduce a partial_zero parameter to explicitly trigger writeback for all scenarios where a partial block is zeroed, ensuring the zeroed data is durably persisted. Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
In the ext4 fallocate call chain, SYNC mode handling is inconsistent: some places check the inode state, while others check the open file descriptor state. Unify these checks by evaluating both conditions to ensure consistent behavior across all fallocate operations. Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
The ctime and mtime update is already handled by file_modified() in ext4_fallocate(), the caller of ext4_alloc_file_blocks(). So remove the redundant calls to inode_set_ctime_current() and inode_set_mtime_to_ts() in ext4_alloc_file_blocks(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
In ext4_alloc_file_blocks(), pagecache_isize_extended() is called under an active handle and may also hold folio lock if the block size is smaller than the folio size. This also breaks the "folio lock -> transaction start" lock ordering for the upcoming iomap buffered I/O path. Therefore, move pagecache_isize_extended() outside of an active handle. Additionally, it is unnecessary to update the file length during each iteration of the allocation loop. Instead, update the file length only to the position where the allocation is successful. Postpone updating the inode size until after the allocation loop completes or is interrupted due to an error. Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
In cases of appending write beyond EOF, ext4_zero_partial_blocks() is called within ext4_*_write_end() to zero out the partial block beyond EOF. This prevents exposing stale data that might be written through mmap. However, supporting only the regular buffered write path is insufficient. It is also necessary to support the DAX path as well as the upcoming iomap buffered write path. Therefore, move this operation to ext4_write_checks(). In addition, this may introduce a race window in which a post-EOF buffered write can race with an mmap write after the old EOF block has been zeroed. As a result, the data in this block written by the buffer-write and the data written by the mmap-write may be mixed. However, this is safe because users should not rely on the result of the race condition. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz>
Author
|
This PR is older than 14 days. Closing automatically. If the series is still relevant, a new version will create a new PR. Automated by ml2pr |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Series: https://patchwork.kernel.org/project/linux-fsdevel/list/?series=1073490
Submitter: Zhang Yi
Version: 4
Patches: 13/13
Message-ID:
<20260327102939.1095257-1-yi.zhang@huaweicloud.com>Base: vfs.base.ci
Lore: https://lore.kernel.org/linux-fsdevel/20260327102939.1095257-1-yi.zhang@huaweicloud.com
Automated by ml2pr