When a buffer gets committed early during buffer selection in io_ring_buffer_select (for non-pollable files), the amount of committed data is based on the requested length, but the length reported to the userspace is the actual length read.
Steps to reproduce:
- Have a small file (e.g. 12 bytes:
Hello World\n)
- Issue multiple reads at offset 0 with a requested length of 16 bytes
- The CQEs contain
res=12 and flags=BUFFER | BUF_MORE but the next data will be placed at offset 16
This is only a problem if an explicit length is requested (length in request != 0). Otherwise the remaining length of buffer the will be requested and the buffer is fully committed anyways.
Solutions:
Introduce a new CQE flag: indicating that the committed length of the buffer was the requested length and not the actual length of the operation. (This does not work, because of the second bug)
- Deny explicitly requested lengths for incremental buffers and force non-pollable files to always commit all of the remaining buffer (length in request
== 0).
- Allow explicitly requested lengths but commit the full remaining buffer anyways.
- Find a way to tell the userspace the offset (or address) within the buffer
The first solution might still be problematic if the following can happen:
Is it possible that two non-pollable operations commit a chunk of the same buffer (committed during buffer selection) but complete in a different order (e.g. the first disk io was slower).
Then the userspace would see the CQE for the second chunk first and calculate the wrong offset in the buffer.
This would be a problem in itself even without the actual length being shorter.
Edit: The out-of-order completion of read operations can actually happen, see reproducers below.
When a buffer gets committed early during buffer selection in
io_ring_buffer_select(for non-pollable files), the amount of committed data is based on the requested length, but the length reported to the userspace is the actual length read.Steps to reproduce:
Hello World\n)res=12andflags=BUFFER | BUF_MOREbut the next data will be placed at offset 16This is only a problem if an explicit length is requested (length in request
!= 0). Otherwise the remaining length of buffer the will be requested and the buffer is fully committed anyways.Solutions:
Introduce a new CQE flag: indicating that the committed length of the buffer was the requested length and not the actual length of the operation.(This does not work, because of the second bug)== 0).The first solution might still be problematic if the following can happen:
Is it possible that two non-pollable operations commit a chunk of the same buffer (committed during buffer selection) but complete in a different order (e.g. the first disk io was slower).
Then the userspace would see the CQE for the second chunk first and calculate the wrong offset in the buffer.
This would be a problem in itself even without the actual length being shorter.
Edit: The out-of-order completion of read operations can actually happen, see reproducers below.