For some use-cases it might be preferable to have partitions split into multiple ordered blocks.
The most important use-case is archiving (or even deletion) of old data. Right now, this would be achieved by creating a new partition (=stream), write a "consolidated event" (or snapshot) and then keep working on that new stream.
Another use-case is scanning a partition for the latest document without an index, e.g. during auto-repair (see #107). If the partition is stored in a single file, the whole partition needs to be scanned because only forward reading is supported by the file format. If the partition is chunked though, only the last chunk needs to be scanned, which is likely orders faster.
It would also potentially help in replication logic, since only a small file would need "hot" replication.
A simple solution would be to just append a starting partition offset to the filename (e.g. partition-foo.0, partition-foo.4096, etc.) and then start a new file whenever either a configured amount of documents or chunk size is reached. Reads can then easily find the correct file to read from by searching for the chunk that has filename-offset < position < filename-offset + filesize (- header) - the list of chunks could also be pre-scanned and indexed on opening the partition.
The drawback is that the file handle would need to be switched multiple times, so there is more file close/open involved, especially when scanning the whole partition (=projection rebuild).
This could be made fully b/c if a missing filename offset is interpreted as 0. Potentially old chunks could then even be merged together to reduce the amount of files per partition again, though an evergrowing partition (stream) is a bad pattern.
Also the chunking should be easily turned off with the configuration.
{
"chunkSize": 4096,
"chunkDocuments": 100
}
=> start a new chunk every 4kb or 100 documents, whatever happens first.
{
"chunkSize": 0,
"chunkDocuments": 0
}
=> disable chunking (=current behaviour)
For some use-cases it might be preferable to have partitions split into multiple ordered blocks.
The most important use-case is archiving (or even deletion) of old data. Right now, this would be achieved by creating a new partition (=stream), write a "consolidated event" (or snapshot) and then keep working on that new stream.
Another use-case is scanning a partition for the latest document without an index, e.g. during auto-repair (see #107). If the partition is stored in a single file, the whole partition needs to be scanned because only forward reading is supported by the file format. If the partition is chunked though, only the last chunk needs to be scanned, which is likely orders faster.
It would also potentially help in replication logic, since only a small file would need "hot" replication.
A simple solution would be to just append a starting partition offset to the filename (e.g.
partition-foo.0,partition-foo.4096, etc.) and then start a new file whenever either a configured amount of documents or chunk size is reached. Reads can then easily find the correct file to read from by searching for the chunk that has filename-offset < position < filename-offset + filesize (- header) - the list of chunks could also be pre-scanned and indexed on opening the partition.The drawback is that the file handle would need to be switched multiple times, so there is more file close/open involved, especially when scanning the whole partition (=projection rebuild).
This could be made fully b/c if a missing filename offset is interpreted as
0. Potentially old chunks could then even be merged together to reduce the amount of files per partition again, though an evergrowing partition (stream) is a bad pattern.Also the chunking should be easily turned off with the configuration.
=> start a new chunk every 4kb or 100 documents, whatever happens first.
=> disable chunking (=current behaviour)