Skip to content

Refactor Run modules to use modern record extensions#830

Open
mwihoti wants to merge 4 commits intoIntersectMBO:mainfrom
mwihoti:feature/811-modern-record-extensions
Open

Refactor Run modules to use modern record extensions#830
mwihoti wants to merge 4 commits intoIntersectMBO:mainfrom
mwihoti:feature/811-modern-record-extensions

Conversation

@mwihoti
Copy link
Copy Markdown
Contributor

@mwihoti mwihoti commented Mar 25, 2026

Description

Following the successful pilot with Internal.Arena, this PR refactors the Run module and several related components to use the modern record language extensions.

changes

  • Record Modernization: Refactored the Run record fields to use shorter, prefix-free names (e.g., runIndex - index, runFilter - bloomFilter), which significantly improves the developer experience with dot notation.
  • Test Suite Cleanup: Extensively updated the StateMachine and Lookup test suites to use dot notation instead of traditional field accessors.

cabal build lsm-tree:lib:core -- Builds successfully
cabal test lsm-tree-test --pattern "Run" -- 129/129 tests pass
cabal test all -- Verified for compilation and linking

link
to the issue.

Checklist

  • Read our contribution guidelines at CONTRIBUTING.md, and make sure that this PR complies with the guidelines.

@mwihoti
Copy link
Copy Markdown
Contributor Author

mwihoti commented Mar 25, 2026

Hi @jorisdral , I just pushed an update for Run module with the new record language extensions, I'm moving next to Index

Copy link
Copy Markdown
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the contribution, I like the new field names for the Run and RunReader module 😄.

Two high-level comments:

  • In modules where we refactor field names (Run/RunReader), we should enable DuplicateRecordFields, NoFieldSelectors, and OverloadedRecordDot. In other modules where we only want to use the .field syntax, it should be fine to only enable OverloadedRecordDot, so let's change that so that we only use the .field syntax on values of type Run and RunReader. That probably makes the diff of this PR also smaller
  • There are some residual parentheses here and there. In particular, we don't need parentheses around syntax like value.field

Comment on lines +14 to +15
Run (Run, index, hasFS, hasBlockIO, dataCaching,
blobFile, bloomFilter, kOpsFile, fsPaths, numEntries, refCounter)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Run (Run, index, hasFS, hasBlockIO, dataCaching,
blobFile, bloomFilter, kOpsFile, fsPaths, numEntries, refCounter)
Run (Run, index, hasFS, hasBlockIO, dataCaching,
blobFile, bloomFilter, kOpsFile)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fsPaths, numEntries, and refCounter fields are not exported on purpose. We should keep them internal

Comment on lines +106 to +112
rnf r =
rnf r.numEntries `seq` rwhnf r.refCounter `seq` rnf r.fsPaths `seq`
rnf r.bloomFilter `seq` rnf r.index `seq` rnf r.kOpsFile `seq`
rnf r.blobFile `seq` rnf r.dataCaching `seq` rwhnf r.hasFS `seq` rwhnf r.hasBlockIO



Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was written like this before:

  rnf (Run a b c  d e f g h i j) = ...

I think the previous definition was better, because the explicit pattern match will result in compiler errors if we forget to use one of the fields (and forgetting to use one of the fields is bad for performance tests)


sizeInPages :: Ref (Run m h) -> NumPages
sizeInPages (DeRef run) = Index.sizeInPages (runIndex run)
sizeInPages (DeRef run) = Index.sizeInPages (run.index)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There a number of cases like this where there are residual parentheses even though they are not necessary anymore:

Suggested change
sizeInPages (DeRef run) = Index.sizeInPages (run.index)
sizeInPages (DeRef run) = Index.sizeInPages run.index

Let's make sure that we remove obsolete parentheses in the rest of PR as well

Comment on lines +237 to +248
(\refCounter -> Run {
numEntries = runNumEntries
, refCounter = refCounter
, fsPaths = runRunFsPaths
, bloomFilter = runFilter
, index = runIndex
, kOpsFile = runKOpsFile
, blobFile = runBlobFile
, dataCaching = runRunDataCaching
, hasFS = runHasFS
, hasBlockIO = runHasBlockIO
})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(\refCounter -> Run {
numEntries = runNumEntries
, refCounter = refCounter
, fsPaths = runRunFsPaths
, bloomFilter = runFilter
, index = runIndex
, kOpsFile = runKOpsFile
, blobFile = runBlobFile
, dataCaching = runRunDataCaching
, hasFS = runHasFS
, hasBlockIO = runHasBlockIO
})
(\refCounter -> Run {
numEntries = runNumEntries
, refCounter = refCounter
, fsPaths = runRunFsPaths
, bloomFilter = runFilter
, index = runIndex
, kOpsFile = runKOpsFile
, blobFile = runBlobFile
, dataCaching = runRunDataCaching
, hasFS = runHasFS
, hasBlockIO = runHasBlockIO
})

, dataCaching = runRunDataCaching
, hasFS = fs
, hasBlockIO = hbio

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +6 to +8
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE OverloadedRecordDot #-}

Comment on lines +1 to +6
{-# LANGUAGE CPP #-}
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE UnboxedTuples #-}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{-# LANGUAGE CPP #-}
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE UnboxedTuples #-}

Comment on lines +2 to +4
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}
Copy link
Copy Markdown
Collaborator

@jorisdral jorisdral Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}

Comment on lines +1 to +3
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE OverloadedRecordDot #-}

Comment on lines +2 to +4
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE NoFieldSelectors #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE OverloadedRecordDot #-}

@mwihoti
Copy link
Copy Markdown
Contributor Author

mwihoti commented Mar 26, 2026

Thanks, I will work on the changes

mwihoti added 2 commits April 3, 2026 09:00
  extension scope

  - Remove unnecessary parentheses around dot-access expressions (value.field)
  - Only enable DuplicateRecordFields/NoFieldSelectors in Run and RunReader
    where field names were actually refactored; other modules get
  OverloadedRecordDot only
@mwihoti mwihoti force-pushed the feature/811-modern-record-extensions branch from 343017b to 5823d35 Compare April 3, 2026 06:01
Snapshot.hs: use exported runFsPaths accessor instead of internal r.fsPaths
Unsafe.hs: restore parens around function applications
 Config/Override.hs: fix corrupted override instance for MergeBatchSize
Benchmarks: update to dot syntax with renamed fields (bloomFilter, index,
    kOpsFile) and add OverloadedRecordDot pragma to both benchmark files
@mwihoti
Copy link
Copy Markdown
Contributor Author

mwihoti commented Apr 3, 2026

Hi @jorisdral I've updated changes based on the feedback
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants