Skip to content

docs: explain how node table scanning with masks works#227

Open
adsharma wants to merge 1 commit intomasterfrom
doc_semi_mask
Open

docs: explain how node table scanning with masks works#227
adsharma wants to merge 1 commit intomasterfrom
doc_semi_mask

Conversation

@adsharma
Copy link
Contributor

Related to: #223

A **node** represents an entity in the graph database. Each node has:
- A unique **node ID** (`nodeID_t`) consisting of:
- `tableID`: The table the node belongs to
- `offset`: The position of the node within its node group
Copy link
Contributor

@aheev aheev Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the position within node table i.e., global_offset

Nodes are stored in node tables, which are the primary storage unit for graph entities.

### 2. Node Groups
A **node group** is a physical storage unit that contains a contiguous range of nodes. Key characteristics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to mention horizontal partitioning or group of rows

- Uses compression for efficient storage

### 4. Data Chunk
A **data chunk** is the in-memory representation during query execution:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't specify representation of what


## Summary

### Local Tables and Semi Masks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should go into a separate doc, certainly not summary of this doc


```cpp
// From column.cpp - scanSegment function
if (!resultVector->state || resultVector->state->getSelVector().isUnfiltered()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SelVector.isUnfiltered is abused here(thanks to it misleading name). As per the usages in the codebase and deepwiki,

  • unFiltered means contiguous range of sel_t i.e., 0..selSize
  • filtered means non-contiguous sorted arr of sel_t ex: [1, 3, 5, 8...]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants