Skip to content

Fix last_by returning null row after deleting all table data (#16985)#17411

Open
PDGGK wants to merge 1 commit intoapache:masterfrom
PDGGK:fix/last-by-null-after-delete
Open

Fix last_by returning null row after deleting all table data (#16985)#17411
PDGGK wants to merge 1 commit intoapache:masterfrom
PDGGK:fix/last-by-null-after-delete

Conversation

@PDGGK
Copy link
Copy Markdown
Contributor

@PDGGK PDGGK commented Mar 31, 2026

Description

Fixes #16985

In table model, after deleting all data of a device, last_by queries still returned one row with null values instead of returning no rows:

-- After deleting all data:
select last_by(s0, time) from table where deviceId = 'd0'
-- Expected: 0 rows
-- Actual: 1 row with null

Root Cause

Three related issues:

  1. LastQueryAggTableScanOperator could still emit a result row for non-grouped last_by scans even when every LastByAccumulator had no initialized result, causing evaluateFinal() to produce a null row.

  2. FileLoaderUtils read aligned time-column modifications using the concrete measurement ID instead of AlignedFullPath.VECTOR_PLACEHOLDER, inconsistent with aligned deletion semantics used in other paths.

  3. ModificationUtils did not set modified = true when an aligned value chunk was fully deleted, allowing statistics-based shortcuts to use stale metadata.

Changes

File Change
LastQueryAggTableScanOperator.java Skip emitting result row when all aggregators are LastByAccumulator and none has initialized result. Applied to both normal aggregation and cache-hit paths.
FileLoaderUtils.java Use AlignedFullPath.VECTOR_PLACEHOLDER for time-column modification lookups
ModificationUtils.java Set modified = true when value chunk is fully removed
IoTDBDeletionTableIT.java Add regression tests for 2-arg and 3-arg last_by after delete

Semantic Note

This changes non-grouped last_by on empty results from returning one null row to returning zero rows, consistent with grouped-query behavior (which already returns zero rows for empty devices). Other aggregations like count/max/last are not affected by this change. Please confirm this is the intended semantics.

Verification

Integration tests pass locally:

mvn -P with-integration-tests,TableSimpleIT -pl integration-test -am \
  -DskipUTs -Dspotless.skip=true -Dcheckstyle.skip=true -Drat.skip=true \
  -Dit.test=IoTDBDeletionTableIT#testLastByWithTimeShouldReturnNoRowsAfterDeletingAllData+testThreeArgLastByShouldReturnNoRowsAfterDeletingAllData \
  verify

Result: 2 tests run, 0 failures, 0 errors.

…16985)

After deleting all data of a device in table model, queries like
`select last_by(s0, time)` could still return one row with null values
instead of returning no rows. Three related issues were fixed:

1. LastQueryAggTableScanOperator: skip emitting a result row when all
   aggregators are LastByAccumulator and none has an initialized result.
   Applied to both the normal aggregation path and the cache-hit path.

2. FileLoaderUtils: read time-column modifications using
   AlignedFullPath.VECTOR_PLACEHOLDER instead of the concrete measurement
   ID, matching aligned deletion semantics used in other paths.

3. ModificationUtils: set modified=true when an aligned value chunk is
   fully deleted, preventing statistics-based shortcuts from using stale
   metadata.

Signed-off-by: Zihan Dai <1436286758@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] 2.0.5版本当数据删除之后再使用last_by函数查询会有数据返回,且这条数据为null

1 participant