Problem
When the cache watermark gap is detected (CommandLog was archived/truncated and the cached watermark ID exceeds the current CommandLog max ID), the history table watermark is reset to 0 but the cache table watermark is reset to @cl_max_id_cache (the current CommandLog max ID):
History table reset (lines ~720–728):
IF @cl_max_id > 0 AND @history_max_id > @cl_max_id
BEGIN
RAISERROR(N' WARNING: History watermark (%i) exceeds CommandLog max ID (%i). Resetting watermark...', 10, 1, ...);
SET @history_max_id = 0; /* #240: reset to 0, not @cl_max_id — NOT EXISTS dedup guard prevents duplicates */
END;
Cache table reset (lines ~800–808):
IF @cl_max_id_cache > 0 AND @cache_watermark > @cl_max_id_cache
BEGIN
RAISERROR(N' WARNING: Cache watermark (%i) exceeds CommandLog max ID (%i). Resetting...', 10, 1, ...);
SET @cache_watermark = @cl_max_id_cache; -- ← different sentinel value!
END;
The history reset uses 0 (full re-scan on next run), while the cache reset uses the current max CommandLog ID (skip everything currently in CommandLog). The comment on the history reset explains this: "#240: reset to 0, not @cl_max_id — NOT EXISTS dedup guard prevents duplicates."
But there is no such dedup guard on the cache table INSERT — the cache table's PRIMARY KEY on CommandLogID handles deduplication. So the cache reset to @cl_max_id_cache means: on the next run after an archive event, all newly-added rows after the archive are cached correctly, but the rows that existed in CommandLog at archive time are never re-cached.
This creates a race: if CommandLog was archived and immediately re-populated (common in rolling-window retention patterns where old rows are deleted and new rows are inserted in the same batch), the window between SET @cache_watermark = @cl_max_id_cache and the next run's cache INSERT will silently skip any rows with IDs ≤ the old max. These are exactly the rows from the active maintenance run that was in progress during the archive event.
Evidence
Cache watermark reset (lines ~800–808):
IF @cl_max_id_cache > 0 AND @cache_watermark > @cl_max_id_cache
BEGIN
RAISERROR(N' WARNING: Cache watermark (%i) exceeds CommandLog max ID (%i). Resetting — CommandLog may have been archived.', 10, 1,
@cache_watermark, @cl_max_id_cache) WITH NOWAIT;
SET @cache_watermark = @cl_max_id_cache; -- ← should be 0 for consistency with history reset
END;
Compare with history table reset (line ~724):
SET @history_max_id = 0; /* #240: reset to 0, not @cl_max_id — NOT EXISTS dedup guard prevents duplicates */
Proposed Fix
Reset both watermarks to 0 for consistency, and rely on the cache table's PRIMARY KEY to handle deduplication on re-insert:
-- Cache watermark reset: use 0 (not @cl_max_id_cache) so full re-scan occurs
-- PK_StatUpdateDiagCache on CommandLogID prevents duplicate inserts
SET @cache_watermark = 0;
Add a comment explaining the reasoning (mirroring the #240 comment on the history reset).
Impact
Medium. In environments using CommandLog retention policies that delete old rows and immediately insert new ones (rolling 30-day windows are common), the cache table will silently skip the newly-inserted rows from the active maintenance window. This produces a gap in the #stat_updates data that is invisible to the user — no error, no warning, just missing rows from recent runs. The diagnostic will appear to analyze less data than expected, potentially missing current-window failures entirely.
Problem
When the cache watermark gap is detected (CommandLog was archived/truncated and the cached watermark ID exceeds the current CommandLog max ID), the history table watermark is reset to 0 but the cache table watermark is reset to
@cl_max_id_cache(the current CommandLog max ID):History table reset (lines ~720–728):
Cache table reset (lines ~800–808):
The history reset uses 0 (full re-scan on next run), while the cache reset uses the current max CommandLog ID (skip everything currently in CommandLog). The comment on the history reset explains this: "#240: reset to 0, not @cl_max_id — NOT EXISTS dedup guard prevents duplicates."
But there is no such dedup guard on the cache table INSERT — the cache table's PRIMARY KEY on
CommandLogIDhandles deduplication. So the cache reset to@cl_max_id_cachemeans: on the next run after an archive event, all newly-added rows after the archive are cached correctly, but the rows that existed in CommandLog at archive time are never re-cached.This creates a race: if CommandLog was archived and immediately re-populated (common in rolling-window retention patterns where old rows are deleted and new rows are inserted in the same batch), the window between
SET @cache_watermark = @cl_max_id_cacheand the next run's cache INSERT will silently skip any rows with IDs ≤ the old max. These are exactly the rows from the active maintenance run that was in progress during the archive event.Evidence
Cache watermark reset (lines ~800–808):
Compare with history table reset (line ~724):
Proposed Fix
Reset both watermarks to 0 for consistency, and rely on the cache table's PRIMARY KEY to handle deduplication on re-insert:
Add a comment explaining the reasoning (mirroring the #240 comment on the history reset).
Impact
Medium. In environments using CommandLog retention policies that delete old rows and immediately insert new ones (rolling 30-day windows are common), the cache table will silently skip the newly-inserted rows from the active maintenance window. This produces a gap in the
#stat_updatesdata that is invisible to the user — no error, no warning, just missing rows from recent runs. The diagnostic will appear to analyze less data than expected, potentially missing current-window failures entirely.