-
Notifications
You must be signed in to change notification settings - Fork 0
Feat: Usage library #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Database adapter - ClickHouse adapter
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughAdds a Utopia Usage subsystem with public API types: Adapter (abstract), SQL base adapter, Usage façade, Metric, Query, and concrete adapters Database and ClickHouse (multi-tenant, namespace, shared-tables, batch/counter logging, querying, aggregation, purge). Introduces extensive PHPUnit tests and a Usage test trait. Adds CI workflows (Tests, Linter, CodeQL), Dockerfile, docker-compose (mariadb, clickhouse, usage), phpunit.xml, composer.json updates, pint.json, CONTRIBUTING.md, CODE_OF_CONDUCT.md, LICENSE.md, and README.md documentation. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
shimonewman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
src/Usage/Adapter/ClickHouse.php
Outdated
| * | ||
| * @throws Exception | ||
| */ | ||
| public function log(string $metric, int $value, string $period = '1h', array $tags = []): bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use constants for the periods
| 'tags' => json_encode($tags), | ||
| ]; | ||
|
|
||
| if ($this->sharedTables) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we aware for the concept of shared vs not shared tables in this scope? We can completely abstract this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstract this where? for database library, Database already abstracts it, but for Clickhouse we don't have a separate library, either we create a clickhouse client that handles these stuff, that can be shared here and in audits as well, or we duplicate this logic both here in audit library in clickhouse adapter
| $tableName = $this->getTableName(); | ||
| $escapedDatabaseAndTable = $this->escapeIdentifier($this->database) . '.' . $this->escapeIdentifier($tableName); | ||
|
|
||
| $sql = " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we protecting against SQL injections here?
src/Usage/Adapter/ClickHouse.php
Outdated
| foreach ($metrics as $metricData) { | ||
| $period = $metricData['period'] ?? '1h'; | ||
|
|
||
| if (! isset(self::PERIODS[$period])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add better validation here of the structure we expect. Might even introduce a Metric class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 737-750: The buildInsertColumns() method appends $attribute['$id']
which is currently typed as mixed; update the loop in buildInsertColumns() to
assert and handle the type (e.g., use is_string($attribute['$id']) and either
cast to string before appending or skip/throw on invalid types) so the returned
array contains only strings and matches the declared return type; reference the
getAttributes() iteration and the '$id' key in the attribute to locate where to
add the type check and handling.
- Around line 706-717: The loop uses $attribute['$id'] without guaranteeing it's
a string before calling getColumnType; cast and validate it first: retrieve the
raw id from getAttributes() into a local variable, assert/cast it to string
(e.g., (string)$rawId) and optionally handle missing/empty ids (skip or throw)
before calling getColumnType($attrId), and then use that typed $attrId for
building $attrKey, $queryParams and $valuePlaceholders so getColumnType always
receives a string.
- Around line 850-863: The CodeQL/static-analysis warnings come from untyped
array accesses in logBatchCounter (and similarly in logBatch); after
validateMetricsBatch has run, add explicit type assertions/casts when extracting
$metricData elements in logBatchCounter (cast $metric to string, $value to
int/float as appropriate, $tags to array, $period to string/int, and $tenant to
string|null when using resolveTenantFromMetric) before passing them into
buildInsertValuesForMetric; update the extraction lines in logBatchCounter (and
the analogous lines in logBatch) to perform these casts so static analysis
recognizes the narrowed types while leaving validation logic unchanged.
🧹 Nitpick comments (4)
tests/Usage/UsageBase.php (2)
302-330: Remove unused variables$aggCountand$counterCount.These variables are assigned but never used in assertions. Either remove them or add assertions to verify the count behavior matches expectations.
Proposed fix
$aggSum = $this->usage->sumByPeriod('agg-vs-counter', '1h'); - $aggCount = $this->usage->countByPeriod('agg-vs-counter', '1h'); // Clear for counter test $this->usage->purge(DateTime::now()); // Log same counter metric 3 times (last one wins) $this->assertTrue($this->usage->logCounter('counter-vs-agg', 10, '1h', [])); $this->assertTrue($this->usage->logCounter('counter-vs-agg', 20, '1h', [])); $this->assertTrue($this->usage->logCounter('counter-vs-agg', 30, '1h', [])); $counterSum = $this->usage->sumByPeriod('counter-vs-agg', '1h'); - $counterCount = $this->usage->countByPeriod('counter-vs-agg', '1h');
173-195: Assertions always pass - consider strengthening.Both
testLessThanEqualQueryandtestGreaterThanEqualQueryuseassertGreaterThanOrEqual(0, count($results))which always passes. If the intent is only to verify no exceptions are thrown, consider usingassertIsArray($results)instead for clarity.src/Usage/Adapter/Database.php (2)
93-140:$batchSizeparameter is unused - consider implementing or documenting.The
logBatchmethod ignores the$batchSizeparameter and processes all metrics in a single upsert operation. While this may be acceptable for the Database adapter, it differs from the ClickHouse adapter which respects the batch size for chunked inserts. Consider either:
- Implementing batch chunking for consistency
- Adding a comment explaining why batching isn't needed for this adapter
388-407: Consider using batch deletion for better performance.The
purgemethod deletes documents one-by-one in a loop. For large datasets, this could be slow. Consider using a batch deletion approach if the underlying database supports it.
src/Usage/Adapter/ClickHouse.php
Outdated
|
|
||
| // Add attributes dynamically - must include ALL attributes in schema order | ||
| foreach ($this->getAttributes() as $attribute) { | ||
| $attrId = $attribute['$id']; | ||
|
|
||
| $attrKey = $attrId . ($paramCounter > 0 ? '_' . $paramCounter : ''); | ||
| $type = $this->getColumnType($attrId); | ||
|
|
||
| // Use the value from map, or null if not present | ||
| $queryParams[$attrKey] = $attributeMap[$attrId] ?? null; | ||
| $valuePlaceholders[] = '{' . $attrKey . ':' . $type . '}'; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type assertion for $attrId before calling getColumnType.
Proposed fix
// Add attributes dynamically - must include ALL attributes in schema order
foreach ($this->getAttributes() as $attribute) {
- $attrId = $attribute['$id'];
+ $attrId = $attribute['$id'];
+ if (!is_string($attrId)) {
+ continue;
+ }
$attrKey = $attrId . ($paramCounter > 0 ? '_' . $paramCounter : '');
$type = $this->getColumnType($attrId);🧰 Tools
🪛 GitHub Actions: CodeQL
[error] 712-712: Parameter #1 $id of method Utopia\Usage\Adapter\ClickHouse::getColumnType() expects string, mixed given.
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 706 - 717, The loop uses
$attribute['$id'] without guaranteeing it's a string before calling
getColumnType; cast and validate it first: retrieve the raw id from
getAttributes() into a local variable, assert/cast it to string (e.g.,
(string)$rawId) and optionally handle missing/empty ids (skip or throw) before
calling getColumnType($attrId), and then use that typed $attrId for building
$attrKey, $queryParams and $valuePlaceholders so getColumnType always receives a
string.
src/Usage/Adapter/ClickHouse.php
Outdated
| private function buildInsertColumns(): array | ||
| { | ||
| $insertColumns = ['id']; | ||
|
|
||
| foreach ($this->getAttributes() as $attribute) { | ||
| $insertColumns[] = $attribute['$id']; | ||
| } | ||
|
|
||
| if ($this->sharedTables) { | ||
| $insertColumns[] = 'tenant'; | ||
| } | ||
|
|
||
| return $insertColumns; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type assertion for attribute ID to fix return type.
$attribute['$id'] is typed as mixed, causing the return type mismatch. Add a type check.
Proposed fix
private function buildInsertColumns(): array
{
$insertColumns = ['id'];
foreach ($this->getAttributes() as $attribute) {
- $insertColumns[] = $attribute['$id'];
+ $id = $attribute['$id'];
+ $insertColumns[] = is_string($id) ? $id : '';
}
if ($this->sharedTables) {
$insertColumns[] = 'tenant';
}
return $insertColumns;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private function buildInsertColumns(): array | |
| { | |
| $insertColumns = ['id']; | |
| foreach ($this->getAttributes() as $attribute) { | |
| $insertColumns[] = $attribute['$id']; | |
| } | |
| if ($this->sharedTables) { | |
| $insertColumns[] = 'tenant'; | |
| } | |
| return $insertColumns; | |
| } | |
| private function buildInsertColumns(): array | |
| { | |
| $insertColumns = ['id']; | |
| foreach ($this->getAttributes() as $attribute) { | |
| $id = $attribute['$id']; | |
| $insertColumns[] = is_string($id) ? $id : ''; | |
| } | |
| if ($this->sharedTables) { | |
| $insertColumns[] = 'tenant'; | |
| } | |
| return $insertColumns; | |
| } |
🧰 Tools
🪛 GitHub Actions: CodeQL
[error] 749-749: Method Utopia\Usage\Adapter\ClickHouse::buildInsertColumns() should return array but returns array<int, mixed>.
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 737 - 750, The
buildInsertColumns() method appends $attribute['$id'] which is currently typed
as mixed; update the loop in buildInsertColumns() to assert and handle the type
(e.g., use is_string($attribute['$id']) and either cast to string before
appending or skip/throw on invalid types) so the returned array contains only
strings and matches the declared return type; reference the getAttributes()
iteration and the '$id' key in the attribute to locate where to add the type
check and handling.
| foreach ($metricsBatch as $metricData) { | ||
| $period = $metricData['period'] ?? Usage::PERIOD_1H; | ||
| $metric = $metricData['metric']; | ||
| $value = $metricData['value']; | ||
| $tags = (array) ($metricData['tags'] ?? []); | ||
|
|
||
| // Build values for this metric | ||
| $tenant = $this->sharedTables ? $this->resolveTenantFromMetric($metricData) : null; | ||
| $result = $this->buildInsertValuesForMetric($metric, $value, $period, $tags, $tenant, $paramCounter); | ||
|
|
||
| $queryParams = array_merge($queryParams, $result['queryParams']); | ||
| $valueClauses[] = '(' . implode(', ', $result['valuePlaceholders']) . ')'; | ||
| $paramCounter++; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type assertions to satisfy static analysis after validation.
The batch processing in logBatchCounter extracts values from $metricData without type assertions, causing CodeQL errors. Since validateMetricsBatch already validates the types, add explicit casts to narrow the types for static analysis.
Proposed fix
foreach ($metricsBatch as $metricData) {
$period = $metricData['period'] ?? Usage::PERIOD_1H;
$metric = $metricData['metric'];
$value = $metricData['value'];
$tags = (array) ($metricData['tags'] ?? []);
+
+ // Type assertions (validated by validateMetricsBatch)
+ $metric = is_string($metric) ? $metric : '';
+ $value = is_int($value) ? $value : 0;
+ $period = is_string($period) ? $period : Usage::PERIOD_1H;
// Build values for this metric
$tenant = $this->sharedTables ? $this->resolveTenantFromMetric($metricData) : null;
$result = $this->buildInsertValuesForMetric($metric, $value, $period, $tags, $tenant, $paramCounter);Also applies to logBatch at lines 966-974.
🧰 Tools
🪛 GitHub Actions: CodeQL
[error] 858-858: Parameter #1 $metric of method Utopia\Usage\Adapter\ClickHouse::buildInsertValuesForMetric() expects string, mixed given.
[error] 858-858: Parameter #2 $value of method Utopia\Usage\Adapter\ClickHouse::buildInsertValuesForMetric() expects int, mixed given.
[error] 858-858: Parameter #3 $period of method Utopia\Usage\Adapter\ClickHouse::buildInsertValuesForMetric() expects string, mixed given.
[error] 858-858: Parameter #4 $tags of method Utopia\Usage\Adapter\ClickHouse::buildInsertValuesForMetric() expects array<string, mixed>, array given.
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 850 - 863, The
CodeQL/static-analysis warnings come from untyped array accesses in
logBatchCounter (and similarly in logBatch); after validateMetricsBatch has run,
add explicit type assertions/casts when extracting $metricData elements in
logBatchCounter (cast $metric to string, $value to int/float as appropriate,
$tags to array, $period to string/int, and $tenant to string|null when using
resolveTenantFromMetric) before passing them into buildInsertValuesForMetric;
update the extraction lines in logBatchCounter (and the analogous lines in
logBatch) to perform these casts so static analysis recognizes the narrowed
types while leaving validation logic unchanged.
…remove ClickHouseBatchTest
… Adapter and Database classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 683-688: When PERIOD_INF is used the code sets $time to null which
yields an empty string param and lets ClickHouse coerce to 1970 implicitly;
instead set a deterministic sentinel datetime for $timestamp when
Usage::PERIOD_INF is selected. Update the block around $now/$time/$timestamp
(used with Usage::PERIODS and formatDateTime) to assign a concrete sentinel
string (e.g. '1970-01-01 00:00:00.000') and pass that through
$this->formatDateTime so $timestamp is never null, ensuring partition key
behavior is explicit and deterministic.
src/Usage/Adapter/ClickHouse.php
Outdated
| $now = new \DateTime(); | ||
| $time = $period === Usage::PERIOD_INF | ||
| ? null | ||
| : $now->format(Usage::PERIODS[$period]); | ||
| $timestamp = $time !== null ? $this->formatDateTime($time) : null; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -type f -name "ClickHouse.php" 2>/dev/null | head -5Repository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# First, let's locate the file
find . -type f -name "ClickHouse.php" -path "*/Usage/Adapter/*" 2>/dev/nullRepository: utopia-php/usage
Length of output: 94
🏁 Script executed:
# Let's search from the repo root
git ls-files | grep -i clickhouseRepository: utopia-php/usage
Length of output: 131
🏁 Script executed:
# Read the relevant file section
head -n 700 src/Usage/Adapter/ClickHouse.php | tail -n 50Repository: utopia-php/usage
Length of output: 1626
🏁 Script executed:
# Get more context around lines 683-688
sed -n '670,700p' src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 978
🏁 Script executed:
# Search for table schema or CREATE TABLE statement
rg -i "CREATE TABLE|table_schema" src/Usage/Adapter/ClickHouse.php | head -20Repository: utopia-php/usage
Length of output: 202
🏁 Script executed:
# Search for the column mapping/attribute map where timestamp is used
sed -n '680,750p' src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 2342
🏁 Script executed:
# Search for how attributes/columns are mapped in the insert
rg -A 30 "attributeMap.*=" src/Usage/Adapter/ClickHouse.php | head -60Repository: utopia-php/usage
Length of output: 1246
🏁 Script executed:
# Find the complete CREATE TABLE statement
rg -B 5 -A 50 "CREATE TABLE IF NOT EXISTS" src/Usage/Adapter/ClickHouse.php | head -150Repository: utopia-php/usage
Length of output: 2819
🏁 Script executed:
# Also check if there are any migration files or schema definitions
find . -type f \( -name "*.php" -o -name "*.sql" \) | xargs grep -l "time.*DateTime" 2>/dev/null | grep -i schemaRepository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Search for column type definitions or schema initialization
rg "time.*DateTime|PARTITION BY" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 438
🏁 Script executed:
# Search for tests related to PERIOD_INF or this code path
rg -i "PERIOD_INF|period_inf" tests/ src/Repository: utopia-php/usage
Length of output: 263
🏁 Script executed:
# Check if there's error handling around NULL inserts
rg -B 3 -A 3 "try|catch|Exception.*time|Exception.*NULL" src/Usage/Adapter/ClickHouse.php | head -50Repository: utopia-php/usage
Length of output: 1853
🏁 Script executed:
# Search for how the query is actually executed with parameters
rg -B 5 -A 10 "query.*INSERT.*valuePlaceholders|execute.*INSERT" src/Usage/Adapter/ClickHouse.php | head -80Repository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Search for formatParamValue function
rg -A 20 "function formatParamValue|formatParamValue.*=" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 571
🏁 Script executed:
# Also check the full picture - how parameters are formatted in the HTTP request
rg -B 10 -A 15 "formatParamValue" src/Usage/Adapter/ClickHouse.php | head -80Repository: utopia-php/usage
Length of output: 3381
🏁 Script executed:
# Check the Usage.php file for PERIODS mapping
rg -B 3 -A 10 "PERIODS.*=" src/Usage/Usage.phpRepository: utopia-php/usage
Length of output: 394
🏁 Script executed:
# Search for getColumnType function
rg -A 30 "function getColumnType" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 1070
🏁 Script executed:
# Also check how 'time' is defined as an attribute
rg -B 5 -A 15 "getAttributes.*=|'time'.*DateTime" src/Usage/Adapter/ClickHouse.php | head -60Repository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Search for where 'time' attribute is defined or set
rg -B 5 -A 5 "'time'" src/Usage/Adapter/ClickHouse.php | grep -A 5 -B 5 "required\|type"Repository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Check the attributes array or getAttributes implementation
rg -B 10 -A 30 "function getAttributes|protected.*attributes" src/Usage/Adapter/ClickHouse.php | head -100Repository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Look for attribute initialization in parent class or base
fd "Adapter.php" src/ --type fRepository: utopia-php/usage
Length of output: 81
🏁 Script executed:
# Check the base Adapter class
cat -n src/Usage/Adapter.phpRepository: utopia-php/usage
Length of output: 3630
🏁 Script executed:
# Search for 'time' attribute definition specifically
rg -B 3 -A 3 "'time'" src/Usage/Adapter.phpRepository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Search for how attributes are defined - looking for 'metric', 'value', 'period', 'time', 'tags'
rg -B 5 -A 20 "metric.*value.*period|'metric'|'period'|'tags'" src/Usage/Adapter/ClickHouse.php | head -100Repository: utopia-php/usage
Length of output: 3660
🏁 Script executed:
# Look at how 'metric', 'value', etc. are mapped to columns
rg -B 3 -A 3 "getAttribute.*time|'time'.*required" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Search for where the schema attributes are hardcoded or predefined
rg -B 5 -A 20 "buildInsertColumns|columnDefs.*=" src/Usage/Adapter/ClickHouse.php | head -80Repository: utopia-php/usage
Length of output: 2698
🏁 Script executed:
# Search for getAttributes implementation
rg -A 30 "function getAttributes" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Check if 'time', 'metric', 'value', 'period', 'tags' are hardcoded or predefined
rg -B 10 "columns.*metric.*value|columns.*=.*\[\]" src/Usage/Adapter/ClickHouse.php | head -80Repository: utopia-php/usage
Length of output: 294
🏁 Script executed:
# Look for where columns array is built and check if time is always included
rg -B 20 "time.*DateTime64|columns\[\].*time" src/Usage/Adapter/ClickHouse.php | head -100Repository: utopia-php/usage
Length of output: 2718
🌐 Web query:
ClickHouse DateTime64 non-nullable column empty string parameter NULL behavior insert
💡 Result:
If you insert an empty string ('') into a non-nullable DateTime64 column, ClickHouse does not treat that as NULL. What happens depends on how you’re inserting (format/settings):
- CSV input: empty fields are treated as the column default value when
input_format_csv_empty_as_default=1(this is true by default)—so an empty field forDateTime64becomes its default (typically the “zero” datetime) rather thanNULL. [2] - JSONEachRow input: empty strings are only replaced with defaults if you enable
input_format_json_empty_as_default=1. Otherwise, the empty string is just a string and parsing can fail. [3] - Actual
NULLinto a non-nullable column throws an error unless you enableinput_format_null_as_default=1, in which caseNULLis replaced by the column’s default. [1][6]
How to handle “empty string means missing” safely
Option A (recommended): make the column nullable
ts Nullable(DateTime64(3))Then send NULL when you mean missing.
Option B: convert empty string to NULL, then choose a fallback
INSERT INTO t (ts)
VALUES (coalesce(toDateTime64OrNull(nullif({p:String}, ''), 3), toDateTime64('1970-01-01 00:00:00', 3)));toDateTime64OrNull returns NULL on invalid/empty input. [4]
Option C: convert empty/invalid directly to the default
INSERT INTO t (ts)
VALUES (toDateTime64OrDefault({p:String}, 3));toDateTime64OrDefault returns the provided default, or 1970-01-01 00:00:00.000 if you don’t provide one. [5]
Sources: [1] Format setting input_format_null_as_default [2] CSV input_format_csv_empty_as_default [3] JSON input_format_json_empty_as_default [4] toDateTime64OrNull [5] toDateTime64OrDefault [6] non-nullable NULL insert error
Set a deterministic timestamp for PERIOD_INF instead of relying on implicit ClickHouse coercion.
The time column is non-nullable DateTime64(3) and used as a partition key. When PERIOD_INF is used, $timestamp is set to null, which becomes an empty string when sent as a parameter. ClickHouse then silently coerces the empty string to its default datetime (1970-01-01 00:00:00), making the behavior implicit and fragile.
Use an explicit sentinel timestamp (e.g., '1970-01-01 00:00:00.000') or reconsider the column design:
Proposed fix (sentinel timestamp)
- $time = $period === Usage::PERIOD_INF
- ? null
- : $now->format(Usage::PERIODS[$period]);
- $timestamp = $time !== null ? $this->formatDateTime($time) : null;
+ $time = $period === Usage::PERIOD_INF
+ ? '1970-01-01 00:00:00.000'
+ : $now->format(Usage::PERIODS[$period]);
+ $timestamp = $this->formatDateTime($time);🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 683 - 688, When PERIOD_INF is
used the code sets $time to null which yields an empty string param and lets
ClickHouse coerce to 1970 implicitly; instead set a deterministic sentinel
datetime for $timestamp when Usage::PERIOD_INF is selected. Update the block
around $now/$time/$timestamp (used with Usage::PERIODS and formatDateTime) to
assign a concrete sentinel string (e.g. '1970-01-01 00:00:00.000') and pass that
through $this->formatDateTime so $timestamp is never null, ensuring partition
key behavior is explicit and deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/Database.php`:
- Around line 482-486: The setSharedTables method currently calls itself causing
infinite recursion; change the internal call from
$this->setSharedTables($sharedTables) to call the underlying database adapter
$this->db->setSharedTables($sharedTables) (ensure $this->db is the correct
property and initialized) and then return $this to preserve fluent interface in
the setSharedTables method.
🧹 Nitpick comments (3)
src/Usage/Adapter/Database.php (3)
93-93: Unused$batchSizeparameter.The
$batchSizeparameter is declared but never used. Either implement batching logic to process documents in chunks of$batchSize, or remove the parameter to avoid confusion.Option 1: Implement batching
public function logBatch(array $metrics, int $batchSize = 1000): bool { $this->db->getAuthorization()->skip(function () use ($metrics, $batchSize) { $documentsById = []; // ... existing aggregation logic ... - if (!empty($documents)) { - $this->db->upsertDocumentsWithIncrease($this->collection, 'value', $documents); - } + if (!empty($documents)) { + foreach (array_chunk($documents, $batchSize) as $batch) { + $this->db->upsertDocumentsWithIncrease($this->collection, 'value', $batch); + } + } });
188-188: Unused$batchSizeparameter (same aslogBatch).Apply the same batching fix as suggested for
logBatch(), or remove the parameter.
388-407: Purge implementation is functional but could be optimized.The current approach deletes documents one-by-one in a loop. While this works correctly, it may be slow for large datasets. Consider using a bulk delete operation if available in the Utopia\Database library.
| public function setSharedTables(bool $sharedTables): self | ||
| { | ||
| $this->setSharedTables($sharedTables); | ||
| return $this; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Infinite recursion in setSharedTables().
Line 484 calls $this->setSharedTables($sharedTables) instead of $this->db->setSharedTables($sharedTables), causing a stack overflow at runtime.
Required fix
public function setSharedTables(bool $sharedTables): self
{
- $this->setSharedTables($sharedTables);
+ $this->db->setSharedTables($sharedTables);
return $this;
}🤖 Prompt for AI Agents
In `@src/Usage/Adapter/Database.php` around lines 482 - 486, The setSharedTables
method currently calls itself causing infinite recursion; change the internal
call from $this->setSharedTables($sharedTables) to call the underlying database
adapter $this->db->setSharedTables($sharedTables) (ensure $this->db is the
correct property and initialized) and then return $this to preserve fluent
interface in the setSharedTables method.
… for improved efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 912-915: When Usage::PERIOD_INF is detected the code sets
$timestamp to null which relies on ClickHouse coercion into the non-nullable
DateTime64(3) time column; change the branch that sets $timestamp so it uses an
explicit sentinel timestamp (for example the epoch or another agreed sentinel)
formatted via the existing formatDateTime() helper instead of null so the
$timestamp variable written to the time column is deterministic (refer to
Usage::PERIOD_INF, $timestamp, formatDateTime(), and the time column
DateTime64(3)).
- Line 896: The prepareMetricRow function is declared to return ?array but
always returns an array; update its signature to return array (remove the
nullable) and adjust any PHPDoc or type hints that reference ?array to array;
specifically change the method declaration for prepareMetricRow(array
$metricData): ?array to prepareMetricRow(array $metricData): array and ensure
callers and static analyzers remain consistent with the non-nullable return
type.
🧹 Nitpick comments (2)
src/Usage/Adapter/ClickHouse.php (2)
1243-1256: Simplify value extraction —$singleValueonly assigned conditionally but used unconditionally.
Query::getValues()always returns an array, makingis_array($values)always true. However,$singleValueis only assigned inside theif (is_array($values))block but used unconditionally at line 1248. This creates fragile code structure.♻️ Proposed refactor
case Query::TYPE_LESSER_EQUAL: $this->validateAttributeName($attribute); $escapedAttr = $this->escapeIdentifier($attribute); $paramName = 'param_' . $paramCounter++; + $singleValue = $values[0] ?? null; if ($attribute === 'time') { - if (is_array($values)) { - /** `@var` \DateTime|string|null $singleValue */ - $singleValue = $values[0] ?? null; - } $filters[] = "{$escapedAttr} <= {{$paramName}:DateTime64(3)}"; $params[$paramName] = $this->formatDateTime($singleValue); } else { - if (is_array($values)) { - /** `@var` bool|float|int|string $singleValue */ - $singleValue = $values[0] ?? null; - } $filters[] = "{$escapedAttr} <= {{$paramName}:String}"; $params[$paramName] = $this->formatParamValue($singleValue); } break;
1264-1277: Same refactor needed forTYPE_GREATER_EQUAL.Same issue as
TYPE_LESSER_EQUAL—$singleValueis conditionally assigned but unconditionally used.♻️ Proposed refactor
case Query::TYPE_GREATER_EQUAL: $this->validateAttributeName($attribute); $escapedAttr = $this->escapeIdentifier($attribute); $paramName = 'param_' . $paramCounter++; + $singleValue = $values[0] ?? null; if ($attribute === 'time') { - if (is_array($values)) { - /** `@var` \DateTime|string|null $singleValue */ - $singleValue = $values[0] ?? null; - } $filters[] = "{$escapedAttr} >= {{$paramName}:DateTime64(3)}"; $params[$paramName] = $this->formatDateTime($singleValue); } else { - if (is_array($values)) { - /** `@var` bool|float|int|string $singleValue */ - $singleValue = $values[0] ?? null; - } $filters[] = "{$escapedAttr} >= {{$paramName}:String}"; $params[$paramName] = $this->formatParamValue($singleValue); } break;
| $time = $period === Usage::PERIOD_INF | ||
| ? null | ||
| : $now->format(Usage::PERIODS[$period]); | ||
| $timestamp = $time !== null ? $this->formatDateTime($time) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Null timestamp for PERIOD_INF relies on implicit ClickHouse coercion.
When period === Usage::PERIOD_INF, $timestamp becomes null (line 915), which is then assigned to the time column (line 929). Since time is a non-nullable DateTime64(3) column defined at line 489, ClickHouse will silently coerce the null/empty value to 1970-01-01 00:00:00, making the behavior implicit and fragile.
Consider using an explicit sentinel timestamp for deterministic behavior.
🐛 Proposed fix (explicit sentinel)
// Period-aligned time
$now = new \DateTime();
$time = $period === Usage::PERIOD_INF
- ? null
+ ? '1970-01-01 00:00:00.000'
: $now->format(Usage::PERIODS[$period]);
- $timestamp = $time !== null ? $this->formatDateTime($time) : null;
+ $timestamp = $this->formatDateTime($time);Also applies to: 929-929
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 912 - 915, When
Usage::PERIOD_INF is detected the code sets $timestamp to null which relies on
ClickHouse coercion into the non-nullable DateTime64(3) time column; change the
branch that sets $timestamp so it uses an explicit sentinel timestamp (for
example the epoch or another agreed sentinel) formatted via the existing
formatDateTime() helper instead of null so the $timestamp variable written to
the time column is deterministic (refer to Usage::PERIOD_INF, $timestamp,
formatDateTime(), and the time column DateTime64(3)).
…s, update parsing logic, and add comprehensive tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1078-1085: In the count() method after $result =
$this->query($sql, $params) and $json = json_decode($result, true), guard the
decoded value before indexing: verify that $json is an array, that
isset($json['data']) and is_array($json['data']), that isset($json['data'][0])
and is_array($json['data'][0]), and that isset($json['data'][0]['total']) before
casting and returning it; if any check fails return 0. Use the existing symbols
query(), $result, $json and the count() method to locate where to add these
checks.
- Around line 1559-1566: The JSON decode in sumByPeriod() is used without
guarding against failures or non-array shapes; update the code around $result =
$this->query($sql, $parsed['params']); and $json = json_decode($result, true) to
validate that $json is an array, contains a 'data' key that is an array, that
index 0 exists, and that 'grand_total' exists before indexing; if any check
fails return 0 (optionally also handle json_decode errors via json_last_error()
for clarity). Ensure you reference the existing query() call and the $json
variable so the checks are added right before accessing
$json['data'][0]['grand_total'].
- Around line 1330-1372: In parseResults(), guard against json_decode and mixed
types: verify $json is an array and that isset($json['data']) yields an array
before iterating (e.g., ensure $rows is an array), handle json_decode($result,
true) possibly returning null; for each $row ensure it is an array before
foreach; when handling 'tenant' cast only if is_numeric($value) or not null;
when handling 'time' ensure $value is a string before calling strpos/str_replace
and otherwise leave as null or format-safe fallback; when handling 'tags' check
is_string($value) and that json_decode($value, true) returns an array (use
is_array() fallback to empty array) and otherwise preserve arrays/objects; and
generally use is_array/is_string checks to satisfy PHPStan for mixed-offset and
type errors.
In `@tests/Usage/Adapter/ClickHouseTest.php`:
- Around line 319-326: Remove the unused local variables $now and $oldTime from
the test — delete the lines that assign $now = DateTime::now(); and $oldTime =
(new \DateTime())->modify('-2 hours'); and update or retain only the explanatory
comments around the calls to $this->usage->log('metric-A', ...),
$this->usage->log('metric-B', ...), and the comment about relying on creation
times; ensure no other code references $now or $oldTime so PHPMD warnings go
away.
| $now = DateTime::now(); | ||
| // metric A: value 10, time NOW | ||
| $this->usage->log('metric-A', 10, '1h', ['category' => 'cat1']); | ||
| // metric B: value 20, time NOW | ||
| $this->usage->log('metric-B', 20, '1h', ['category' => 'cat2']); | ||
| // metric C: value 30, time NOW - 2 hours | ||
| $oldTime = (new \DateTime())->modify('-2 hours'); | ||
| // We can't easily force time in log(), so we just rely on metrics created now being "newer" than this timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused locals flagged by PHPMD.
$now and $oldTime are unused; drop them to keep the test clean.
🛠️ Proposed fix
- $now = DateTime::now();
- // metric A: value 10, time NOW
+ // metric A: value 10, time NOW
$this->usage->log('metric-A', 10, '1h', ['category' => 'cat1']);
// metric B: value 20, time NOW
$this->usage->log('metric-B', 20, '1h', ['category' => 'cat2']);
- // metric C: value 30, time NOW - 2 hours
- $oldTime = (new \DateTime())->modify('-2 hours');
- // We can't easily force time in log(), so we just rely on metrics created now being "newer" than this timestamp
+ // We can't easily force time in log(), so we just rely on metrics created now being "newer" than this timestamp🧰 Tools
🪛 PHPMD (2.15.0)
[warning] 319-319: Avoid unused local variables such as '$now'. (undefined)
(UnusedLocalVariable)
[warning] 325-325: Avoid unused local variables such as '$oldTime'. (undefined)
(UnusedLocalVariable)
🤖 Prompt for AI Agents
In `@tests/Usage/Adapter/ClickHouseTest.php` around lines 319 - 326, Remove the
unused local variables $now and $oldTime from the test — delete the lines that
assign $now = DateTime::now(); and $oldTime = (new \DateTime())->modify('-2
hours'); and update or retain only the explanatory comments around the calls to
$this->usage->log('metric-A', ...), $this->usage->log('metric-B', ...), and the
comment about relying on creation times; ensure no other code references $now or
$oldTime so PHPMD warnings go away.
…nd OFFSET application; update test documentation
…se method for improved readability and maintainability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1264-1278: The branch in ClickHouse.php uses $singleValue after
only assigning it inside the inner is_array($values) check, which can leave it
undefined; update the TYPE_LESS_EQUAL handling (and mirror the same change for
TYPE_GREATER_EQUAL) to initialize $singleValue = null before the is_array check
so both the time branch (used by formatDateTime) and the non-time branch (used
by formatParamValue) always have a defined variable; locate the blocks
referencing $singleValue, $paramName, $escapedAttr, and $values and ensure
$singleValue is set to null (or a sensible default) prior to the conditional
assignment.
🧹 Nitpick comments (1)
src/Usage/Adapter/ClickHouse.php (1)
388-389: Optional: Consider URL-encoding the query string for defensive consistency.While backticks are valid in URL query strings and ClickHouse correctly handles them, the query string could be URL-encoded for defensive consistency with standard URL practices. However, the current implementation works correctly since table names are internally controlled and alphanumeric. If you want to add URL encoding for robustness:
- $url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow"; + $query = "INSERT INTO {$escapedTable} FORMAT JSONEachRow"; + $url = "{$scheme}://{$this->host}:{$this->port}/?query=" . urlencode($query);This is not required for correctness but improves alignment with standard URL encoding practices.
| if ($attribute === 'time') { | ||
| if (is_array($values)) { | ||
| /** @var \DateTime|string|null $singleValue */ | ||
| $singleValue = $values[0] ?? null; | ||
| } | ||
| $filters[] = "{$escapedAttr} <= {{$paramName}:DateTime64(3)}"; | ||
| $params[$paramName] = $this->formatDateTime($singleValue); | ||
| } else { | ||
| if (is_array($values)) { | ||
| /** @var bool|float|int|string $singleValue */ | ||
| $singleValue = $values[0] ?? null; | ||
| } | ||
| $filters[] = "{$escapedAttr} <= {{$paramName}:String}"; | ||
| $params[$paramName] = $this->formatParamValue($singleValue); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially undefined variable $singleValue.
$singleValue is assigned only inside the if (is_array($values)) block but used unconditionally afterwards. While Query::getValues() always returns an array, this code structure is fragile:
if ($attribute === 'time') {
if (is_array($values)) {
$singleValue = $values[0] ?? null; // Only assigned here
}
$filters[] = "...";
$params[$paramName] = $this->formatDateTime($singleValue); // Used unconditionallyProposed fix
case Query::TYPE_LESSER_EQUAL:
$this->validateAttributeName($attribute);
$escapedAttr = $this->escapeIdentifier($attribute);
$paramName = 'param_' . $paramCounter++;
+ $singleValue = $values[0] ?? null;
if ($attribute === 'time') {
- if (is_array($values)) {
- /** `@var` \DateTime|string|null $singleValue */
- $singleValue = $values[0] ?? null;
- }
$filters[] = "{$escapedAttr} <= {{$paramName}:DateTime64(3)}";
$params[$paramName] = $this->formatDateTime($singleValue);
} else {
- if (is_array($values)) {
- /** `@var` bool|float|int|string $singleValue */
- $singleValue = $values[0] ?? null;
- }
$filters[] = "{$escapedAttr} <= {{$paramName}:String}";
$params[$paramName] = $this->formatParamValue($singleValue);
}
break;Apply the same pattern to TYPE_GREATER_EQUAL (lines 1281-1300).
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 1264 - 1278, The branch in
ClickHouse.php uses $singleValue after only assigning it inside the inner
is_array($values) check, which can leave it undefined; update the
TYPE_LESS_EQUAL handling (and mirror the same change for TYPE_GREATER_EQUAL) to
initialize $singleValue = null before the is_array check so both the time branch
(used by formatDateTime) and the non-time branch (used by formatParamValue)
always have a defined variable; locate the blocks referencing $singleValue,
$paramName, $escapedAttr, and $values and ensure $singleValue is set to null (or
a sensible default) prior to the conditional assignment.
…od for cleaner code
…atus and server info
…TP request timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 970-1017: In validateMetricsBatch(), the tenant validation reads
the wrong array key ('$tenant') causing tenant checks to be skipped or wrong;
change the lookup to use the correct key 'tenant' (e.g., $tenantValue =
$metricData['tenant']), keep the surrounding existence check
(array_key_exists('tenant', $metricData)), and retain the existing validation
logic (allow non-negative integers or numeric strings, reject negatives and
other types) so that metric-level tenant overrides are validated correctly.
🧹 Nitpick comments (1)
src/Usage/Adapter/Database.php (1)
133-180:$batchSizeparameter is unused.The
logBatchmethod accepts$batchSizebut processes all metrics in a singleupsertDocumentsWithIncreasecall. Either implement batching or document that the Database adapter handles all documents at once regardless of batch size.Option 1: Document the behavior
/** * Log multiple metrics in batch * * `@param` array<array{metric: string, value: int, period?: string, tags?: array<string,mixed>}> $metrics - * `@param` int $batchSize Maximum number of metrics per INSERT statement + * `@param` int $batchSize Unused - Database adapter processes all metrics at once via upsert */ + #[\SuppressWarnings(PHPMD.UnusedFormalParameter)] public function logBatch(array $metrics, int $batchSize = 1000): bool
| private function validateMetricsBatch(array $metrics): void | ||
| { | ||
| foreach ($metrics as $index => $metricData) { | ||
| // Validate required fields exist | ||
| if (!isset($metricData['metric'])) { | ||
| throw new Exception("Metric #{$index}: 'metric' is required"); | ||
| } | ||
| if (!isset($metricData['value'])) { | ||
| throw new Exception("Metric #{$index}: 'value' is required"); | ||
| } | ||
|
|
||
| $metric = $metricData['metric']; | ||
| $value = $metricData['value']; | ||
| $period = $metricData['period'] ?? Usage::PERIOD_1H; | ||
|
|
||
| // Validate types | ||
| if (!is_string($metric)) { | ||
| throw new Exception("Metric #{$index}: 'metric' must be a string, got " . gettype($metric)); | ||
| } | ||
| if (!is_int($value)) { | ||
| throw new Exception("Metric #{$index}: 'value' must be an integer, got " . gettype($value)); | ||
| } | ||
| if (!is_string($period)) { | ||
| throw new Exception("Metric #{$index}: 'period' must be a string, got " . gettype($period)); | ||
| } | ||
|
|
||
| /** @var array<string, mixed> */ | ||
| $tags = $metricData['tags'] ?? []; | ||
| $this->validateMetricData($metric, $value, $period, $tags, $index); | ||
|
|
||
| // Validate tenant when provided (metric-level tenant overrides adapter tenant) | ||
| if (array_key_exists('tenant', $metricData)) { | ||
| $tenantValue = $metricData['$tenant']; | ||
|
|
||
| if ($tenantValue !== null) { | ||
| if (is_int($tenantValue)) { | ||
| if ($tenantValue < 0) { | ||
| throw new Exception("Metric #{$index}: 'tenant' cannot be negative"); | ||
| } | ||
| } elseif (is_string($tenantValue) && ctype_digit($tenantValue)) { | ||
| // ok numeric string | ||
| } else { | ||
| throw new Exception("Metric #{$index}: 'tenant' must be a non-negative integer, got " . gettype($tenantValue)); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thorough batch validation but tenant key mismatch needs fixing.
The validateMetricsBatch() method performs comprehensive validation of required fields, types, and constraints. However, the tenant key mismatch issue at lines 1001-1002 (noted above) affects this validation block.
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 970 - 1017, In
validateMetricsBatch(), the tenant validation reads the wrong array key
('$tenant') causing tenant checks to be skipped or wrong; change the lookup to
use the correct key 'tenant' (e.g., $tenantValue = $metricData['tenant']), keep
the surrounding existence check (array_key_exists('tenant', $metricData)), and
retain the existing validation logic (allow non-negative integers or numeric
strings, reject negatives and other types) so that metric-level tenant overrides
are validated correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1026-1041: The validation is reading the wrong array key: when
checking array_key_exists('tenant', $metricData) the code then uses
$metricData['$tenant'] (with a stray dollar sign) which yields null and skips
validation; update the access to use $metricData['tenant'] and keep the existing
null/int/string checks (including the ctype_digit branch) so that the tenant
validation in this block (referencing $metricData and $index) correctly enforces
non-negative integer values and throws the same Exception messages on invalid
input.
| // Validate tenant when provided (metric-level tenant overrides adapter tenant) | ||
| if (array_key_exists('tenant', $metricData)) { | ||
| $tenantValue = $metricData['$tenant']; | ||
|
|
||
| if ($tenantValue !== null) { | ||
| if (is_int($tenantValue)) { | ||
| if ($tenantValue < 0) { | ||
| throw new Exception("Metric #{$index}: 'tenant' cannot be negative"); | ||
| } | ||
| } elseif (is_string($tenantValue) && ctype_digit($tenantValue)) { | ||
| // ok numeric string | ||
| } else { | ||
| throw new Exception("Metric #{$index}: 'tenant' must be a non-negative integer, got " . gettype($tenantValue)); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tenant key mismatch causes validation to read wrong array key.
Line 1027 checks array_key_exists('tenant', $metricData) but line 1028 reads $metricData['$tenant'] (with dollar sign prefix). This means validation passes when the tenant key exists, but then reads from the non-existent $tenant key, returning null and bypassing all validation logic.
🐛 Proposed fix
// Validate tenant when provided (metric-level tenant overrides adapter tenant)
- if (array_key_exists('tenant', $metricData)) {
- $tenantValue = $metricData['$tenant'];
+ if (array_key_exists('$tenant', $metricData)) {
+ $tenantValue = $metricData['$tenant'];
if ($tenantValue !== null) {Or alternatively, if the key should be tenant without dollar sign:
if (array_key_exists('tenant', $metricData)) {
- $tenantValue = $metricData['$tenant'];
+ $tenantValue = $metricData['tenant'];🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 1026 - 1041, The validation is
reading the wrong array key: when checking array_key_exists('tenant',
$metricData) the code then uses $metricData['$tenant'] (with a stray dollar
sign) which yields null and skips validation; update the access to use
$metricData['tenant'] and keep the existing null/int/string checks (including
the ctype_digit branch) so that the tenant validation in this block (referencing
$metricData and $index) correctly enforces non-negative integer values and
throws the same Exception messages on invalid input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 782-883: The INSERT should enable ClickHouse deduplication instead
of disabling retries: modify the SQL in the ClickHouse Adapter where $sql and
$url are built (variables $sql = "INSERT INTO {$escapedTable} FORMAT
JSONEachRow" and $url =
".../?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow") to include a
SETTINGS clause (e.g. " SETTINGS non_replicated_deduplication_window=1") and
ensure the URL encodes that same SQL (or append the equivalent settings query
parameter) so retries remain enabled but server-side deduplication prevents
duplicate aggregated rows; update any logging (buildErrorMessage/logQuery) that
references $sql if needed to reflect the added SETTINGS.
src/Usage/Adapter/ClickHouse.php
Outdated
| $attempt = 0; | ||
| $lastException = null; | ||
|
|
||
| while ($attempt <= $this->maxRetries) { | ||
| $startTime = microtime(true); | ||
| $scheme = $this->secure ? 'https' : 'http'; | ||
| $escapedTable = $this->escapeIdentifier($table); | ||
| $url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow"; | ||
|
|
||
| // Update the database header | ||
| $this->client->addHeader('X-ClickHouse-Database', $this->database); | ||
| $this->client->addHeader('Content-Type', 'application/x-ndjson'); | ||
|
|
||
| // Enable keep-alive for connection pooling | ||
| if ($this->enableKeepAlive) { | ||
| $this->client->addHeader('Connection', 'keep-alive'); | ||
| } else { | ||
| $this->client->addHeader('Connection', 'close'); | ||
| } | ||
|
|
||
| // Enable compression if configured | ||
| if ($this->enableCompression) { | ||
| $this->client->addHeader('Accept-Encoding', 'gzip'); | ||
| } | ||
|
|
||
| // Track request count for statistics (only on first attempt) | ||
| if ($attempt === 0) { | ||
| $this->requestCount++; | ||
| } | ||
|
|
||
| // Join JSON strings with newlines | ||
| $body = implode("\n", $data); | ||
|
|
||
| $sql = "INSERT INTO {$escapedTable} FORMAT JSONEachRow"; | ||
| $params = ['rows' => count($data), 'bytes' => strlen($body)]; | ||
|
|
||
| try { | ||
| $response = $this->client->fetch( | ||
| url: $url, | ||
| method: Client::METHOD_POST, | ||
| body: $body | ||
| ); | ||
|
|
||
| $httpCode = $response->getStatusCode(); | ||
|
|
||
| if ($httpCode !== 200) { | ||
| $bodyStr = $response->getBody(); | ||
| $bodyStr = is_string($bodyStr) ? $bodyStr : ''; | ||
| $duration = microtime(true) - $startTime; | ||
| $rowCount = count($data); | ||
| $baseError = "ClickHouse insert failed with HTTP {$httpCode}: {$bodyStr}"; | ||
| $errorMsg = $this->buildErrorMessage($baseError, $table, "INSERT INTO {$table} ({$rowCount} rows)"); | ||
| $this->logQuery($sql, $params, $duration, false, $errorMsg, $attempt); | ||
|
|
||
| // Clean up Content-Type before retry | ||
| $this->client->removeHeader('Content-Type'); | ||
|
|
||
| // Check if error is retryable | ||
| if ($attempt < $this->maxRetries && $this->isRetryableError($httpCode, $baseError)) { | ||
| $attempt++; | ||
| $delay = $this->retryDelay * (2 ** ($attempt - 1)); // Exponential backoff | ||
| usleep($delay * 1000); // Convert ms to microseconds | ||
| continue; | ||
| } | ||
|
|
||
| throw new Exception($errorMsg); | ||
| } | ||
|
|
||
| $duration = microtime(true) - $startTime; | ||
| $this->logQuery($sql, $params, $duration, true, null, $attempt); | ||
|
|
||
| // Clean up Content-Type after successful insert | ||
| $this->client->removeHeader('Content-Type'); | ||
| return; | ||
| } catch (Exception $e) { | ||
| $duration = microtime(true) - $startTime; | ||
| $this->logQuery($sql, $params, $duration, false, $e->getMessage(), $attempt); | ||
| $lastException = $e; | ||
|
|
||
| // Clean up Content-Type before retry | ||
| $this->client->removeHeader('Content-Type'); | ||
|
|
||
| // Check if error is retryable | ||
| if ($attempt < $this->maxRetries && $this->isRetryableError(null, $e->getMessage())) { | ||
| $attempt++; | ||
| $delay = $this->retryDelay * (2 ** ($attempt - 1)); // Exponential backoff | ||
| usleep($delay * 1000); // Convert ms to microseconds | ||
| continue; | ||
| } | ||
|
|
||
| $rowCount = count($data); | ||
| $baseError = "ClickHouse insert execution failed after " . ($attempt + 1) . " attempt(s): {$e->getMessage()}"; | ||
| $errorMsg = $this->buildErrorMessage($baseError, $table, "INSERT INTO {$table} ({$rowCount} rows)"); | ||
| throw new Exception($errorMsg, 0, $e); | ||
| } | ||
| } | ||
|
|
||
| // Should never reach here, but just in case | ||
| $rowCount = count($data); | ||
| $baseError = "ClickHouse insert execution failed after " . ($this->maxRetries + 1) . " attempt(s)"; | ||
| $errorMsg = $this->buildErrorMessage($baseError, $table, "INSERT INTO {$table} ({$rowCount} rows)"); | ||
| throw new Exception($errorMsg, 0, $lastException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's see the full context of the method and class to understand the retry logic better
head -100 src/Usage/Adapter/ClickHouse.php | tail -50Repository: utopia-php/usage
Length of output: 1928
🏁 Script executed:
# Let's look at the method around lines 782-883 and understand the context
sed -n '700,900p' src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 8777
🏁 Script executed:
# Check the isRetryableError method to understand what errors trigger retries
rg -A 15 "function isRetryableError" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 699
🏁 Script executed:
# Look for deduplication-related configuration or methods
rg -i "dedup|query_id|insert_dedup" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Check what maxRetries is set to
rg "maxRetries\s*=" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 133
🌐 Web query:
ClickHouse INSERT idempotency query_id deduplication retry safety
💡 Result:
What ClickHouse actually deduplicates (and what it doesn’t)
- ClickHouse “retry safety” for
INSERTis block-based deduplication for*MergeTreetables: each inserted block gets ablock_id(hash of the block’s data). If the sameblock_idis seen again within the table’s dedup log window, the duplicate block is skipped. This is what makes retries effectively idempotent for ambiguous failures like timeouts/network drops. [1] - This is not primarily driven by
query_id.query_idis useful for tracing/observability, but the documented retry deduplication mechanism isblock_id+ dedup logs (and optionally a token). [1][2]
How to make INSERT retries safe (recommended approach)
- Use a
*MergeTreetable (dedup-on-insert is only supported there). [1] - Keep retries byte-for-byte and setting-for-setting identical:
- For
INSERT ... VALUES, block splitting is deterministic given the same settings, so retries should reuse the same settings. [1] - For
INSERT ... SELECT, dedup only works if theSELECTreturns the same rows in the same order; the docs recommend usingORDER BY ALLfor stable ordering. [1]
- For
- Ensure deduplication is enabled:
- For
Replicated*MergeTree, it’s enabled by default and governed byreplicated_deduplication_window/_seconds. [1][2] - For non-replicated
MergeTree, usenon_replicated_deduplication_window(and make sureinsert_deduplicate=1is not disabled for your inserts). [1][2]
- For
When you should use a deduplication token
- If you may legitimately insert the “same” block content multiple times (or you want an explicit idempotency key), set
insert_deduplication_tokento a stable value for the logical operation you’re retrying; ClickHouse will use it to control dedup behavior (works at partition level). [1][2]
Important limitations (so you don’t assume “exactly-once”)
- Deduplication only works within a finite deduplication window; if too many other inserts happen before your retry, duplicates can slip through. [1]
- With materialized views, ambiguous failures can still leave you unsure which tables received data if you stop retrying before success. [1]
Asynchronous inserts caveat
- For
async_insert, deduplication is not automatically on; enable it explicitly (e.g.,async_insert_deduplicate=1) and preferwait_for_async_insert=1for production reliability. [3][4]
Sources: ClickHouse docs on deduplicating inserts/retries and settings, plus insert-strategy guidance. [1][2][3][4]
🏁 Script executed:
# Search for table creation or engine configuration
rg -i "summing\|mergetree\|engine" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Check if there's any ClickHouse settings configuration
rg -i "insert_deduplicate|insert_deduplication_token|settings" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 199
🏁 Script executed:
# Look for table creation methods or schema setup
rg -A 10 "CREATE TABLE" src/Repository: utopia-php/usage
Length of output: 1533
🏁 Script executed:
# Check if there are any test files that show how the insert is used
fd -e php -path "*/test*" -o -path "*/spec*" | head -20Repository: utopia-php/usage
Length of output: 228
🏁 Script executed:
# Check the larger context of the adapter class to understand its purpose
rg -B 5 "class ClickHouse" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 305
🏁 Script executed:
# Check if insert_deduplication_token is used anywhere
rg "insert_deduplication_token|deduplicate" src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# Check the full insert method signature and if there are any headers/settings related to dedup
sed -n '775,895p' src/Usage/Adapter/ClickHouse.php | head -150Repository: utopia-php/usage
Length of output: 5209
🏁 Script executed:
# Check if there are any tests that show the expected behavior for retries
find . -name "*.php" -type f | xargs grep -l "ClickHouse" | grep -i test | head -5Repository: utopia-php/usage
Length of output: 100
🌐 Web query:
ClickHouse block_id deduplication SummingMergeTree default behavior insert
💡 Result:
-
SummingMergeTree uses the standard MergeTree insert-dedup mechanism: incoming data is split into blocks, each block gets a
block_id= hash of the inserted block’s bytes (data + schema), and ClickHouse can skip re-inserting a block if itsblock_idis already in the table’s deduplication log. [1] -
Default behavior depends on whether the table is replicated:
- Replicated SummingMergeTree (
ReplicatedSummingMergeTree): deduplication on INSERT retries is enabled by default viainsert_deduplicate=1, with a default deduplication log window of 100 recent blocks per partition (and also time-limited byreplicated_deduplication_window_seconds, default 1 week). [2][3] - Non-replicated SummingMergeTree (
SummingMergeTree): deduplication is OFF by default becausenon_replicated_deduplication_windowdefaults to 0 (disabled). You must set it > 0 to enable deduplication. [3]
- Replicated SummingMergeTree (
-
Practical implication: if you run the same INSERT again (same rows, same order, and effectively the same block splitting), ClickHouse may drop it as a duplicate only within the deduplication window. If you intentionally want to insert identical data multiple times, use
insert_deduplication_token(vary it per insert) to avoid “same block_id” deduplication. [1][2]
Sources:
[1] ClickHouse “Deduplicating Inserts on Retries” (block_id definition and behavior) (clickhouse.com)
[2] ClickHouse settings: insert_deduplicate, insert_deduplication_token defaults/notes (clickhouse.com)
[3] ClickHouse MergeTree settings: replicated_deduplication_window default 100; non_replicated_deduplication_window default 0 (docs-content.clickhouse.tech)
Citations:
- 1: https://clickhouse.com/docs/en/guides/developer/deduplicating-inserts-on-retries?utm_source=openai
- 2: https://clickhouse.com/docs/operations/settings/settings?utm_source=openai
- 3: https://docs-content.clickhouse.tech/docs/en/operations/settings/merge-tree-settings?utm_source=openai
🏁 Script executed:
# Check if there's any configuration for non_replicated_deduplication_window
rg "non_replicated_deduplication_window|replicated_deduplication" src/Repository: utopia-php/usage
Length of output: 42
🏁 Script executed:
# See if there's a README or documentation about the ClickHouse adapter
cat README.md | head -100Repository: utopia-php/usage
Length of output: 3114
🏁 Script executed:
# Check the test file to understand expected behavior
cat tests/Usage/Adapter/ClickHouseTest.php | head -200Repository: utopia-php/usage
Length of output: 6898
🏁 Script executed:
# Check if there are any settings passed with INSERT queries
rg -B 5 -A 5 "INSERT" src/Usage/Adapter/ClickHouse.php | head -100Repository: utopia-php/usage
Length of output: 3939
🏁 Script executed:
# Check the exact URL being constructed for INSERT to see if any dedup settings are passed
sed -n '790,810p' src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 867
🏁 Script executed:
# Look at the exact INSERT URL construction
sed -n '782,815p' src/Usage/Adapter/ClickHouse.phpRepository: utopia-php/usage
Length of output: 1404
Enable ClickHouse-level deduplication for INSERT retries instead of disabling them.
INSERT retries with SummingMergeTree can produce duplicate rows if the server acknowledges the insert but the response is lost. This inflates aggregated totals. ClickHouse provides built-in deduplication via block_id (hash of inserted data), but for non-replicated MergeTree tables this is disabled by default. Set non_replicated_deduplication_window > 0 in the INSERT query settings to enable it.
Recommended fix:
Add deduplication settings to the INSERT URL:
- $url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow";
+ $url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow&settings[non_replicated_deduplication_window]=100";This preserves retry resilience while preventing duplicate aggregation. Disabling retries entirely sacrifices availability.
🤖 Prompt for AI Agents
In `@src/Usage/Adapter/ClickHouse.php` around lines 782 - 883, The INSERT should
enable ClickHouse deduplication instead of disabling retries: modify the SQL in
the ClickHouse Adapter where $sql and $url are built (variables $sql = "INSERT
INTO {$escapedTable} FORMAT JSONEachRow" and $url =
".../?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow") to include a
SETTINGS clause (e.g. " SETTINGS non_replicated_deduplication_window=1") and
ensure the URL encodes that same SQL (or append the equivalent settings query
parameter) so retries remain enabled but server-side deduplication prevents
duplicate aggregated rows; update any logging (buildErrorMessage/logQuery) that
references $sql if needed to reflect the added SETTINGS.
…ickHouse operations
Usage
Automatically construct ID using md5 hash
Automatically aggregate metric in DB library using upsertDocumentWithIncrease and in ClickHouse using
summingMergeTreeengine and final select to make sure the aggregation happensDatabase adapter (utopia-php/database)
ClickHouse adapter
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.