Skip to content

Add compression, timeout, and health check support to ClickHouse adapter#110

Open
lohanidamodar wants to merge 5 commits intomainfrom
claude/clickhouse-compression-support-LZhdW
Open

Add compression, timeout, and health check support to ClickHouse adapter#110
lohanidamodar wants to merge 5 commits intomainfrom
claude/clickhouse-compression-support-LZhdW

Conversation

@lohanidamodar
Copy link
Contributor

@lohanidamodar lohanidamodar commented Feb 2, 2026

Summary

Enhanced the ClickHouse audit adapter with HTTP compression support, configurable connection timeouts, and server health checking capabilities. These features improve performance, reduce bandwidth usage, and provide better connection reliability monitoring.

Key Changes

Compression Support

  • Added support for three compression types: none, gzip, and lz4
  • Compression can be configured via constructor parameter or setCompression() method
  • Automatically compresses request bodies for INSERT operations when enabled
  • Handles response decompression via Accept-Encoding headers
  • Gracefully falls back to uncompressed if compression extension unavailable

Connection Timeout Management

  • Introduced configurable timeout with sensible defaults (30 seconds)
  • Enforced timeout range: 1-600 seconds (1,000-600,000 milliseconds)
  • Timeout can be set via constructor or setTimeout() method
  • Timeout applied to HTTP client for all requests

Health Checking & Diagnostics

  • Added ping() method to verify server connectivity and authentication
  • Added getServerVersion() method to retrieve ClickHouse version information
  • Both methods gracefully handle connection failures without throwing exceptions

Code Quality Improvements

  • Enhanced documentation with detailed parameter descriptions
  • Added comprehensive validation for compression and timeout parameters
  • Improved error handling with handleQueryError() and parseClickHouseError() methods
  • Better error messages with HTTP status code context (401, 403, 404)
  • Extracted client initialization to initializeClient() for reusability
  • Fixed duplicate variable assignment in parseQueries() method

Testing

  • Added 11 new test cases covering:
    • Compression type validation (valid and invalid)
    • Timeout range validation
    • Getter/setter methods for compression and timeout
    • Health check functionality (ping)
    • Server version retrieval
    • End-to-end operations with gzip compression enabled

Implementation Details

  • Compression headers (Accept-Encoding, Content-Encoding) configured in HTTP client
  • gzip compression uses PHP's built-in gzencode() function
  • lz4 compression requires optional lz4 PHP extension
  • Error parsing extracts meaningful messages from ClickHouse's standard error format
  • All new features maintain backward compatibility with existing code

https://claude.ai/code/session_01KCdXs3NCKaAR57A8b3QdVt

Summary by CodeRabbit

Release Notes

  • New Features

    • Added HTTP compression support with GZIP and LZ4 options for optimized data transfer
    • Configurable connection timeout settings for flexible network environments
    • Server health check (ping) functionality to verify connectivity
    • Server version retrieval capability for compatibility checks
    • Enhanced error handling with improved message parsing
  • Tests

    • Comprehensive test coverage added for compression validation, timeout handling, and health check operations

…se adapter

- Add HTTP compression support (gzip, lz4) for reduced bandwidth
- Add configurable connection timeout settings
- Add ping() method for connection health checking
- Add getServerVersion() method to retrieve server version
- Add better error handling with contextual error messages
- Fix duplicate line in parseQueries method
- Add comprehensive tests for new functionality

https://claude.ai/code/session_01KCdXs3NCKaAR57A8b3QdVt
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

Warning

Rate limit exceeded

@lohanidamodar has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 33 minutes and 35 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

This pull request adds HTTP compression support and configurable timeout functionality to the ClickHouse audit adapter. Changes include introduction of compression type constants (GZIP, LZ4, NONE) and timeout validation bounds, extension of the constructor to accept these parameters, and new public methods for getting/setting compression and timeout values. Additionally, the adapter now includes health check functionality via a ping method, server version retrieval, and enhanced error parsing capabilities. Request body compression and error handling logic have been integrated into the query flow. The test file adds comprehensive coverage for compression validation, timeout bounds checking, health check behavior, and end-to-end operations with compression enabled.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding compression, timeout, and health check support to the ClickHouse adapter, which aligns with the substantial modifications across both the adapter implementation and test files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/clickhouse-compression-support-LZhdW

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Audit/Adapter/ClickHouse.php (1)

1479-1489: ⚠️ Potential issue | 🔴 Critical

Missing response decompression causes JSON parsing failures.

The pipeline failure "Control character error, possibly incorrectly encoded" indicates the response body is compressed but not decompressed before JSON parsing. When Accept-Encoding: gzip is set, ClickHouse returns compressed responses that must be decompressed before parsing.

🐛 Proposed fix: Add response decompression

Add a helper method and call it before parsing:

/**
 * Decompress response body if needed.
 *
 * `@param` string $body The potentially compressed body
 * `@return` string The decompressed body
 */
private function decompressResponse(string $body): string
{
    if ($this->compression === self::COMPRESSION_NONE || empty($body)) {
        return $body;
    }

    if ($this->compression === self::COMPRESSION_GZIP) {
        $decompressed = `@gzdecode`($body);
        if ($decompressed !== false) {
            return $decompressed;
        }
        // If decompression fails, assume response wasn't compressed
        return $body;
    }

    if ($this->compression === self::COMPRESSION_LZ4 && function_exists('lz4_uncompress')) {
        $decompressed = `@lz4_uncompress`($body);
        if ($decompressed !== false) {
            return $decompressed;
        }
        return $body;
    }

    return $body;
}

Then in query() method after receiving response (around line 769):

-            return $responseBody;
+            return $this->decompressResponse($responseBody);
🤖 Fix all issues with AI agents
In `@src/Audit/Adapter/ClickHouse.php`:
- Around line 386-410: The ping() and getServerVersion() return compressed
binary when HTTP compression is enabled because query() doesn't decompress
responses; modify the query() method (used by ping() and getServerVersion()) to
detect and handle compressed responses (check response headers like
Content-Encoding or ClickHouse-specific compression flags) and decompress the
body (e.g., using gzdecode/zlib_decode or appropriate decompressor) before
returning a UTF-8 string; ensure query() still throws on transport errors and
returns a string that ping() can trim() and getServerVersion() can parse, with a
safe fallback to original body if decompression fails and appropriate exception
handling/logging.
- Around line 163-169: The Content-Encoding header is being added globally in
the constructor (check the compression check using $this->compression and
self::COMPRESSION_NONE and the calls to $this->client->addHeader), which
incorrectly marks non-INSERT/multipart requests as compressed; change the logic
to only set Content-Encoding per request when you actually send compressed
payloads (e.g. in the query() method where $jsonRows/INSERT requests are
detected), by removing the unconditional addHeader('Content-Encoding', ...) from
initialization and adding it just before dispatching compressed INSERT bodies
(or use a separate request/client instance for compressed INSERTs).
🧹 Nitpick comments (2)
src/Audit/Adapter/ClickHouse.php (2)

804-816: LZ4 compression silently falls back without warning.

When LZ4 compression is configured but the lz4 extension is not available, the code silently falls back to uncompressed requests. This could lead to confusion during debugging when users expect compression but don't see reduced bandwidth.

Consider logging a warning or throwing an exception during validateCompression() if LZ4 is selected but the extension is unavailable.

♻️ Option: Validate extension availability during compression validation
     private function validateCompression(string $compression): void
     {
         if (!in_array($compression, self::VALID_COMPRESSION_TYPES, true)) {
             $validTypes = implode(', ', self::VALID_COMPRESSION_TYPES);
             throw new Exception("Invalid compression type '{$compression}'. Valid types are: {$validTypes}");
         }
+
+        if ($compression === self::COMPRESSION_LZ4 && !function_exists('lz4_compress')) {
+            throw new Exception("LZ4 compression requires the 'lz4' PHP extension to be installed");
+        }
     }

829-846: Unused $sql parameter in handleQueryError().

The $sql parameter is declared but never used. Either include it in error messages for debugging context or remove it.

♻️ Option 1: Use the parameter for better error context
     private function handleQueryError(int $statusCode, string $responseBody, string $sql): void
     {
         // Extract meaningful error message from ClickHouse response
         $errorMessage = $this->parseClickHouseError($responseBody);

         $context = '';
         if ($statusCode === 401) {
             $context = ' (authentication failed - check username/password)';
         } elseif ($statusCode === 403) {
             $context = ' (access denied - check permissions)';
         } elseif ($statusCode === 404) {
             $context = ' (database or table not found)';
         }

+        // Truncate SQL for error message to avoid overly long exceptions
+        $sqlPreview = strlen($sql) > 100 ? substr($sql, 0, 100) . '...' : $sql;
+
         throw new Exception(
-            "ClickHouse query failed with HTTP {$statusCode}{$context}: {$errorMessage}"
+            "ClickHouse query failed with HTTP {$statusCode}{$context}: {$errorMessage}. Query: {$sqlPreview}"
         );
     }
♻️ Option 2: Remove unused parameter
-    private function handleQueryError(int $statusCode, string $responseBody, string $sql): void
+    private function handleQueryError(int $statusCode, string $responseBody): void

And update the call site at line 766:

-                $this->handleQueryError($statusCode, $responseBody, $sql);
+                $this->handleQueryError($statusCode, $responseBody);

The previous implementation set Content-Encoding header globally but only
compressed INSERT request bodies, causing ClickHouse to fail parsing
uncompressed SELECT queries. This fix:

- Removes request body compression (was causing JSON parse errors)
- Keeps response compression via Accept-Encoding header (the bigger win)
- Updates documentation to clarify compression is for responses only

https://claude.ai/code/session_01KCdXs3NCKaAR57A8b3QdVt
The Fetch Client doesn't automatically decompress gzip responses. Added
decompressResponse() method that:
- Checks Content-Encoding header for gzip/deflate/lz4
- Decompresses the response body accordingly
- Falls back to original body if decompression fails

This allows the compression feature to work properly - ClickHouse sends
compressed responses when Accept-Encoding is set.

https://claude.ai/code/session_01KCdXs3NCKaAR57A8b3QdVt
Based on patterns from utopia-php/usage ClickHouse adapter:

Retry Logic:
- Exponential backoff with configurable max retries (0-10)
- Configurable base delay (10-5000ms)
- Automatic retry on network errors and server overload
- Retryable status codes: 408, 429, 500, 502, 503, 504

Health Check:
- healthCheck() returns comprehensive status object
- Includes: version, uptime, response time, configuration
- Returns error message on failure

Query Logging:
- enableQueryLogging() to toggle logging
- getQueryLog() returns SQL, params, duration, success/error
- clearQueryLog() to reset

Statistics:
- getStats() returns query counts, success rate, configuration
- resetStats() to reset counters
- Automatic tracking of all queries

https://claude.ai/code/session_01KCdXs3NCKaAR57A8b3QdVt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants