Conversation
Signed-off-by: slach <bloodjazman@gmail.com>
…mpose YAMLs, cleanup references - Rename docker-compose/ -> docker/, keep only scripts (custom_entrypoint.sh, dynamic_settings.sh) - Delete docker-compose.yml, clickhouse-service.yml, kafka-service.yml, zookeeper-service.yml - Rename docker_compose_project_dir -> docker_dir, _compose_dir -> _docker_dir in cluster.py - Remove unused docker_compose/docker_compose_file params from Cluster.__init__ - Add port 7171 conflict detection and logging in _do_down() - Make --debug flag in run.sh conditional on TESTFLOWS_DEBUG env var - Update README.md and argparser.py help text Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: slach <bloodjazman@gmail.com>
- Replace fixed 7171:7171 port binding with dynamic host port mapping - Add Cluster.get_mapped_port() for querying mapped ports at runtime - api.py uses dynamic backup_api_port from context instead of hardcoded 7171 - Always clean up containers in Cluster.down() (remove local mode skip) - run.sh: auto-discover suites from regression.py, run in parallel via xargs - RUN_PARALLEL=1 by default, each suite gets its own Cluster (~11 containers) - Suite results collected from log files, summary printed at end Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…fixes - Each regression.py process creates its own configs/backup_<PID>/ dir - Storage path prefix set to testflows_<PID> for s3/gcs/azblob/ftp/sftp/cos - Cluster accepts backup_config_dir to mount per-process config into container - Per-process config dir cleaned up in finally block - Fixes cloud_storage and api test failures when running with RUN_PARALLEL>1 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…run.sh - Wire TestContainers into pool factory when USE_TESTCONTAINERS=1 - Add TestMain to clean up containers after test run - Add cleanupStaleTestContainers() to remove leftover tc_ resources (containers, networks, volumes) from interrupted runs - Create Docker named volumes before using them in container binds - Add "azure" network alias for Azurite container (ClickHouse configs reference http://azure:10000) - Support extra network aliases in startContainer() - Update run.sh: USE_TESTCONTAINERS=1 is the new default, skips all docker compose up/down logic; legacy compose mode still available Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: slach <bloodjazman@gmail.com>
Each test now creates its own containers in NewTestEnvironment and destroys them in Cleanup. Concurrency is controlled by go test -parallel. Removes go-commons-pool dependency and simplifies TestMain. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The old run.sh used CLICKHOUSE_VERSION == 2* to select the advanced compose file, which included dynamic_settings.sh (storage policies). CH 20.3+ needs hot_and_cold policy for TestHardlinksExistsFiles. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ough Simplify build.yaml testflows step to call ./test/testflows/run.sh. run.sh now handles tfs report generation, coverage formatting, and permission fixes. Adds RUN_PARALLEL=3 and DEBUG/NO_COLORS env vars. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Avoids slow inline pull that gets mixed into the SAS token output. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
# Conflicts: # test/testflows/.gitignore
Binary is already built and downloaded as artifact in CI. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ilure log.Fatal kills the entire test process, including all parallel tests. When a ClickHouse container restarts, port bindings temporarily disappear. Now returns error to let connectWithWait retry. Also increased retries from 10 to 30 with 1s sleep to tolerate container restarts. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Shows container status, health, exit code, OOMKilled flag, and last 50 lines of logs when a container fails to become healthy. Helps diagnose why ClickHouse or other containers fail to start in CI. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…to 300s ClickHouse 26.1 needs 2+ minutes to initialize S3/Azure object storage disks. With StartPeriod=2s Docker marks the container unhealthy before ClickHouse finishes startup. Increase StartPeriod so health failures during init don't count as retries, and wait up to 5 minutes total. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Pull Request Test Coverage Report for Build 23679338147Details
💛 - Coveralls |
…kip redundant pulls - Start all independent support services (sshd, ftp, minio, gcs, azure, zookeeper, mysql, pgsql) in parallel goroutines instead of sequentially - Wait for all health checks in parallel - Pre-pull all Docker images once in TestMain before tests start, so parallel tests don't race to pull the same images - Skip Docker pull if image already exists locally (ImageInspect check) - Add sync.Mutex to protect concurrent map writes during parallel startup - Enable TEST_LOG_LEVEL=debug in CI for better diagnostics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Creating containers per-test added ~40-80 minutes of overhead for 41 tests. Now pre-creates RUN_PARALLEL environments in TestMain and reuses them via a buffered channel pool. Tests acquire env from pool, clean shared state (disk_s3, backups, rsync, restic, kopia) in Cleanup, and return to pool. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
RUN_TESTS='*' in CI was treated as a specific filter, bypassing the parallel xargs branch. Now '*' falls through to the parallel suite discovery path. Also guard source .env for CI where file doesn't exist. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
restore uses ON CLUSTER for CREATE TABLE, so DROP DATABASE without ON CLUSTER leaves pending DDL tasks in ZooKeeper that can recreate tables after the database is dropped. This fixes TestSkipEmptyTables flakiness where empty_table reappeared after being skipped. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
RBAC restore does SYSTEM SHUTDOWN internally. Without an explicit container restart, the immediate reconnect hits an unready ClickHouse. Replace commented-out compose restart with tc.RestartContainer. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
After SYSTEM SHUTDOWN, ClickHouse briefly accepts TCP connections while shutting down. Connect+Ping succeeds but the next query gets EOF. Add 5s delay for shutdown to complete and verify with SELECT 1 after reconnect to ensure ClickHouse is truly ready, not just accepting TCP. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…s on test failure - Increase reconnect timeout from 180s to 300s (CH 23.3 with S3/Azure disks needs ~3.5 min to restart) - Use per-query 5s timeout for SELECT 1 instead of outer closeCtx which may be nearly expired - Increase retry count from 60 to 120 - Dump all container state + last 50 log lines when a test fails (DumpAllContainerLogs) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse may still be loading RBAC objects after restart, causing EOF on first query. Add retry loop with reconnect for SHOW queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Named Docker volumes have significant overhead for file-heavy operations. Replace with host bind-mount directories in /tmp for native filesystem speed. This fixes TestGCS timeout (67 min -> should be ~40 min like on master). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse creates files as root inside the container, so the host Go process cannot delete them. Clean shared dirs via docker exec before stopping containers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On shared environments parallel tests add IO pressure to minio, causing cached list to occasionally be slower than uncached. Retry cached measurement up to 3 times before failing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace CH 26.1 with 26.3 in CI matrix (26.1 has known BlobKillerThread bug) - Update default CLICKHOUSE_VERSION to 26.3 in run.sh scripts - Increase go test timeout from 90m to 120m (TestGCS needs ~50 min) - Add fail-fast: false to CI matrix to avoid cascading cancellations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…vior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…leanup - TestNamedCollections: drop database before named collection (CH 26.3 forbids DROP NAMED COLLECTION while tables reference it) - checkObjectStorageIsEmpty: call SYSTEM WAIT BLOBS CLEANUP before checking minio (CH 26.2+ async BlobKillerThread leaves disk_s3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g grep Results section only checked for "Failing" in log files, missing suites that crashed with exit code 1 (e.g. missing docker image). Now tracks exit code via .rc files and prints stdout on failure for CI visibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…use exit codes only - Start all N test environments concurrently instead of sequentially (~19s vs ~77s for 4 envs) - Stop all environments and their containers in parallel on teardown - Remove grep "Failing" fallback from testflows/run.sh, rely solely on exit codes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix transient DNS/network failures (exit code 6) when downloading yq, restic, and kopia inside containers during CI. Add --retry 5 --retry-delay 5 --retry-connrefused to all curl commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rtPeriod - Revert shared volumes from host bind-mount directories back to Docker named volumes (matching working commit cdb05d3). Bind mounts + rm -rf /var/lib/clickhouse was destroying ClickHouse data. - Fix CUR_DIR fallback: go test already sets cwd to test/integration, so don't append test/integration again. - Restore ClickHouse healthcheck StartPeriod to 120s (was incorrectly reduced to 10s). - Keep parallelized env startup/shutdown and container stop improvements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…estcontainers_migration # Conflicts: # go.mod # go.sum
…eLocalDownloadRestore Race condition: async download/restore API returns immediately, but fixed sleep 2/sleep 8 was insufficient — restore could start before download's pid file was cleaned up via defer. Now polls /backup/status by operation_id until completion, then waits 1s for defer cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The query checking s3 parts used `name='table_s3'` (part name column) instead of `table='table_s3'` (table name column), making the assertion always pass regardless of whether data was actually restored. Also reset the variable before reuse to prevent stale values from prior query. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: slach <bloodjazman@gmail.com>
SYSTEM WAIT BLOBS CLEANUP is only available in CH 26.3+, not 26.2. checkObjectStorageIsEmpty is called before runMainIntegrationScenario which means env.ch is nil. Connect/disconnect around the query. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…in testAPIDeleteLocalDownloadRestore The /backup/status?operationid= endpoint returns a single JSON object (via sendJSONEachRow), not a JSON array. Changed jq from .[0].status to .status. Also narrowed error assertion to match "status":"error" instead of bare "error" which false-matched bash -xe trace output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
to increase parallelism and flexibility