Skip to content

[WIP] feat: rewrite cloudbuild.yaml as full CI/CD pipeline matching deploy.sh#40

Open
luis5tb wants to merge 4 commits intoRHEcosystemAppEng:mainfrom
luis5tb:cloudbuilds
Open

[WIP] feat: rewrite cloudbuild.yaml as full CI/CD pipeline matching deploy.sh#40
luis5tb wants to merge 4 commits intoRHEcosystemAppEng:mainfrom
luis5tb:cloudbuilds

Conversation

@luis5tb
Copy link
Copy Markdown
Collaborator

@luis5tb luis5tb commented Mar 12, 2026

The old cloudbuild.yaml was outdated and unused — it only handled the agent image with inline CLI flags. This rewrite creates a complete 10-step pipeline that mirrors deploy.sh: builds both service images, copies the MCP image from Quay.io to GCR, deploys both services using the YAML configs with sed substitution, configures Pub/Sub, and sets service URLs. Also documents the Cloud Build workflow in the Cloud Run and main READMEs.

The old cloudbuild.yaml was outdated and unused — it only handled the
agent image with inline CLI flags. This rewrite creates a complete
10-step pipeline that mirrors deploy.sh: builds both service images,
copies the MCP image from Quay.io to GCR, deploys both services using
the YAML configs with sed substitution, configures Pub/Sub, and sets
service URLs. Also documents the Cloud Build workflow in the Cloud Run
and main READMEs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yuvalk
Copy link
Copy Markdown
Collaborator

yuvalk commented Mar 13, 2026

do we want to test it somehow?

@luis5tb luis5tb changed the title feat: rewrite cloudbuild.yaml as full CI/CD pipeline matching deploy.sh [WIP] feat: rewrite cloudbuild.yaml as full CI/CD pipeline matching deploy.sh Mar 13, 2026
yuvalk added 3 commits March 14, 2026 04:38
The images section declares build artifacts, but when
_BUILD_FROM_SOURCE=false, images are pulled/tagged/pushed in copy
steps, not built by Cloud Build. This could cause the post-build
push to fail or overwrite with stale layers.
Matches deploy.sh validation: when using a fully-qualified topic
path (projects/...), the subscription name must be explicitly set
since the default ${topic}-sub would produce an invalid name.
Also adds set -eo pipefail for safer bash execution.
…henticated

Both steps modify Cloud Run services concurrently (IAM binding vs env
var update). Add allow-unauthenticated to the waitFor list of
update-agent-urls to serialize these operations.
@yuvalk
Copy link
Copy Markdown
Collaborator

yuvalk commented Mar 14, 2026

Summary

Rewrites the existing minimal cloudbuild.yaml (which only built and deployed the agent image) into a comprehensive 10-step CI/CD pipeline that mirrors deploy.sh. The new pipeline builds (or pulls pre-built) both service images, copies the MCP sidecar image from Quay.io to GCR, deploys both services using YAML configs with sed substitution, configures Pub/Sub, sets IAM policies, and updates service URLs. Also adds Cloud Build documentation to both READMEs.

Critical Issues

  1. [Critical] Missing SERVICE_CONTROL_SERVICE_NAME substitution in handler deploy step -> handler YAML will contain a literal ${SERVICE_CONTROL_SERVICE_NAME} placeholder, breaking marketplace integration. deploy.sh (line 242) substitutes this variable but cloudbuild.yaml deploy-handler step omits it entirely. There is also no corresponding _SERVICE_CONTROL_SERVICE_NAME substitution variable defined. [cloudbuild.yaml: deploy-handler step, ~line 186]

    Fix: Add _SERVICE_CONTROL_SERVICE_NAME to the substitutions block and add the sed replacement:

    -e 's|$${SERVICE_CONTROL_SERVICE_NAME}|${_SERVICE_CONTROL_SERVICE_NAME}|g' \
    
  2. [Critical] images directive will fail when using pre-built images from Quay.io. The images section at the bottom declares three images as build artifacts. When _BUILD_FROM_SOURCE=false, no docker build runs -- the images are pulled/tagged/pushed in the copy-* steps. Cloud Build's images directive expects to push images it built; images pushed manually in steps may cause the post-build push to fail or overwrite with stale layers. [cloudbuild.yaml: lines ~347-349]

    Fix: Either remove the images section entirely (since all images are already pushed explicitly in the pipeline steps), or document that this is intentional and tested with both paths.

Important Issues

  1. [Important] Missing cross-project Pub/Sub topic handling. deploy.sh (lines 304-308) detects fully-qualified topic paths (projects/...) and uses --impersonate-service-account for the subscription create command. The cloudbuild.yaml configure-pubsub step does not replicate this logic. Cross-project marketplace topics will fail to create subscriptions. [cloudbuild.yaml: configure-pubsub step]

    Fix: Add the same fully-qualified topic detection and impersonation logic from deploy.sh.

  2. [Important] Missing fully-qualified topic path validation. deploy.sh (lines 82-87) validates that if PUBSUB_TOPIC is a fully-qualified path, PUBSUB_SUBSCRIPTION must be explicitly set (because ${topic}-sub would produce an invalid name). The cloudbuild.yaml does not perform this validation. [cloudbuild.yaml: configure-pubsub step]

    Fix: Add a guard at the start of the configure-pubsub step script that checks if _PUBSUB_TOPIC starts with projects/ and _PUBSUB_SUBSCRIPTION is empty, and exits with an error.

  3. [Important] gcr.io Container Registry is deprecated. Google has deprecated Container Registry (gcr.io) in favor of Artifact Registry (REGION-docker.pkg.dev). New pipelines should use Artifact Registry to avoid future migration. [cloudbuild.yaml: throughout]

    Fix: Consider using ${_REGION}-docker.pkg.dev/${PROJECT_ID}/REPO_NAME/IMAGE format, or at minimum add a comment acknowledging the planned migration.

  4. [Important] Potential race condition in post-deploy steps. The update-agent-urls step waits for ['deploy-handler', 'deploy-agent'] but runs in parallel with allow-unauthenticated and configure-pubsub. If allow-unauthenticated modifies the service while update-agent-urls also modifies it, the concurrent gcloud run services update and gcloud run services add-iam-policy-binding calls could conflict. [cloudbuild.yaml: update-agent-urls step]

    Fix: Add 'allow-unauthenticated' to the waitFor list of update-agent-urls, or verify that IAM binding and env var update are safe to run concurrently on Cloud Run.

Suggestions

  1. [Minor] No ENABLE_MARKETPLACE equivalent. deploy.sh supports ENABLE_MARKETPLACE=false to skip Pub/Sub configuration entirely. The cloudbuild.yaml always runs configure-pubsub. Consider adding a _ENABLE_MARKETPLACE substitution variable for parity. [cloudbuild.yaml: configure-pubsub step]

  2. [Minor] Bash steps lack set -euo pipefail. Most bash entrypoint scripts in the pipeline steps do not start with set -e. While Cloud Build treats a non-zero exit as a step failure, individual commands within a multi-command script block (like configure-pubsub) could silently fail without set -e. The gcloud run services add-iam-policy-binding ... || true in configure-pubsub already handles expected failures, but other commands could silently fail. [cloudbuild.yaml: all bash steps]

    Fix: Add set -euo pipefail at the top of each multi-command bash block.

  3. [Minor] README documentation table says "Copy" phase runs only when _BUILD_FROM_SOURCE=false, but the "Build" row says steps are "Skipped when _BUILD_FROM_SOURCE=false" without mentioning that copy-mcp always runs regardless of this flag. This could confuse readers. [deploy/cloudrun/README.md: pipeline table]

  4. [Nit] Inconsistent error handling between configure-pubsub and update-agent-urls. The configure-pubsub step soft-fails with a WARNING and exit 0 when the handler URL is missing, while update-agent-urls hard-fails with exit 1 when the agent URL is missing. The inconsistency is reasonable given their different criticality, but worth a comment explaining why. [cloudbuild.yaml: both steps]

What's Done Well

  • Well-structured parallelism: Build steps run in parallel with waitFor: ['-'], and the dependency graph (deploy-handler before deploy-agent, push before deploy) is correctly modeled.
  • Dual-path flexibility: The _BUILD_FROM_SOURCE flag enabling both build-from-source and pull-from-Quay workflows is a thoughtful design that covers different deployment scenarios.
  • Faithful port of deploy.sh logic: The sed substitution patterns, Pub/Sub subscription create-or-update logic, and URL update flow closely mirror the shell script.
  • Thorough documentation: The README additions include a clear comparison table between deploy.sh and cloudbuild.yaml, complete substitution variable reference, and multiple usage examples.
  • Sensible defaults: All substitution variables have defaults matching deploy.sh, reducing the barrier to entry.
  • Good use of conditional skipping: The if/else pattern in build/copy steps cleanly handles the two image sourcing modes without duplicating step definitions.

Verdict: Request Changes

The missing SERVICE_CONTROL_SERVICE_NAME substitution is a functional bug that will cause the marketplace handler to deploy with a literal placeholder, breaking marketplace integration. The images directive behavior with the pre-built image path needs verification. The missing cross-project Pub/Sub handling is a significant gap compared to deploy.sh that could cause production failures for cross-project marketplace setups. These issues should be addressed before merge. The [WIP] tag in the title also suggests the author considers this not yet ready.

Review-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants