Skip to content

docs: add external services inventory#29

Open
yuvalk wants to merge 12 commits intoRHEcosystemAppEng:mainfrom
yuvalk:external-services
Open

docs: add external services inventory#29
yuvalk wants to merge 12 commits intoRHEcosystemAppEng:mainfrom
yuvalk:external-services

Conversation

@yuvalk
Copy link
Copy Markdown
Collaborator

@yuvalk yuvalk commented Mar 11, 2026

Summary

  • Add EXTERNAL_SERVICES.md cataloging all external service dependencies
  • Covers Google Cloud services, Red Hat services, databases, caching/rate limiting, observability, and container registries
  • Includes key config variables and required/optional status for each service

Test plan

  • Review document for accuracy against current codebase
  • Verify all listed services and config variables are correct

🤖 Generated with Claude Code

| **Cloud Run** | Production serverless deployment (2 services: agent + handler) | For production | `deploy/cloudrun/` |
| **Cloud Pub/Sub** | Receives marketplace provisioning events asynchronously | For marketplace | Topic: `marketplace-entitlements` |
| **Commerce Procurement API** | Approve/manage marketplace accounts & entitlements | For marketplace | `https://cloudcommerceprocurement.googleapis.com/v1` |
| **Service Control API** | Usage metering & billing reporting to GCP Marketplace | For marketplace | `SERVICE_CONTROL_SERVICE_NAME`, `SERVICE_CONTROL_ENABLED` |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this we have it in there but never tested it, and there is no need to use it for now giving it is going to be a free/public offering. We can still keep it here

@yuvalk
Copy link
Copy Markdown
Collaborator Author

yuvalk commented Mar 14, 2026

Summary

Adds a new EXTERNAL_SERVICES.md file that catalogs all external service dependencies for the Google Lightspeed Agent. The document covers LLM models, Google Cloud services, Red Hat services, databases, caching/rate limiting, observability, container registries, and JWT validation endpoints.

Critical Issues

  1. Fabricated config variables LIGHTSPEED_CLIENT_ID and LIGHTSPEED_CLIENT_SECRET
    The document lists these as key config for console.redhat.com (Insights APIs), but they do not appear anywhere in the codebase. A grep across src/ for LIGHTSPEED_CLIENT returns zero matches. The MCP server handles Insights API authentication independently; the agent itself has RED_HAT_SSO_CLIENT_ID and RED_HAT_SSO_CLIENT_SECRET (src/lightspeed_agent/config/settings.py:49-55), which serve a different purpose (token introspection). Listing nonexistent config variables will confuse operators.

  2. Redis container image is wrong
    The document states the Redis image is docker.io / redis:7-alpine. The actual image used in Podman deployments is quay.io/fedora/redis-7 (deploy/podman/redis-pod.yaml:20). For Cloud Run, Redis is provided by Cloud Memorystore (a managed service, no container image at all). The docker.io image is not referenced anywhere in the repository.

Important Issues

  1. Missing Google Cloud services that are actually used
    The document omits several GCP services that are configured in deploy/cloudrun/setup.sh:86-97 and referenced throughout the deployment:

    • Cloud SQL -- the actual PostgreSQL provider in production (referenced in deploy/cloudrun/setup.sh:322, service.yaml:35, IAM role roles/cloudsql.client at line 145)
    • Secret Manager -- stores all secrets (API keys, DB URLs, Redis URL) per deploy/cloudrun/setup.sh:157-187
    • Cloud Scheduler -- schedules usage reporting jobs per deploy/cloudrun/setup.sh:80
    • Cloud Logging / Cloud Monitoring -- IAM roles roles/logging.logWriter and roles/monitoring.metricWriter are granted at setup.sh:143-144
    • Serverless VPC Access -- required to connect Cloud Run to Cloud Memorystore Redis (setup.sh:85, service.yaml:37)
      These are all production-required infrastructure services.
  2. Artifact Registry vs. Google Container Registry (gcr.io)
    The document lists "Artifact Registry" in the Google Cloud Services table with gcr.io/{PROJECT_ID}/.... However, gcr.io is Google Container Registry (GCR), which is a distinct (and deprecated) service. Artifact Registry uses REGION-docker.pkg.dev URLs. The codebase consistently uses gcr.io (deploy/cloudrun/deploy.sh:144-147, deploy/cloudrun/service.yaml:46), so the service name in the table should say "Container Registry (gcr.io)" rather than "Artifact Registry".

  3. MCP server container image references are imprecise
    The doc lists quay.io and ghcr.io with image name red-hat-lightspeed-mcp, but the actual full image paths differ:

    • ghcr.io/redhatinsights/red-hat-lightspeed-mcp:latest (src/lightspeed_agent/tools/mcp_config.py:16)
    • quay.io/redhat-services-prod/insights-management-tenant/insights-mcp/red-hat-lightspeed-mcp:latest (deploy/podman/lightspeed-agent-pod.yaml:147)

    The abbreviated references are ambiguous and may lead operators to pull the wrong image.

  4. Missing console exporter type for OpenTelemetry
    The Observability table lists only Jaeger and Zipkin as alternatives to the OTLP collector. The code also supports console, otlp, and otlp-http exporter types (src/lightspeed_agent/config/settings.py:290). This is a minor completeness gap but worth noting since console is useful for debugging.

  5. Google JWT endpoint description is misleading
    The document says the endpoint "Fetches public keys to validate DCR software_statement JWTs from cloud-agentspace@system.gserviceaccount.com". In the code (src/lightspeed_agent/dcr/google_jwt.py:20-23), the full URL https://www.googleapis.com/service_accounts/v1/metadata/x509/cloud-agentspace@system.gserviceaccount.com is used both as the certificate fetch URL AND as the expected JWT issuer (GOOGLE_DCR_ISSUER). The doc only describes the fetch purpose but omits that this same URL is verified as the iss claim in the JWT (line 247), which is an important security detail.

Suggestions

  1. Add a "Managed GCP Infrastructure" section to capture Cloud SQL, Secret Manager, Cloud Scheduler, Cloud Logging, Cloud Monitoring, and Serverless VPC Access. These are distinct from the "application-level" services but are required for production operation.

  2. Include full container image paths in the Container Registries table rather than just the image short name. Operators need the exact pull path.

  3. Add OTEL_EXPORTER_OTLP_HTTP_ENDPOINT to the observability table -- it is a separate config variable from OTEL_EXPORTER_OTLP_ENDPOINT and is used when otel_exporter_type=otlp-http (settings.py:287-289).

  4. Clarify the Pub/Sub topic name: The document states Topic: marketplace-entitlements which matches docs/marketplace.md:287, but this is a user-chosen name during setup, not a fixed system requirement. Stating it as a fixed value could be misleading.

  5. Consider listing Keycloak test server (quay.io/keycloak/keycloak:26.0 referenced in scripts/test_dcr.py:122) in the container registries table if the goal is a complete inventory.

What's Done Well

  • The overall structure is clear and well-organized, with logical grouping by service category.
  • The required/optional distinction per service is helpful for understanding deployment flexibility.
  • The "Key Architectural Notes" section at the bottom provides useful context about the two-service architecture and development mode flexibility.
  • The LLM Models section accurately captures both access paths (AI Studio vs. Vertex AI) with correct config variables and defaults, verified against settings.py:22-41.
  • The Procurement API base URL https://cloudcommerceprocurement.googleapis.com/v1 is correct per src/lightspeed_agent/marketplace/service.py:33.
  • The Red Hat SSO default issuer URL is correct per settings.py:44-47.
  • Database configuration (PostgreSQL + SQLite fallback, two separate databases) is accurately described and matches settings.py:189-214.
  • Redis rate limiting config variables and default URL are correct per settings.py:137-139.

Verdict: Request Changes

The document contains two critical inaccuracies (fabricated LIGHTSPEED_CLIENT_* config variables and wrong Redis image), multiple missing GCP services that are production-required, and the Artifact Registry naming error. These issues would mislead operators attempting to use this document as an authoritative inventory. The foundation is solid and the structure is good, but the factual errors need to be corrected before merging.

Review-Authored-By Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copy link
Copy Markdown
Collaborator

@luis5tb luis5tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just couple of nits

yuvalk and others added 8 commits March 27, 2026 13:13
Catalog all external service dependencies including Google Cloud
services, Red Hat services, databases, caching, and observability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace fabricated LIGHTSPEED_CLIENT_ID/LIGHTSPEED_CLIENT_SECRET with
the actual config used for Insights API interaction. The agent accesses
Insights APIs via the MCP server, forwarding the user's JWT token
through MCP headers. The key config is MCP_SERVER_URL and the JWT
forwarding mechanism in mcp_headers.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The actual image used is quay.io/fedora/redis-7 (per redis-pod.yaml),
not docker.io/redis:7-alpine. Also note that production uses Cloud
Memorystore (managed Redis).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gcr.io is Google Container Registry (GCR), not Artifact Registry.
Artifact Registry uses REGION-docker.pkg.dev URLs. The codebase
consistently uses gcr.io.
Add Cloud SQL, Secret Manager, Cloud Scheduler, Cloud Logging,
Cloud Monitoring, and Serverless VPC Access. These are all
production-required infrastructure services configured in setup.sh.
Replace abbreviated image names with full pull paths to avoid
ambiguity. Paths verified against mcp_config.py and
lightspeed-agent-pod.yaml.
The Google x509 certificate URL is both the certificate fetch
endpoint AND the expected JWT issuer (iss claim). This is an
important security detail for DCR validation.
@yuvalk yuvalk force-pushed the external-services branch from 62b18b7 to dc2665b Compare March 27, 2026 13:14
yuvalk and others added 4 commits March 27, 2026 13:15
Address reviewer comment that Vertex AI can deploy models other than
Gemini, not just serve as an alternative access path for Gemini.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address reviewer comment to use "Lightspeed APIs" instead of
"Insights APIs" for console.redhat.com and MCP Server descriptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address reviewer comment that Memorystore (managed Redis) and Cloud
Storage (agent card hosting) were missing from the GCP infrastructure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address reviewer comment: MCP images are pushed to quay.io and then
uploaded to gcr.io (Google Container Registry) for Cloud Run consumption,
not ghcr.io (GitHub Container Registry).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants