Skip to content

Add weekly forward compatibility testing#1884

Open
ArangoGutierrez wants to merge 11 commits intoNVIDIA:mainfrom
ArangoGutierrez:integration-tests
Open

Add weekly forward compatibility testing#1884
ArangoGutierrez wants to merge 11 commits intoNVIDIA:mainfrom
ArangoGutierrez:integration-tests

Conversation

@ArangoGutierrez
Copy link
Copy Markdown
Collaborator

@ArangoGutierrez ArangoGutierrez commented Nov 10, 2025

Implement automated forward compatibility tests that validate GPU Operator
against the latest published images from NVIDIA component repositories.

Changes

Workflow Refactoring:

  • Refactor monolithic ci.yaml into reusable workflow modules (variables.yaml, golang-checks.yaml, config-checks.yaml, image-builds.yaml, e2e-tests.yaml, release.yaml)

Forward Compatibility Testing:

  • Add forward-compatibility.yaml workflow (weekly + manual trigger)
  • Create get-latest-images.sh to fetch latest commit-based images with retry/backoff
  • Create generate-values-overrides.sh to produce Helm values override file
  • Use artifact-based values override approach (per @cdesiniotis review)

E2E Test Enhancements:

  • Extend e2e-tests.yaml with optional use_values_override input
  • Create env-to-values.sh to convert env vars to Helm values file
  • Update install-operator.sh with proper YAML merging via yq
  • Handle vGPU options correctly with values file approach
  • Pass VALUES_FILE to remote test instances (per @rajathagasthya review)

Notifications:

  • Add Slack notifications for test failures (JSON payload, configurable mentions)
  • Guard notification step against missing secrets

Components Tested

  • ghcr.io/nvidia/container-toolkit
  • ghcr.io/nvidia/k8s-device-plugin
  • ghcr.io/nvidia/k8s-mig-manager

We could add other operands later, but I wanted to start with the core ones.

Schedule: Every Monday at 2 AM UTC

Review Feedback Addressed

  • Use values override file instead of individual input vars (@cdesiniotis)
  • Move operator_image/operator_version to variables workflow (@cdesiniotis)
  • Remove unnecessary workflow inputs from forward-compatibility (@cdesiniotis)
  • Remove year from copyright headers (@rajathagasthya)
  • Pass VALUES_FILE to remote test instance (@rajathagasthya)
  • Fix Slack JSON payload format
  • Use yq for proper YAML merging
  • Handle vGPU with values file approach
  • Add retry with exponential backoff for image verification
  • Remove push trigger from e2e-tests (secrets not inherited)
  • Restore golangci-lint-action to v9
  • Write SSH key before SCP step
  • Restore Holodeck to v0.2.18
  • Remove unused operator_version input from release workflow
  • Guard Slack notification against missing secrets

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants