C2D: build custom images updates by alexcos20 · Pull Request #1297 · oceanprotocol/ocean-node

alexcos20 · 2026-03-26T12:04:46Z

Closes #1294

Summary

This PR tightens Docker compute job accounting and image-build behavior:

Free jobs: building images from Dockerfiles is not allowed.
Paid jobs: image build time is included in billable duration and counts toward maxJobDuration (same limit applies to build + run).
Image build robustness: after the build stream completes, the engine now verifies the image tag exists via docker.getImage(...).inspect(); if missing, the job is marked BuildImageFailed.
Resource tracking: running-time calculations prefer build start time when valid.
CPU affinity: allocateCpus() now reuses an existing allocation for a jobId rather than allocating new cores.

Implementation notes

Added buildStartTimestamp / buildStopTimestamp to DBComputeJob and persisted them to sqlite internal blob.
buildImage() now records build start/stop timestamps and uses per-job CPU/RAM build options.
Payment claim duration now uses (algoDuration + buildDuration) when build timestamps are valid (start > 0, stop >= start, numeric), and still applies maxJobDuration and minJobDuration rules.

alexcos20 · 2026-03-26T12:05:40Z

/run-security-scan

alexcos20

AI automated code review (Gemini 3).

Overall risk: high

Summary:
This pull request introduces significant changes to the compute-to-data (C2D) image building process, allowing image builds to consume job resources and duration. Key enhancements include tracking build start/stop timestamps, incorporating build time into the overall job duration, enforcing build time limits, and dynamically allocating resources (CPU, RAM) for the build process. A new unit test for image building has been added. However, there is a critical security concern regarding the commented-out authentication check for free compute jobs.

Comments:
• [INFO][style] The phrase "most likley are very limited" could be rephrased for better grammar and formality, e.g., "are likely very limited" or "are typically very limited."
• [INFO][performance] The CPU affinity reuse logic (const existing = this.cpuAllocations.get(jobId)) is a good optimization to avoid re-calculating CPU sets if already allocated for a job. This is a nice addition, although somewhat independent of the primary build image changes.
• [WARNING][performance] Changing timeoutMs to job.maxJobDuration * 1000 means the image build process can now consume the entire allocated job duration. While this allows for more complex builds, it implicitly reduces the time available for the actual algorithm computation. If a job has a maxJobDuration of 60 seconds and the build takes 59 seconds, only 1 second remains for the algorithm. This is handled by the check on line 1606, but it's important to consider if this is the desired default behavior for all paid jobs, especially smaller ones. Perhaps a minimum algorithm duration guarantee or a separate maxBuildDuration might be considered in the future.
• [INFO][other] Adding the docker.getImage(job.containerImage).inspect() check after the build stream completes is a robust improvement. It ensures that even if the Docker build stream reports success, the image genuinely exists and can be used, preventing potential downstream issues.
• [ERROR][security] CRITICAL SECURITY CONCERN: The validateTokenOrSignature call for FreeComputeStartHandler has been commented out. This bypasses the authentication and authorization check for all free compute jobs. This means anyone could potentially request free compute jobs without any validation of their identity or nonce, which is a significant security vulnerability. This change needs immediate justification, or the code should be restored. Free jobs, even without payment, still require a validated consumerAddress to prevent abuse and track usage correctly. If this is intentional, it should be clearly documented as a major design decision with associated risks, but ideally, authentication should remain for all job types.
• [INFO][other] This new unit test file is a great addition, providing focused testing for the buildImage functionality and improving test coverage for these critical new features.

alexcos20 · 2026-03-26T12:09:22Z

[ERROR][security] CRITICAL SECURITY CONCERN:

fix in 11806ae

alexcos20 · 2026-03-27T09:31:15Z

/run-security-scan

alexcos20

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces significant enhancements to the C2D compute engine, primarily focusing on allowing and properly accounting for Docker image builds within compute jobs. It distinguishes between free and paid environments, ensuring that image building is only permitted in environments explicitly configured for it and that build time is accurately factored into job duration and resource consumption. The changes span documentation, type definitions, core compute engine logic, database persistence, configuration schemas, and include a dedicated unit test for the image build process's robustness.

Comments:
• [INFO][other] This change correctly incorporates the build duration into the overall job duration, which is crucial for accurate billing and job lifecycle management. This is a core part of the feature.
• [INFO][security] This allowImageBuild check is important. It prevents users from circumventing free environment restrictions by submitting Dockerfiles for image building where it's not allowed, which could lead to resource abuse. Good enforcement of policies.
• [INFO][performance] This new logic ensures that if the image build process itself consumes the entire maxJobDuration, the job is terminated early and marked as finished. This prevents jobs from running indefinitely on the build phase if it's too long, respecting the allocated time budget. This is a good failure mode handling.
• [INFO][other] Adjusting the job expiry to account for build duration ensures that the actual algorithm execution time adheres to the maxJobDuration minus the time spent building the image. This is vital for correct resource accounting and charging.
• [INFO][performance] Reusing existing CPU allocations for a job is a nice optimization. It avoids unnecessary re-allocations and ensures CPU affinity if a job's CPU request changes or is re-evaluated during its lifecycle, though for this specific change it's more about preventing re-allocation if the function is called multiple times for the same job. This improves efficiency and predictability.
• [INFO][other] Setting the timeoutMs for buildImage to job.maxJobDuration * 1000 means the image build itself is constrained by the overall job duration. This aligns the build phase with the job's resource budget, which is a key part of this feature.
• [INFO][performance] Dynamically setting Docker image build options (memory, cpuquota) based on the job's requested resources is a significant improvement. This ensures the build process itself operates within the allocated resource limits, preventing resource exhaustion on the node during image builds and reflecting the actual cost implications.
• [WARNING][bug] This added check after the build stream completes, which inspects the Docker image, is a crucial robustness improvement. The buildStream.emit('end') only indicates that the Docker daemon finished processing the stream, not necessarily that an image was successfully created and is available. Explicitly checking for the image prevents silent failures and ensures the job transitions correctly. Without this, a build could 'complete' but yield no usable image, leading to subsequent job failures.

Could we add a test case where the build stream succeeds but docker.getImage().inspect() fails? Oh, wait, the buildImage.test.ts already covers this with getImage: sinon.stub().returns({ inspect: sinon.stub().rejects(new Error('no such image')) }). Excellent!
• [INFO][other] The addition of a new dedicated unit test file for buildImage is excellent. It demonstrates good test coverage for the critical new functionality and robustness checks introduced in this PR. The tests specifically target the new docker.getImage().inspect() logic after a build, which is a key failure point.
• [INFO][other] Adding allowImageBuild to the schema with a default of false is a good practice. It ensures that the feature is opt-in and provides a safe default, preventing unexpected behavior on existing installations.

build images updates

146870a

alexcos20 commented Mar 26, 2026

View reviewed changes

fix typo

11806ae

alexcos20 marked this pull request as ready for review March 26, 2026 12:09

alexcos20 requested review from andreip136, bogdanfazakas, giurgiur99 and ndrpp as code owners March 26, 2026 12:09

alexcos20 self-assigned this Mar 26, 2026

alexcos20 changed the title ~~build images updates~~ C2D: build custom images updates Mar 26, 2026

make allowImageBuild configurable for free envs

3750cbe

alexcos20 commented Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C2D: build custom images updates#1297

C2D: build custom images updates#1297
alexcos20 wants to merge 3 commits intomainfrom
feature/c2d_build_images_changes

alexcos20 commented Mar 26, 2026 •

edited

Loading

Uh oh!

alexcos20 commented Mar 26, 2026

Uh oh!

alexcos20 left a comment

Uh oh!

alexcos20 commented Mar 26, 2026

Uh oh!

alexcos20 commented Mar 27, 2026

Uh oh!

alexcos20 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexcos20 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation notes

Uh oh!

alexcos20 commented Mar 26, 2026

Uh oh!

alexcos20 left a comment

Choose a reason for hiding this comment

Uh oh!

alexcos20 commented Mar 26, 2026

Uh oh!

alexcos20 commented Mar 27, 2026

Uh oh!

alexcos20 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alexcos20 commented Mar 26, 2026 •

edited

Loading