Skip to content

set cpu pinning for envs#1289

Open
denisiuriet wants to merge 1 commit intomainfrom
cpu-affinity
Open

set cpu pinning for envs#1289
denisiuriet wants to merge 1 commit intomainfrom
cpu-affinity

Conversation

@denisiuriet
Copy link
Contributor

@denisiuriet denisiuriet commented Mar 24, 2026

Fixes #1279 .

Changes proposed in this PR:

  • Pin each Docker compute container to dedicated physical CPU cores using CpusetCpus, preventing jobs from competing for the same cores
  • Each compute environment gets its own range of cores via a cpuOffset, so multiple environments don't overlap
  • On node restart, rebuild the CPU allocation map by inspecting running containers (rebuildCpuAllocations)
  • Release pinned cores back to the pool when a job finishes (cleanupJob → releaseCpus)
  • When not enough free cores are available, the job proceeds without pinning

@denisiuriet denisiuriet changed the title set cpu pinning for envs, release cpu once the job is done, handle th… set cpu pinning for envs Mar 25, 2026
@denisiuriet denisiuriet marked this pull request as ready for review March 25, 2026 08:47
@alexcos20
Copy link
Member

/run-security-scan

Copy link
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces CPU affinity capabilities to the Docker-based C2D compute engine. It allows pinning compute jobs to specific physical CPU cores, manages CPU core allocation and deallocation, and supports multi-cluster configurations by offsetting CPU core indices. The changes include adding cpuOffset to the engine constructor, calculating available environment CPU cores, integrating CpusetCpus into Docker container creation, and implementing mechanisms to rebuild CPU allocations on node restart and release cores upon job cleanup. This is a significant enhancement for resource management and potentially performance for CPU-bound tasks.

Comments:
• [WARNING][style] The // eslint-disable-next-line require-await comment on this line seems misplaced, as parseCpusetString is a synchronous function and does not need this directive. Please remove it.
• [INFO][other] The TODO comment /* TODO - get namedresources & discreete one is still present. Consider addressing it in this PR or creating a follow-up task if it's not critical for immediate functionality.
• [INFO][other] The addition of await this.rebuildCpuAllocations() during initialization is a crucial improvement for robustness. It correctly handles node restarts by restoring CPU allocations for already running containers, which is excellent for maintaining consistent resource management.
• [WARNING][performance] In allocateCpus, if freeCores.length < count, the method logs a warning and returns null, meaning the container will proceed without CPU affinity (i.e., Docker will schedule it on any available CPU cycles). Is this the desired fallback behavior? For applications where strict CPU isolation is critical or resources are highly contended, it might be preferable to throw an error, fail the job, or implement a queuing/retry mechanism. Please confirm if this graceful degradation is acceptable or if a stricter approach is needed based on business requirements.
• [WARNING][other] The cpuOffset logic in C2DEngines (specifically cpuOffset += cpuRes.total) relies heavily on cluster.connection.resources.find((r: any) => r.id === 'cpu') and cpuRes?.total to accurately sum up the total CPUs for each cluster. This implies that these configurations must be perfectly accurate and that physical CPU indices can be assumed to be contiguous across multiple Docker engines/clusters when cpuOffset is applied. Please ensure documentation or configuration guidelines clearly explain this dependency for multi-cluster setups to prevent misconfigurations leading to CPU allocation overlaps or inefficiencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cpu afinty on docker containers

2 participants