Conversation
…e case when the node restarts
|
/run-security-scan |
alexcos20
left a comment
There was a problem hiding this comment.
AI automated code review (Gemini 3).
Overall risk: medium
Summary:
This pull request introduces CPU affinity capabilities to the Docker-based C2D compute engine. It allows pinning compute jobs to specific physical CPU cores, manages CPU core allocation and deallocation, and supports multi-cluster configurations by offsetting CPU core indices. The changes include adding cpuOffset to the engine constructor, calculating available environment CPU cores, integrating CpusetCpus into Docker container creation, and implementing mechanisms to rebuild CPU allocations on node restart and release cores upon job cleanup. This is a significant enhancement for resource management and potentially performance for CPU-bound tasks.
Comments:
• [WARNING][style] The // eslint-disable-next-line require-await comment on this line seems misplaced, as parseCpusetString is a synchronous function and does not need this directive. Please remove it.
• [INFO][other] The TODO comment /* TODO - get namedresources & discreete one is still present. Consider addressing it in this PR or creating a follow-up task if it's not critical for immediate functionality.
• [INFO][other] The addition of await this.rebuildCpuAllocations() during initialization is a crucial improvement for robustness. It correctly handles node restarts by restoring CPU allocations for already running containers, which is excellent for maintaining consistent resource management.
• [WARNING][performance] In allocateCpus, if freeCores.length < count, the method logs a warning and returns null, meaning the container will proceed without CPU affinity (i.e., Docker will schedule it on any available CPU cycles). Is this the desired fallback behavior? For applications where strict CPU isolation is critical or resources are highly contended, it might be preferable to throw an error, fail the job, or implement a queuing/retry mechanism. Please confirm if this graceful degradation is acceptable or if a stricter approach is needed based on business requirements.
• [WARNING][other] The cpuOffset logic in C2DEngines (specifically cpuOffset += cpuRes.total) relies heavily on cluster.connection.resources.find((r: any) => r.id === 'cpu') and cpuRes?.total to accurately sum up the total CPUs for each cluster. This implies that these configurations must be perfectly accurate and that physical CPU indices can be assumed to be contiguous across multiple Docker engines/clusters when cpuOffset is applied. Please ensure documentation or configuration guidelines clearly explain this dependency for multi-cluster setups to prevent misconfigurations leading to CPU allocation overlaps or inefficiencies.
Fixes #1279 .
Changes proposed in this PR: