Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/post_training/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ class SlurmConfig:
"""SLURM job scheduler parameters."""

partition: str = "gpu"
account: str | None = "OELLM_prod2026"
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the SLURM account is optional, but the new config default sets account to a concrete value ("OELLM_prod2026"). This makes the option effectively enabled by default (once the renderer passes it through) and also hard-codes an environment-specific account into the library default. Consider defaulting to None and letting users set it via YAML/CLI overrides when needed.

Suggested change
account: str | None = "OELLM_prod2026"
account: str | None = None

Copilot uses AI. Check for mistakes.
num_nodes: int = 1
gpus_per_node: int = 4
cpus_per_task: int = 32
Expand Down
3 changes: 2 additions & 1 deletion src/post_training/slurm/job.sh.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@

#SBATCH --job-name={{ job_name }}
#SBATCH --partition={{ partition }}
#SBATCH --nodes={{ num_nodes }}
{% if account %}#SBATCH --account={{ account }}
{% endif %}#SBATCH --nodes={{ num_nodes }}
Comment on lines +9 to +10
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

account is referenced in the template but is not currently provided in the template.render(...) context in src/post_training/slurm/launcher.py (render_*_slurm_script functions). As a result the --account directive will never be emitted (Jinja treats missing vars as Undefined/falsy by default), so this feature won’t work as intended. Pass account=config.slurm.account into the render context for each script renderer (and consider keeping the {% if %} / #SBATCH directives on separate lines for consistency with the other conditional blocks in this template).

Suggested change
{% if account %}#SBATCH --account={{ account }}
{% endif %}#SBATCH --nodes={{ num_nodes }}
{% if account %}
#SBATCH --account={{ account }}
{% endif %}
#SBATCH --nodes={{ num_nodes }}

Copilot uses AI. Check for mistakes.
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:{{ gpus_per_node }}
#SBATCH --cpus-per-task={{ cpus_per_task }}
Expand Down
3 changes: 2 additions & 1 deletion src/post_training/slurm/job_llamafactory.sh.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@

#SBATCH --job-name={{ job_name }}
#SBATCH --partition={{ partition }}
#SBATCH --nodes={{ num_nodes }}
{% if account %}#SBATCH --account={{ account }}
{% endif %}#SBATCH --nodes={{ num_nodes }}
Comment on lines +9 to +10
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

account is referenced in the template but is not currently provided in the template.render(...) context in src/post_training/slurm/launcher.py. Without passing account=config.slurm.account, the #SBATCH --account directive will never be written into the generated script for this backend.

Suggested change
{% if account %}#SBATCH --account={{ account }}
{% endif %}#SBATCH --nodes={{ num_nodes }}
{% if account is defined and account %}
#SBATCH --account={{ account }}
{% endif %}
#SBATCH --nodes={{ num_nodes }}

Copilot uses AI. Check for mistakes.
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:{{ gpus_per_node }}
{% if cpus_per_gpu is integer and cpus_per_gpu > 0 -%}
Expand Down
3 changes: 2 additions & 1 deletion src/post_training/slurm/job_trl_container.sh.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@

#SBATCH --job-name={{ job_name }}
#SBATCH --partition={{ partition }}
#SBATCH --nodes={{ num_nodes }}
{% if account %}#SBATCH --account={{ account }}
{% endif %}#SBATCH --nodes={{ num_nodes }}
Comment on lines +9 to +10
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

account is referenced in the template but is not currently provided in the template.render(...) context in src/post_training/slurm/launcher.py. This makes the {% if account %} block effectively dead code, so containerized TRL jobs will never set #SBATCH --account even when configured. Ensure account=config.slurm.account is passed when rendering this template.

Suggested change
{% if account %}#SBATCH --account={{ account }}
{% endif %}#SBATCH --nodes={{ num_nodes }}
{% if account is defined and account %}
#SBATCH --account={{ account }}
{% endif %}
#SBATCH --nodes={{ num_nodes }}

Copilot uses AI. Check for mistakes.
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:{{ gpus_per_node }}
{% if cpus_per_gpu is integer and cpus_per_gpu > 0 -%}
Expand Down
Loading