fix: pass all arguments through gradient checkpoint in BasicTransformerBlock by Mr-Neutr0n · Pull Request #474 · Stability-AI/generative-models

Mr-Neutr0n · 2026-02-11T14:21:52Z

Summary

BasicTransformerBlock.forward() silently drops additional_tokens and n_times_crossframe_attn_in_self when gradient checkpointing is enabled. The checkpoint() call only forwards x and context, causing the other two arguments to revert to their defaults (None and 0):

# Before (bug)
return checkpoint(self._forward, x, context)

This means that during training with checkpointing enabled:

additional_tokens is silently ignored (defaults to None), so any extra conditioning tokens passed to self-attention and cross-attention are lost.
n_times_crossframe_attn_in_self is silently ignored (defaults to 0), so cross-frame attention repeat behavior in self-attention is disabled.

The non-checkpointed path (self._forward(**kwargs)) correctly passes all arguments, so this bug only manifests when checkpoint=True — the default setting — making it easy to miss.

Fix

Pass all four arguments through checkpoint():

# After (fix)
return checkpoint(self._forward, x, context, additional_tokens, n_times_crossframe_attn_in_self)

This ensures _forward receives the same arguments regardless of whether gradient checkpointing is enabled.

…gh checkpoint

fix: pass additional_tokens and n_times_crossframe_attn_in_self throu…

4659f8e

…gh checkpoint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pass all arguments through gradient checkpoint in BasicTransformerBlock#474

fix: pass all arguments through gradient checkpoint in BasicTransformerBlock#474
Mr-Neutr0n wants to merge 1 commit intoStability-AI:mainfrom
Mr-Neutr0n:fix/checkpoint-pass-all-args

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mr-Neutr0n commented Feb 11, 2026

Summary

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant