Issue with gradient accumulation and expansion_factor_real_data>1

### Bug report

`expansion_factor_real_data` > 1 generates placeholder data and then truncates the inputs in `loss_fn` to the correct size, which is supposed to eliminate the placeholder data.  In gradient accumulation, from `train_step` we reshape the inputs to introduce a `gradient_accumulation_steps` dimension.

If we first truncated then reshaped, this would work correctly.  However, we reshape then truncate, which means later gradient accumulation steps use the placeholder data.

I believe `max_checkify` does not catch this issue because it happens too early in the process.

(Internally we're on an older fork of this codebase, so I apologize if this has been fixed already.  I looked through the relevant code and it looked like it would have the same issue)

### Logs/Output

_No response_

### Environment Information

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with gradient accumulation and expansion_factor_real_data>1 #3381

Bug report

Logs/Output

Environment Information

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with gradient accumulation and expansion_factor_real_data>1 #3381

Description

Bug report

Logs/Output

Environment Information

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions