Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…o nemotron3-tiny-tests
There was a problem hiding this comment.
Thanks!
The CI is red:
FAILED tests/test_dpo_trainer.py::TestDPOTrainer::test_train[trl-internal-testing/tiny-NemotronHForCausalLM] - RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8
FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train[trl-internal-testing/tiny-NemotronHForCausalLM] - RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8
qgallouedec
left a comment
There was a problem hiding this comment.
thanks!! just a few comments
| use_mamba_kernels=False, # CPU-friendly for testing | ||
| ) | ||
| model = NemotronHForCausalLM(config).to(dtype=torch.bfloat16) | ||
| init_weights_tiny_model(model) |
There was a problem hiding this comment.
Can you cast backbone.layers.[N].mixer.D and backbone.layers.[N].mixer.A_log to fp32?: it seems like these two layers are in fp32, and we want to be as close as possible to the reference model
check how we do here for Qwen3.5 https://github.com/huggingface/trl/pull/5278/changes#diff-dd3349f840a26de373fc88378e6fcded0b75423da8a34f7cfa6ac573b7398b8bL404
| kwargs = {} | ||
| if "NemotronH" in model_id: | ||
| kwargs["gradient_checkpointing"] = False | ||
| kwargs["use_cpu"] = True |
There was a problem hiding this comment.
really not sure about this. we don't train on cpu, so why testing it + we wouldn't know it a gpu-specific issue is introduced
is it possible that this error originates from what params are used to build the model? |
|
Fixes applied. There is an issue with some dependencies that needs to be addressed in transformers. |
|
Hi @sergiopaniego, is there any update on this or the corresponding upstream PR? |
|
Upstream PR has just been merged! |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0423cdb. Configure here.
yes! It could be approved and merged if tests are green 😄 |
|
failed test is unrelated |
# Conflicts: # tests/test_data_utils.py

What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
@qgallouedec @albertvillanova
Note
Low Risk
Low risk: changes are limited to test coverage and the tiny-model generation script, with runtime guarded by
transformers>=5.3.0skip conditions.Overview
Adds generation of a tiny NVIDIA Nemotron 3 hybrid Mamba/Attention
NemotronHForCausalLMinscripts/generate_tiny_models.py, including a small config, CPU-friendly settings, and explicit float32 casting for Mamba mixer parameters (D,A_log).Extends test parameter matrices to run existing chat-template/data-utils,
DPOTrainer, andSFTTrainertraining smoke tests againsttrl-internal-testing/tiny-NemotronHForCausalLM, gated behind atransformers>=5.3.0skip.Reviewed by Cursor Bugbot for commit cd65331. Bugbot is set up for automated code reviews on this repo. Configure here.