-
Notifications
You must be signed in to change notification settings - Fork 273
update qwen quant #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ajrasane/qwen3-omni-30B
Are you sure you want to change the base?
update qwen quant #880
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| # Note: thinker_max_new_tokens controls the thinker's generation limit (default 1024), | ||
| # which is separate from max_new_tokens. Cap it to avoid long waits. | ||
| result = full_model.generate( | ||
| **calib_batch, max_new_tokens=100, thinker_max_new_tokens=100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated here
| result = full_model.generate(**calib_batch, max_new_tokens=100) | ||
| print("[DEBUG] pre_quantize: starting qwen3omni preview generation (max_new_tokens=100)...", flush=True) | ||
| result = full_model.generate( | ||
| **calib_batch, max_new_tokens=100, thinker_max_new_tokens=100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated here
| # For Qwen3-Omni Thinking models, the thinker's token limit is controlled by | ||
| # a separate `thinker_max_new_tokens` param (default 1024), not `max_new_tokens`. | ||
| # Cap it to avoid unbounded chain-of-thought generation during calibration. | ||
| if "qwen3omni" in model.__class__.__name__.lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated here
Qwen3-Omni Thinking models have a separate
thinker_max_new_tokensparameter (default value =1024) that is independent of max_new_tokens.During calibration, setting max_new_tokens=1 only limits the talker — the thinker still generates up to 1024 tokens per sample, causing a ~500x slowdown that makes calibration extremely slow