Motivation
We are adding support for multimodal models such as Qwen-VL, which require additional model-specific metadata during quantization and export.
For example, Qwen-VL needs vision_grid_thw to construct position embeddings statically. This information is not part of quantization policy, but rather execution-time metadata required by certain wrappers.
Currently, PTQConfig only supports:
- global defaults (dtype, observer, qscheme)
- per-scope overrides (observer configuration)
There is no clean way to pass such model-specific metadata through the existing
configuration system.
Problem
We need a mechanism to provide additional inputs (e.g., vision grid shape)
to wrappers without:
- polluting HuggingFace model config
- modifying wrapper constructor signatures
- abusing
overrides for non-quantization purposes
Proposal
Introduce a new field in PTQConfig:
model_args: Mapping[str, Any] = field(default_factory=dict)
This field is intended to store model-specific execution metadata required by wrappers.
Example Usage
cfg = PTQConfig(
default_dtype=DType.int(16),
default_qscheme=QScheme.PER_TENSOR_SYMM,
wrapper_variant="prefill",
model_args={
"vision": {
"grid_thw": (1, 36, 36),
}
},
)
In wrapper:
vision_args = qcfg.get_model_arg("vision", {}) if qcfg else {}
vision_grid_thw = vision_args.get("grid_thw")
Motivation
We are adding support for multimodal models such as Qwen-VL, which require additional model-specific metadata during quantization and export.
For example, Qwen-VL needs
vision_grid_thwto construct position embeddings statically. This information is not part of quantization policy, but rather execution-time metadata required by certain wrappers.Currently, PTQConfig only supports:
There is no clean way to pass such model-specific metadata through the existing
configuration system.
Problem
We need a mechanism to provide additional inputs (e.g., vision grid shape)
to wrappers without:
overridesfor non-quantization purposesProposal
Introduce a new field in
PTQConfig:This field is intended to store model-specific execution metadata required by wrappers.
Example Usage
In wrapper: