Skip to content

[quantization] Add model_args to PTQConfig for passing model-specific execution metadata #560

@mhs4670go

Description

@mhs4670go

Motivation

We are adding support for multimodal models such as Qwen-VL, which require additional model-specific metadata during quantization and export.

For example, Qwen-VL needs vision_grid_thw to construct position embeddings statically. This information is not part of quantization policy, but rather execution-time metadata required by certain wrappers.

Currently, PTQConfig only supports:

  • global defaults (dtype, observer, qscheme)
  • per-scope overrides (observer configuration)

There is no clean way to pass such model-specific metadata through the existing
configuration system.

Problem

We need a mechanism to provide additional inputs (e.g., vision grid shape)
to wrappers without:

  • polluting HuggingFace model config
  • modifying wrapper constructor signatures
  • abusing overrides for non-quantization purposes

Proposal

Introduce a new field in PTQConfig:

model_args: Mapping[str, Any] = field(default_factory=dict)

This field is intended to store model-specific execution metadata required by wrappers.

Example Usage

cfg = PTQConfig(
    default_dtype=DType.int(16),
    default_qscheme=QScheme.PER_TENSOR_SYMM,
    wrapper_variant="prefill",
    model_args={
        "vision": {
            "grid_thw": (1, 36, 36),
        }
    },
)

In wrapper:

vision_args = qcfg.get_model_arg("vision", {}) if qcfg else {}
vision_grid_thw = vision_args.get("grid_thw")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions