Skip to content

[PyTorch FE] Add 2-bit (u2) weight decompression subgraph support#34542

Draft
ljaljushkin wants to merge 2 commits intoopenvinotoolkit:masterfrom
ljaljushkin:nl/torch_fe_2bit_subgraph_support
Draft

[PyTorch FE] Add 2-bit (u2) weight decompression subgraph support#34542
ljaljushkin wants to merge 2 commits intoopenvinotoolkit:masterfrom
ljaljushkin:nl/torch_fe_2bit_subgraph_support

Conversation

@ljaljushkin
Copy link
Copy Markdown
Contributor

@ljaljushkin ljaljushkin commented Mar 6, 2026

Details:

  • Added u2_compression_stack pattern matcher in utils_quantize.cpp that detects the NNCF
    unpack_uint2 subgraph: aten::stack([bitwise_and(packed, 3), bitwise_and(bitwise_right_shift(packed, 2), 3), bitwise_and(bitwise_right_shift(packed, 4), 3), bitwise_and(bitwise_right_shift(packed, 6), 3)], dim=-1)
    and replaces it with a single element::u2 constant.
  • Added U2ConvertReshape transformation pass that folds Reshape on u2 constants
    into the constant itself (analogous to U4ConvertReshape for u4).
  • Added MarkCompressedWeightConstants transformation pass that marks Convert nodes
    consuming u2/u4/i4 constants with disable_constant_folding and mark_as_decompression
    to prevent MOC from expanding compressed weight constants.
  • Integrated u2 pattern detection in both translate_stack (TorchScript) and
    translate_stack_fx (torch.compile) code paths.
  • Added u2 → u8 type promotion in CPU plugin's transformation_pipeline.cpp.
  • No need for a standalone aten::bitwise_right_shift op converter — the pattern matcher
    consumes the entire unpack subgraph at the aten::stack level.

This enables export of NNCF models with INT2SymmetricWeightsDecompressor to OpenVINO IR,
following the same approach used for INT4 decompression patterns (PR #27048).

Tickets:

AI Assistance:

  • AI assistance used: yes
  • *manually converted u2/u4 qwen/qwen3-4b and evaluated in openvino

lm_eval
--model openvino
--model_args pretrained=$IR_DIR
--device cpu
--tasks lambada_openai

model quant_config group_size Lambada, acc Lambada ppl Lambada OV acc Lambada OV ppl
Qwen/Qwen3-4B FQ_LORA (avg 3bit - mix of 2/4 bit) 128/64 0.5604 9.6826 0.5572 9.6666

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin category: PyTorch FE OpenVINO PyTorch Frontend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant