Releases · vlora-dev/vlora

Highlights

First-class QLoRA support — vLoRA now integrates with QLoRA workflows for maximum compression. QLoRA compresses the base model (FP16 → NF4), vLoRA compresses the adapter space — these stack multiplicatively.

New Features

NF4 quantization — subspace.quantize(method="nf4") uses QLoRA's 4-bit NormalFloat data type with per-block absmax scaling
Double quantization — quantize the NF4 block scales to FP8 via double_quant=True
NF4 packed storage — save_quantized() packs to uint8 for ~7× disk savings; load() auto-detects format
QLoRA-aware VLoRAModel — compute_dtype for mixed-precision, qlora_info for base model introspection
full_stack_compression() — combined base model + adapter compression reporting
Layer shapes stored in metadata, __repr__ on core objects, adaptive_k preserved through absorb

Bug Fixes

absorb_incremental re-projection — existing tasks now properly re-projected when basis rotates
VLoRACallback was a no-op — now uses differentiable hooks + steps optimizer
TIES merge normalization — fixed over-scaling when elements are trimmed
7 additional correctness and robustness fixes (see CHANGELOG.md)

Performance

gram_schmidt → QR factorization
Module handle caching in VLoRAModel
NF4 uses torch.bucketize (O(N) memory vs O(N×16))

197 tests (196 passed, 5 skipped without transformers)

Full changelog: CHANGELOG.md

pip install vlora-dev==0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

New Features

Bug Fixes

Performance

Uh oh!

Releases: vlora-dev/vlora

v0.3.0 — NF4 Quantization & QLoRA Support

Highlights

New Features

Bug Fixes

Performance

Uh oh!