Releases: vlora-dev/vlora
Releases · vlora-dev/vlora
v0.3.0 — NF4 Quantization & QLoRA Support
Highlights
First-class QLoRA support — vLoRA now integrates with QLoRA workflows for maximum compression. QLoRA compresses the base model (FP16 → NF4), vLoRA compresses the adapter space — these stack multiplicatively.
New Features
- NF4 quantization —
subspace.quantize(method="nf4")uses QLoRA's 4-bit NormalFloat data type with per-block absmax scaling - Double quantization — quantize the NF4 block scales to FP8 via
double_quant=True - NF4 packed storage —
save_quantized()packs to uint8 for ~7× disk savings;load()auto-detects format - QLoRA-aware VLoRAModel —
compute_dtypefor mixed-precision,qlora_infofor base model introspection full_stack_compression()— combined base model + adapter compression reporting- Layer shapes stored in metadata,
__repr__on core objects,adaptive_kpreserved through absorb
Bug Fixes
absorb_incrementalre-projection — existing tasks now properly re-projected when basis rotatesVLoRACallbackwas a no-op — now uses differentiable hooks + steps optimizer- TIES merge normalization — fixed over-scaling when elements are trimmed
- 7 additional correctness and robustness fixes (see CHANGELOG.md)
Performance
gram_schmidt→ QR factorization- Module handle caching in VLoRAModel
- NF4 uses
torch.bucketize(O(N) memory vs O(N×16))
197 tests (196 passed, 5 skipped without transformers)
Full changelog: CHANGELOG.md
pip install vlora-dev==0.3.0