Commit 7b46d93
committed
Add LoRA multimethod export to CoreML static attention
Add support for exporting LoRA adapters as separate methods in a CoreML
PTE file. CoreML POSITIONAL weight sharing deduplicates base weights
across methods so the binary overhead is just the lora_a/lora_b weights.
- StaticAttention: LoRA-aware projection creation for split_mha=False
- utils.py: skip_names + LoRALinear guard in replace_linear_with_split_linear
- export: --adapter CLI, adapter loading, _exclude_lora quantization filter,
skip_split_names for POSITIONAL sharing, multimethod export branches
Authored with Claude.
ghstack-source-id: 16ab852
ghstack-comment-id: 4093471437
Pull-Request: #183441 parent a652e04 commit 7b46d93
3 files changed
Lines changed: 264 additions & 144 deletions
File tree
- examples
- apple/coreml/llama
- models/llama
0 commit comments