UbiquitousLearning · chenghuaWang · Mar 18, 2026 · Feb 17, 2026 · Feb 18, 2026 · Feb 18, 2026
diff --git a/.claude/skills/impl-jit-kernel/SKILL.md b/.claude/skills/impl-jit-kernel/SKILL.md
diff --git a/.claude/skills/install-pymllm/SKILL.md b/.claude/skills/install-pymllm/SKILL.md
@@ -0,0 +1,73 @@
+---
+name: install-pymllm
+description: Install the pymllm Python package. Asks the user whether to do a full build (with CMake C++ compilation) or a fast install (Python-only, skip CMake). Use when the user asks to install, set up, or reinstall pymllm.
+---
+
+# Install pymllm
+
+## Goal
+
+Help the user install the `pymllm` package with the right configuration for their use case.
+
+## Workflow
+
+### Step 1: Ask the user which install mode they want
+
+Use `AskUserQuestion` to present two options:
+
+**Full Install (with C++ build)**
+- Compiles the C++ mllm runtime and FFI extension via CMake
+- Required if the user needs mobile inference, model conversion with FFI, or CPU/QNN backends
+- Slower (several minutes depending on the machine)
+- Command: `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall`
+
+**Fast Install (Python-only, skip CMake)**
+- Skips the entire CMake build step
+- Only installs the pure Python package
+- Recommended for users who only use CUDA backends (FlashInfer, TileLang) and do not need the C++ mllm runtime
+- Much faster (seconds)
+- Command: `SKBUILD_WHEEL_CMAKE=false pip install -e .`
+
+### Step 2: Ask editable or non-editable
+
+Use `AskUserQuestion` to ask:
+
+- **Editable (`pip install -e .`)**: For active development. Python imports point to the source tree. Changes to `.py` files take effect immediately without reinstalling.
+- **Non-editable (wheel)**: For stable usage. Installs a wheel into site-packages.
+
+### Step 3: Ask whether the user needs CUDA optional dependencies
+
+Use `AskUserQuestion` to ask whether the user needs CUDA support (FlashInfer, TileLang, pyzmq, etc.).
+
+This determines whether to append `[cuda]` to the install specifier (e.g. `pip install -e ".[cuda]"` instead of `pip install -e .`).
+
+**This applies to ALL install modes.** For fast-install users this is especially important since the CUDA packages are the primary compute backend.
+
+### Step 4: Execute the install
+
+Based on user choices, compose and run the appropriate command. The install specifier is either `.` or `".[cuda]"` depending on Step 3.
+
+| Mode | Editable | CUDA | Command |
+|------|----------|------|---------|
+| Full | Yes | No  | `pip install -e -v .` |
+| Full | Yes | Yes | `pip install -e -v ".[cuda]"` |
+| Full | No  | No  | `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall` |
+| Full | No  | Yes | `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall && pip install "pymllm[cuda]"` |
+| Fast | Yes | No  | `SKBUILD_WHEEL_CMAKE=false pip install -e .` |
+| Fast | Yes | Yes | `SKBUILD_WHEEL_CMAKE=false pip install -e ".[cuda]"` |
+| Fast | No  | No  | `SKBUILD_WHEEL_CMAKE=false pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall` |
+| Fast | No  | Yes | `SKBUILD_WHEEL_CMAKE=false pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall && pip install "pymllm[cuda]"` |
+
+### Step 5: Post-install for editable + full build
+
+If the user chose **editable + full build**, the compiled `.so` files live in a build directory (e.g. `build/bin/`), not in the source tree. The Python code at `pymllm/__init__.py` looks for libraries at `pymllm/lib/MllmFFIExtension.so`. A symlink is needed to bridge this gap.
+
+**Invoke the `/link-pymllm-lib` skill** to help the user set up the symlink.
+
+## Important Notes
+
+- The project root must contain `pyproject.toml` with `scikit-build-core` as the build backend.
+- The `wheel.cmake = true` flag in `pyproject.toml` controls whether CMake runs. The env var `SKBUILD_WHEEL_CMAKE=false` overrides it at install time without modifying the file.
+- For non-editable full builds, the `.so` files are bundled inside the wheel automatically — no symlink needed.
+- For fast installs, `pymllm.is_mobile_available()` will return `False` since no C++ libraries are present. This is expected.
+- The `[cuda]` optional dependencies are defined in `pyproject.toml` under `[project.optional-dependencies]`.
diff --git a/.claude/skills/link-pymllm-lib/SKILL.md b/.claude/skills/link-pymllm-lib/SKILL.md
@@ -0,0 +1,83 @@
+---
+name: link-pymllm-lib
+description: Create or update the pymllm/lib symlink to point to a C++ build directory's bin/ folder. Required after editable installs with C++ builds so that Python can find the compiled .so libraries. Use when the user asks to link, fix, or set up pymllm native libraries.
+---
+
+# Link pymllm lib
+
+## Goal
+
+Create a symlink at `pymllm/lib` pointing to the correct build output directory so that an editable-installed pymllm can load the compiled C++ shared libraries (`MllmFFIExtension.so`, `libMllmRT.so`, etc.).
+
+## Background
+
+When pymllm is installed in editable mode (`pip install -e .`), Python imports from the source tree directly. The C++ libraries are compiled into `<build-dir>/bin/` by CMake, but pymllm looks for them at `pymllm/lib/`. A symlink bridges this gap:
+
+```
+pymllm/lib -> <project-root>/<build-dir>/bin
+```
+
+## Workflow
+
+### Step 1: Detect available build directories
+
+Scan the project root for directories matching the pattern `build*/bin/` that contain `MllmFFIExtension.so` (or `.dylib` on macOS). List all valid candidates.
+
+Common build directories and their corresponding platforms:
+
+| Build directory | Platform / Config | Typical build command |
+|----------------|-------------------|----------------------|
+| `build/bin` | X86 CPU only | `python task.py tasks/build_x86.yaml` |
+| `build-x86-cuda/bin` | X86 + CUDA | `python task.py tasks/build_x86_cuda.yaml` |
+| `build-qnn-aot/bin` | X86 + QNN AOT | `python task.py tasks/build_x86_qnn_aot.yaml` |
+| `build-android-arm64-v8a-qnn/bin` | Android ARM + QNN | `python task.py tasks/build_android_qnn.yaml` |
+
+### Step 2: Ask the user which build to link
+
+Use `AskUserQuestion` to let the user pick from the detected build directories. Show each option with its path and the platform it corresponds to.
+
+If no build directories with `.so` files are found, inform the user they need to build first:
+
+```bash
+pip install -r requirements.txt
+python task.py tasks/build_x86.yaml  # or another build task
+```
+
+### Step 3: Check existing symlink
+
+Before creating a new symlink, check if `pymllm/lib` already exists:
+
+- If it's a symlink, show where it currently points and confirm replacement.
+- If it's a real directory, warn the user and ask before removing it.
+- If it doesn't exist, proceed directly.
+
+### Step 4: Create the symlink
+
+```bash
+ln -sfn <project-root>/<build-dir>/bin <project-root>/pymllm/lib
+```
+
+Use `ln -sfn` to atomically replace any existing symlink.
+
+### Step 5: Verify
+
+After creating the symlink, verify by checking that the target `.so` file is accessible:
+
+```bash
+ls -la pymllm/lib/MllmFFIExtension.so
+```
+
+Then run a quick Python check:
+
+```bash
+python -c "import pymllm; print('mobile available:', pymllm.is_mobile_available())"
+```
+
+If `is_mobile_available()` returns `True`, the link is correct.
+
+## Important Notes
+
+- The symlink target must be an **absolute path** for reliability.
+- On macOS, the library extension is `.dylib` instead of `.so`.
+- Android build directories (e.g., `build-android-arm64-v8a-qnn/bin`) contain ARM binaries that cannot run on x86 hosts. Warn the user if they select one of these on a non-ARM machine.
+- If the user has multiple build directories, they can re-run this skill anytime to switch which build pymllm uses.
diff --git a/.claude/skills/update-codeowners/SKILL.md b/.claude/skills/update-codeowners/SKILL.md
@@ -0,0 +1,44 @@
+---
+name: update-codeowners
+description: Updates CODEOWNERS entries safely with consistent path and owner formatting. Use when the user asks to add, remove, or modify CODEOWNERS rules, ownership mappings, reviewers, or module maintainers.
+---
+
+# Update CODEOWNERS
+
+## Goal
+Maintain `CODEOWNERS` accurately while preserving the repository's existing section/comment style.
+
+## Workflow
+1. Read the current `CODEOWNERS` file before editing.
+2. Identify requested changes as one of:
+   - Add new path rule
+   - Modify owners for existing path rule
+   - Remove obsolete path rule
+   - Reorganize section comments (only if requested)
+3. Update rules in place instead of creating duplicates for the same path.
+4. Keep existing section headers and comment style unless the user asks to refactor structure.
+5. Return a concise changelog describing which paths were added, changed, or removed.
+
+## Rule Format
+- Use one rule per line: `<path-pattern> <owner1> <owner2> ...`
+- Owners must be GitHub handles prefixed with `@`.
+- Keep path style consistent with the file (in this repo, path patterns typically start with `/`).
+- Do not leave rules with empty owner lists.
+
+## Editing Guidelines
+- Prefer minimal edits near related sections.
+- If a path already exists, update that line instead of adding a second conflicting line.
+- If a new rule logically belongs to an existing section, place it in that section.
+- Preserve human-readable grouping and blank lines.
+- Keep comments intact unless they are clearly outdated and the user asked for cleanup.
+
+## Validation Checklist
+- [ ] Every non-comment, non-empty line has at least one owner.
+- [ ] Every owner token starts with `@`.
+- [ ] No accidental duplicate rule for the exact same path pattern.
+- [ ] Existing comments/sections were preserved unless explicitly changed.
+
+## Example Requests
+- "Add `/mllm/models/new_model/ @alice @bob` under models."
+- "Change `/core/Storage` owner to `@team-core`."
+- "Remove ownership rule for deprecated path `/legacy/`."
diff --git a/.codespellrc b/.codespellrc
@@ -1,3 +1,3 @@
 [codespell]
-ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS, bfloat, constexpr, cuda, dlpack, expt, forceinline, ifndef, linalg, LPBQ, mllm, pymllm, Quantizaton, Qwen, ROCM, silu, torchao
+ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS, bfloat, constexpr, cuda, dlpack, expt, forceinline, ifndef, linalg, LPBQ, mllm, pymllm, Quantizaton, Qwen, ROCM, silu, torchao, flashinfer
 skip = *.json,*.jsonl,*.patch,*.txt
diff --git a/.gitignore b/.gitignore
@@ -4,7 +4,7 @@
 .cache/
 .tmp/
 compile_commands.json
-.claude/
+settings.local.json
 
 # MLLM Team Specific
 tasks/mllmteam*
@@ -13,7 +13,7 @@ tasks/mllmteam*
 
 # Building files and binary
 build*/
-install*/
+/install*/
 mllm-sdk-*/
 mllm-install-*/
 

diff --git a/README.md b/README.md
@@ -308,6 +308,15 @@ mllm provides a set of model converters to convert models from other popular mod
 bash ./scripts/install_pymllm.sh
 ```
 
+> **Tip for CUDA-only users:** If you only use CUDA backends (e.g., FlashInfer, TileLang) and do not need the C++ mllm runtime, you can skip the CMake build to speed up installation significantly:
+>
+> ```shell
+> SKBUILD_WHEEL_CMAKE=false pip install -e .
+> pip install pymllm[cuda]
+> ```
+>
+> This installs only the pure Python package without compiling the C++ components.
+
 **future:**
 
 Once PyPI approves the creation of the mllm organization, we will publish it there. Afterwards, you can use the command below to install it in the future.

diff --git a/assets/pymllm-arch.png b/assets/pymllm-arch.png
@@ -246,6 +246,17 @@ mllm provides a set of model converters to convert models from other popular mod
 
    bash ./scripts/install_pymllm.sh
 
+.. tip::
+
+   **For CUDA-only users:** If you only use CUDA backends (e.g., FlashInfer, TileLang) and do not need the C++ mllm runtime, you can skip the CMake build to speed up installation significantly:
+
+   .. code-block:: shell
+
+      SKBUILD_WHEEL_CMAKE=false pip install -e .
+      pip install pymllm[cuda]
+
+   This installs only the pure Python package without compiling the C++ components.
+
 **future:**
 
 Once PyPI approves the creation of the mllm organization, we will publish it there. Afterwards, you can use the command below to install it in the future.

@@ -60,6 +60,10 @@ Taking ``qwen3_qnn_aot`` as an example, the detailed steps are as follows.
          pip install -e .
 
          # link lib to pymllm's dir, so that tvm ffi can find the lib
+         # 
+         # NOTE:! build x86 qualcomm aot first !
+         source <absolute path to where you install qnn>/bin/envsetup.sh
+         python task.py tasks/build_x86_qnn_aot.yaml
          ln -s <absolute path to where you build mllm>/bin/ mllm/pymllm/lib
 
 
@@ -82,6 +86,7 @@ Taking ``qwen3_qnn_aot`` as an example, the detailed steps are as follows.
    .. code-block:: shell
 
       # In the mllm-v2 project root directory
+      source <absolute path to where you install qnn>/bin/envsetup.sh
       python task.py tasks/build_x86_qnn_aot.yaml
 
       # Run the compiler program

diff --git a/mllm-kernel/.gitignore b/mllm-kernel/.gitignore
@@ -3,3 +3,4 @@ build-py/
 .vscode/settings.json
 compile_commands.json
 .clangd
+.pytest_cache/
diff --git a/mllm-kernel/README.md b/mllm-kernel/README.md
@@ -80,31 +80,30 @@ y = add_constant(x, 8)
 
 Use the helpers in `mllm_kernel.jit_utils`:
 
-- `load_cpu_jit`
-- `load_cuda_jit`
+- `jit`
 - `make_cpp_args`
-- `cache_once`
 
-Example pattern:
+Recommended pattern (CPU example):
 
 ```python
 import torch
-from mllm_kernel.jit_utils import cache_once, load_cpu_jit, make_cpp_args
-
-@cache_once
-def _jit_my_kernel_module(param: int):
-    args = make_cpp_args(param)
-    return load_cpu_jit(
-        "my_kernel",
-        *args,
-        cpp_files=["my_kernel.cpp"],
-        cpp_wrappers=[("my_kernel", f"my_namespace::my_kernel<{args}>")],
-    )
+import mllm_kernel
+
+@mllm_kernel.jit(
+    args=16,
+    device="cpu",
+    cpp_files=["my_kernel.cpp"],
+    cpp_wrappers=[("my_kernel", "my_namespace::my_kernel<16>")],
+    func_name="my_kernel",
+)
+def _my_kernel_16(compiled_module, dst: torch.Tensor, src: torch.Tensor) -> None:
+    compiled_module.my_kernel(dst, src)
 
 def my_kernel(src: torch.Tensor, param: int) -> torch.Tensor:
+    if param != 16:
+        raise ValueError("This demo only supports param=16.")
     dst = torch.empty_like(src)
-    module = _jit_my_kernel_module(param)
-    module.my_kernel(dst, src)
+    _my_kernel_16(dst, src)
     return dst
 ```