Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
cfac59e
feat(mllm_kernel): simplify JIT usage in README and update kernel exa…
chenghuaWang Feb 17, 2026
8f3485a
feat: update dependencies and refactor mobile module structure
chenghuaWang Feb 18, 2026
abf1fa4
feat: enhance configuration management and update dependencies
chenghuaWang Feb 18, 2026
ec71258
feat: add main entry points and configuration for pymllm and mllm-kernel
chenghuaWang Feb 18, 2026
731ea71
feat: enhance layer implementations and add new components
chenghuaWang Feb 19, 2026
02255d8
feat: add initial files for pymllm architecture and launch functionality
chenghuaWang Feb 19, 2026
d40396c
Merge branch 'UbiquitousLearning:main' into wch-main
chenghuaWang Feb 20, 2026
28b75fb
feat: update dependencies and enhance configuration structure
chenghuaWang Feb 21, 2026
6c4aa44
feat: implement store_cache functionality and related components
chenghuaWang Feb 21, 2026
e5e1b78
refactor: improve socket initialization in TokenizerProcess
chenghuaWang Feb 21, 2026
73fe4fd
Merge branch 'UbiquitousLearning:main' into wch-main
chenghuaWang Feb 27, 2026
65f00b4
feat(engine): support batch generation and enable shared memory queue…
chenghuaWang Feb 27, 2026
b057360
feat(mllm-kernel): add high-performance create_kv_indices CUDA kernel…
chenghuaWang Mar 2, 2026
9bc959f
feat(sampling): add sampling module with FlashInfer acceleration and …
chenghuaWang Mar 2, 2026
2cf50f4
feat(cuda): add fused GDN decode and RMSNorm+SiLU gating kernels for …
chenghuaWang Mar 9, 2026
31b0ff9
fix(attention): refine FlashInfer backend logic and improve RadixCach…
chenghuaWang Mar 17, 2026
4d3d0c5
refactor: improve code readability and structure across multiple modules
chenghuaWang Mar 17, 2026
9c27ad8
chore: update installation instructions and add new skills for pymllm
chenghuaWang Mar 17, 2026
0327b8d
refactor: enhance installation instructions and improve cache management
chenghuaWang Mar 17, 2026
0c31ee9
refactor: enhance configuration management and improve process health…
chenghuaWang Mar 18, 2026
6835ab8
feat(mllm-kernel): introduce new Marlin kernel implementations for ef…
chenghuaWang Mar 18, 2026
b3fb68c
feat(quantization): implement quantization configuration loading and …
chenghuaWang Mar 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
486 changes: 486 additions & 0 deletions .claude/skills/impl-jit-kernel/SKILL.md

Large diffs are not rendered by default.

73 changes: 73 additions & 0 deletions .claude/skills/install-pymllm/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
name: install-pymllm
description: Install the pymllm Python package. Asks the user whether to do a full build (with CMake C++ compilation) or a fast install (Python-only, skip CMake). Use when the user asks to install, set up, or reinstall pymllm.
---

# Install pymllm

## Goal

Help the user install the `pymllm` package with the right configuration for their use case.

## Workflow

### Step 1: Ask the user which install mode they want

Use `AskUserQuestion` to present two options:

**Full Install (with C++ build)**
- Compiles the C++ mllm runtime and FFI extension via CMake
- Required if the user needs mobile inference, model conversion with FFI, or CPU/QNN backends
- Slower (several minutes depending on the machine)
- Command: `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall`

**Fast Install (Python-only, skip CMake)**
- Skips the entire CMake build step
- Only installs the pure Python package
- Recommended for users who only use CUDA backends (FlashInfer, TileLang) and do not need the C++ mllm runtime
- Much faster (seconds)
- Command: `SKBUILD_WHEEL_CMAKE=false pip install -e .`

### Step 2: Ask editable or non-editable

Use `AskUserQuestion` to ask:

- **Editable (`pip install -e .`)**: For active development. Python imports point to the source tree. Changes to `.py` files take effect immediately without reinstalling.
- **Non-editable (wheel)**: For stable usage. Installs a wheel into site-packages.

### Step 3: Ask whether the user needs CUDA optional dependencies

Use `AskUserQuestion` to ask whether the user needs CUDA support (FlashInfer, TileLang, pyzmq, etc.).

This determines whether to append `[cuda]` to the install specifier (e.g. `pip install -e ".[cuda]"` instead of `pip install -e .`).

**This applies to ALL install modes.** For fast-install users this is especially important since the CUDA packages are the primary compute backend.

### Step 4: Execute the install

Based on user choices, compose and run the appropriate command. The install specifier is either `.` or `".[cuda]"` depending on Step 3.

| Mode | Editable | CUDA | Command |
|------|----------|------|---------|
| Full | Yes | No | `pip install -e -v .` |
| Full | Yes | Yes | `pip install -e -v ".[cuda]"` |
| Full | No | No | `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall` |
| Full | No | Yes | `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall && pip install "pymllm[cuda]"` |
| Fast | Yes | No | `SKBUILD_WHEEL_CMAKE=false pip install -e .` |
| Fast | Yes | Yes | `SKBUILD_WHEEL_CMAKE=false pip install -e ".[cuda]"` |
| Fast | No | No | `SKBUILD_WHEEL_CMAKE=false pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall` |
| Fast | No | Yes | `SKBUILD_WHEEL_CMAKE=false pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall && pip install "pymllm[cuda]"` |

### Step 5: Post-install for editable + full build

If the user chose **editable + full build**, the compiled `.so` files live in a build directory (e.g. `build/bin/`), not in the source tree. The Python code at `pymllm/__init__.py` looks for libraries at `pymllm/lib/MllmFFIExtension.so`. A symlink is needed to bridge this gap.

**Invoke the `/link-pymllm-lib` skill** to help the user set up the symlink.

## Important Notes

- The project root must contain `pyproject.toml` with `scikit-build-core` as the build backend.
- The `wheel.cmake = true` flag in `pyproject.toml` controls whether CMake runs. The env var `SKBUILD_WHEEL_CMAKE=false` overrides it at install time without modifying the file.
- For non-editable full builds, the `.so` files are bundled inside the wheel automatically — no symlink needed.
- For fast installs, `pymllm.is_mobile_available()` will return `False` since no C++ libraries are present. This is expected.
- The `[cuda]` optional dependencies are defined in `pyproject.toml` under `[project.optional-dependencies]`.
83 changes: 83 additions & 0 deletions .claude/skills/link-pymllm-lib/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
name: link-pymllm-lib
description: Create or update the pymllm/lib symlink to point to a C++ build directory's bin/ folder. Required after editable installs with C++ builds so that Python can find the compiled .so libraries. Use when the user asks to link, fix, or set up pymllm native libraries.
---

# Link pymllm lib

## Goal

Create a symlink at `pymllm/lib` pointing to the correct build output directory so that an editable-installed pymllm can load the compiled C++ shared libraries (`MllmFFIExtension.so`, `libMllmRT.so`, etc.).

## Background

When pymllm is installed in editable mode (`pip install -e .`), Python imports from the source tree directly. The C++ libraries are compiled into `<build-dir>/bin/` by CMake, but pymllm looks for them at `pymllm/lib/`. A symlink bridges this gap:

```
pymllm/lib -> <project-root>/<build-dir>/bin
```

## Workflow

### Step 1: Detect available build directories

Scan the project root for directories matching the pattern `build*/bin/` that contain `MllmFFIExtension.so` (or `.dylib` on macOS). List all valid candidates.

Common build directories and their corresponding platforms:

| Build directory | Platform / Config | Typical build command |
|----------------|-------------------|----------------------|
| `build/bin` | X86 CPU only | `python task.py tasks/build_x86.yaml` |
| `build-x86-cuda/bin` | X86 + CUDA | `python task.py tasks/build_x86_cuda.yaml` |
| `build-qnn-aot/bin` | X86 + QNN AOT | `python task.py tasks/build_x86_qnn_aot.yaml` |
| `build-android-arm64-v8a-qnn/bin` | Android ARM + QNN | `python task.py tasks/build_android_qnn.yaml` |

### Step 2: Ask the user which build to link

Use `AskUserQuestion` to let the user pick from the detected build directories. Show each option with its path and the platform it corresponds to.

If no build directories with `.so` files are found, inform the user they need to build first:

```bash
pip install -r requirements.txt
python task.py tasks/build_x86.yaml # or another build task
```

### Step 3: Check existing symlink

Before creating a new symlink, check if `pymllm/lib` already exists:

- If it's a symlink, show where it currently points and confirm replacement.
- If it's a real directory, warn the user and ask before removing it.
- If it doesn't exist, proceed directly.

### Step 4: Create the symlink

```bash
ln -sfn <project-root>/<build-dir>/bin <project-root>/pymllm/lib
```

Use `ln -sfn` to atomically replace any existing symlink.

### Step 5: Verify

After creating the symlink, verify by checking that the target `.so` file is accessible:

```bash
ls -la pymllm/lib/MllmFFIExtension.so
```

Then run a quick Python check:

```bash
python -c "import pymllm; print('mobile available:', pymllm.is_mobile_available())"
```

If `is_mobile_available()` returns `True`, the link is correct.

## Important Notes

- The symlink target must be an **absolute path** for reliability.
- On macOS, the library extension is `.dylib` instead of `.so`.
- Android build directories (e.g., `build-android-arm64-v8a-qnn/bin`) contain ARM binaries that cannot run on x86 hosts. Warn the user if they select one of these on a non-ARM machine.
- If the user has multiple build directories, they can re-run this skill anytime to switch which build pymllm uses.
44 changes: 44 additions & 0 deletions .claude/skills/update-codeowners/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
name: update-codeowners
description: Updates CODEOWNERS entries safely with consistent path and owner formatting. Use when the user asks to add, remove, or modify CODEOWNERS rules, ownership mappings, reviewers, or module maintainers.
---

# Update CODEOWNERS

## Goal
Maintain `CODEOWNERS` accurately while preserving the repository's existing section/comment style.

## Workflow
1. Read the current `CODEOWNERS` file before editing.
2. Identify requested changes as one of:
- Add new path rule
- Modify owners for existing path rule
- Remove obsolete path rule
- Reorganize section comments (only if requested)
3. Update rules in place instead of creating duplicates for the same path.
4. Keep existing section headers and comment style unless the user asks to refactor structure.
5. Return a concise changelog describing which paths were added, changed, or removed.

## Rule Format
- Use one rule per line: `<path-pattern> <owner1> <owner2> ...`
- Owners must be GitHub handles prefixed with `@`.
- Keep path style consistent with the file (in this repo, path patterns typically start with `/`).
- Do not leave rules with empty owner lists.

## Editing Guidelines
- Prefer minimal edits near related sections.
- If a path already exists, update that line instead of adding a second conflicting line.
- If a new rule logically belongs to an existing section, place it in that section.
- Preserve human-readable grouping and blank lines.
- Keep comments intact unless they are clearly outdated and the user asked for cleanup.

## Validation Checklist
- [ ] Every non-comment, non-empty line has at least one owner.
- [ ] Every owner token starts with `@`.
- [ ] No accidental duplicate rule for the exact same path pattern.
- [ ] Existing comments/sections were preserved unless explicitly changed.

## Example Requests
- "Add `/mllm/models/new_model/ @alice @bob` under models."
- "Change `/core/Storage` owner to `@team-core`."
- "Remove ownership rule for deprecated path `/legacy/`."
2 changes: 1 addition & 1 deletion .codespellrc
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[codespell]
ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS, bfloat, constexpr, cuda, dlpack, expt, forceinline, ifndef, linalg, LPBQ, mllm, pymllm, Quantizaton, Qwen, ROCM, silu, torchao
ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS, bfloat, constexpr, cuda, dlpack, expt, forceinline, ifndef, linalg, LPBQ, mllm, pymllm, Quantizaton, Qwen, ROCM, silu, torchao, flashinfer
skip = *.json,*.jsonl,*.patch,*.txt
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
.cache/
.tmp/
compile_commands.json
.claude/
settings.local.json

# MLLM Team Specific
tasks/mllmteam*
Expand All @@ -13,7 +13,7 @@ tasks/mllmteam*

# Building files and binary
build*/
install*/
/install*/
mllm-sdk-*/
mllm-install-*/

Expand Down
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,15 @@ mllm provides a set of model converters to convert models from other popular mod
bash ./scripts/install_pymllm.sh
```

> **Tip for CUDA-only users:** If you only use CUDA backends (e.g., FlashInfer, TileLang) and do not need the C++ mllm runtime, you can skip the CMake build to speed up installation significantly:
>
> ```shell
> SKBUILD_WHEEL_CMAKE=false pip install -e .
> pip install pymllm[cuda]
> ```
>
> This installs only the pure Python package without compiling the C++ components.

**future:**

Once PyPI approves the creation of the mllm organization, we will publish it there. Afterwards, you can use the command below to install it in the future.
Expand Down
Binary file added assets/pymllm-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,17 @@ mllm provides a set of model converters to convert models from other popular mod

bash ./scripts/install_pymllm.sh

.. tip::

**For CUDA-only users:** If you only use CUDA backends (e.g., FlashInfer, TileLang) and do not need the C++ mllm runtime, you can skip the CMake build to speed up installation significantly:

.. code-block:: shell

SKBUILD_WHEEL_CMAKE=false pip install -e .
pip install pymllm[cuda]

This installs only the pure Python package without compiling the C++ components.

**future:**

Once PyPI approves the creation of the mllm organization, we will publish it there. Afterwards, you can use the command below to install it in the future.
Expand Down
5 changes: 5 additions & 0 deletions docs/qnn_backend/aot_execute.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ Taking ``qwen3_qnn_aot`` as an example, the detailed steps are as follows.
pip install -e .

# link lib to pymllm's dir, so that tvm ffi can find the lib
#
# NOTE:! build x86 qualcomm aot first !
source <absolute path to where you install qnn>/bin/envsetup.sh
python task.py tasks/build_x86_qnn_aot.yaml
ln -s <absolute path to where you build mllm>/bin/ mllm/pymllm/lib


Expand All @@ -82,6 +86,7 @@ Taking ``qwen3_qnn_aot`` as an example, the detailed steps are as follows.
.. code-block:: shell

# In the mllm-v2 project root directory
source <absolute path to where you install qnn>/bin/envsetup.sh
python task.py tasks/build_x86_qnn_aot.yaml

# Run the compiler program
Expand Down
1 change: 1 addition & 0 deletions mllm-kernel/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ build-py/
.vscode/settings.json
compile_commands.json
.clangd
.pytest_cache/
33 changes: 16 additions & 17 deletions mllm-kernel/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,31 +80,30 @@ y = add_constant(x, 8)

Use the helpers in `mllm_kernel.jit_utils`:

- `load_cpu_jit`
- `load_cuda_jit`
- `jit`
- `make_cpp_args`
- `cache_once`

Example pattern:
Recommended pattern (CPU example):

```python
import torch
from mllm_kernel.jit_utils import cache_once, load_cpu_jit, make_cpp_args

@cache_once
def _jit_my_kernel_module(param: int):
args = make_cpp_args(param)
return load_cpu_jit(
"my_kernel",
*args,
cpp_files=["my_kernel.cpp"],
cpp_wrappers=[("my_kernel", f"my_namespace::my_kernel<{args}>")],
)
import mllm_kernel

@mllm_kernel.jit(
args=16,
device="cpu",
cpp_files=["my_kernel.cpp"],
cpp_wrappers=[("my_kernel", "my_namespace::my_kernel<16>")],
func_name="my_kernel",
)
def _my_kernel_16(compiled_module, dst: torch.Tensor, src: torch.Tensor) -> None:
compiled_module.my_kernel(dst, src)

def my_kernel(src: torch.Tensor, param: int) -> torch.Tensor:
if param != 16:
raise ValueError("This demo only supports param=16.")
dst = torch.empty_like(src)
module = _jit_my_kernel_module(param)
module.my_kernel(dst, src)
_my_kernel_16(dst, src)
return dst
```
Comment thread
chenghuaWang marked this conversation as resolved.

Expand Down
Loading
Loading