-
Notifications
You must be signed in to change notification settings - Fork 189
feat(pymllm): VocabParallelEmbedding & pymllm's cuda infra init #640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
cfac59e
feat(mllm_kernel): simplify JIT usage in README and update kernel exa…
chenghuaWang 8f3485a
feat: update dependencies and refactor mobile module structure
chenghuaWang abf1fa4
feat: enhance configuration management and update dependencies
chenghuaWang ec71258
feat: add main entry points and configuration for pymllm and mllm-kernel
chenghuaWang 731ea71
feat: enhance layer implementations and add new components
chenghuaWang 02255d8
feat: add initial files for pymllm architecture and launch functionality
chenghuaWang d40396c
Merge branch 'UbiquitousLearning:main' into wch-main
chenghuaWang 28b75fb
feat: update dependencies and enhance configuration structure
chenghuaWang 6c4aa44
feat: implement store_cache functionality and related components
chenghuaWang e5e1b78
refactor: improve socket initialization in TokenizerProcess
chenghuaWang 73fe4fd
Merge branch 'UbiquitousLearning:main' into wch-main
chenghuaWang 65f00b4
feat(engine): support batch generation and enable shared memory queue…
chenghuaWang b057360
feat(mllm-kernel): add high-performance create_kv_indices CUDA kernel…
chenghuaWang 9bc959f
feat(sampling): add sampling module with FlashInfer acceleration and …
chenghuaWang 2cf50f4
feat(cuda): add fused GDN decode and RMSNorm+SiLU gating kernels for …
chenghuaWang 31b0ff9
fix(attention): refine FlashInfer backend logic and improve RadixCach…
chenghuaWang 4d3d0c5
refactor: improve code readability and structure across multiple modules
chenghuaWang 9c27ad8
chore: update installation instructions and add new skills for pymllm
chenghuaWang 0327b8d
refactor: enhance installation instructions and improve cache management
chenghuaWang 0c31ee9
refactor: enhance configuration management and improve process health…
chenghuaWang 6835ab8
feat(mllm-kernel): introduce new Marlin kernel implementations for ef…
chenghuaWang b3fb68c
feat(quantization): implement quantization configuration loading and …
chenghuaWang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| --- | ||
| name: install-pymllm | ||
| description: Install the pymllm Python package. Asks the user whether to do a full build (with CMake C++ compilation) or a fast install (Python-only, skip CMake). Use when the user asks to install, set up, or reinstall pymllm. | ||
| --- | ||
|
|
||
| # Install pymllm | ||
|
|
||
| ## Goal | ||
|
|
||
| Help the user install the `pymllm` package with the right configuration for their use case. | ||
|
|
||
| ## Workflow | ||
|
|
||
| ### Step 1: Ask the user which install mode they want | ||
|
|
||
| Use `AskUserQuestion` to present two options: | ||
|
|
||
| **Full Install (with C++ build)** | ||
| - Compiles the C++ mllm runtime and FFI extension via CMake | ||
| - Required if the user needs mobile inference, model conversion with FFI, or CPU/QNN backends | ||
| - Slower (several minutes depending on the machine) | ||
| - Command: `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall` | ||
|
|
||
| **Fast Install (Python-only, skip CMake)** | ||
| - Skips the entire CMake build step | ||
| - Only installs the pure Python package | ||
| - Recommended for users who only use CUDA backends (FlashInfer, TileLang) and do not need the C++ mllm runtime | ||
| - Much faster (seconds) | ||
| - Command: `SKBUILD_WHEEL_CMAKE=false pip install -e .` | ||
|
|
||
| ### Step 2: Ask editable or non-editable | ||
|
|
||
| Use `AskUserQuestion` to ask: | ||
|
|
||
| - **Editable (`pip install -e .`)**: For active development. Python imports point to the source tree. Changes to `.py` files take effect immediately without reinstalling. | ||
| - **Non-editable (wheel)**: For stable usage. Installs a wheel into site-packages. | ||
|
|
||
| ### Step 3: Ask whether the user needs CUDA optional dependencies | ||
|
|
||
| Use `AskUserQuestion` to ask whether the user needs CUDA support (FlashInfer, TileLang, pyzmq, etc.). | ||
|
|
||
| This determines whether to append `[cuda]` to the install specifier (e.g. `pip install -e ".[cuda]"` instead of `pip install -e .`). | ||
|
|
||
| **This applies to ALL install modes.** For fast-install users this is especially important since the CUDA packages are the primary compute backend. | ||
|
|
||
| ### Step 4: Execute the install | ||
|
|
||
| Based on user choices, compose and run the appropriate command. The install specifier is either `.` or `".[cuda]"` depending on Step 3. | ||
|
|
||
| | Mode | Editable | CUDA | Command | | ||
| |------|----------|------|---------| | ||
| | Full | Yes | No | `pip install -e -v .` | | ||
| | Full | Yes | Yes | `pip install -e -v ".[cuda]"` | | ||
| | Full | No | No | `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall` | | ||
| | Full | No | Yes | `pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall && pip install "pymllm[cuda]"` | | ||
| | Fast | Yes | No | `SKBUILD_WHEEL_CMAKE=false pip install -e .` | | ||
| | Fast | Yes | Yes | `SKBUILD_WHEEL_CMAKE=false pip install -e ".[cuda]"` | | ||
| | Fast | No | No | `SKBUILD_WHEEL_CMAKE=false pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall` | | ||
| | Fast | No | Yes | `SKBUILD_WHEEL_CMAKE=false pip wheel -v -w dist . && pip install dist/*.whl --force-reinstall && pip install "pymllm[cuda]"` | | ||
|
|
||
| ### Step 5: Post-install for editable + full build | ||
|
|
||
| If the user chose **editable + full build**, the compiled `.so` files live in a build directory (e.g. `build/bin/`), not in the source tree. The Python code at `pymllm/__init__.py` looks for libraries at `pymllm/lib/MllmFFIExtension.so`. A symlink is needed to bridge this gap. | ||
|
|
||
| **Invoke the `/link-pymllm-lib` skill** to help the user set up the symlink. | ||
|
|
||
| ## Important Notes | ||
|
|
||
| - The project root must contain `pyproject.toml` with `scikit-build-core` as the build backend. | ||
| - The `wheel.cmake = true` flag in `pyproject.toml` controls whether CMake runs. The env var `SKBUILD_WHEEL_CMAKE=false` overrides it at install time without modifying the file. | ||
| - For non-editable full builds, the `.so` files are bundled inside the wheel automatically — no symlink needed. | ||
| - For fast installs, `pymllm.is_mobile_available()` will return `False` since no C++ libraries are present. This is expected. | ||
| - The `[cuda]` optional dependencies are defined in `pyproject.toml` under `[project.optional-dependencies]`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| --- | ||
| name: link-pymllm-lib | ||
| description: Create or update the pymllm/lib symlink to point to a C++ build directory's bin/ folder. Required after editable installs with C++ builds so that Python can find the compiled .so libraries. Use when the user asks to link, fix, or set up pymllm native libraries. | ||
| --- | ||
|
|
||
| # Link pymllm lib | ||
|
|
||
| ## Goal | ||
|
|
||
| Create a symlink at `pymllm/lib` pointing to the correct build output directory so that an editable-installed pymllm can load the compiled C++ shared libraries (`MllmFFIExtension.so`, `libMllmRT.so`, etc.). | ||
|
|
||
| ## Background | ||
|
|
||
| When pymllm is installed in editable mode (`pip install -e .`), Python imports from the source tree directly. The C++ libraries are compiled into `<build-dir>/bin/` by CMake, but pymllm looks for them at `pymllm/lib/`. A symlink bridges this gap: | ||
|
|
||
| ``` | ||
| pymllm/lib -> <project-root>/<build-dir>/bin | ||
| ``` | ||
|
|
||
| ## Workflow | ||
|
|
||
| ### Step 1: Detect available build directories | ||
|
|
||
| Scan the project root for directories matching the pattern `build*/bin/` that contain `MllmFFIExtension.so` (or `.dylib` on macOS). List all valid candidates. | ||
|
|
||
| Common build directories and their corresponding platforms: | ||
|
|
||
| | Build directory | Platform / Config | Typical build command | | ||
| |----------------|-------------------|----------------------| | ||
| | `build/bin` | X86 CPU only | `python task.py tasks/build_x86.yaml` | | ||
| | `build-x86-cuda/bin` | X86 + CUDA | `python task.py tasks/build_x86_cuda.yaml` | | ||
| | `build-qnn-aot/bin` | X86 + QNN AOT | `python task.py tasks/build_x86_qnn_aot.yaml` | | ||
| | `build-android-arm64-v8a-qnn/bin` | Android ARM + QNN | `python task.py tasks/build_android_qnn.yaml` | | ||
|
|
||
| ### Step 2: Ask the user which build to link | ||
|
|
||
| Use `AskUserQuestion` to let the user pick from the detected build directories. Show each option with its path and the platform it corresponds to. | ||
|
|
||
| If no build directories with `.so` files are found, inform the user they need to build first: | ||
|
|
||
| ```bash | ||
| pip install -r requirements.txt | ||
| python task.py tasks/build_x86.yaml # or another build task | ||
| ``` | ||
|
|
||
| ### Step 3: Check existing symlink | ||
|
|
||
| Before creating a new symlink, check if `pymllm/lib` already exists: | ||
|
|
||
| - If it's a symlink, show where it currently points and confirm replacement. | ||
| - If it's a real directory, warn the user and ask before removing it. | ||
| - If it doesn't exist, proceed directly. | ||
|
|
||
| ### Step 4: Create the symlink | ||
|
|
||
| ```bash | ||
| ln -sfn <project-root>/<build-dir>/bin <project-root>/pymllm/lib | ||
| ``` | ||
|
|
||
| Use `ln -sfn` to atomically replace any existing symlink. | ||
|
|
||
| ### Step 5: Verify | ||
|
|
||
| After creating the symlink, verify by checking that the target `.so` file is accessible: | ||
|
|
||
| ```bash | ||
| ls -la pymllm/lib/MllmFFIExtension.so | ||
| ``` | ||
|
|
||
| Then run a quick Python check: | ||
|
|
||
| ```bash | ||
| python -c "import pymllm; print('mobile available:', pymllm.is_mobile_available())" | ||
| ``` | ||
|
|
||
| If `is_mobile_available()` returns `True`, the link is correct. | ||
|
|
||
| ## Important Notes | ||
|
|
||
| - The symlink target must be an **absolute path** for reliability. | ||
| - On macOS, the library extension is `.dylib` instead of `.so`. | ||
| - Android build directories (e.g., `build-android-arm64-v8a-qnn/bin`) contain ARM binaries that cannot run on x86 hosts. Warn the user if they select one of these on a non-ARM machine. | ||
| - If the user has multiple build directories, they can re-run this skill anytime to switch which build pymllm uses. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| --- | ||
| name: update-codeowners | ||
| description: Updates CODEOWNERS entries safely with consistent path and owner formatting. Use when the user asks to add, remove, or modify CODEOWNERS rules, ownership mappings, reviewers, or module maintainers. | ||
| --- | ||
|
|
||
| # Update CODEOWNERS | ||
|
|
||
| ## Goal | ||
| Maintain `CODEOWNERS` accurately while preserving the repository's existing section/comment style. | ||
|
|
||
| ## Workflow | ||
| 1. Read the current `CODEOWNERS` file before editing. | ||
| 2. Identify requested changes as one of: | ||
| - Add new path rule | ||
| - Modify owners for existing path rule | ||
| - Remove obsolete path rule | ||
| - Reorganize section comments (only if requested) | ||
| 3. Update rules in place instead of creating duplicates for the same path. | ||
| 4. Keep existing section headers and comment style unless the user asks to refactor structure. | ||
| 5. Return a concise changelog describing which paths were added, changed, or removed. | ||
|
|
||
| ## Rule Format | ||
| - Use one rule per line: `<path-pattern> <owner1> <owner2> ...` | ||
| - Owners must be GitHub handles prefixed with `@`. | ||
| - Keep path style consistent with the file (in this repo, path patterns typically start with `/`). | ||
| - Do not leave rules with empty owner lists. | ||
|
|
||
| ## Editing Guidelines | ||
| - Prefer minimal edits near related sections. | ||
| - If a path already exists, update that line instead of adding a second conflicting line. | ||
| - If a new rule logically belongs to an existing section, place it in that section. | ||
| - Preserve human-readable grouping and blank lines. | ||
| - Keep comments intact unless they are clearly outdated and the user asked for cleanup. | ||
|
|
||
| ## Validation Checklist | ||
| - [ ] Every non-comment, non-empty line has at least one owner. | ||
| - [ ] Every owner token starts with `@`. | ||
| - [ ] No accidental duplicate rule for the exact same path pattern. | ||
| - [ ] Existing comments/sections were preserved unless explicitly changed. | ||
|
|
||
| ## Example Requests | ||
| - "Add `/mllm/models/new_model/ @alice @bob` under models." | ||
| - "Change `/core/Storage` owner to `@team-core`." | ||
| - "Remove ownership rule for deprecated path `/legacy/`." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,3 @@ | ||
| [codespell] | ||
| ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS, bfloat, constexpr, cuda, dlpack, expt, forceinline, ifndef, linalg, LPBQ, mllm, pymllm, Quantizaton, Qwen, ROCM, silu, torchao | ||
| ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS, bfloat, constexpr, cuda, dlpack, expt, forceinline, ifndef, linalg, LPBQ, mllm, pymllm, Quantizaton, Qwen, ROCM, silu, torchao, flashinfer | ||
| skip = *.json,*.jsonl,*.patch,*.txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,3 +3,4 @@ build-py/ | |
| .vscode/settings.json | ||
| compile_commands.json | ||
| .clangd | ||
| .pytest_cache/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.