Skip to content

feat: implementation of multimodal runner#892

Open
NorbertKlockiewicz wants to merge 46 commits intomainfrom
@nk/lfm-vlm
Open

feat: implementation of multimodal runner#892
NorbertKlockiewicz wants to merge 46 commits intomainfrom
@nk/lfm-vlm

Conversation

@NorbertKlockiewicz
Copy link
Contributor

@NorbertKlockiewicz NorbertKlockiewicz commented Mar 2, 2026

Description

Adds vision/multimodal support to useLLM: load a VLM by passing capabilities: ['vision'], then use sendMessage(text, { imagePath }) to send messages with images. Under the hood this introduces a pluggable encoder architecture (IEncoder / VisionEncoder), a dedicated MultimodalRunner, and a refactored BaseLLMRunner with cleaner ownership and shared state. Also exposes getVisualTokenCount() JSI method for accurate token counting with images. No changes to the text-only path.

Documentation and Tests: yet to be written once the changes to the runner are accepted by reviewers

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Run the llm example app, select multimodal llm screen. Select an image and prompt the model.

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

NorbertKlockiewicz and others added 28 commits March 2, 2026 11:13
- Add UnifiedRunner that auto-detects PTE layout at load time
  (forward method → text-only, token_embedding+text_decoder → multimodal)
- Merge MultimodalLLM into LLM using UnifiedRunner
- VLMs now have full feature parity: multi-turn, countTextTokens,
  getMaxContextLength, setCountInterval, setTimeInterval
- Remove Runner, MultimodalRunner, MultimodalLLM classes
- Add sendMessageWithImage to LLMController and useLLM hook
- Remove useMultimodalLLM — callers use useLLM with isMultimodal: true
- Migrate multimodal_llm example app to useLLM

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nd fix token generation bugs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…age cache

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…using generateMultimodal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…emove tokenizerConfig guard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndMessage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…modal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… C++ splits on placeholder

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e double reset, fix max_context_len fallback, require tokenizerConfigSource, pass tools in multimodal branch, capture callback by value
NorbertKlockiewicz and others added 16 commits March 3, 2026 11:46
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… EOS IDs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g cache

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kenCount JSI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… runner classes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ad image shape from model metadata

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mage_token from config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NorbertKlockiewicz NorbertKlockiewicz changed the title feat: initial implementation of multimodal runner with lfm vlm feat implementation of multimodal runner Mar 5, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NorbertKlockiewicz NorbertKlockiewicz marked this pull request as ready for review March 5, 2026 16:19
@msluszniak msluszniak changed the title feat implementation of multimodal runner feat: implementation of multimodal runner Mar 5, 2026
@msluszniak msluszniak added the feature PRs that implement a new feature label Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PRs that implement a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants