Skip to content

Feature/optimize memory usage#17

Open
zhongkaifu wants to merge 4 commits intomainfrom
feature/optimize_memory_usage
Open

Feature/optimize memory usage#17
zhongkaifu wants to merge 4 commits intomainfrom
feature/optimize_memory_usage

Conversation

@zhongkaifu
Copy link
Copy Markdown
Owner

No description provided.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb9c17005d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +222 to +224
int chunkSize = ResolvePrefillChunkSize(_backend, tokens.Count);
if (chunkSize >= tokens.Count)
return _model.ForwardRefill(CopyTokenRange(tokens, 0, tokens.Count));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Disable CUDA prefill chunking for multimodal prompts

ForwardPromptPrefill now chunks any CUDA prompt larger than 5120 tokens, but multimodal models store image/audio embedding insertion indices relative to the full prompt token sequence. With chunked prefill, ForwardRefill is called on partial token windows while those absolute positions are still applied, which can push embedding insertion past the chunk bounds (triggering Tensor.Narrow range exceptions) or place embeddings in the wrong segment. This breaks long multimodal inference on CUDA; chunking should be skipped for multimodal requests or embedding positions must be rebased per chunk.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant