Feature/optimize memory usage by zhongkaifu · Pull Request #17 · zhongkaifu/TensorSharp

zhongkaifu · 2026-04-15T07:29:29Z

No description provided.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb9c17005d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-15T07:36:05Z

+            int chunkSize = ResolvePrefillChunkSize(_backend, tokens.Count);
+            if (chunkSize >= tokens.Count)
+                return _model.ForwardRefill(CopyTokenRange(tokens, 0, tokens.Count));


Disable CUDA prefill chunking for multimodal prompts

ForwardPromptPrefill now chunks any CUDA prompt larger than 5120 tokens, but multimodal models store image/audio embedding insertion indices relative to the full prompt token sequence. With chunked prefill, ForwardRefill is called on partial token windows while those absolute positions are still applied, which can push embedding insertion past the chunk bounds (triggering Tensor.Narrow range exceptions) or place embeddings in the wrong segment. This breaks long multimodal inference on CUDA; chunking should be skipped for multimodal requests or embedding positions must be rebased per chunk.

Useful? React with 👍 / 👎.

zhongkaifu added 2 commits April 14, 2026 08:05

allow user to read and scroll up and down freely

271f02c

optimize memory usage

eb9c170

chatgpt-codex-connector bot reviewed Apr 15, 2026

View reviewed changes

zhongkaifu added 2 commits April 17, 2026 17:49

optimize kv cache

545bc82

update web chat ux

343ccfc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/optimize memory usage#17

Feature/optimize memory usage#17
zhongkaifu wants to merge 4 commits intomainfrom
feature/optimize_memory_usage

zhongkaifu commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhongkaifu commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant