Add tutorial for KV cache compression with TurboQuant by kacperlukawski · Pull Request #438 · deepset-ai/haystack-tutorials

kacperlukawski · 2026-03-30T15:58:12Z

This tutorial presents how to enable TurboQuant cache for HuggingFaceLocalChatGenerator models. It is based on turboquant-vllm, an unofficial implementation as Google hasn't released the official one yet.

review-notebook-app · 2026-03-30T15:58:18Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

kacperlukawski added 2 commits March 30, 2026 17:53

Add tutorial for KV cache compression with TurboQuant

d6b2570

Make it clear that we use unofficial turboquant implementation

d6ca261

kacperlukawski requested a review from a team as a code owner March 30, 2026 15:58

kacperlukawski requested a review from bilgeyucel March 30, 2026 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tutorial for KV cache compression with TurboQuant#438

Add tutorial for KV cache compression with TurboQuant#438
kacperlukawski wants to merge 2 commits intomainfrom
turboquant-tutorial

kacperlukawski commented Mar 30, 2026

Uh oh!

review-notebook-app bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kacperlukawski commented Mar 30, 2026

Uh oh!

review-notebook-app bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant