feat: Add SearchableToolset for dynamic tool discovery from large tool catalogs by vblagoje · Pull Request #10426 · deepset-ai/haystack

vblagoje · 2026-01-22T08:53:08Z

Why

Large tool catalogs overwhelm LLM context windows. Agents need a way to discover tools on-demand rather than receiving all tool definitions upfront.

fixes Add Tool Search Tool #10323

What

SearchableToolset: Toolset subclass with BM25-based tool discovery
Uses Haystack's built-in InMemoryDocumentStore.bm25_retrieval() internally for BM25L search — no custom search engine, no extra dependencies
search_tools(query, k) bootstrap tool for LLM-driven discovery
Passthrough mode for catalogs below search_threshold (default: 8)
clear() method for resetting discovered tools between agent runs
Full serialization support (to_dict/from_dict)

How can it be used

from haystack.tools import SearchableToolset                                                                                                                                                                 
from haystack.components.agents import Agent                                                                                                                                                                 
                                                                                                                                                                                                             
# Large catalog - LLM discovers via search_tools 
tools = <any type of tool or toolset we support as long as it is ToolsType>                                                                                                                                                           
toolset = SearchableToolset(catalog=tools)                                                                                                                                      
agent = Agent(chat_generator=generator, tools=toolset)

How did you test it

Unit tests for SearchableToolset (passthrough, discovery, iteration, serialization)
Integration with ComponentTool warm-up verification

Notes for the reviewer

BM25L search is delegated to InMemoryDocumentStore.bm25_retrieval(), reusing well-tested Haystack infrastructure instead of a hand-rolled engine
Embedding mode deferred to future PR
Tools warmed up lazily on discovery, not at index time

…alogs Implements ToolSearchToolset - a Toolset subclass that enables dynamic tool discovery from large catalogs. Tools are discovered via `search_tools` bm25 based special search tool and become available to the LLM. Key features: - Single discovery mode: "bm25", postpone "embedding" for future - Passthrough mode for small catalogs (< search_threshold) - Self-contained BM25L search engine implementation - Full serialization support (to_dict/from_dict) - Auto warm-up when iterating to ensure bootstrap tool availability

vercel · 2026-01-22T08:53:14Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
haystack-docs	Ignored	Preview	Feb 25, 2026 10:14am

coveralls · 2026-01-22T08:57:50Z

Pull Request Test Coverage Report for Build 22308342425

Details

0 of 0 changed or added relevant lines in 0 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.02%) to 92.67%

Files with Coverage Reduction	New Missed Lines	%
core/pipeline/async_pipeline.py	2	66.67%

Totals
Change from base Build 22302678937:	-0.02%
Covered Lines:	15437
Relevant Lines:	16658

💛 - Coveralls

The description claimed to return "a JSON array of tool definitions" but actually returns a plain text confirmation message with tool names.

Tools discovered via search_tools were added to _discovered_tools without calling warm_up(), causing tools that require initialization (connections, model loading) to fail when invoked.

The inherited __getitem__ accessed self.tools which is always empty in ToolSearchToolset. This caused IndexError for valid indexes even when tools were available through __iter__.

vblagoje · 2026-01-22T09:58:47Z

@sjrl @julian-risch - give me a 1-2 days to test it thoroughly with large mcp toolsets and if all good I'll open this PR. This is the general direction @mpangrazzi and I talked about. LMK if you agree.

anakin87

Cool implementation!

I left some comments and there's still some work to be done, but it seems a good direction.

sjrl · 2026-01-27T10:58:49Z

Looking very cool! One additional question I had is that this wouldn't work with MCPToolset as it is right? So after this PR we'd need to add support for an SearchableMCPToolset as well in our MCP integration?

vblagoje · 2026-01-27T13:10:26Z

Looking very cool! One additional question I had is that this wouldn't work with MCPToolset as it is right? So after this PR we'd need to add support for an SearchableMCPToolset as well in our MCP integration?

@sjrl @anakin87 I thought it would. It warms up tools so perhaps it should work. Still didn't get around to test but I will today or tomorrow. Will report back!

vblagoje · 2026-01-28T08:28:49Z

@sjrl @anakin87 I tried it with itinerary agent and it doesn't work that well. Investigating...

Renamed the `query` parameter to `tool_keywords` and refined the description to guide LLMs toward providing vocabulary from tool names/descriptions rather than echoing user requests. Before: LLMs often passed user intent like "south of france highlights" After: LLMs provide tool vocabulary like "route weather search" This improves BM25 matching since it relies on lexical overlap with indexed tool names and descriptions.

Rename class and module for clarity: - tool_search_toolset.py -> searchable_toolset.py - ToolSearchToolset -> SearchableToolset - Update all imports, tests, and release notes

Remove the hand-rolled BM25L engine (~107 lines) and delegate to Haystack's built-in InMemoryDocumentStore.bm25_retrieval(), which uses the same algorithm and tokenization. Update tests accordingly.

Replace manual Tool construction with create_tool_from_function and Annotated type hints in _create_search_tool. Reduce test duplication by using create_tool_from_function for fixtures and large_catalog. Consolidate 4 integration tests into 1 deterministic math test and remove the redundant integration_catalog fixture.

vblagoje · 2026-01-30T14:17:16Z

These were good improvements, one last thing I want to do is to make sure is that SearchableToolset is robust and battle tested - i.e. that it works even if tools are loaded lazily i.e in MCPToolset and potentially others. It works great now in RL demo with itinerary agent when MCPToolset is eager but not lazy. A few more days and I'll open this PR @sjrl @anakin87

sjrl · 2026-02-25T07:52:31Z

+        This method allows resetting the toolset's discovered tools between agent runs
+        when the same toolset instance is reused. This can be useful for long-running


We say this in the docstrings which sounds nice, but I don't think our Agent is set up to call this method. If we think it should do that perhaps we should open up a follow-up issue to update Agent to utilize this method? (Not entirely sure what that would look like tbh).

…tack into tool_search_tool

sjrl · 2026-02-25T09:54:27Z

+    Key features include:
+    configurable search threshold for automatic passthrough mode, top-k result limiting,
+    and a ``clear()`` method to reset discovered tools between agent runs.


Still feel a little odd highlighting this since I don't know how I'd use clear in practice, nor is it highlighted in the code example

I can remove the mention from the release note.

Just to share an example

from haystack.tools import create_tool_from_function, SearchableToolset from haystack.components.agents import Agent from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage def get_weather(city: str) -> str: """Get current weather for a city.""" return f"Weather in {city}: 22°C, sunny" def add_numbers(a: int, b: int) -> int: """Add two numbers together.""" return a + b def multiply_numbers(a: int, b: int) -> int: """Multiply two numbers.""" return a * b def get_stock_price(symbol: str) -> str: """Get stock price by ticker symbol.""" return f"{symbol}: $150.00" def search_database(query: str) -> str: """Search the database for records.""" return f"Found 5 records matching '{query}'" def send_email(to: str, subject: str, body: str) -> str: """Send an email to a recipient.""" return f"Email sent to {to}" def calculate_tax(amount: float, rate: float) -> float: """Calculate tax on an amount.""" return amount * rate def convert_currency(amount: float, from_currency: str, to_currency: str) -> float: """Convert currency from one to another.""" return amount * 1.1 # Simplified conversion # Test fixtures weather_tool = create_tool_from_function(get_weather) add_tool = create_tool_from_function(add_numbers) multiply_tool = create_tool_from_function(multiply_numbers) stock_tool = create_tool_from_function(get_stock_price) search_tool = create_tool_from_function(search_database) send_email_tool = create_tool_from_function(send_email) calculate_tax_tool = create_tool_from_function(calculate_tax) convert_currency_tool = create_tool_from_function(convert_currency) large_catalog = [weather_tool, add_tool, multiply_tool, stock_tool, search_tool, send_email_tool, calculate_tax_tool, convert_currency_tool] searchable_toolset = SearchableToolset(catalog=large_catalog) agent = Agent(tools=searchable_toolset, chat_generator=OpenAIChatGenerator(model="gpt-4.1-mini")) result= agent.run(messages=[ChatMessage.from_user("What's the weather in Milan?")]) print(result["messages"]) # ... print(len(searchable_toolset)) # 2 print(searchable_toolset._discovered_tools) # {'get_weather': Tool(name='get_weather', description='Get current weather for a city.', parameters={'properties': {'city': {'type': 'string'}}, 'required': ['city'], 'type': 'object'}, function=<function get_weather at 0x1017a6700>, outputs_to_string=None, inputs_from_state=None, outputs_to_state=None)} searchable_toolset.clear() print(len(searchable_toolset)) # 1 print(searchable_toolset._discovered_tools) # {} result= agent.run(messages=[ChatMessage.from_user("How many records in the database for query: 'apple'. Use the appropriate tool to search the database.")]) print(result["messages"]) # ... print(len(searchable_toolset)) # 2 print(searchable_toolset._discovered_tools) # {'search_database': Tool(name='search_database', description='Search the database for records.', parameters={'properties': {'query': {'type': 'string'}}, 'required': ['query'], 'type': 'object'}, function=<function search_database at 0x1189276a0>, outputs_to_string=None, inputs_from_state=None, outputs_to_state=None)}

Thanks for the example and opening the issue! The issue should help us figure out how this can be used in a production app rather than just in a script.

sjrl

Looks good! Just two minor comments about how a user is practically meant to use the clear method. But probably something we can do in a follow up PR

github-actions Bot added the topic:tests label Jan 22, 2026

vblagoje added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Jan 22, 2026

github-actions Bot added the type:documentation Improvements on the docs label Jan 22, 2026

vblagoje added 6 commits January 22, 2026 10:01

fix: Correct misleading search_tools description about return value

a8c67a7

The description claimed to return "a JSON array of tool definitions" but actually returns a plain text confirmation message with tool names.

fix: Warm up discovered tools before making them available

ca3c2e1

Tools discovered via search_tools were added to _discovered_tools without calling warm_up(), causing tools that require initialization (connections, model loading) to fail when invoked.

fix: Override __getitem__ to return tools from dynamic iteration

717f4c1

The inherited __getitem__ accessed self.tools which is always empty in ToolSearchToolset. This caused IndexError for valid indexes even when tools were available through __iter__.

fix: Warm up catalog tools in passthrough mode

62d8604

feat: Add clear method to reset discovered tools

e442cbd

docs: Add release note for ToolSearchToolset

1367bba

vblagoje removed the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Jan 22, 2026

fix: reno note format

1e1f093

vblagoje changed the title ~~feat: Add ToolSearchToolset for dynamic tool discovery from large catalogs~~ feat: Add ToolSearchToolset for dynamic tool discovery from large tool catalogs Jan 22, 2026

anakin87 reviewed Jan 26, 2026

View reviewed changes

Comment thread haystack/tools/searchable_toolset.py Outdated

Comment thread haystack/tools/searchable_toolset.py Outdated

Comment thread test/tools/test_searchable_toolset.py

Comment thread haystack/tools/tool_search_toolset.py Outdated

vblagoje added 2 commits January 28, 2026 14:37

Merge branch 'main' into tool_search_tool

29fb03a

vercel Bot deployed to Preview January 28, 2026 13:44 View deployment

Rename ToolSearchToolset to SearchableToolset

5eab32c

Rename class and module for clarity: - tool_search_toolset.py -> searchable_toolset.py - ToolSearchToolset -> SearchableToolset - Update all imports, tests, and release notes

vblagoje changed the title ~~feat: Add ToolSearchToolset for dynamic tool discovery from large tool catalogs~~ feat: Add SearchableToolset for dynamic tool discovery from large tool catalogs Jan 29, 2026

vblagoje added 3 commits January 29, 2026 15:19

Replace _BM25SearchEngine with InMemoryDocumentStore

4706261

Remove the hand-rolled BM25L engine (~107 lines) and delegate to Haystack's built-in InMemoryDocumentStore.bm25_retrieval(), which uses the same algorithm and tokenization. Update tests accordingly.

Merge branch 'main' into tool_search_tool

f8a4c06

vercel Bot deployed to Preview January 30, 2026 09:03 View deployment

anakin87 added 2 commits February 24, 2026 15:26

Merge branch 'main' into tool_search_tool

ebc12ca

improvements

595620f

github-actions Bot added topic:build/distribution topic:DX Developer Experience labels Feb 24, 2026

anakin87 added 2 commits February 24, 2026 17:49

fix and simplify

448650f

Merge branch 'main' into tool_search_tool

17acf3b

vercel Bot deployed to Preview February 24, 2026 17:02 View deployment

sjrl reviewed Feb 25, 2026

View reviewed changes

Comment thread haystack/tools/searchable_toolset.py Outdated

sjrl reviewed Feb 25, 2026

View reviewed changes

Comment thread haystack/tools/searchable_toolset.py

sjrl reviewed Feb 25, 2026

View reviewed changes

Comment thread haystack/tools/searchable_toolset.py Outdated

anakin87 added 2 commits February 25, 2026 09:17

more unit tests

dbb0c8a

Merge branch 'tool_search_tool' of https://github.com/deepset-ai/hays…

d05090c

…tack into tool_search_tool

vercel Bot deployed to Preview February 25, 2026 08:20 View deployment

sjrl reviewed Feb 25, 2026

View reviewed changes

Comment thread haystack/tools/searchable_toolset.py Outdated

sjrl reviewed Feb 25, 2026

View reviewed changes

Comment thread haystack/tools/searchable_toolset.py

anakin87 added 3 commits February 25, 2026 09:42

amke is_passthrough private

cc8af69

raise notimplementederror

14da4c2

improve/simplify serde

f8a9be7

This was referenced Feb 25, 2026

Searchable Toolset: does it need to inherit from Toolset? #10669

Open

Searchable Toolset: make internal search_tools tool configurable #10670

Closed

sjrl reviewed Feb 25, 2026

View reviewed changes

sjrl approved these changes Feb 25, 2026

View reviewed changes

anakin87 mentioned this pull request Feb 25, 2026

Should Agent clear Searchable Toolset? #10671

Open

do not mention clear in relnote

e6324fe

anakin87 mentioned this pull request Feb 25, 2026

Searchable Toolset: add documentation page #10672

Closed

anakin87 approved these changes Feb 25, 2026

View reviewed changes

anakin87 merged commit 6c01184 into main Feb 25, 2026
23 checks passed

anakin87 deleted the tool_search_tool branch February 25, 2026 10:32

rob-9 mentioned this pull request Mar 1, 2026

feat: make SearchableToolset search_tools tool configurable #10703

Merged

7 tasks

		This method allows resetting the toolset's discovered tools between agent runs
		when the same toolset instance is reused. This can be useful for long-running

Conversation

vblagoje commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

How can it be used

How did you test it

Notes for the reviewer

Uh oh!

vercel Bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 22308342425

Details

💛 - Coveralls

Uh oh!

vblagoje commented Jan 22, 2026

Uh oh!

anakin87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjrl commented Jan 27, 2026

Uh oh!

vblagoje commented Jan 27, 2026

Uh oh!

vblagoje commented Jan 28, 2026

Uh oh!

vblagoje commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

sjrl Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

anakin87 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjrl Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anakin87 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

sjrl Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

sjrl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vblagoje commented Jan 22, 2026 •

edited

Loading

vercel Bot commented Jan 22, 2026 •

edited

Loading

coveralls commented Jan 22, 2026 •

edited

Loading

sjrl Feb 25, 2026 •

edited

Loading