Skip to content

feat: Add SearchableToolset for dynamic tool discovery from large tool catalogs#10426

Merged
anakin87 merged 28 commits intomainfrom
tool_search_tool
Feb 25, 2026
Merged

feat: Add SearchableToolset for dynamic tool discovery from large tool catalogs#10426
anakin87 merged 28 commits intomainfrom
tool_search_tool

Conversation

@vblagoje
Copy link
Copy Markdown
Member

@vblagoje vblagoje commented Jan 22, 2026

Why

Large tool catalogs overwhelm LLM context windows. Agents need a way to discover tools on-demand rather than receiving all tool definitions upfront.

What

  • SearchableToolset: Toolset subclass with BM25-based tool discovery
  • Uses Haystack's built-in InMemoryDocumentStore.bm25_retrieval() internally for BM25L search — no custom search engine, no extra dependencies
  • search_tools(query, k) bootstrap tool for LLM-driven discovery
  • Passthrough mode for catalogs below search_threshold (default: 8)
  • clear() method for resetting discovered tools between agent runs
  • Full serialization support (to_dict/from_dict)

How can it be used

from haystack.tools import SearchableToolset                                                                                                                                                                 
from haystack.components.agents import Agent                                                                                                                                                                 
                                                                                                                                                                                                             
# Large catalog - LLM discovers via search_tools 
tools = <any type of tool or toolset we support as long as it is ToolsType>                                                                                                                                                           
toolset = SearchableToolset(catalog=tools)                                                                                                                                      
agent = Agent(chat_generator=generator, tools=toolset)                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                        

How did you test it

  • Unit tests for SearchableToolset (passthrough, discovery, iteration, serialization)
  • Integration with ComponentTool warm-up verification

Notes for the reviewer

  • BM25L search is delegated to InMemoryDocumentStore.bm25_retrieval(), reusing well-tested Haystack infrastructure instead of a hand-rolled engine
  • Embedding mode deferred to future PR
  • Tools warmed up lazily on discovery, not at index time

…alogs

Implements ToolSearchToolset - a Toolset subclass that enables dynamic tool
discovery from large catalogs. Tools are discovered via `search_tools` bm25
based special search tool and become available to the LLM.

Key features:
- Single discovery mode: "bm25", postpone "embedding" for future
- Passthrough mode for small catalogs (< search_threshold)
- Self-contained BM25L search engine implementation
- Full serialization support (to_dict/from_dict)
- Auto warm-up when iterating to ensure bootstrap tool availability
@vercel
Copy link
Copy Markdown

vercel Bot commented Jan 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
haystack-docs Ignored Ignored Preview Feb 25, 2026 10:14am

Request Review

@vblagoje vblagoje added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Jan 22, 2026
@github-actions github-actions Bot added the type:documentation Improvements on the docs label Jan 22, 2026
@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Jan 22, 2026

Pull Request Test Coverage Report for Build 22308342425

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 92.67%

Files with Coverage Reduction New Missed Lines %
core/pipeline/async_pipeline.py 2 66.67%
Totals Coverage Status
Change from base Build 22302678937: -0.02%
Covered Lines: 15437
Relevant Lines: 16658

💛 - Coveralls

The description claimed to return "a JSON array of tool definitions"
but actually returns a plain text confirmation message with tool names.
Tools discovered via search_tools were added to _discovered_tools
without calling warm_up(), causing tools that require initialization
(connections, model loading) to fail when invoked.
The inherited __getitem__ accessed self.tools which is always empty
in ToolSearchToolset. This caused IndexError for valid indexes even
when tools were available through __iter__.
@vblagoje vblagoje removed the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Jan 22, 2026
@vblagoje
Copy link
Copy Markdown
Member Author

@sjrl @julian-risch - give me a 1-2 days to test it thoroughly with large mcp toolsets and if all good I'll open this PR. This is the general direction @mpangrazzi and I talked about. LMK if you agree.

@vblagoje vblagoje changed the title feat: Add ToolSearchToolset for dynamic tool discovery from large catalogs feat: Add ToolSearchToolset for dynamic tool discovery from large tool catalogs Jan 22, 2026
Copy link
Copy Markdown
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool implementation!

I left some comments and there's still some work to be done, but it seems a good direction.

Comment thread haystack/tools/searchable_toolset.py Outdated
Comment thread haystack/tools/searchable_toolset.py Outdated
Comment thread test/tools/test_searchable_toolset.py
Comment thread haystack/tools/tool_search_toolset.py Outdated
@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Jan 27, 2026

Looking very cool! One additional question I had is that this wouldn't work with MCPToolset as it is right? So after this PR we'd need to add support for an SearchableMCPToolset as well in our MCP integration?

@vblagoje
Copy link
Copy Markdown
Member Author

Looking very cool! One additional question I had is that this wouldn't work with MCPToolset as it is right? So after this PR we'd need to add support for an SearchableMCPToolset as well in our MCP integration?

@sjrl @anakin87 I thought it would. It warms up tools so perhaps it should work. Still didn't get around to test but I will today or tomorrow. Will report back!

@vblagoje
Copy link
Copy Markdown
Member Author

@sjrl @anakin87 I tried it with itinerary agent and it doesn't work that well. Investigating...

Renamed the `query` parameter to `tool_keywords` and refined the
description to guide LLMs toward providing vocabulary from tool
names/descriptions rather than echoing user requests.

Before: LLMs often passed user intent like "south of france highlights"
After: LLMs provide tool vocabulary like "route weather search"

This improves BM25 matching since it relies on lexical overlap with
indexed tool names and descriptions.
Rename class and module for clarity:
- tool_search_toolset.py -> searchable_toolset.py
- ToolSearchToolset -> SearchableToolset
- Update all imports, tests, and release notes
@vblagoje vblagoje changed the title feat: Add ToolSearchToolset for dynamic tool discovery from large tool catalogs feat: Add SearchableToolset for dynamic tool discovery from large tool catalogs Jan 29, 2026
Remove the hand-rolled BM25L engine (~107 lines) and delegate to
Haystack's built-in InMemoryDocumentStore.bm25_retrieval(), which
uses the same algorithm and tokenization. Update tests accordingly.
Replace manual Tool construction with create_tool_from_function and
Annotated type hints in _create_search_tool. Reduce test duplication
by using create_tool_from_function for fixtures and large_catalog.
Consolidate 4 integration tests into 1 deterministic math test and
remove the redundant integration_catalog fixture.
@vblagoje
Copy link
Copy Markdown
Member Author

These were good improvements, one last thing I want to do is to make sure is that SearchableToolset is robust and battle tested - i.e. that it works even if tools are loaded lazily i.e in MCPToolset and potentially others. It works great now in RL demo with itinerary agent when MCPToolset is eager but not lazy. A few more days and I'll open this PR @sjrl @anakin87

Comment thread haystack/tools/searchable_toolset.py Outdated
Comment thread haystack/tools/searchable_toolset.py
Comment on lines +124 to +125
This method allows resetting the toolset's discovered tools between agent runs
when the same toolset instance is reused. This can be useful for long-running
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We say this in the docstrings which sounds nice, but I don't think our Agent is set up to call this method. If we think it should do that perhaps we should open up a follow-up issue to update Agent to utilize this method? (Not entirely sure what that would look like tbh).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread haystack/tools/searchable_toolset.py Outdated
Comment thread haystack/tools/searchable_toolset.py Outdated
Comment thread haystack/tools/searchable_toolset.py
Comment on lines +10 to +12
Key features include:
configurable search threshold for automatic passthrough mode, top-k result limiting,
and a ``clear()`` method to reset discovered tools between agent runs.
Copy link
Copy Markdown
Contributor

@sjrl sjrl Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still feel a little odd highlighting this since I don't know how I'd use clear in practice, nor is it highlighted in the code example

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the mention from the release note.

Just to share an example

from haystack.tools import create_tool_from_function, SearchableToolset
from haystack.components.agents import Agent
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Weather in {city}: 22°C, sunny"


def add_numbers(a: int, b: int) -> int:
    """Add two numbers together."""
    return a + b


def multiply_numbers(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


def get_stock_price(symbol: str) -> str:
    """Get stock price by ticker symbol."""
    return f"{symbol}: $150.00"


def search_database(query: str) -> str:
    """Search the database for records."""
    return f"Found 5 records matching '{query}'"


def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to a recipient."""
    return f"Email sent to {to}"


def calculate_tax(amount: float, rate: float) -> float:
    """Calculate tax on an amount."""
    return amount * rate


def convert_currency(amount: float, from_currency: str, to_currency: str) -> float:
    """Convert currency from one to another."""
    return amount * 1.1  # Simplified conversion


# Test fixtures
weather_tool =  create_tool_from_function(get_weather)
add_tool = create_tool_from_function(add_numbers)
multiply_tool = create_tool_from_function(multiply_numbers)
stock_tool = create_tool_from_function(get_stock_price)
search_tool = create_tool_from_function(search_database)
send_email_tool = create_tool_from_function(send_email)
calculate_tax_tool = create_tool_from_function(calculate_tax)
convert_currency_tool = create_tool_from_function(convert_currency)
large_catalog = [weather_tool, add_tool, multiply_tool, stock_tool, search_tool, send_email_tool, calculate_tax_tool, convert_currency_tool]


searchable_toolset = SearchableToolset(catalog=large_catalog)
agent = Agent(tools=searchable_toolset, chat_generator=OpenAIChatGenerator(model="gpt-4.1-mini"))

result= agent.run(messages=[ChatMessage.from_user("What's the weather in Milan?")])
print(result["messages"])
# ...

print(len(searchable_toolset))
# 2
print(searchable_toolset._discovered_tools)
# {'get_weather': Tool(name='get_weather', description='Get current weather for a city.', parameters={'properties': {'city': {'type': 'string'}}, 'required': ['city'], 'type': 'object'}, function=<function get_weather at 0x1017a6700>, outputs_to_string=None, inputs_from_state=None, outputs_to_state=None)}

searchable_toolset.clear()

print(len(searchable_toolset))
# 1
print(searchable_toolset._discovered_tools)
# {}


result= agent.run(messages=[ChatMessage.from_user("How many records in the database for query: 'apple'. Use the appropriate tool to search the database.")])
print(result["messages"])
# ...

print(len(searchable_toolset))
# 2
print(searchable_toolset._discovered_tools)
# {'search_database': Tool(name='search_database', description='Search the database for records.', parameters={'properties': {'query': {'type': 'string'}}, 'required': ['query'], 'type': 'object'}, function=<function search_database at 0x1189276a0>, outputs_to_string=None, inputs_from_state=None, outputs_to_state=None)}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example and opening the issue! The issue should help us figure out how this can be used in a production app rather than just in a script.

Copy link
Copy Markdown
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just two minor comments about how a user is practically meant to use the clear method. But probably something we can do in a follow up PR

@anakin87 anakin87 merged commit 6c01184 into main Feb 25, 2026
23 checks passed
@anakin87 anakin87 deleted the tool_search_tool branch February 25, 2026 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Tool Search Tool

5 participants