Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/build-workflows/a2a-client.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ limitations under the License.

# NVIDIA NeMo Agent Toolkit Workflow as an A2A Client

[Agent-to-Agent (A2A) Protocol](https://a2aproject.org/) is an open standard from the Linux Foundation that enables agent-to-agent communication and collaboration. The protocol standardizes how [agents](../components/agents/index.md) discover capabilities, delegate tasks, and exchange information.
[Agent-to-Agent (A2A) Protocol](https://a2a-protocol.org) is an open standard from the Linux Foundation that enables agent-to-agent communication and collaboration. The protocol standardizes how [agents](../components/agents/index.md) discover capabilities, delegate tasks, and exchange information.

You can create a [workflow](./about-building-workflows.md) that connects to remote A2A agents and provides a function interface for interacting with their capabilities.

Expand Down
8 changes: 7 additions & 1 deletion docs/source/build-workflows/object-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,13 @@ class ObjectStoreItem:
```

### ObjectStore Interface
The `ObjectStore` abstract interface defines the four standard operations:
The `ObjectStore` abstract interface defines the four standard operations and a `read_dataframe()` convenience method:

- **put_object(key, item)**: Store a new object with a unique key. Raises if the key already exists.
- **upsert_object(key, item)**: Update (or inserts) an object with the given key.
- **get_object(key)**: Retrieve an object by its key. Raises if the key doesn't exist.
- **delete_object(key)**: Remove an object from the store. Raises if the key doesn't exist.
- **read_dataframe(key, format)**: Read an object and parse it into a pandas DataFrame. The format is inferred from the key's file extension when not specified. Supported formats: `csv`, `json`, `jsonl`, `parquet`, `xls`. Subclasses may override this method for efficient native reads (for example, reading directly from a file path or executing a SQL query).

```python
class ObjectStore(ABC):
Expand All @@ -63,11 +64,16 @@ class ObjectStore(ABC):
@abstractmethod
async def delete_object(self, key: str) -> None:
...

async def read_dataframe(self, key: str, format: str | None = None, **kwargs) -> "pd.DataFrame":
"""Read an object and parse it as a pandas DataFrame."""
...
```

## Included Object Stores
The NeMo Agent Toolkit includes several object store providers:

- **File Object Store**: Local filesystem storage. Used automatically when specifying `file_path` in evaluation dataset configuration. Overrides `read_dataframe()` to read files directly from disk for efficiency. See `packages/nvidia_nat_core/src/nat/object_store/file_object_store.py`
- **In-Memory Object Store**: In-memory storage for development and testing. See `packages/nvidia_nat_core/src/nat/object_store/in_memory_object_store.py`
- **S3 Object Store**: Amazon S3 and S3-compatible storage (like MinIO). See `packages/nvidia_nat_s3/src/nat/plugins/s3/s3_object_store.py`
- **MySQL Object Store**: MySQL database-backed storage. See `packages/nvidia_nat_mysql/src/nat/plugins/mysql/mysql_object_store.py`
Expand Down
2 changes: 1 addition & 1 deletion docs/source/components/integrations/a2a.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ limitations under the License.

# Agent-to-Agent Protocol (A2A)

NVIDIA NeMo Agent Toolkit [Agent-to-Agent Protocol (A2A)](https://a2aproject.org/) integration includes:
NVIDIA NeMo Agent Toolkit [Agent-to-Agent Protocol (A2A)](https://a2a-protocol.org) integration includes:
* An [A2A client](../../build-workflows/a2a-client.md) to connect to and interact with remote A2A [agents](../agents/index.md).
* An [A2A server](../../run-workflows/a2a-server.md) to publish [workflows](../../build-workflows/about-building-workflows.md) as A2A agents that can be discovered and invoked by other A2A clients.

Expand Down
229 changes: 229 additions & 0 deletions docs/source/extend/custom-components/custom-dataset-loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

<!-- path-check-skip-begin -->

# Custom Data Sources for Evaluation

:::{note}
We recommend reading the [Evaluating NeMo Agent Toolkit Workflows](../../improve-workflows/evaluate.md) guide before proceeding with this detailed documentation.
:::

NeMo Agent Toolkit loads evaluation datasets through the [ObjectStore](../../build-workflows/object-store.md) subsystem. Built-in support covers common file formats (CSV, JSON, JSONL, Parquet, and Excel) and storage backends (local files, S3, Redis, MySQL). This guide shows how to configure dataset loading and how to create a custom ObjectStore implementation for novel data sources.

## Loading from a Local File

The simplest way to specify an evaluation dataset is the `file_path` shorthand. NeMo Agent Toolkit creates a transient `FileObjectStore` behind the scenes and infers the data format from the file extension.

```yaml
eval:
general:
dataset:
file_path: /data/eval.csv
structure:
question_key: input
answer_key: expected_output
```

## Loading from a Remote ObjectStore

To load data from a configured ObjectStore (for example, S3), reference the store by name and provide the object key:

```yaml
object_stores:
s3_data:
_type: s3
bucket: my-eval-datasets

eval:
general:
dataset:
object_store: s3_data
key: v2/eval.parquet
```

The named ObjectStore must be declared in the top-level `object_stores` section of your configuration file. Any ObjectStore backend that NeMo Agent Toolkit supports (S3, Redis, MySQL, or a custom implementation) can be used here.

## Format Inference and Explicit Override

The data format is inferred from the file extension of the `file_path` or `key`:

| Extension | Format |
|-----------|--------|
| `.csv` | csv |
| `.json` | json |
| `.jsonl` | jsonl |
| `.parquet` | parquet |
| `.xls`, `.xlsx` | xls |

If the file extension does not match the actual format, or if there is no extension, you can specify the format explicitly:

```yaml
eval:
general:
dataset:
file_path: /data/eval_data
format: csv
```

## Creating a Custom ObjectStore for Novel Data Sources

If your evaluation data lives in a source not covered by the built-in ObjectStore providers (for example, a REST API, a database query, or a proprietary storage system), you can create a custom ObjectStore implementation. This replaces the former DatasetLoader plugin approach.

### Example: API-backed ObjectStore

The following example shows how to create an ObjectStore that fetches data from a custom REST API.

<!-- path-check-skip-begin -->
```python
# my_plugin/api_object_store.py
import httpx
from nat.data_models.object_store import NoSuchKeyError
from nat.object_store.interfaces import ObjectStore
from nat.object_store.models import ObjectStoreItem
from nat.utils.type_utils import override


class ApiObjectStore(ObjectStore):
"""ObjectStore that reads data from a REST API."""

def __init__(self, base_url: str, api_key: str) -> None:
self._base_url = base_url
self._api_key = api_key

@override
async def get_object(self, key: str) -> ObjectStoreItem:
async with httpx.AsyncClient() as client:
response = await client.get(
f"{self._base_url}/datasets/{key}",
headers={"Authorization": f"Bearer {self._api_key}"},
)
if response.status_code == 404:
raise NoSuchKeyError(key)
response.raise_for_status()
return ObjectStoreItem(
data=response.content,
content_type=response.headers.get("content-type"),
)

@override
async def put_object(self, key: str, item: ObjectStoreItem) -> None:
raise NotImplementedError("Read-only store")

@override
async def upsert_object(self, key: str, item: ObjectStoreItem) -> None:
raise NotImplementedError("Read-only store")

@override
async def delete_object(self, key: str) -> None:
raise NotImplementedError("Read-only store")
```
<!-- path-check-skip-end -->

For dataset loading, only `get_object` needs a real implementation. The base `ObjectStore.read_dataframe()` method will call `get_object` to fetch the raw bytes and then parse them into a pandas DataFrame using the inferred (or explicit) format.

### Overriding `read_dataframe()` for Efficient Native Reads

If your backend can produce a DataFrame directly (for example, via a SQL query or a native API), you can override `read_dataframe()` to skip the bytes-to-DataFrame parsing:

<!-- path-check-skip-begin -->
```python
class ApiObjectStore(ObjectStore):
# ... (same as above)

@override
async def read_dataframe(self, key: str, format: str | None = None, **kwargs):
"""Fetch data directly as a DataFrame from the API."""
import pandas as pd

async with httpx.AsyncClient() as client:
response = await client.get(
f"{self._base_url}/datasets/{key}/records",
headers={"Authorization": f"Bearer {self._api_key}"},
)
response.raise_for_status()
return pd.DataFrame(response.json())
```
<!-- path-check-skip-end -->

### Registering the Custom ObjectStore

Create a config class and registration function following the standard ObjectStore plugin pattern:

<!-- path-check-skip-begin -->
```python
# my_plugin/register.py
from nat.builder.builder import Builder
from nat.cli.register_workflow import register_object_store
from nat.data_models.object_store import ObjectStoreBaseConfig


class ApiObjectStoreConfig(ObjectStoreBaseConfig, name="api_store"):
base_url: str
api_key: str


@register_object_store(config_type=ApiObjectStoreConfig)
async def api_object_store(config: ApiObjectStoreConfig, _builder: Builder):
from .api_object_store import ApiObjectStore
yield ApiObjectStore(base_url=config.base_url, api_key=config.api_key)
```
<!-- path-check-skip-end -->

Add an entry point in your `pyproject.toml` so that NeMo Agent Toolkit discovers the plugin automatically:

```toml
[project.entry-points.'nat.plugins']
my_plugin = "my_plugin.register"
```

### Using the Custom ObjectStore for Evaluation

Once registered, reference the custom ObjectStore in your evaluation configuration:

```yaml
object_stores:
my_api:
_type: api_store
base_url: https://data.example.com
api_key: ${API_KEY}

eval:
general:
dataset:
object_store: my_api
key: eval-set-v3
format: json
```

## Built-in Format Support

The following formats are supported for parsing evaluation datasets:

| Format | Reader | Notes |
|--------|--------|-------|
| `csv` | `pandas.read_csv` | Default for `.csv` files |
| `json` | `pandas.read_json` | Expects a JSON array of records |
| `jsonl` | Custom JSONL reader | One JSON object per line |
| `parquet` | `pandas.read_parquet` | Binary columnar format |
| `xls` | `pandas.read_excel` | Requires `openpyxl`; covers `.xls` and `.xlsx` |

For more details on ObjectStore configuration and the available built-in providers, see the [Object Stores](../../build-workflows/object-store.md) documentation.

For details on how to create a custom ObjectStore provider, see [Adding an Object Store Provider](object-store.md).

<!-- path-check-skip-end -->
1 change: 1 addition & 0 deletions docs/source/extend/custom-components/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Authentication Provider <./adding-an-authentication-provider.md>
LLM Provider <./adding-an-llm-provider.md>
Retriever <./adding-a-retriever.md>
Evaluator <./custom-evaluator.md>
Dataset Loader <./custom-dataset-loader.md>
MCP Server Worker <./mcp-server.md>
Memory Provider <./memory.md>
Object Store Provider <./object-store.md>
Expand Down
7 changes: 6 additions & 1 deletion docs/source/extend/custom-components/object-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ This documentation presumes familiarity with the NeMo Agent Toolkit [object stor
- **{py:class}`~nat.data_models.object_store.ObjectStoreBaseConfigT`**: A generic type alias for object store config classes.

* **Object Store Interfaces**
- **{py:class}`~nat.object_store.interfaces.ObjectStore`** (abstract interface): The core interface for object store operations, including put, upsert, get, and delete operations.
- **{py:class}`~nat.object_store.interfaces.ObjectStore`** (abstract interface): The core interface for object store operations, including put, upsert, get, and delete operations, plus a non-abstract `read_dataframe()` method for parsing stored data into pandas DataFrames.
```python
class ObjectStore(ABC):
@abstractmethod
Expand All @@ -44,6 +44,11 @@ This documentation presumes familiarity with the NeMo Agent Toolkit [object stor
@abstractmethod
async def delete_object(self, key: str) -> None:
...

async def read_dataframe(self, key: str, format: str | None = None, **kwargs) -> "pd.DataFrame":
# Default: fetches bytes via get_object() and parses them.
# Subclasses may override for efficient native reads.
...
```

* **Object Store Models**
Expand Down
1 change: 1 addition & 0 deletions docs/source/extend/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ NeMo Agent Toolkit utilizes the this plugin system for all first party component
NeMo Agent Toolkit currently supports the following plugin types:

- **CLI Commands**: CLI commands extend the `nat` command-line interface with plugin-specific commands. For example, the MCP and A2A plugins provide their own CLI commands for client operations and server management. To register a CLI command, add an entry point in the `nat.cli` group.
- **Data Sources**: [Evaluation datasets](../improve-workflows/evaluate.md#using-datasets) are loaded through the ObjectStore subsystem. Built-in support covers `json`, `jsonl`, `csv`, `xls`, and `parquet` formats from local files or remote ObjectStore backends. You can add support for novel data sources by creating a custom ObjectStore plugin. To register an ObjectStore, use the {py:deco}`nat.cli.register_workflow.register_object_store` decorator. See the [Custom Data Sources](./custom-components/custom-dataset-loader.md) documentation for a step-by-step guide.
- **Embedder Clients**: [Embedder](../build-workflows/embedders.md) Clients are implementations of embedder providers, which are specific to a [LLM](../build-workflows/llms/index.md) framework. For example, when using the OpenAI embedder provider with the LangChain/LangGraph framework, the LangChain/LangGraph OpenAI embedder client needs to be registered. To register an embedder client, you can use the {py:deco}`nat.cli.register_workflow.register_embedder_client` decorator.
- **Embedder Providers**: Embedder Providers are services that provide a way to embed text. For example, OpenAI and NVIDIA NIMs are embedder providers. To register an embedder provider, you can use the {py:deco}`nat.cli.register_workflow.register_embedder_provider` decorator.
- **Evaluators**: [Evaluators](../improve-workflows/evaluate.md) are used by the evaluation framework to evaluate the performance of NeMo Agent Toolkit workflows. To register an evaluator, you can use the {py:deco}`nat.cli.register_workflow.register_evaluator` decorator.
Expand Down
Loading