NVIDIA · dnandakumar-nv · Feb 9, 2026 · Feb 9, 2026 · Feb 9, 2026 · Feb 10, 2026
@@ -17,7 +17,7 @@ limitations under the License.
 
 # NVIDIA NeMo Agent Toolkit Workflow as an A2A Client
 
-[Agent-to-Agent (A2A) Protocol](https://a2aproject.org/) is an open standard from the Linux Foundation that enables agent-to-agent communication and collaboration. The protocol standardizes how [agents](../components/agents/index.md) discover capabilities, delegate tasks, and exchange information.
+[Agent-to-Agent (A2A) Protocol](https://a2a-protocol.org) is an open standard from the Linux Foundation that enables agent-to-agent communication and collaboration. The protocol standardizes how [agents](../components/agents/index.md) discover capabilities, delegate tasks, and exchange information.
 
 You can create a [workflow](./about-building-workflows.md) that connects to remote A2A agents and provides a function interface for interacting with their capabilities.
 

@@ -39,12 +39,13 @@ class ObjectStoreItem:
 ```
 
 ### ObjectStore Interface
-The `ObjectStore` abstract interface defines the four standard operations:
+The `ObjectStore` abstract interface defines the four standard operations and a `read_dataframe()` convenience method:
 
 - **put_object(key, item)**: Store a new object with a unique key. Raises if the key already exists.
 - **upsert_object(key, item)**: Update (or inserts) an object with the given key.
 - **get_object(key)**: Retrieve an object by its key. Raises if the key doesn't exist.
 - **delete_object(key)**: Remove an object from the store. Raises if the key doesn't exist.
+- **read_dataframe(key, format)**: Read an object and parse it into a pandas DataFrame. The format is inferred from the key's file extension when not specified. Supported formats: `csv`, `json`, `jsonl`, `parquet`, `xls`. Subclasses may override this method for efficient native reads (for example, reading directly from a file path or executing a SQL query).
 
 ```python
 class ObjectStore(ABC):
@@ -63,11 +64,16 @@ class ObjectStore(ABC):
     @abstractmethod
     async def delete_object(self, key: str) -> None:
         ...
+
+    async def read_dataframe(self, key: str, format: str | None = None, **kwargs) -> "pd.DataFrame":
+        """Read an object and parse it as a pandas DataFrame."""
+        ...
 ```
 
 ## Included Object Stores
 The NeMo Agent Toolkit includes several object store providers:
 
+- **File Object Store**: Local filesystem storage. Used automatically when specifying `file_path` in evaluation dataset configuration. Overrides `read_dataframe()` to read files directly from disk for efficiency. See `packages/nvidia_nat_core/src/nat/object_store/file_object_store.py`
 - **In-Memory Object Store**: In-memory storage for development and testing. See `packages/nvidia_nat_core/src/nat/object_store/in_memory_object_store.py`
 - **S3 Object Store**: Amazon S3 and S3-compatible storage (like MinIO). See `packages/nvidia_nat_s3/src/nat/plugins/s3/s3_object_store.py`
 - **MySQL Object Store**: MySQL database-backed storage. See `packages/nvidia_nat_mysql/src/nat/plugins/mysql/mysql_object_store.py`

@@ -17,7 +17,7 @@ limitations under the License.
 
 # Agent-to-Agent Protocol (A2A)
 
-NVIDIA NeMo Agent Toolkit [Agent-to-Agent Protocol (A2A)](https://a2aproject.org/) integration includes:
+NVIDIA NeMo Agent Toolkit [Agent-to-Agent Protocol (A2A)](https://a2a-protocol.org) integration includes:
 * An [A2A client](../../build-workflows/a2a-client.md) to connect to and interact with remote A2A [agents](../agents/index.md).
 * An [A2A server](../../run-workflows/a2a-server.md) to publish [workflows](../../build-workflows/about-building-workflows.md) as A2A agents that can be discovered and invoked by other A2A clients.
 

@@ -0,0 +1,229 @@
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+<!-- path-check-skip-begin -->
+
+# Custom Data Sources for Evaluation
+
+:::{note}
+We recommend reading the [Evaluating NeMo Agent Toolkit Workflows](../../improve-workflows/evaluate.md) guide before proceeding with this detailed documentation.
+:::
+
+NeMo Agent Toolkit loads evaluation datasets through the [ObjectStore](../../build-workflows/object-store.md) subsystem. Built-in support covers common file formats (CSV, JSON, JSONL, Parquet, and Excel) and storage backends (local files, S3, Redis, MySQL). This guide shows how to configure dataset loading and how to create a custom ObjectStore implementation for novel data sources.
+
+## Loading from a Local File
+
+The simplest way to specify an evaluation dataset is the `file_path` shorthand. NeMo Agent Toolkit creates a transient `FileObjectStore` behind the scenes and infers the data format from the file extension.
+
+```yaml
+eval:
+  general:
+    dataset:
+      file_path: /data/eval.csv
+      structure:
+        question_key: input
+        answer_key: expected_output
+```
+
+## Loading from a Remote ObjectStore
+
+To load data from a configured ObjectStore (for example, S3), reference the store by name and provide the object key:
+
+```yaml
+object_stores:
+  s3_data:
+    _type: s3
+    bucket: my-eval-datasets
+
+eval:
+  general:
+    dataset:
+      object_store: s3_data
+      key: v2/eval.parquet
+```
+
+The named ObjectStore must be declared in the top-level `object_stores` section of your configuration file. Any ObjectStore backend that NeMo Agent Toolkit supports (S3, Redis, MySQL, or a custom implementation) can be used here.
+
+## Format Inference and Explicit Override
+
+The data format is inferred from the file extension of the `file_path` or `key`:
+
+| Extension | Format |
+|-----------|--------|
+| `.csv` | csv |
+| `.json` | json |
+| `.jsonl` | jsonl |
+| `.parquet` | parquet |
+| `.xls`, `.xlsx` | xls |
+
+If the file extension does not match the actual format, or if there is no extension, you can specify the format explicitly:
+
+```yaml
+eval:
+  general:
+    dataset:
+      file_path: /data/eval_data
+      format: csv
+```
+
+## Creating a Custom ObjectStore for Novel Data Sources
+
+If your evaluation data lives in a source not covered by the built-in ObjectStore providers (for example, a REST API, a database query, or a proprietary storage system), you can create a custom ObjectStore implementation. This replaces the former DatasetLoader plugin approach.
+
+### Example: API-backed ObjectStore
+
+The following example shows how to create an ObjectStore that fetches data from a custom REST API.
+
+<!-- path-check-skip-begin -->
+```python
+# my_plugin/api_object_store.py
+import httpx
+from nat.data_models.object_store import NoSuchKeyError
+from nat.object_store.interfaces import ObjectStore
+from nat.object_store.models import ObjectStoreItem
+from nat.utils.type_utils import override
+
+
+class ApiObjectStore(ObjectStore):
+    """ObjectStore that reads data from a REST API."""
+
+    def __init__(self, base_url: str, api_key: str) -> None:
+        self._base_url = base_url
+        self._api_key = api_key
+
+    @override
+    async def get_object(self, key: str) -> ObjectStoreItem:
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                f"{self._base_url}/datasets/{key}",
+                headers={"Authorization": f"Bearer {self._api_key}"},
+            )
+            if response.status_code == 404:
+                raise NoSuchKeyError(key)
+            response.raise_for_status()
+            return ObjectStoreItem(
+                data=response.content,
+                content_type=response.headers.get("content-type"),
+            )
+
+    @override
+    async def put_object(self, key: str, item: ObjectStoreItem) -> None:
+        raise NotImplementedError("Read-only store")
+
+    @override
+    async def upsert_object(self, key: str, item: ObjectStoreItem) -> None:
+        raise NotImplementedError("Read-only store")
+
+    @override
+    async def delete_object(self, key: str) -> None:
+        raise NotImplementedError("Read-only store")
+```
+<!-- path-check-skip-end -->
+
+For dataset loading, only `get_object` needs a real implementation. The base `ObjectStore.read_dataframe()` method will call `get_object` to fetch the raw bytes and then parse them into a pandas DataFrame using the inferred (or explicit) format.
+
+### Overriding `read_dataframe()` for Efficient Native Reads
+
+If your backend can produce a DataFrame directly (for example, via a SQL query or a native API), you can override `read_dataframe()` to skip the bytes-to-DataFrame parsing:
+
+<!-- path-check-skip-begin -->
+```python
+class ApiObjectStore(ObjectStore):
+    # ... (same as above)
+
+    @override
+    async def read_dataframe(self, key: str, format: str | None = None, **kwargs):
+        """Fetch data directly as a DataFrame from the API."""
+        import pandas as pd
+
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                f"{self._base_url}/datasets/{key}/records",
+                headers={"Authorization": f"Bearer {self._api_key}"},
+            )
+            response.raise_for_status()
+            return pd.DataFrame(response.json())
+```
+<!-- path-check-skip-end -->
+
+### Registering the Custom ObjectStore
+
+Create a config class and registration function following the standard ObjectStore plugin pattern:
+
+<!-- path-check-skip-begin -->
+```python
+# my_plugin/register.py
+from nat.builder.builder import Builder
+from nat.cli.register_workflow import register_object_store
+from nat.data_models.object_store import ObjectStoreBaseConfig
+
+
+class ApiObjectStoreConfig(ObjectStoreBaseConfig, name="api_store"):
+    base_url: str
+    api_key: str
+
+
+@register_object_store(config_type=ApiObjectStoreConfig)
+async def api_object_store(config: ApiObjectStoreConfig, _builder: Builder):
+    from .api_object_store import ApiObjectStore
+    yield ApiObjectStore(base_url=config.base_url, api_key=config.api_key)
+```
+<!-- path-check-skip-end -->
+
+Add an entry point in your `pyproject.toml` so that NeMo Agent Toolkit discovers the plugin automatically:
+
+```toml
+[project.entry-points.'nat.plugins']
+my_plugin = "my_plugin.register"
+```
+
+### Using the Custom ObjectStore for Evaluation
+
+Once registered, reference the custom ObjectStore in your evaluation configuration:
+
+```yaml
+object_stores:
+  my_api:
+    _type: api_store
+    base_url: https://data.example.com
+    api_key: ${API_KEY}
+
+eval:
+  general:
+    dataset:
+      object_store: my_api
+      key: eval-set-v3
+      format: json
+```
+
+## Built-in Format Support
+
+The following formats are supported for parsing evaluation datasets:
+
+| Format | Reader | Notes |
+|--------|--------|-------|
+| `csv` | `pandas.read_csv` | Default for `.csv` files |
+| `json` | `pandas.read_json` | Expects a JSON array of records |
+| `jsonl` | Custom JSONL reader | One JSON object per line |
+| `parquet` | `pandas.read_parquet` | Binary columnar format |
+| `xls` | `pandas.read_excel` | Requires `openpyxl`; covers `.xls` and `.xlsx` |
+
+For more details on ObjectStore configuration and the available built-in providers, see the [Object Stores](../../build-workflows/object-store.md) documentation.
+
+For details on how to create a custom ObjectStore provider, see [Adding an Object Store Provider](object-store.md).
+
+<!-- path-check-skip-end -->
@@ -27,6 +27,7 @@ Authentication Provider <./adding-an-authentication-provider.md>
 LLM Provider <./adding-an-llm-provider.md>
 Retriever <./adding-a-retriever.md>
 Evaluator <./custom-evaluator.md>
+Dataset Loader <./custom-dataset-loader.md>
 MCP Server Worker <./mcp-server.md>
 Memory Provider <./memory.md>
 Object Store Provider <./object-store.md>

@@ -26,7 +26,7 @@ This documentation presumes familiarity with the NeMo Agent Toolkit [object stor
    - **{py:class}`~nat.data_models.object_store.ObjectStoreBaseConfigT`**: A generic type alias for object store config classes.
 
 * **Object Store Interfaces**
-   - **{py:class}`~nat.object_store.interfaces.ObjectStore`** (abstract interface): The core interface for object store operations, including put, upsert, get, and delete operations.
+   - **{py:class}`~nat.object_store.interfaces.ObjectStore`** (abstract interface): The core interface for object store operations, including put, upsert, get, and delete operations, plus a non-abstract `read_dataframe()` method for parsing stored data into pandas DataFrames.
      ```python
      class ObjectStore(ABC):
         @abstractmethod
@@ -44,6 +44,11 @@ This documentation presumes familiarity with the NeMo Agent Toolkit [object stor
         @abstractmethod
         async def delete_object(self, key: str) -> None:
             ...
+
+        async def read_dataframe(self, key: str, format: str | None = None, **kwargs) -> "pd.DataFrame":
+            # Default: fetches bytes via get_object() and parses them.
+            # Subclasses may override for efficient native reads.
+            ...
      ```
 
 * **Object Store Models**

@@ -34,6 +34,7 @@ NeMo Agent Toolkit utilizes the this plugin system for all first party component
 NeMo Agent Toolkit currently supports the following plugin types:
 
 - **CLI Commands**: CLI commands extend the `nat` command-line interface with plugin-specific commands. For example, the MCP and A2A plugins provide their own CLI commands for client operations and server management. To register a CLI command, add an entry point in the `nat.cli` group.
+- **Data Sources**: [Evaluation datasets](../improve-workflows/evaluate.md#using-datasets) are loaded through the ObjectStore subsystem. Built-in support covers `json`, `jsonl`, `csv`, `xls`, and `parquet` formats from local files or remote ObjectStore backends. You can add support for novel data sources by creating a custom ObjectStore plugin. To register an ObjectStore, use the {py:deco}`nat.cli.register_workflow.register_object_store` decorator. See the [Custom Data Sources](./custom-components/custom-dataset-loader.md) documentation for a step-by-step guide.
 - **Embedder Clients**: [Embedder](../build-workflows/embedders.md) Clients are implementations of embedder providers, which are specific to a [LLM](../build-workflows/llms/index.md) framework. For example, when using the OpenAI embedder provider with the LangChain/LangGraph framework, the LangChain/LangGraph OpenAI embedder client needs to be registered. To register an embedder client, you can use the {py:deco}`nat.cli.register_workflow.register_embedder_client` decorator.
 - **Embedder Providers**: Embedder Providers are services that provide a way to embed text. For example, OpenAI and NVIDIA NIMs are embedder providers. To register an embedder provider, you can use the {py:deco}`nat.cli.register_workflow.register_embedder_provider` decorator.
 - **Evaluators**: [Evaluators](../improve-workflows/evaluate.md) are used by the evaluation framework to evaluate the performance of NeMo Agent Toolkit workflows. To register an evaluator, you can use the {py:deco}`nat.cli.register_workflow.register_evaluator` decorator.