microsoft · yungshinlintw · Mar 21, 2026 · Mar 21, 2026 · Mar 22, 2026
diff --git a/python/packages/azure-contentunderstanding/.gitignore b/python/packages/azure-contentunderstanding/.gitignore
@@ -0,0 +1,3 @@
+# Local-only files (not committed)
+_local_only/
+*_local_only*
diff --git a/python/packages/azure-contentunderstanding/AGENTS.md b/python/packages/azure-contentunderstanding/AGENTS.md
@@ -0,0 +1,36 @@
+# AGENTS.md — azure-contentunderstanding
+
+## Package Overview
+
+`agent-framework-azure-contentunderstanding` integrates Azure Content Understanding (CU)
+into the Agent Framework as a context provider. It automatically analyzes file attachments
+(documents, images, audio, video) and injects structured results into the LLM context.
+
+## Public API
+
+| Symbol | Type | Description |
+|--------|------|-------------|
+| `ContentUnderstandingContextProvider` | class | Main context provider — extends `BaseContextProvider` |
+| `AnalysisSection` | enum | Output section selector (MARKDOWN, FIELDS, etc.) |
+| `ContentLimits` | dataclass | Configurable file size/page/duration limits |
+
+## Architecture
+
+- **`_context_provider.py`** — Main provider implementation. Overrides `before_run()` to detect
+  file attachments, call the CU API, manage session state with multi-document tracking,
+  and auto-register retrieval tools for follow-up turns.
+- **`_models.py`** — `AnalysisSection` enum, `ContentLimits` dataclass, `DocumentEntry` TypedDict.
+
+## Key Patterns
+
+- Follows the Azure AI Search context provider pattern (same lifecycle, config style).
+- Uses provider-scoped `state` dict for multi-document tracking across turns.
+- Auto-registers `list_documents()` and `get_analyzed_document()` tools via `context.extend_tools()`.
+- Configurable timeout (`max_wait`) with `asyncio.create_task()` background fallback.
+- Strips supported binary attachments from `input_messages` to prevent LLM API errors.
+
+## Running Tests
+
+```bash
+uv run poe test -P azure-contentunderstanding
+```
diff --git a/python/packages/azure-contentunderstanding/LICENSE b/python/packages/azure-contentunderstanding/LICENSE
@@ -0,0 +1,21 @@
+    MIT License
+
+    Copyright (c) Microsoft Corporation.
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE
diff --git a/python/packages/azure-contentunderstanding/README.md b/python/packages/azure-contentunderstanding/README.md
@@ -0,0 +1,91 @@
+# Azure Content Understanding for Microsoft Agent Framework
+
+[![PyPI](https://img.shields.io/pypi/v/agent-framework-azure-contentunderstanding)](https://pypi.org/project/agent-framework-azure-contentunderstanding/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+
+Azure Content Understanding (CU) integration for the [Microsoft Agent Framework](https://aka.ms/agent-framework). Provides a context provider that automatically analyzes file attachments (documents, images, audio, video) using Azure Content Understanding and injects structured results into the LLM context.
+
+## Installation
+
+```bash
+pip install --pre agent-framework-azure-contentunderstanding
+```
+
+> **Note:** This package is in preview. The `--pre` flag is required to install pre-release versions.
+
+## Quick Start
+
+```python
+from agent_framework import Agent, Message, Content
+from agent_framework.azure import AzureOpenAIResponsesClient
+from agent_framework_azure_contentunderstanding import ContentUnderstandingContextProvider
+from azure.identity import DefaultAzureCredential
+
+credential = DefaultAzureCredential()
+
+cu = ContentUnderstandingContextProvider(
+    endpoint="https://my-resource.cognitiveservices.azure.com/",
+    credential=credential,
+    analyzer_id="prebuilt-documentSearch",
+)
+
+async with cu, AzureOpenAIResponsesClient(credential=credential) as llm_client:
+    agent = Agent(client=llm_client, context_providers=[cu])
+
+    response = await agent.run(Message(role="user", contents=[
+        Content.from_text("What's on this invoice?"),
+        Content.from_data(pdf_bytes, "application/pdf",
+                          additional_properties={"filename": "invoice.pdf"}),
+    ]))
+    print(response.text)
+```
+
+## Features
+
+- **Automatic file detection** — Scans input messages for supported file attachments and analyzes them automatically.
+- **Multi-document sessions** — Tracks multiple analyzed documents per session with status tracking (`pending`/`ready`/`failed`).
+- **Background processing** — Configurable timeout with async background fallback for large files or slow analysis.
+- **Output filtering** — Passes only relevant sections (markdown, fields) to the LLM, reducing token usage by >90%.
+- **Auto-registered tools** — `list_documents()` and `get_analyzed_document()` tools let the LLM query status and retrieve cached content on follow-up turns.
+- **All CU modalities** — Documents, images, audio, and video via prebuilt or custom analyzers.
+
+## Supported File Types
+
+| Category | Types |
+|----------|-------|
+| Documents | PDF, DOCX, XLSX, PPTX, HTML, TXT, Markdown |
+| Images | JPEG, PNG, TIFF, BMP |
+| Audio | WAV, MP3, M4A, FLAC, OGG |
+| Video | MP4, MOV, AVI, WebM |
+
+## Configuration
+
+```python
+from agent_framework_azure_contentunderstanding import (
+    ContentUnderstandingContextProvider,
+    AnalysisSection,
+    ContentLimits,
+)
+
+cu = ContentUnderstandingContextProvider(
+    endpoint="https://my-resource.cognitiveservices.azure.com/",
+    credential=credential,
+    analyzer_id="my-custom-analyzer",     # default: "prebuilt-documentSearch"
+    max_wait=10.0,                        # default: 5.0 seconds
+    output_sections=[                     # default: MARKDOWN + FIELDS
+        AnalysisSection.MARKDOWN,
+        AnalysisSection.FIELDS,
+        AnalysisSection.FIELD_GROUNDING,
+    ],
+    content_limits=ContentLimits(         # default: 20 pages, 10 MB, 5 min audio, 2 min video
+        max_pages=50,
+        max_file_size_mb=50,
+    ),
+)
+```
+
+## Links
+
+- [Microsoft Agent Framework](https://aka.ms/agent-framework)
+- [Azure Content Understanding](https://learn.microsoft.com/azure/ai-services/content-understanding/)
+- [API Reference](https://learn.microsoft.com/python/api/azure-ai-contentunderstanding/)
diff --git a/...ackages/azure-contentunderstanding/agent_framework_azure_contentunderstanding/__init__.py b/...ackages/azure-contentunderstanding/agent_framework_azure_contentunderstanding/__init__.py
@@ -0,0 +1,18 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+import importlib.metadata
+
+from ._context_provider import ContentUnderstandingContextProvider
+from ._models import AnalysisSection, ContentLimits
+
+try:
+    __version__ = importlib.metadata.version(__name__)
+except importlib.metadata.PackageNotFoundError:
+    __version__ = "0.0.0"
+
+__all__ = [
+    "AnalysisSection",
+    "ContentLimits",
+    "ContentUnderstandingContextProvider",
+    "__version__",
+]