Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions python/packages/azure-contentunderstanding/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Local-only files (not committed)
_local_only/
*_local_only*
36 changes: 36 additions & 0 deletions python/packages/azure-contentunderstanding/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# AGENTS.md — azure-contentunderstanding

## Package Overview

`agent-framework-azure-contentunderstanding` integrates Azure Content Understanding (CU)
into the Agent Framework as a context provider. It automatically analyzes file attachments
(documents, images, audio, video) and injects structured results into the LLM context.

## Public API

| Symbol | Type | Description |
|--------|------|-------------|
| `ContentUnderstandingContextProvider` | class | Main context provider — extends `BaseContextProvider` |
| `AnalysisSection` | enum | Output section selector (MARKDOWN, FIELDS, etc.) |
| `ContentLimits` | dataclass | Configurable file size/page/duration limits |

## Architecture

- **`_context_provider.py`** — Main provider implementation. Overrides `before_run()` to detect
file attachments, call the CU API, manage session state with multi-document tracking,
and auto-register retrieval tools for follow-up turns.
- **`_models.py`** — `AnalysisSection` enum, `ContentLimits` dataclass, `DocumentEntry` TypedDict.

## Key Patterns

- Follows the Azure AI Search context provider pattern (same lifecycle, config style).
- Uses provider-scoped `state` dict for multi-document tracking across turns.
- Auto-registers `list_documents()` and `get_analyzed_document()` tools via `context.extend_tools()`.
- Configurable timeout (`max_wait`) with `asyncio.create_task()` background fallback.
- Strips supported binary attachments from `input_messages` to prevent LLM API errors.

## Running Tests

```bash
uv run poe test -P azure-contentunderstanding
```
21 changes: 21 additions & 0 deletions python/packages/azure-contentunderstanding/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE
91 changes: 91 additions & 0 deletions python/packages/azure-contentunderstanding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Azure Content Understanding for Microsoft Agent Framework

[![PyPI](https://img.shields.io/pypi/v/agent-framework-azure-contentunderstanding)](https://pypi.org/project/agent-framework-azure-contentunderstanding/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Azure Content Understanding (CU) integration for the [Microsoft Agent Framework](https://aka.ms/agent-framework). Provides a context provider that automatically analyzes file attachments (documents, images, audio, video) using Azure Content Understanding and injects structured results into the LLM context.

## Installation

```bash
pip install --pre agent-framework-azure-contentunderstanding
```

> **Note:** This package is in preview. The `--pre` flag is required to install pre-release versions.

## Quick Start

```python
from agent_framework import Agent, Message, Content
from agent_framework.azure import AzureOpenAIResponsesClient
from agent_framework_azure_contentunderstanding import ContentUnderstandingContextProvider
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()

cu = ContentUnderstandingContextProvider(
endpoint="https://my-resource.cognitiveservices.azure.com/",
credential=credential,
analyzer_id="prebuilt-documentSearch",
)

async with cu, AzureOpenAIResponsesClient(credential=credential) as llm_client:
agent = Agent(client=llm_client, context_providers=[cu])

response = await agent.run(Message(role="user", contents=[
Content.from_text("What's on this invoice?"),
Content.from_data(pdf_bytes, "application/pdf",
additional_properties={"filename": "invoice.pdf"}),
]))
print(response.text)
```

## Features

- **Automatic file detection** — Scans input messages for supported file attachments and analyzes them automatically.
- **Multi-document sessions** — Tracks multiple analyzed documents per session with status tracking (`pending`/`ready`/`failed`).
- **Background processing** — Configurable timeout with async background fallback for large files or slow analysis.
- **Output filtering** — Passes only relevant sections (markdown, fields) to the LLM, reducing token usage by >90%.
- **Auto-registered tools** — `list_documents()` and `get_analyzed_document()` tools let the LLM query status and retrieve cached content on follow-up turns.
- **All CU modalities** — Documents, images, audio, and video via prebuilt or custom analyzers.

## Supported File Types

| Category | Types |
|----------|-------|
| Documents | PDF, DOCX, XLSX, PPTX, HTML, TXT, Markdown |
| Images | JPEG, PNG, TIFF, BMP |
| Audio | WAV, MP3, M4A, FLAC, OGG |
| Video | MP4, MOV, AVI, WebM |

## Configuration

```python
from agent_framework_azure_contentunderstanding import (
ContentUnderstandingContextProvider,
AnalysisSection,
ContentLimits,
)

cu = ContentUnderstandingContextProvider(
endpoint="https://my-resource.cognitiveservices.azure.com/",
credential=credential,
analyzer_id="my-custom-analyzer", # default: "prebuilt-documentSearch"
max_wait=10.0, # default: 5.0 seconds
output_sections=[ # default: MARKDOWN + FIELDS
AnalysisSection.MARKDOWN,
AnalysisSection.FIELDS,
AnalysisSection.FIELD_GROUNDING,
],
content_limits=ContentLimits( # default: 20 pages, 10 MB, 5 min audio, 2 min video
max_pages=50,
max_file_size_mb=50,
),
)
```

## Links

- [Microsoft Agent Framework](https://aka.ms/agent-framework)
- [Azure Content Understanding](https://learn.microsoft.com/azure/ai-services/content-understanding/)
- [API Reference](https://learn.microsoft.com/python/api/azure-ai-contentunderstanding/)
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Copyright (c) Microsoft. All rights reserved.

import importlib.metadata

from ._context_provider import ContentUnderstandingContextProvider
from ._models import AnalysisSection, ContentLimits

try:
__version__ = importlib.metadata.version(__name__)
except importlib.metadata.PackageNotFoundError:
__version__ = "0.0.0"

__all__ = [
"AnalysisSection",
"ContentLimits",
"ContentUnderstandingContextProvider",
"__version__",
]
Loading
Loading