Skip to content

Python: [WIP] [Python] Add agent-framework-azure-contentunderstanding package (DO NOT REVIEW)#4829

Draft
yungshinlintw wants to merge 3 commits intomicrosoft:mainfrom
yungshinlintw:yslin/contentunderstanding-context-provider
Draft

Python: [WIP] [Python] Add agent-framework-azure-contentunderstanding package (DO NOT REVIEW)#4829
yungshinlintw wants to merge 3 commits intomicrosoft:mainfrom
yungshinlintw:yslin/contentunderstanding-context-provider

Conversation

@yungshinlintw
Copy link
Member

Draft -- Not Ready for Review

This PR adds agent-framework-azure-contentunderstanding, an optional connector package that integrates Azure Content Understanding (CU) into the Agent Framework as a context provider.

Status: Work in progress -- do not review yet.

What's Included

  • ContentUnderstandingContextProvider -- auto-analyzes file attachments (PDF, images, audio, video) via Azure CU and injects structured results (markdown, fields) into LLM context
  • Multi-document session state with status tracking (pending/ready/failed)
  • Configurable timeout with async background fallback
  • Output filtering (>90% token reduction) via AnalysisSection enum
  • Auto-registered list_documents() and get_analyzed_document() tools
  • Content limits enforcement
  • 46 unit tests, 91% coverage, all lint/type checks pass

Still TODO

  • Trim large invoice fixture (~199K lines)
  • Add load_settings() / env var support
  • Add audio/video fixture capture from live API
  • Add samples
  • Enforce max_pages / max_audio_duration_s / max_video_duration_s limits
  • Add telemetry (OpenTelemetry spans)

Install (preview)

pip install --pre agent-framework-azure-contentunderstanding

Add Azure Content Understanding integration as a context provider for the
Agent Framework. The package automatically analyzes file attachments
(documents, images, audio, video) using Azure CU and injects structured
results (markdown, fields) into the LLM context.

Key features:
- Multi-document session state with status tracking (pending/ready/failed)
- Configurable timeout with async background fallback for large files
- Output filtering via AnalysisSection enum
- Auto-registered list_documents() and get_analyzed_document() tools
- Supports all CU modalities: documents, images, audio, video
- Content limits enforcement (pages, file size, duration)
- Binary stripping of supported files from input messages

Public API:
- ContentUnderstandingContextProvider (main class)
- AnalysisSection (output section selector enum)
- ContentLimits (configurable limits dataclass)

Tests: 46 unit tests, 91% coverage, all linting and type checks pass.
- Replace synthetic fixtures with real CU API responses (sanitized)
- Update test assertions to match real data (Contoso vs CONTOSO,
  TotalAmount vs InvoiceTotal, field values from real analysis)
- Add --pre install note in README (preview package)
- Document unenforced ContentLimits fields (max_pages, duration)
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python labels Mar 22, 2026
@github-actions github-actions bot changed the title [WIP] [Python] Add agent-framework-azure-contentunderstanding package (DO NOT REVIEW) Python: [WIP] [Python] Add agent-framework-azure-contentunderstanding package (DO NOT REVIEW) Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants