Skip to content

Python: fix: use workflow factory to avoid RuntimeError under parallel requests#4772

Open
LEDazzio01 wants to merge 2 commits intomicrosoft:mainfrom
LEDazzio01:fix/4766-workflow-parallel-requests
Open

Python: fix: use workflow factory to avoid RuntimeError under parallel requests#4772
LEDazzio01 wants to merge 2 commits intomicrosoft:mainfrom
LEDazzio01:fix/4766-workflow-parallel-requests

Conversation

@LEDazzio01
Copy link
Contributor

Summary

Fixes #4766

The hosted agent sample at writer_reviewer_agents_in_workflow/main.py creates a single workflow agent and passes it directly to from_agent_framework(). When the hosted endpoint receives parallel requests, the shared Workflow instance attempts to run concurrently, raising:

RuntimeError: Workflow is already running. Concurrent executions are not allowed.

Root Cause

# Before: single shared agent instance — not safe for concurrent requests
agent = create_workflow(writer, reviewer)
await from_agent_framework(agent).run_async()

from_agent_framework() accepts either a pre-built agent or a factory callable. When given a factory, it creates a fresh agent per request, avoiding shared state.

Fix

# After: factory lambda creates a fresh workflow per request
await from_agent_framework(lambda: create_workflow(writer, reviewer)).run_async()

This ensures each incoming request gets its own Workflow instance, eliminating the RuntimeError under concurrent load.

Changes

  • python/samples/05-end-to-end/hosted_agents/writer_reviewer_agents_in_workflow/main.py: Pass a factory lambda to from_agent_framework() instead of a pre-built agent instance

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? No

…ts (microsoft#4766)

Pass a factory lambda to `from_agent_framework()` instead of a pre-built
agent instance so each hosted request gets a fresh workflow. Previously,
the single shared workflow would raise `RuntimeError: Workflow is already
running. Concurrent executions are not allowed.` when parallel requests
arrived.
Copilot AI review requested due to automatic review settings March 18, 2026 23:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a concurrency issue in the Python hosted-agent workflow sample by ensuring each incoming hosted request gets a fresh Workflow instance rather than reusing a shared one (which can raise RuntimeError: Workflow is already running... under parallel load).

Changes:

  • Update the hosted-agent sample to pass a workflow factory callable into from_agent_framework(...) so the workflow is constructed per request.
  • Keep the writer/reviewer agent creation separate from workflow instantiation.

You can also share your feedback on Copilot code review. Take the survey.

@LEDazzio01
Copy link
Contributor Author

Thanks for the review, Copilot! All 4 comments are about pre-existing code in the sample rather than the factory-lambda fix introduced in this PR. I'll keep the scope of this PR focused on the concurrency fix (#4766).

That said, these are valid observations about the original sample — particularly the MSI_ENDPOINT gating and env var naming inconsistency. These could be addressed in a follow-up PR if the maintainers would like.

Copy link
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 89%

✗ Correctness

The new sample has two correctness bugs: (1) it calls .run_async() on the object returned by from_agent_framework(), but all existing samples use .run()run_async likely does not exist and will raise AttributeError at runtime; (2) it passes a lambda factory to from_agent_framework() instead of an agent instance, diverging from every other hosted-agent sample which passes the agent directly. Additionally, the entire main() is needlessly async with asyncio.run() when the established pattern is a synchronous main() calling .run().

✓ Security Reliability

New sample introduces a custom credential-selection mechanism using the MSI_ENDPOINT environment variable instead of the established DefaultAzureCredential pattern used by all other hosted-agent samples. This hand-rolled approach is less robust (ignores IDENTITY_ENDPOINT used in newer Azure hosting, workload identity, and other credential types) and reduces reliability. No hard security vulnerabilities were found, but the credential deviation is a reliability concern worth addressing.

✓ Test Coverage

This diff adds a new end-to-end sample file demonstrating a writer-reviewer multi-agent workflow. The samples directory has no unit tests for any existing samples, so the absence of tests here is consistent with project convention. The file is a runnable demo requiring live Azure credentials, not library code that would warrant unit tests. No test coverage concerns specific to this change.

✗ Design Approach

This new sample hand-rolls credential selection by sniffing the MSI_ENDPOINT environment variable, while every other sample in the repo (~20+) uses DefaultAzureCredential — the Azure SDK's built-in credential chain that already handles ManagedIdentity, AzureCLI, environment variables, and more. The custom get_credential() function is a fragile reimplementation of existing SDK behavior: MSI_ENDPOINT is specific to older Azure App Service; newer compute (Container Apps, Functions v4, AKS workload identity) uses IDENTITY_ENDPOINT or other mechanisms. Since samples teach patterns, shipping this trains users to re-invent a credential chain that the SDK already provides correctly.

Flagged Issues

  • .run_async() does not appear to exist on the object returned by from_agent_framework(). All other hosted-agent samples use .run(). This will raise AttributeError at runtime.
  • from_agent_framework() is called with a lambda (factory function) instead of an agent instance. All other hosted-agent samples pass the agent directly (e.g., from_agent_framework(workflow_agent).run()). Passing a lambda may cause a type error or unexpected behavior at runtime.
  • Hand-rolled credential selection via MSI_ENDPOINT check is a fragile reimplementation of DefaultAzureCredential. MSI_ENDPOINT only exists on older App Service SKUs; newer Azure compute (Container Apps, Functions v4, AKS workload identity) uses IDENTITY_ENDPOINT or federation. Every other sample in this repo uses DefaultAzureCredential — this sample should too.

Suggestions

  • Follow the established pattern: make main() synchronous, build the agents/workflow inline, and call from_agent_framework(agent).run() directly — see agents_in_workflow/main.py for reference.
  • If AzureOpenAIResponsesClient requires async context-manager cleanup, consider whether the credential and client can be created without async with, or restructure so the agent is created before calling from_agent_framework.
  • Consider validating that PROJECT_ENDPOINT is set before passing it to AzureOpenAIResponsesClient, to provide a clear error message instead of silently falling through to an alternative initialization path.

Automated review by moonbox3's agents

@moonbox3
Copy link
Contributor

@LEDazzio01 once comments are addressed and no longer relevant, it's really helpful for us if you can please resolve them. Thanks.

- Switch to sync DefaultAzureCredential (matches all other samples)
- Use from_agent_framework(agent).run() instead of .run_async()
- Remove unnecessary async/asyncio patterns
- Change load_dotenv(override=True) to load_dotenv()

Addresses review feedback from @moonbox3 and Copilot.
@LEDazzio01
Copy link
Contributor Author

Thanks for the review @moonbox3! I've addressed all the feedback:

  • ✅ Switched to from_agent_framework(workflow_agent).run() — passes the agent directly, no lambda
  • ✅ Replaced hand-rolled credential with DefaultAzureCredential (from azure.identity, sync)
  • ✅ Changed load_dotenv(override=True)load_dotenv()
  • ✅ Removed all async patterns (asyncio, asynccontextmanager, async def main()) — main() is now fully synchronous, matching the agents_in_workflow reference sample

Re: Copilot's env var naming comment (PROJECT_ENDPOINT vs AZURE_AI_PROJECT_ENDPOINT) — that's pre-existing on main and out of scope for this bug-fix PR. Same for the missing supporting files comment — Dockerfile, agent.yaml, requirements.txt etc. already exist on main.

I'll resolve the comment threads now. Thanks!

@LEDazzio01 LEDazzio01 force-pushed the fix/4766-workflow-parallel-requests branch from eb740d6 to 49153de Compare March 19, 2026 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: Hosted agent sample reuses a single workflow instance and breaks under parallel requests

4 participants