Skip to content

Enhance Multimodal Support in LLM Workflow and Update Documentation#45

Open
eanzhao wants to merge 6 commits intodevfrom
codex/feat/2026-03-16_multimodal-llm-support
Open

Enhance Multimodal Support in LLM Workflow and Update Documentation#45
eanzhao wants to merge 6 commits intodevfrom
codex/feat/2026-03-16_multimodal-llm-support

Conversation

@eanzhao
Copy link
Contributor

@eanzhao eanzhao commented Mar 16, 2026

  • Expanded the LLM workflow to support multimodal input and output, including text, images, audio, and video.
  • Updated the ChatRequestEvent and ChatResponseEvent to include input_parts and output_parts for handling diverse content types.
  • Introduced new ContentPart and MediaContentEvent classes to encapsulate various media types and their properties.
  • Refactored the ChatRuntime and RoleGAgent to process and emit multimodal content effectively.
  • Enhanced documentation to reflect the new capabilities and provide clear guidelines for using multimodal features in workflows.

- Expanded the LLM workflow to support multimodal input and output, including text, images, audio, and video.
- Updated the `ChatRequestEvent` and `ChatResponseEvent` to include `input_parts` and `output_parts` for handling diverse content types.
- Introduced new `ContentPart` and `MediaContentEvent` classes to encapsulate various media types and their properties.
- Refactored the `ChatRuntime` and `RoleGAgent` to process and emit multimodal content effectively.
- Enhanced documentation to reflect the new capabilities and provide clear guidelines for using multimodal features in workflows.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5a50b5e50

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +257 to +258
TryGetStringByKeys(root, "image_base64", "imageBase64") ??
TryGetNestedMediaBase64(root, "image");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore legacy base64 aliases in tool media parser

This parser no longer checks the previously supported root aliases base64/data when extracting image payloads, so tool outputs that still return { "base64": "..." } will now fail TryExtractToolContentParts and be sent back as plain text JSON instead of multimodal ContentParts. That regresses existing tool-call flows that relied on the old schema and causes image outputs to be dropped from subsequent LLM turns.

Useful? React with 👍 / 👎.

Comment on lines +116 to +117
if (!TryParseContentPartKind(part.Type, out var kind))
continue;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject chat requests when all inputParts are unsupported

Unsupported inputParts are silently skipped here, and the normalizer still returns success even if every part is dropped; combined with the new prompt-or-inputParts gate in ChatEndpoints/WebSocket parser (which only checks raw part count), requests like inputParts:[{"type":"foo"}] now get accepted and execute with an empty derived prompt. This should be treated as INVALID_PROMPT (or invalid part type) instead of dispatching a blank run.

Useful? React with 👍 / 👎.

@eanzhao eanzhao changed the base branch from docs/2026-03-14_gagent-service-phase-1-design to dev March 21, 2026 06:05
eanzhao added 5 commits March 21, 2026 14:18
- Added a new CI job for testing and building the console-web application.
- Updated the `console_web` output in the CI workflow to include relevant paths.
- Introduced a new environment variable `AEVATAR_CONSOLE_PUBLIC_PATH` for configuring deployment paths.
- Refactored the public path resolution logic in the console-web configuration.
- Removed deprecated enriched graph API and related decoding logic from the console API.
- Updated authentication configuration to disable NyxID login when required environment variables are missing.
- Added type-checking step for console-web in the CI workflow.
- Removed redundant pnpm setup step to streamline the workflow.
- Updated architecture scorecard documentation to reflect successful compliance with all architecture guards.
- Fixed naming issues and improved clarity in documentation regarding project structure and CI pipeline integration.
…multimodal-llm-support

# Conflicts:
#	apps/aevatar-console-web/src/app.tsx
#	apps/aevatar-console-web/src/pages/actors/index.tsx
#	apps/aevatar-console-web/src/pages/observability/index.tsx
#	apps/aevatar-console-web/src/pages/overview/index.tsx
#	apps/aevatar-console-web/src/pages/playground/index.test.tsx
#	apps/aevatar-console-web/src/pages/playground/index.tsx
#	apps/aevatar-console-web/src/pages/primitives/index.tsx
#	apps/aevatar-console-web/src/pages/runs/index.tsx
#	apps/aevatar-console-web/src/pages/settings/index.tsx
#	apps/aevatar-console-web/src/pages/studio/components/StudioShell.test.tsx
#	apps/aevatar-console-web/src/pages/studio/index.test.tsx
#	apps/aevatar-console-web/src/pages/workflows/index.tsx
#	apps/aevatar-console-web/src/pages/yaml/index.test.tsx
#	apps/aevatar-console-web/src/pages/yaml/index.tsx
#	apps/aevatar-console-web/src/shared/api/consoleApi.ts
#	apps/aevatar-console-web/src/shared/api/decoders.ts
- Created WORKFLOW.md to define project workflow, including issue tracking, execution flow, and verification expectations.
- Added start-local.sh script to set up the local development environment, ensuring required commands are available and configuring necessary environment variables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant