Now it's time to introduce Image generation to the reference application using DALL-E. DALL-E is an artificial intelligence (AI) model that generates images from textual descriptions. DALL-E can create images of objects, scenes, and even abstract concepts based on the descriptive text provided to it. This capability allows for a wide range of creative possibilities, from illustrating ideas to creating entirely new visual concepts that might not exist in the real world.
In this challenge, you will deploy an Azure AI Foundry service capable of hosting DALL-E models and integrate it with the Semantic Kernel. You will also create a plugin to generate images using DALL-E from a text prompt.
-
Create an Azure AI Foundry Deployment for DALL-E in a region capable of hosting DALL-E models.
Now that you've deployed the DALL-E model, update the
.envfile you created in Challenge-02:# Add this to your existing .env file for the DALL-E model AZURE_OPENAI_TEXT_TO_IMAGE_DEPLOYMENT_NAME="your-dalle-deployment-name"Note: According to the Semantic Kernel documentation, when using Azure AI Foundry's DALL-E model, you can use the same API key and endpoint you've already configured, but you'll need to specify the deployment name for the text-to-image model.
-
Update the reference application by adding the DALL-E model to Semantic Kernel
NOTE: We are using Azure Open AI so the service name is AzureTextToImage and the base class is OpenAITextToImageBase any examples that use OpenAITextToImage also works with AzureTextToImage
The Semantic Kernel Documentation In-Depth Samples provides examples of using Text-to-Image models like DALL-E. Be sure to modify the sample to use an Azure AI Foundry model instead of an OpenAI.
-
Create a Semantic Kernel plugin to generate an image using DALL-E from a text prompt. The plugin should accept a text prompt and return the URL string for the image generated by DALL-E.
NOTE: We are using Azure Open AI so the service name is AzureTextToImage and the base class is OpenAITextToImageBase
-
A simple prompt to test the plugin
create a picture of a cute kitten wearing a hat -
Working with chat history to generate images
❗ Refresh browser to clear chat history before entering the next prompt
NOTE: Feel free to change the details of the story to make it your own.
Generate a detailed children's story about a dragon and a little girl that go on an adventure together❌ Without clearing the chat history, create an image from a scene in the story.
randomly choose a major scene from the story and create a cartoon style image💡 Set a breakpoint in the image plugin to view the generated prompt sent to the DALL-E model. Notice how the LLM summarized a scene from the story to generate a prompt for the text-to-image model.
❗Refresh browser to clear chat history before entering the next set of prompts
-
Write a prompt to call multiple plugins.
Create a prompt that calls the image plugin and at least one other plugin written in the previous challenges. Try to use as many plugins as you can in a single prompt.
-
Finally, Let's do some product design.
NOTE: Feel free to change the details of the product
In this final task, have the AI generate a product name, description and an image for a handheld teleporting device using a single prompt. This will require the AI to construct a multi-step plan that will:
1. Generate a product name 2. Generate a product description 3. Create a prompt from the name and description suitable for a text-to-image AI model 4. Call the image plugin with the generated prompt 5. Generate a prompt that will create a logo for the product 6. Call the image plugin again with the Logo prompt
💡 Set a breakpoint in the image plugin to view the generated prompt sent to the DALL-E model. Notice how the LLM summarized the product name and description to generate a prompt for the text-to-image model.
Here's a simplified view of how image generation works in your Semantic Kernel application:
flowchart LR
A[User Request] --> B[Semantic Kernel]
B --> C{Auto Function<br/>Selection}
C --> D[Image Plugin]
D --> E[DALL-E Service]
E --> F[Image URL]
F --> G[Response to User]
classDef userNode fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000
classDef kernelNode fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000
classDef pluginNode fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000
classDef serviceNode fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#000
class A,G userNode
class B,C kernelNode
class D,F pluginNode
class E serviceNode
For a clearer understanding of the step-by-step process:
sequenceDiagram
participant U as User
participant SK as Semantic Kernel
participant CS as Chat Service
participant IP as Image Plugin
participant DS as DALL-E Service
U->>SK: "Create a picture of a kitten"
SK->>CS: Process with Auto Function Choice
CS->>SK: Determine image generation needed
SK->>IP: generate_image_from_prompt()
IP->>IP: Validate service exists
IP->>DS: Generate image (1024x1024)
DS-->>IP: Return image URL
IP-->>SK: Image URL string
SK->>CS: Add to chat history
CS-->>U: Text response + Image URL
Your implementation showcases the modular plugin system:
graph TB
subgraph SK[Semantic Kernel]
CS[Chat Service]
FCB[Auto Function Choice]
end
subgraph Plugins[Available Plugins]
TP[Time]
GP[Geo]
WP[Weather]
WI[Work Items]
SP[Search]
IP[Image]
end
subgraph Azure[Azure Services]
ACM[Chat Model]
DM[DALL-E]
AS[AI Search]
TE[Text Embedding]
end
CS --> FCB
FCB --> Plugins
IP --> DM
SP --> AS
SP --> TE
CS --> ACM
classDef kernelStyle fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000
classDef pluginStyle fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px,color:#000
classDef azureStyle fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#000
class CS,FCB kernelStyle
class TP,GP,WP,WI,SP,IP pluginStyle
class ACM,DM,AS,TE azureStyle
Your implementation demonstrates several important patterns:
# Your ImagePlugin constructor ensures the service is available
if not kernel.get_service(type=AzureTextToImage):
raise Exception("Missing text-to-image service")
self.dalle3 = kernel.get_service(type=AzureTextToImage)The FunctionChoiceBehavior.Auto() setting in your chat completion allows the AI to automatically:
- Analyze user requests
- Determine when image generation is needed
- Call the appropriate plugin function
- Orchestrate multi-step workflows
Your current setup provides:
- Modularity: Each plugin (Time, Geo, Weather, Image, etc.) operates independently
- Composability: Multiple plugins can be called in a single conversation turn
- Extensibility: New plugins can be added without modifying existing code
- Service Abstraction: Plugins interact with Azure services through Semantic Kernel's service layer
sequenceDiagram
participant User
participant SK as Semantic Kernel
participant Chat as Chat Service
participant IP as Image Plugin
participant DALLE as DALL-E Service
User->>SK: "Create a picture of a cute kitten"
SK->>Chat: Process with Function Choice Behavior
Chat->>IP: generate_image_from_prompt()
IP->>DALLE: Generate image (1024x1024)
DALLE-->>IP: Return image URL
IP-->>Chat: Image URL
Chat-->>SK: Complete response
SK-->>User: Text + Image URL
When you request a product concept with both description and images, your implementation:
- Text Generation Phase: Uses the chat completion service to generate product name and description
- Prompt Optimization: The AI automatically creates DALL-E-optimized prompts from the product details
- Image Generation: Calls your Image Plugin multiple times (product image + logo)
- Response Integration: Combines all outputs into a cohesive response
Your chat_history.add_message(result) approach ensures:
- Context is maintained across turns
- Previous images can be referenced
- Follow-up image requests can build on prior conversation
This architecture showcases how Semantic Kernel's plugin system enables sophisticated AI workflows where language understanding, planning, and multi-modal generation work together seamlessly.
- Verify that your Image plugin can generate images from simple text prompts.
- Verify that your Image plugin can work with chat history to generate relevant images.
- Verify that your Image plugin can be called from a prompt that also calls other plugins.