Retrieval Augmented Generation (RAG) is an AI architecture pattern that combines retrieval mechanisms with generative AI models to produce more accurate, relevant, and contextually-appropriate responses. In a RAG system, when a query is received, the system first retrieves relevant information from a knowledge base or external data source. This retrieved information is then used to augment or condition the input to a generative AI model, enhancing its ability to provide accurate and comprehensive responses.
The RAG pattern is particularly valuable because it addresses two key limitations of large language models (LLMs): their potential to produce factually incorrect information (hallucinations) and their reliance on the data they were initially trained on, which may become outdated. By retrieving and incorporating up-to-date, relevant information from external sources, RAG systems can provide more current and accurate responses.
In RAG systems, the retrieval component typically involves searching a database, knowledge base, or other structured or unstructured data source to find information relevant to a given query. This retrieval can be based on traditional search techniques, keyword matching, or more sophisticated methods like semantic or vector search, which aim to capture the meaning behind the query rather than just matching specific terms.
The retrieved information is then fed into the generative model as conditioning context along with the original query. This additional context helps the model to generate a response that is not only conversationally appropriate but also grounded in the retrieved information. The generative model might directly incorporate facts from the retrieved documents, paraphrase them, or use them as a reference to ensure the coherence and factual accuracy of its output.
For instance, if a user asks about a company's employee safety policy, a RAG system might first retrieve specific company safety documents that mention safety protocols. The generative model could then use this information to provide an accurate response that reflects the company's specific policies, rather than providing generic information about safety practices or making unsupported assertions about the company's policies.
Your prompts (inputs), completions (outputs), embeddings, and training data are:
- NOT available to other customers.
- NOT available to OpenAI.
- NOT used to improve OpenAI models.
- NOT used to improve any Microsoft or 3rd party products or services.
- NOT used for automatically improving Azure AI Foundry models for your use in your resource (The models are stateless, unless you explicitly fine-tune models with your training data).
Your fine-tuned Azure AI Foundry models are available exclusively for your use. The Azure AI Foundry Service is fully controlled by Microsoft; Microsoft hosts the OpenAI models in Microsoft's Azure environment and the Service does NOT interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).
For more information on Data, privacy, and security for Azure AI Foundry Service visit this Link
Embedding models are a way to represent complex data, like words or images, as numbers so that computers can understand and work with them more easily.
Representation: Imagine you have words like "cat," "dog," and "apple." In an embedding model, each word is represented by a list of numbers (like coordinates in a space). For example, "cat" might be represented as [0.2, 0.5, 0.8] and "dog" as [0.3, 0.4, 0.9]. These numbers capture the meaning of the word in a way that the computer can process.
Similarity: Words with similar meanings will have similar numbers (or be close to each other in this space). For example, "cat" and "dog" might be close together, while "apple" would be farther away.
Training: To create these embeddings, the model is trained on lots of text. It learns patterns and relationships between words based on how they are used together. Once trained, it can represent any word as a list of numbers.
Usage: These embeddings are used in tasks like translating languages, finding similar items (like in a recommendation system), or even understanding sentences.
In essence, embedding models take something complex (like words or images) and turn them into a simple, consistent format (numbers) that machines can work with.
Document chunking is a technique used in AI, especially in Retrieval-Augmented Generation (RAG) models, to break down large documents into smaller, manageable pieces or "chunks." This makes it easier for the AI to process the relevant information. Imagine you have a long book and you want to find specific information quickly. Instead of reading the entire book, you divide it into chapters or sections. Each chunk can then be indexed and searched individually, making the retrieval process faster and more efficient.
In the context of RAG, these chunks are used to enhance the AI's ability to generate accurate and contextually relevant responses. When a query is made, the retrieval mechanism searches through these smaller chunks to find the most relevant information, which the model then uses to generate a coherent and informative answer. This method improves the performance and accuracy of AI models by ensuring they have access to the most pertinent data without being overwhelmed by the volume of information.
Completed at least Challenge 03 and have a functional version of the solution running and a good understanding of plugins.
In this challenge, you will create a Semantic Search Plugin that utilizes an Azure AI Search Index to retrieve information from the Contoso Handbook PDF. The purpose of the plugin is to enable the AI Model to answer questions about your own documents. We achieve this by converting the user's query into an embedding using a Text Embedding model. The embedding is then used to search the AI Search Index for the most relevant information.
- Deploy Azure AI Search
- Deploy Storage Account with CORS enabled
- Use AI Foundry to deploy a Text Embedding model
- Import documents
- Create a Semantic Search Plugin to query the AI Search Index
-
In the Azure Portal search for
AI Searchand select Create.- Create it in the same resource group and location as your AI Models.
- Change the pricing tier to Basic.
- Leave everything else as default, then click Review + create.
-
Once the AI Search resource is created, navigate to the resource.
- Grab the URL from the Overview section.
- Grab the Key from the Keys section.
-
In the Azure Portal search for
Storage Accountand select Create.- Create it in the same resource group and location as your AI Models.
- Leave everything as default, then click Review + create.
-
Once the Storage Account is created, navigate to the resource. Reference this screenshot for the CORS settings below.
-
Under the Settings section, click on Resource sharing (CORS).
-
Add 2 rows with the following values:
-
Row 1:
- Allowed origins:
https://documentintelligence.ai.azure.com - Allowed methods:
Select All - Allowed headers:
* - Exposed headers:
* - Max age:
120
- Allowed origins:
-
Row 2:
- Allowed origins:
https://ai.azure.com - Allowed methods:
GET, POST, OPTIONS, and PUT - Allowed headers:
* - Exposed headers:
* - Max age:
120
💡 If you are using AI Studio instead of Azure AI Foundry Studio, you will need to change the origin to
https://ai.azure.com. - Allowed origins:
-
-
Click Save.
-
-
Using Azure AI Foundry, deploy a Standard text-embedding-ada-002 model in the same deployment as your previous GPT-4o model.
Now that you have deployed all the necessary resources, you'll need to update your .env file with the appropriate configuration settings.
-
Review the
.env_templatefile in yoursrcdirectory to identify the required environment variables for Azure AI Search and Text Embeddings. -
Add the necessary environment variables to your existing
.envfile. The variables should include settings for:- Azure AI Search configuration
- Text Embedding model configuration
Ensure all required settings are properly configured before proceeding to the next step.
Hint: The Semantic Kernel will automatically detect environment variables with the correct naming conventions. Check the template file for the exact variable names needed.
-
In Azure AI Foundry click on Playground -> Chat Playground
-
Ensure the correct model is selected under Deployment. Then click the drop down Add your data -> Add new data source
-
Select Data Source =
Upload Files. -
Choose the AI Search Resource setup in the previous step.
-
For the Index Name use:
employeehandbook💡 The AI Search Index Name will be needed by the reference application
-
Click Next
-
Check Add Vector Search
-
Select your Azure OpenAI connection
-
Select the text-embedding-ada-002 model
-
Select your text-embedding-ada-002 model deployment that was created previously
-
Navigate back to the reference application and open the chat.py file. Register the service for Azure AI Foundry Text Embedding Generation with the Kernel.
💡 As an example, look at how you registered the AzureOpenAIChatCompletion service. Also note the 3 variables you added to the .env file: EMBEDDINGS_DEPLOYMODEL, AOI_ENDPOINT, AOI_API_KEY.
-
We will be using the Azure AI Search Vector Store connector. The plugin has already been provided for you in your plugins folder.
This is the Semantic Search Plugin to query the AI Search Index created earlier. This Plugin should take the users query and generate an embedding using the Text Embedding model. The embedding should then be used to query the AI Search Index containing the Contoso Handbook PDF and return the most relevant information.
💡 Note the 2 environment variables you added to the .env file: AZURE_AI_SEARCH_ENDPOINT, AZURE_AI_SEARCH_KEY.
-
The Sample RAG Plugin in the documentation maps the incoming data from AI Search to a class named
EmployeeHandbookModel.The properties on this class map to fields in the AI Search Index we created earlier. In the portal, you can navigate to the AI Search Index and see the fields that are available.
-
Add the plugin to Semantic Kernel in chat.py
💡 Add the AI Search plugin to the
load_plugins()helper method. The plugin requires the kernel instance to access the embedding service, so ensure the Text Embedding service is registered ininitialize_kernel()beforeload_plugins()is called.Below is the workflow handled by Semantic Kernel and your plugin:
LoadingsequenceDiagram participant C as Client participant S as Semantic Kernel participant A as AI box Contoso Search Plugin participant P as Plugin participant E as Embedding participant Search as Azure AI Search end C->>S: What are the steps for the Contoso Performance Reviews? activate C S->>+A: What are the steps for the Contoso Performance Reviews? A-->>-S: Call contoso_search function S->>+P: Query: Steps for the Contoso Performance Reviews? P->>+E: Convert query to Embedding E-->>-P: Embedding [19,324,12,.......] P->>+Search: Search Documents using Embedding Search-->>-P: Related Documents P-->>-S: Here are the related documents S->>+A: Results of contoso_search A-->>-S: The steps for the Contoso Performance Reviews are ... S->>C: Here are the steps for the Contoso Performance Reviews deactivate C -
Test the Plugin
Set a breakpoint in your plugin to verify that the Contoso search function is being called correctly. Review the incoming query, the generated embedding, and the search results returned from the AI Search Index.
Test the plugin by running the applications and asking the Chatbot questions about the Contoso Handbook. The Chatbot should be able to answer questions similar to the following:
-
What are the steps for the Contoso Performance Reviews? -
What is Contoso's policy on Data Security? -
Who do I contact at Contoso for questions regarding workplace safety?
-
The following diagram illustrates how the RAG pattern works with Azure AI Search to enhance AI responses with custom knowledge:
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#242424', 'primaryTextColor': '#fff', 'primaryBorderColor': '#555', 'lineColor': '#555', 'secondaryColor': '#444', 'tertiaryColor': '#333'}}}%%
flowchart LR
classDef indexClass fill:#4B6BFD,stroke:#333,stroke-width:1px,color:white
classDef queryClass fill:#28A745,stroke:#333,stroke-width:1px,color:white
classDef userClass fill:#6F42C1,stroke:#333,stroke-width:1px,color:white
classDef resultClass fill:#E83E8C,stroke:#333,stroke-width:1px,color:white
subgraph Indexing["Document Processing (Before Query Time)"]
direction LR
I1[Document intake]:::indexClass --> I2[Split into chunks]:::indexClass
I2 --> I3[Generate embeddings]:::indexClass
I3 --> I4[Store in vector DB]:::indexClass
end
subgraph Querying["Query Processing (At Runtime)"]
direction LR
Q1[User query]:::userClass --> Q2[Convert to embedding]:::queryClass
Q2 --> Q3[Search vector DB]:::queryClass
Q3 --> Q4[Retrieve relevant chunks]:::queryClass
Q4 --> Q5[Add chunks to prompt]:::queryClass
Q5 --> Q6[Send to AI model]:::queryClass
Q6 --> Q7[Generate enhanced response]:::resultClass
Q7 --> Q8[Display to user]:::userClass
end
I4 -.->|Vector similarity search| Q3
This diagram shows the complete RAG workflow:
-
Document Processing (Done before query time)
- Documents are ingested
- Split into smaller chunks
- Each chunk is converted into a vector embedding
- These embeddings are stored in the Azure AI Search vector database
-
Query Processing (At runtime)
- User asks a question
- The question is converted to a vector embedding
- This embedding is used to search the vector database for similar content
- The most relevant document chunks are retrieved
- The retrieved chunks are combined with the original query
- This enhanced prompt is sent to the AI model
- The model generates a response that incorporates specific knowledge from the documents
- User receives an accurate answer grounded in your organization's data
The RAG pattern ensures responses are factual, up-to-date, and relevant to your specific organization, eliminating "hallucinations" and providing access to information not in the model's original training data.
- Verify that you deployed the text-embedding-ada-002 Text Embedding model in Azure AI Foundry
- Verify that you deployed an AI Search Index and imported the Contoso Handbook PDF
- Verify that the Chatbot is able to answer questions about the Contoso Handbook by querying the AI Search Index using the Semantic Search Plugin
- Semantic Kernel Blog
- ChatGPT + Enterprise data with Azure AI Foundry and Cognitive Search
- Build Industry-Specific LLMs Using Retrieval Augmented Generation
These are optional challenges for those who want to further explore the capabilities of Semantic Search and plugins.
- Delete the AI Search Index and re-upload the Employee Handbook PDF changing the chunk size. Experiment with different chunk sizes and see how it affects the search results.
- Update the Semantic Search Plugin to return the top 3 most relevant search results, instead of just the top result.






