From 3c51b88861b6cc3753c32451164fdaf2d812a6b0 Mon Sep 17 00:00:00 2001 From: bilgeyucel Date: Thu, 26 Mar 2026 17:18:14 +0100 Subject: [PATCH] Add alternative LLM models --- index.toml | 2 +- tutorials/27_First_RAG_Pipeline.ipynb | 1802 ++++++++++++++++--------- 2 files changed, 1192 insertions(+), 612 deletions(-) diff --git a/index.toml b/index.toml index 2b92622..b6559e4 100644 --- a/index.toml +++ b/index.toml @@ -12,7 +12,7 @@ notebook = "27_First_RAG_Pipeline.ipynb" aliases = [] completion_time = "10 min" created_at = 2023-12-05 -dependencies = ["datasets>=2.6.1", "sentence-transformers>=4.1.0"] +dependencies = ["datasets>=2.6.1", "sentence-transformers>=4.1.0", "mistral-haystack"] featured = true [[tutorial]] diff --git a/tutorials/27_First_RAG_Pipeline.ipynb b/tutorials/27_First_RAG_Pipeline.ipynb index f24153b..9e9937a 100644 --- a/tutorials/27_First_RAG_Pipeline.ipynb +++ b/tutorials/27_First_RAG_Pipeline.ipynb @@ -1,617 +1,1197 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "2OvkPji9O-qX" - }, - "source": [ - "# Tutorial: Creating Your First QA Pipeline with Retrieval-Augmentation\n", - "\n", - "- **Level**: Beginner\n", - "- **Time to complete**: 10 minutes\n", - "- **Components Used**: [`InMemoryDocumentStore`](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore), [`SentenceTransformersDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [`SentenceTransformersTextEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder), [`InMemoryEmbeddingRetriever`](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever), [`PromptBuilder`](https://docs.haystack.deepset.ai/docs/promptbuilder), [`OpenAIChatGenerator`](https://docs.haystack.deepset.ai/docs/openaichatgenerator)\n", - "- **Prerequisites**: You must have an [OpenAI API Key](https://platform.openai.com/api-keys).\n", - "- **Goal**: After completing this tutorial, you'll have learned the new prompt syntax and how to use PromptBuilder and OpenAIChatGenerator to build a generative question-answering pipeline with retrieval-augmentation." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "LFqHcXYPO-qZ" - }, - "source": [ - "## Overview\n", - "\n", - "This tutorial shows you how to create a generative question-answering pipeline using the retrieval-augmentation ([RAG](https://www.deepset.ai/blog/llms-retrieval-augmentation)) approach with Haystack. The process involves four main components: [SentenceTransformersTextEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder) for creating an embedding for the user query, [InMemoryBM25Retriever](https://docs.haystack.deepset.ai/docs/inmemorybm25retriever) for fetching relevant documents, [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder) for creating a template prompt, and [OpenAIChatGenerator](https://docs.haystack.deepset.ai/docs/openaichatgenerator) for generating responses.\n", - "\n", - "For this tutorial, you'll use the Wikipedia pages of [Seven Wonders of the Ancient World](https://en.wikipedia.org/wiki/Wonders_of_the_World) as Documents, but you can replace them with any text you want.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Kww5B_vXO-qZ" - }, - "source": [ - "## Installing Haystack\n", - "\n", - "Install Haystack and other required packages with `pip`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "UQbU8GUfO-qZ", - "outputId": "c33579e9-5557-43bd-a3c5-63b8373770c7" - }, - "outputs": [], - "source": [ - "%%bash\n", - "\n", - "pip install haystack-ai\n", - "pip install \"datasets>=2.6.1\"\n", - "pip install \"sentence-transformers>=4.1.0\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "_lvfew16O-qa" - }, - "source": [ - "## Fetching and Indexing Documents\n", - "\n", - "You'll start creating your question answering system by downloading the data and indexing the data with its embeddings to a DocumentStore. \n", - "\n", - "In this tutorial, you will take a simple approach to writing documents and their embeddings into the DocumentStore. For a full indexing pipeline with preprocessing, cleaning and splitting, check out our tutorial on [Preprocessing Different File Types](https://haystack.deepset.ai/tutorials/30_file_type_preprocessing_index_pipeline).\n", - "\n", - "\n", - "### Initializing the DocumentStore\n", - "\n", - "Initialize a DocumentStore to index your documents. A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. In this tutorial, you'll be using the `InMemoryDocumentStore`." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "id": "CbVN-s5LO-qa" - }, - "outputs": [], - "source": [ - "from haystack.document_stores.in_memory import InMemoryDocumentStore\n", - "\n", - "document_store = InMemoryDocumentStore()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "yL8nuJdWO-qa" - }, - "source": [ - "> `InMemoryDocumentStore` is the simplest DocumentStore to get started with. It requires no external dependencies and it's a good option for smaller projects and debugging. But it doesn't scale up so well to larger Document collections, so it's not a good choice for production systems. To learn more about the different types of external databases that Haystack supports, see [DocumentStore Integrations](https://haystack.deepset.ai/integrations?type=Document+Store)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XvLVaFHTO-qb" - }, - "source": [ - "The DocumentStore is now ready. Now it's time to fill it with some Documents." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "HryYZP9ZO-qb" - }, - "source": [ - "### Fetch the Data\n", - "\n", - "You'll use the Wikipedia pages of [Seven Wonders of the Ancient World](https://en.wikipedia.org/wiki/Wonders_of_the_World) as Documents. We preprocessed the data and uploaded to a Hugging Face Space: [Seven Wonders](https://huggingface.co/datasets/bilgeyucel/seven-wonders). Thus, you don't need to perform any additional cleaning or splitting.\n", - "\n", - "Fetch the data and convert it into Haystack Documents:" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "INdC3WvLO-qb", - "outputId": "1af43d0f-2999-4de4-d152-b3cca9fb49e6" - }, - "outputs": [], - "source": [ - "from datasets import load_dataset\n", - "from haystack import Document\n", - "\n", - "dataset = load_dataset(\"bilgeyucel/seven-wonders\", split=\"train\")\n", - "docs = [Document(content=doc[\"content\"], meta=doc[\"meta\"]) for doc in dataset]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "czMjWwnxPA-3" - }, - "source": [ - "### Initalize a Document Embedder\n", - "\n", - "To store your data in the DocumentStore with embeddings, initialize a [SentenceTransformersDocumentEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder) with the model name and call `warm_up()` to download the embedding model.\n", - "\n", - "> If you'd like, you can use a different [Embedder](https://docs.haystack.deepset.ai/docs/embedders) for your documents." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "EUmAH9sEn3R7", - "outputId": "ee54b59b-4d4a-45eb-c1a9-0b7b248f1dd4" - }, - "outputs": [], - "source": [ - "from haystack.components.embedders import SentenceTransformersDocumentEmbedder\n", - "\n", - "doc_embedder = SentenceTransformersDocumentEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")\n", - "doc_embedder.warm_up()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9y4iJE_SrS4K" - }, - "source": [ - "### Write Documents to the DocumentStore\n", - "\n", - "Run the `doc_embedder` with the Documents. The embedder will create embeddings for each document and save these embeddings in Document object's `embedding` field. Then, you can write the Documents to the DocumentStore with `write_documents()` method." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 66, - "referenced_widgets": [ - "7d482188c12d4a7886f20a65d3402c59", - "2a3ec74419ae4a02ac0210db66133415", - "ddeff9a822404adbbc3cad97a939bc0c", - "36d341ab3a044709b5af2e8ab97559bc", - "88fc33e1ab78405e911b5eafa512c935", - "91e5d4b0ede848319ef0d3b558d57d19", - "d2428c21707d43f2b6f07bfafbace8bb", - "7fdb2c859e454e72888709a835f7591e", - "6b8334e071a3438397ba6435aac69f58", - "5f5cfa425cac4d37b2ea29e53b4ed900", - "3c59a82dac5c476b9a3e3132094e1702" - ] - }, - "id": "ETpQKftLplqh", - "outputId": "b9c8658c-90c8-497c-e765-97487c0daf8e" - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Batches: 0%| | 0/5 [00:00 ⚠️ Notice that you used `sentence-transformers/all-MiniLM-L6-v2` model to create embeddings for your documents before. This is why you need to use the same model to embed the user queries." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": { - "id": "LyJY2yW628dl" - }, - "outputs": [], - "source": [ - "from haystack.components.embedders import SentenceTransformersTextEmbedder\n", - "\n", - "text_embedder = SentenceTransformersTextEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0_cj-5m-O-qb" - }, - "source": [ - "### Initialize the Retriever\n", - "\n", - "Initialize a [InMemoryEmbeddingRetriever](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever) and make it use the InMemoryDocumentStore you initialized earlier in this tutorial. This Retriever will get the relevant documents to the query." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": { - "id": "-uo-6fjiO-qb" - }, - "outputs": [], - "source": [ - "from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n", - "\n", - "retriever = InMemoryEmbeddingRetriever(document_store)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6CEuQpB7O-qb" - }, - "source": [ - "### Define a Template Prompt\n", - "\n", - "Create a custom prompt for a generative question answering task using the RAG approach. The prompt should take in two parameters: `documents`, which are retrieved from a document store, and a `question` from the user. Use the Jinja2 looping syntax to combine the content of the retrieved documents in the prompt.\n", - "\n", - "Next, initialize a [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder) instance with your prompt template. The PromptBuilder, when given the necessary values, will automatically fill in the variable values and generate a complete prompt. This approach allows for a more tailored and effective question-answering experience." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": { - "id": "ObahTh45FqOT" - }, - "outputs": [], - "source": [ - "from haystack.components.builders import ChatPromptBuilder\n", - "from haystack.dataclasses import ChatMessage\n", - "\n", - "template = [\n", - " ChatMessage.from_user(\n", - " \"\"\"\n", - "Given the following information, answer the question.\n", - "\n", - "Context:\n", - "{% for document in documents %}\n", - " {{ document.content }}\n", - "{% endfor %}\n", - "\n", - "Question: {{question}}\n", - "Answer:\n", - "\"\"\"\n", - " )\n", - "]\n", - "\n", - "prompt_builder = ChatPromptBuilder(template=template)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "HR14lbfcFtXj" - }, - "source": [ - "### Initialize a ChatGenerator\n", - "\n", - "\n", - "ChatGenerators are the components that interact with large language models (LLMs). Now, set `OPENAI_API_KEY` environment variable and initialize a [OpenAIChatGenerator](https://docs.haystack.deepset.ai/docs/openaichatgenerator) that can communicate with OpenAI GPT models. As you initialize, provide a model name:" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "SavE_FAqfApo", - "outputId": "1afbf2e8-ae63-41ff-c37f-5123b2103356" - }, - "outputs": [], - "source": [ - "import os\n", - "from getpass import getpass\n", - "from haystack.components.generators.chat import OpenAIChatGenerator\n", - "\n", - "if \"OPENAI_API_KEY\" not in os.environ:\n", - " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")\n", - "chat_generator = OpenAIChatGenerator(model=\"gpt-4o-mini\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nenbo2SvycHd" - }, - "source": [ - "> You can replace `OpenAIChatGenerator` in your pipeline with another `ChatGenerator`. Check out the full list of chat generators [here](https://docs.haystack.deepset.ai/docs/generators)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1bfHwOQwycHe" - }, - "source": [ - "### Build the Pipeline\n", - "\n", - "To build a pipeline, add all components to your pipeline and connect them. Create connections from `text_embedder`'s \"embedding\" output to \"query_embedding\" input of `retriever`, from `retriever` to `prompt_builder` and from `prompt_builder` to `llm`. Explicitly connect the output of `retriever` with \"documents\" input of the `prompt_builder` to make the connection obvious as `prompt_builder` has two inputs (\"documents\" and \"question\").\n", - "\n", - "For more information on pipelines and creating connections, refer to [Creating Pipelines](https://docs.haystack.deepset.ai/docs/creating-pipelines) documentation." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 1000 - }, - "id": "f6NFmpjEO-qb", - "outputId": "89fd1b48-5189-4401-9cf8-15f55c503676" - }, - "outputs": [], - "source": [ - "from haystack import Pipeline\n", - "\n", - "basic_rag_pipeline = Pipeline()\n", - "# Add components to your pipeline\n", - "basic_rag_pipeline.add_component(\"text_embedder\", text_embedder)\n", - "basic_rag_pipeline.add_component(\"retriever\", retriever)\n", - "basic_rag_pipeline.add_component(\"prompt_builder\", prompt_builder)\n", - "basic_rag_pipeline.add_component(\"llm\", chat_generator)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "\n", - "🚅 Components\n", - " - text_embedder: SentenceTransformersTextEmbedder\n", - " - retriever: InMemoryEmbeddingRetriever\n", - " - prompt_builder: ChatPromptBuilder\n", - " - llm: OpenAIChatGenerator\n", - "🛤️ Connections\n", - " - text_embedder.embedding -> retriever.query_embedding (List[float])\n", - " - retriever.documents -> prompt_builder.documents (List[Document])\n", - " - prompt_builder.prompt -> llm.messages (List[ChatMessage])" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial: Creating Your First QA Pipeline with Retrieval-Augmentation\n", + "\n", + "- **Level**: Beginner\n", + "- **Time to complete**: 10 minutes\n", + "- **Components Used**: [`InMemoryDocumentStore`](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore), [`SentenceTransformersDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [`SentenceTransformersTextEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder), [`InMemoryEmbeddingRetriever`](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever), [`ChatPromptBuilder`](https://docs.haystack.deepset.ai/docs/chatpromptbuilder), and a [`ChatGenerator`](https://docs.haystack.deepset.ai/docs/generators) such as [`OpenAIChatGenerator`](https://docs.haystack.deepset.ai/docs/openaichatgenerator), [`MistralChatGenerator`](https://docs.haystack.deepset.ai/docs/mistralchatgenerator), or [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator).\n", + "- **Prerequisites**: Access to a large language model, either an **API key** from a provider or a **locally or on-premises hosted** model (for example on Colab runtime).\n", + "- **Goal**: After completing this tutorial, you'll have learned the new prompt syntax and how to use ChatPromptBuilder with a ChatGenerator to build a generative question-answering pipeline with retrieval-augmentation." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LFqHcXYPO-qZ" + }, + "source": [ + "## Overview\n", + "\n", + "This tutorial shows you how to create a generative question-answering pipeline using the retrieval-augmentation ([RAG](https://www.deepset.ai/blog/llms-retrieval-augmentation)) approach with Haystack. The process involves four main components: [SentenceTransformersTextEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder) for creating an embedding for the user query, [InMemoryEmbeddingRetriever](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever) for fetching relevant documents, [ChatPromptBuilder](https://docs.haystack.deepset.ai/docs/chatpromptbuilder) for creating a template prompt, and a [ChatGenerator](https://docs.haystack.deepset.ai/docs/generators) for generating the final answer.\n", + "\n", + "The LLM behind that generator can be **hosted in the cloud**, for example with [OpenAI](https://haystack.deepset.ai/integrations/openai), [Anthropic](https://haystack.deepset.ai/integrations/anthropic), [Google](https://haystack.deepset.ai/integrations/google-genai), [Mistral](https://haystack.deepset.ai/integrations/mistral), or other providers, usually by setting an API key in the environment or **run locally**, for example via [Ollama](https://haystack.deepset.ai/integrations/ollama) or [vLLM](https://haystack.deepset.ai/integrations/vllm), or **on a Colab VM** by loading an open-weight model from Hugging Face. The *Initialize a ChatGenerator* section shows three concrete options (OpenAI, Mistral, and a local model).\n", + "\n", + "For this tutorial, you'll use the Wikipedia pages of [Seven Wonders of the Ancient World](https://en.wikipedia.org/wiki/Wonders_of_the_World) as Documents, but you can replace them with any text you want.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Kww5B_vXO-qZ" + }, + "source": [ + "## Installing Haystack\n", + "\n", + "Install Haystack and other required packages with `pip`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UQbU8GUfO-qZ", + "outputId": "7a4ef73b-8822-467e-9979-a75169b36729" + }, + "outputs": [], + "source": [ + "%%bash\n", + "\n", + "pip install haystack-ai mistral-haystack \"datasets>=2.6.1\" \"sentence-transformers>=4.1.0\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_lvfew16O-qa" + }, + "source": [ + "## Fetching and Indexing Documents\n", + "\n", + "You'll start creating your question answering system by downloading the data and indexing the data with its embeddings to a DocumentStore.\n", + "\n", + "In this tutorial, you will take a simple approach to writing documents and their embeddings into the DocumentStore. For a full indexing pipeline with preprocessing, cleaning and splitting, check out our tutorial on [Preprocessing Different File Types](https://haystack.deepset.ai/tutorials/30_file_type_preprocessing_index_pipeline).\n", + "\n", + "\n", + "### Initializing the DocumentStore\n", + "\n", + "Initialize a DocumentStore to index your documents. A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. In this tutorial, you'll be using the `InMemoryDocumentStore`." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "CbVN-s5LO-qa" + }, + "outputs": [], + "source": [ + "from haystack.document_stores.in_memory import InMemoryDocumentStore\n", + "\n", + "document_store = InMemoryDocumentStore()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yL8nuJdWO-qa" + }, + "source": [ + "> `InMemoryDocumentStore` is the simplest DocumentStore to get started with. It requires no external dependencies and it's a good option for smaller projects and debugging. But it doesn't scale up so well to larger Document collections, so it's not a good choice for production systems. To learn more about the different types of external databases that Haystack supports, see [DocumentStore Integrations](https://haystack.deepset.ai/integrations?type=Document+Store)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XvLVaFHTO-qb" + }, + "source": [ + "The DocumentStore is now ready. Now it's time to fill it with some Documents." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HryYZP9ZO-qb" + }, + "source": [ + "### Fetch the Data\n", + "\n", + "You'll use the Wikipedia pages of [Seven Wonders of the Ancient World](https://en.wikipedia.org/wiki/Wonders_of_the_World) as Documents. We preprocessed the data and uploaded it to Hugging Face as the [Seven Wonders](https://huggingface.co/datasets/bilgeyucel/seven-wonders) dataset. Thus, you don't need to perform any additional cleaning or splitting.\n", + "\n", + "Fetch the data and convert it into Haystack Documents:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 256, + "referenced_widgets": [ + "b22af125522742d4a83e95890e582058", + "bab24f7e38c54ecca22a5551bc708a8e", + "879852bee39a471f86528f0f8afe58f7", + "31249898b08240df919cbd951bed2ccd", + "8652edf5598542b990aa91472e6d698f", + "2690be8f1f744602b9b393e6438fd559", + "1fa0537a43e04678880a1d063e4100be", + "d8dee92388ae4ddcb1639ae0625d7136", + "5b690d1088194714b83d401b7667bdd2", + "3137d503b8d34f56ab5c458018d0f575", + "5497fb84edbd4332adbff6de18a33015", + "c3fca5ba78d2467da72fb1abd0b01ec3", + "9f84829b414345a29d5acaf0c0d99bd5", + "7ef46645be63425f857db5349da4a3a9", + "45aa957278ce483e9ddf1aa3932e8789", + "8519f294cae34825afdd69bd03229db3", + "cb78d35eb8e741c1b30b33bb42d04621", + "6886a8c59b6c4ea693d511b874c12c51", + "be5d50edbf2449a59508feab6d3da98c", + "9a2686b5238d412a8df69f996429c9b2", + "6d4989d458d9420491ea50c2e8923597", + "a7adbec1f16f4e39b4aa27373993f532", + "041e6870096b4f8e95a10274524065ab", + "54169d1347cd41b7845616a98a3b38e8", + "db436133e86f425cad8c5cc28eea6c03", + "bb419307a3e547ea8455e89d70390c42", + "9aab6009b2544ce5994e4e26c9182e36", + "2a4c9085eb2348c6a4da76679b390d1f", + "d6c7c125d594416b92376b4417246af1", + "889f5a03fac949ebbfeb4e840dcb57d3", + "5f548252e5b845f9bb54dc7820a8835a", + "01707d970b6443c9bc0ea9602b0b1f5e", + "70d9add61b3e4c26a278b482a1380e26" + ] + }, + "id": "INdC3WvLO-qb", + "outputId": "d1515765-20ea-44db-c8a8-a68776fafdab" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n", + "The secret `HF_TOKEN` does not exist in your Colab secrets.\n", + "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n", + "You will be able to reuse this secret in all of your notebooks.\n", + "Please note that authentication is recommended but still optional to access public models or datasets.\n", + " warnings.warn(\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b22af125522742d4a83e95890e582058", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "README.md: 0%| | 0.00/46.0 [00:00 If you'd like, you can use a different [Embedder](https://docs.haystack.deepset.ai/docs/embedders) for your documents." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "EUmAH9sEn3R7" + }, + "outputs": [], + "source": [ + "from haystack.components.embedders import SentenceTransformersDocumentEmbedder\n", + "\n", + "doc_embedder = SentenceTransformersDocumentEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9y4iJE_SrS4K" + }, + "source": [ + "### Write Documents to the DocumentStore\n", + "\n", + "Run the `doc_embedder` with the Documents. The embedder will create embeddings for each document and store them in that document's `embedding` field. Then, write the Documents to the DocumentStore with the `write_documents()` method." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 576, + "referenced_widgets": [ + "2cc35d93724b4e0f9608fe6999e96baa", + "769840529f8e48378f568109e008dda7", + "5e34616bda1840db92c9afdbcc17b090", + "74296c48c5204c26a186d5e6fe2eb389", + "6ab0bef578534eb3b326ce3f1c7952cb", + "821e521c20eb4952a9f553be1707fe4d", + "b627994ed6fb4e419d0d7b042a2d1845", + "430c4f94f9704dceb7657fdf54e4e979", + "b9f80e63f9f4477e815fb6741e0b7481", + "1bb5874fc5424756839b3119ee88c652", + "adf3861551a74ac2bfd622fe9a4dd8c4", + "f0ab87c57a644c8693e51d2f04a666f3", + "9acd8a5fa709457eb1627bdafed3dc90", + "a99f41226d8c48978417e050c447cf06", + "fa24a76d328e47eeac0c12fc4a50bcd7", + "a3cd8b42b2884452b82f56e81b9f23fa", + "f4a53626b22f458ab46e2bf3ce91cf1f", + "9f2586e4929e47ea8e3ce7788b9d6a09", + "0b3d4bc9fc8e4f7d96a04b9728d966c0", + "00be43d5e49d4d288757d8c3b4566262", + "851f2132407f4602bdb482e09aff7864", + "0a73b3d5dd7e4658ac146786039c005c", + "efe7818df31b4b4fb4ac23447133add1", + "d2b362efd0ad499193b03da90419ee41", + "31f44064c8b64cfcb8a305e111c8465a", + "77eb3fe2195b43d8aa622f23d009dfd6", + "3bc2a31d2fbb4f548c7a7fbbecb4de56", + "32bf3855ebea499c97baa448b4d721d5", + "3fa909ce952b471e9883a4c32586e111", + "23e319859ad64a389e141e6ec6175cb5", + "17d53e1b04ba4196b2d0d9ffd8a6fa29", + "92248f3c7de0496c8ca16a26f528ab5e", + "9fe390e03164438a855d4c041a3e3435", + "be6e75733aa44951a9a2f1e76b3d1627", + "9e58a463153b4aadac2b54ffac39c0b4", + "72f7dd0ba5ce45359b9ed35563b59aee", + "75986772e38e4dbea85249f34f52178e", + "894070da75a948e8af880e57898e5cb8", + "4c903df362c840b5a887a549e86ac62f", + "42166f3fba3b400c858d1e10feb4c751", + "65deef2a69ee47a698b3b6f3e4a6bce6", + "4fd6a3cecf47491bad25a58f563c4a15", + "11053c79eb414fc4976cef3966a7fac1", + "7f5cbe293ec2416f80c97c91fce7df1e", + "132784a9e3d74220abf9748a4d17ca89", + "39e35e2d422245ce833481b43416441e", + "b38ec117ee1741148e3d5e79167e74fe", + "d02863cda9c44772a165b7be9bd2fea9", + "41e2ffb4d7a645d5bb5ec794ee21cf86", + "a6c11c68ab754703b2c1c8f6816bb96a", + "25666b5ff5604b8ba366a529c5025230", + "0b4eb164565445b28828357715ef329b", + "4d865aefb4ab48448fcf458ad256ca20", + "b47bd72419e041ce8cdcc43d263a6686", + "f37aad71e98948e480718fa4e3f70217", + "7b2b5ad63a0b4775bbb652453c179d19", + "252525f805004deca52a9a94f9eae057", + "d8d01da923864706b8331af4f361b047", + "51770f17bef74ff59a1993abeb82de5f", + "bb2fbfbc2dc34f5d87c03e46f551ea98", + "836d16710f9e4edbbf6b1233af637928", + "35da2a2f76664857b9390b1d80c2ecd2", + "9d53c7d71c9e400399cf37b2f37c91af", + "2fae67ce866a4e8cb125f8395d96d896", + "4813b61b1f1c4adba5b9fe214459cac9", + "b47368b719da42aaad46070cf40a7a00", + "65d85c964cfb46dab8b9ea69c9506ada", + "5589e6aa47c541ce889beda46435a9c7", + "4694f52453794bb4b873de48ab7c6b3c", + "557161a5e01a40e390c86f344089fc60", + "a8185ea16ee54420a17c1cca14468b7c", + "86f2d32941e845088ceade417b76424a", + "bece87ceb6b04dacb8f01347a9d5bb7d", + "02ab1ee2008f4008b5f19346005ce6f7", + "73b9187ac82b4adb8117f90fb231434d", + "87cd40c733e44790bfa102607827596f", + "62431513a3b643839dcd0076b2ed1a9f", + "98895c06137b4d80b5451f56f06324a4", + "bedd40889bf6433f932729d5db0539f0", + "2812a84fd4e14914a93f37a32c7430db", + "9003b727e1f64fe19386a725d8a344fa", + "952bafd6d58144b28bc2c4ad66ed4f8b", + "b21cbd6d566d4ef293ab98e51a1033f1", + "17d2ae89093f4d2f9962ffa654ddcbad", + "9d9ab19f4caf487694d6483f234aabb7", + "5f52c41b56774eefac83864f42dabe28", + "4884f6f41f4045b5ba0239898fee50c6", + "ff77c736962b458c9341a947cd00a7f3", + "8c9072b8a35849cfbda6f5eb9179f528", + "7efffcc4356344228af97bb22065c750", + "1bc3fd2803e644e284752fb42a92bfc5", + "8d9c6c7014a24d719d36210cdb04b89a", + "f0312ab0e54c4395ab3b24deaf6ac930", + "94bcce13ca63491895532a8b03fbdc50", + "29f6e78fa0d64bee95316178f2af26b4", + "5ca4298d4a8d41d7ab357ab0d6c41607", + "69bb8165fe03447cac2409e52dfb6fba", + "b4f131eac5d145ac99773413639efd30", + "21c0e13bf8814bf8a7698831a81f40ce", + "6ad2981514934ba3a99d9af804b71b83", + "8aa676e6e41e4cb68f08b7b6fff394b8", + "69b1631b328e408e846e545accda625c", + "db375dcc92b04b64a3ece532b7058590", + "90fe90981b814463a2106ad7ab9d268e", + "ab355a59d76a46f695b4f40132aab760", + "7df0ad47f4624fb3a61cff463428e2f5", + "5ae61ea745fd4728ac38282300dde003", + "f2a66c6cda7346e4b5f1e0b535ebaca1", + "a24d5e98caa24cf39fb422dff93c7eca", + "8de77145f60b4271b5357abe6a145675", + "ba3552f1892e4334a82d7ff298cd3b5b", + "b733682eb42347f5906cc121927916e1", + "e4cd66a09968401ab56ad2951d4b26d1", + "8d1d910a9a1c46fa988835118b626691", + "a9ee6367c48540678a00d16948920aad", + "7975972153144d30b42aacace4e83274", + "2cb85592bf97403fbad9bb8c519cd8bf", + "f95a49beeb19460b94e318841aab492f", + "0ff989201f644a51adb95470a9292c66", + "70390c6a974a4fc583ca457e22140d9f", + "babcacf75255494f877706da78a2f6b4", + "962f0a845a8141fb9a616a548cf586ae", + "4614a496a0e742a1b7012bc5eaf3c846", + "be0724b67e1140b292ff67811ecae2e3", + "f62e6725df84439c85114851b3bac16b", + "85f395657239437bb8714f4cc31e0b47", + "f16162d038144049a87d95909db044fe", + "78ceb340847a4c10ba8e016746024af8", + "b1e5b5b6a98b4c29b6be1a646ff62c7c", + "80787ac2994845788981beff20ace5d4", + "c848ed8b46d546cfbc8a5a7e92e4ae44", + "9bb104726b3449cca9b3a4a5d60ff4eb", + "347ff1dbd39d441eb77d5cef02415a5e", + "c890c36976494af2b1a161306981b67c", + "cbfa575577d5449b8b23dd4a8600c982", + "dd686226092a4ebda728440cd72ef649", + "08408d415bac43bb82180aeb65e11592", + "507b6215a4d04df4ae38250d0ded7ba7", + "496aacba74bd42e983ffa1ba00d1c7e2", + "00b905e9cfa049f8bc32cb7166489f82", + "20cf7cbc45b345e599dc61c80fef33d6", + "47e583b278674aa38fc7883c10b44a20", + "a0294366add94b328970d7e7b20844d7" + ] + }, + "id": "ETpQKftLplqh", + "outputId": "2b73c450-364e-4483-80b1-449dccff6e6c" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2cc35d93724b4e0f9608fe6999e96baa", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "modules.json: 0%| | 0.00/349 [00:00 ⚠️ Notice that you used `sentence-transformers/all-MiniLM-L6-v2` model to create embeddings for your documents before. This is why you need to use the same model to embed the user queries." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "LyJY2yW628dl" + }, + "outputs": [], + "source": [ + "from haystack.components.embedders import SentenceTransformersTextEmbedder\n", + "\n", + "text_embedder = SentenceTransformersTextEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0_cj-5m-O-qb" + }, + "source": [ + "### Initialize the Retriever\n", + "\n", + "Initialize an [InMemoryEmbeddingRetriever](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever) and make it use the `InMemoryDocumentStore` you initialized earlier in this tutorial. This Retriever will fetch the documents most relevant to the query." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "-uo-6fjiO-qb" + }, + "outputs": [], + "source": [ + "from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n", + "\n", + "retriever = InMemoryEmbeddingRetriever(document_store)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6CEuQpB7O-qb" + }, + "source": [ + "### Define a Template Prompt\n", + "\n", + "Create a [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) object with the `from_user` method and pass the custom prompt for a question answering task using the RAG approach. The prompt should take in two parameters: `documents`, which are retrieved from a document store, and a `question` from the user. Use the Jinja2 looping syntax to combine the content of the retrieved documents in the prompt.\n", + "\n", + "Next, initialize a [ChatPromptBuilder](https://docs.haystack.deepset.ai/docs/chatpromptbuilder) instance with your prompt template. The `ChatPromptBuilder`, when given the necessary values, will automatically fill in the variable values and generate a complete prompt. This approach allows for a more tailored and effective question-answering experience.\n", + "\n", + "> By default, all prompt variables are treated as optional. Set `required_variables=\"*\"` to ensure that all prompt variables are mandatory for the prompt." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "ObahTh45FqOT" + }, + "outputs": [], + "source": [ + "from haystack.components.builders import ChatPromptBuilder\n", + "from haystack.dataclasses import ChatMessage\n", + "\n", + "template = [\n", + " ChatMessage.from_user(\n", + " \"\"\"\n", + "Given the following information, answer the question.\n", + "\n", + "Context:\n", + "{% for document in documents %}\n", + " {{ document.content }}\n", + "{% endfor %}\n", + "\n", + "Question: {{question}}\n", + "Answer:\n", + "\"\"\"\n", + " )\n", + "]\n", + "\n", + "prompt_builder = ChatPromptBuilder(template=template, required_variables=\"*\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HR14lbfcFtXj" + }, + "source": [ + "### Initialize a ChatGenerator\n", + "\n", + "[ChatGenerators](https://docs.haystack.deepset.ai/docs/generators) are the components that call large language models (LLMs) and return chat completions.\n", + "\n", + "**Before you run the pipeline, decide how you will access the LLM:**\n", + "\n", + "- **Hosted provider API** — Create an API key with a provider. In Colab, you can store it under *Secrets* tab or set the matching environment variable (`OPENAI_API_KEY`, `MISTRAL_API_KEY`, …). The cells below prompt for a key if it is not already set.\n", + "- **Local or self-hosted (including on Colab)** — If you prefer not to use a remote API, you can run an open-weight model on your machine or the Colab runtime with [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator). See the [generators documentation](https://docs.haystack.deepset.ai/docs/generators) for more integrations.\n", + "\n", + "The next three sections show **OpenAI**, **Mistral**, and **Hugging Face** as examples. Run **only one** of them to define `chat_generator`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Use OpenAI's GPT models (requires an API key)**\n", + "\n", + "[Get an OpenAI API key](https://platform.openai.com/api-keys) and set it as the `OPENAI_API_KEY` environment variable. Then initialize `OpenAIChatGenerator` with the model name you want to use. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "SavE_FAqfApo" + }, + "outputs": [], + "source": [ + "import os\n", + "from getpass import getpass\n", + "from haystack.components.generators.chat import OpenAIChatGenerator\n", + "\n", + "if \"OPENAI_API_KEY\" not in os.environ:\n", + " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")\n", + " \n", + "chat_generator = OpenAIChatGenerator(model=\"gpt-4o-mini\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Use Mistral models (requires a free API key)**\n", + "\n", + "[Get a Mistral API key](https://docs.mistral.ai/) (free tier available) and set it as the `MISTRAL_API_KEY` environment variable. Then initialize `MistralChatGenerator` with the model name you want to use. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "0-zl-0WsZI7T", + "outputId": "dffb77e3-8c03-4fc2-d038-692e1d3cc2e7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Enter Mistral API key:··········\n" + ] + } + ], + "source": [ + "import os\n", + "from getpass import getpass\n", + "from haystack_integrations.components.generators.mistral import MistralChatGenerator\n", + "\n", + "if \"MISTRAL_API_KEY\" not in os.environ:\n", + " os.environ[\"MISTRAL_API_KEY\"] = getpass(\"Enter Mistral API key:\")\n", + " \n", + "chat_generator = MistralChatGenerator(model=\"mistral-small-latest\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Use open-weight models from Hugging Face (no API key required for local inference)**\n", + "\n", + "Initialize `HuggingFaceLocalChatGenerator` with an open-weight LLM from Hugging Face, such as [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507). To call models through the **Hugging Face Inference API** instead, use [`HuggingFaceAPIChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator), which requires a Hugging Face API token." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from haystack.components.generators.chat import HuggingFaceLocalChatGenerator\n", + "\n", + "chat_generator = HuggingFaceLocalChatGenerator(model=\"Qwen/Qwen3-4B-Instruct-2507\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nenbo2SvycHd" + }, + "source": [ + "> You can replace the examples above with any Haystack `ChatGenerator` that fits your setup: another API provider or a local / Colab-hosted backend. See the full list of chat generators [here](https://docs.haystack.deepset.ai/docs/generators)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1bfHwOQwycHe" + }, + "source": [ + "### Build the Pipeline\n", + "\n", + "To build a pipeline, add all components to your pipeline and connect them. Create connections from `text_embedder`'s \"embedding\" output to \"query_embedding\" input of `retriever`, from `retriever` to `prompt_builder` and from `prompt_builder` to `llm`. Explicitly connect the output of `retriever` with \"documents\" input of the `prompt_builder` to make the connection obvious as `prompt_builder` has two inputs (\"documents\" and \"question\").\n", + "\n", + "For more information on pipelines and creating connections, refer to [Creating Pipelines](https://docs.haystack.deepset.ai/docs/creating-pipelines) documentation." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "f6NFmpjEO-qb" + }, + "outputs": [], + "source": [ + "from haystack import Pipeline\n", + "\n", + "basic_rag_pipeline = Pipeline()\n", + "# Add components to your pipeline\n", + "basic_rag_pipeline.add_component(\"text_embedder\", text_embedder)\n", + "basic_rag_pipeline.add_component(\"retriever\", retriever)\n", + "basic_rag_pipeline.add_component(\"prompt_builder\", prompt_builder)\n", + "basic_rag_pipeline.add_component(\"llm\", chat_generator)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "CyqzFsp1qWVM", + "outputId": "591f9e19-13d6-4232-f00a-d6becd89694b" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "\n", + "🚅 Components\n", + " - text_embedder: SentenceTransformersTextEmbedder\n", + " - retriever: InMemoryEmbeddingRetriever\n", + " - prompt_builder: ChatPromptBuilder\n", + " - llm: MistralChatGenerator\n", + "🛤️ Connections\n", + " - text_embedder.embedding -> retriever.query_embedding (list[float])\n", + " - retriever.documents -> prompt_builder.documents (list[Document])\n", + " - prompt_builder.prompt -> llm.messages (list[ChatMessage])" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Now, connect the components to each other\n", + "basic_rag_pipeline.connect(\"text_embedder.embedding\", \"retriever.query_embedding\")\n", + "basic_rag_pipeline.connect(\"retriever\", \"prompt_builder\")\n", + "basic_rag_pipeline.connect(\"prompt_builder.prompt\", \"llm.messages\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6NqyLhx7O-qc" + }, + "source": [ + "That's it! Your RAG pipeline is ready to generate answers to questions!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DBAyF5tVO-qc" + }, + "source": [ + "## Asking a Question\n", + "\n", + "When asking a question, use the `run()` method of the pipeline. Make sure to provide the question to both the `text_embedder` and the `prompt_builder`. This ensures that the `{{question}}` variable in the template prompt gets replaced with your specific question.\n", + "\n", + "> ⚠️ If you host the model on the Colab runtime (for example with HuggingFaceLocalChatGenerator), the first pipeline run can take longer as the LLM is loaded and prepared for inference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 86, + "referenced_widgets": [ + "4e6e97b6d54f4f80bb7e8b25aba8e616", + "1a820c06a7a049d8b6c9ff300284d06e", + "58ff4e0603a74978a134f63533859be5", + "8bdb8bfae31d4f4cb6c3b0bf43120eed", + "39a68d9a5c274e2dafaa2d1f86eea768", + "d0cfe5dacdfc431a91b4c4741123e2d0", + "e7f1e1a14bb740d18827dd78bbe7b2e3", + "3fda06f905b445a488efdd2dd08c0939", + "2bc341a780f7498ba9cd475468841bb5", + "d7218475e23b420a8c03d00ca4ab8718", + "a694abaf765f4d1b82fa0138e59c6793" + ] + }, + "id": "Vnt283M5O-qc", + "outputId": "d2843a73-3ad5-4daa-8d1e-a58de7aa2bb0" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Batches: 100%|██████████| 1/1 [00:00<00:00, 1.77it/s]\n", + "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "To disable this warning, you can either:\n", + "\t- Avoid using `tokenizers` before the fork if possible\n", + "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The Colossus of Rhodes was a statue of the Greek sun-god Helios, standing approximately 70 cubits (about 33 meters or 108 feet) tall. Although no complete descriptions of its appearance exist, scholars believe it featured the following characteristics:\n", + "\n", + "1. **Facial Features**: The head of the statue likely had curly hair, with spikes resembling bronze or silver flames radiating outward. This style is similar to depictions found on contemporary Rhodian coins.\n", + "\n", + "2. **Posture**: While the exact pose is uncertain, it is suggested that the statue may have been constructed in a pose where Helios is depicted shielding his eyes with one hand, a common representation of someone looking toward the sun.\n", + "\n", + "3. **Construction Materials**: The structure was built using iron tie bars and brass plates, which formed the skin of the statue. The interior was filled with stone blocks.\n", + "\n", + "4. **Height and Scale**: The Colossus was positioned on a 15-metre-high (49-foot) pedestal, making it one of the tallest statues of the ancient world, towering over the harbor entrance.\n", + "\n", + "5. **Symbolic Representation**: The statue was meant to symbolize the victory and freedom of the Rhodians after successfully defending their city against an invader.\n", + "\n", + "Overall, the Colossus of Rhodes was an impressive and monumental statue designed to celebrate and symbolize the strength and resilience of the city of Rhodes.\n" + ] + } + ], + "source": [ + "question = \"What does Rhodes Statue look like?\"\n", + "\n", + "response = basic_rag_pipeline.run({\"text_embedder\": {\"text\": question}, \"prompt_builder\": {\"question\": question}})\n", + "\n", + "print(response[\"llm\"][\"replies\"][0].text)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 712, + "referenced_widgets": [ + "3a404870f30f48aeb5fce11bcb794a1a", + "6b9eb9888076445c92b80f9aa29121ce", + "1839264932db40d0a40ccbfc08b50896", + "0a893c45730f4ce5a36060dcc880add1", + "699de9f0c89e4cc294b341932c4decc7", + "d283295d0bec454d9bd84256f14904ea", + "b4d7b68ea70b449b95eadc54e37954d6", + "0c9be40eb1064e50a70fe4de5cf9c760", + "96d50cf0bf05451a91c9eed788d36ed0", + "d891ea3f48314c7199f0963277063df8", + "24d82ad3686c4616aff7987647485df6" + ] + }, + "id": "v6bQceW8ZduN", + "outputId": "f8ca4d07-ca67-4810-d07c-354c7888f80f" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3a404870f30f48aeb5fce11bcb794a1a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Batches: 0%| | 0/1 [00:00