Notebook for Semantic search in 5 mins tutorial#95
Notebook for Semantic search in 5 mins tutorial#95abdonpijpelink wants to merge 3 commits intomasterfrom
Conversation
kanungle
left a comment
There was a problem hiding this comment.
Minor comments to be added. I think it could help
| " collection_name=COLLECTION_NAME,\n", | ||
| " query=models.Document(\n", | ||
| " text=\"alien invasion\",\n", | ||
| " model=EMBEDDING_MODEL\n", |
There was a problem hiding this comment.
add code comment: "Used in Cloud Inference"
| { | ||
| "cell_type": "code", | ||
| "source": [ | ||
| "EMBEDDING_MODEL=\"sentence-transformers/all-minilm-l6-v2\"\n", |
There was a problem hiding this comment.
Add code comment this is needed for Cloud Inference
| { | ||
| "cell_type": "markdown", | ||
| "source": [ | ||
| "Next, create a client connection to your Qdrant cluster. Ensure that you have added QDRANT_URL and QDRANT_API_KEY as secrets." |
There was a problem hiding this comment.
I think it's this type of tutorial where we need to double mention how to get these 2 (even in the notebook) & that it costs no money & Free Tier is forever free.
| { | ||
| "cell_type": "markdown", | ||
| "source": [ | ||
| "All data in Qdrant is organized within collections. Since you're storing books, let's create a collection named `my_books`." |
There was a problem hiding this comment.
Link to the doc/course? If people are starting with this tutorial, they probably are starting with Qdrant, too, so I'd cross-reference more
| "client.create_collection(\n", | ||
| " collection_name=COLLECTION_NAME,\n", | ||
| " vectors_config=models.VectorParams(\n", | ||
| " size=384, # Vector size is defined by the model\n", |
There was a problem hiding this comment.
Which model, we didn't say anything before
Perhaps cloud_inference = True deserves a one liner of explanations
I am nitpicky since I believe that people opening a notebook wouldn't want to switch between tabs of a tutorial on the website & notebook, so it's better to give them some input.
There was a problem hiding this comment.
ALso we can say which models are available (for free and not for free)
There was a problem hiding this comment.
I don't think we should list the models in the docs, because it will be a pain to maintain. Instead, I've added instructions on where to find the list of the available free and paid models in the Inference tab of the Cluster Detail page in the Qdrant Cloud Console.
| "source": [ | ||
| "Upload the dataset to the `my_books` collection. Each book will be stored as a point with:\n", | ||
| "- a unique ID\n", | ||
| "- a vector generated by the `sentence-transformers/all-minilm-l6-v2` embedding model (available for free on Qdrant Cloud), based on the book's description. This is achieved by providing a `Document` object with the `model` name and the `text` to embed.\n", |
There was a problem hiding this comment.
Ah, it's here!
Still would put some quick TL;DR at the beginning:)
There was a problem hiding this comment.
P.S. I feel like this model name and models might be confusing for some
I'd cut on referring to the variables, it's explainable from the code, and just explain what's happening in one sentence.
| { | ||
| "cell_type": "markdown", | ||
| "source": [ | ||
| "How about the most recent book from the early 2000s? Qdrant, allows you to narrow down query results by applying a filter. To filter for books published after the year 2000, you can filter on the `year` field in the payload. Before filtering on a payload field, create a payload index for that field:" |
There was a problem hiding this comment.
Comma after Qdrant looks off
Also, I'd attach links to payloads and indexing
There was a problem hiding this comment.
P.S. This order of actions here reminded me that users don't get filterable HNSW if they do payload indexes after uploading points to the collection, with HNSW indexing on
Idk if it's relevant so early on, ig no, but that might be the reason, once again, to think how to make it more obvious for a user <maybe, from the engineering side of things>
Adds a notebook for the updated "Build a Semantic Search Engine in 5 Minutes" tutorial at https://qdrant.tech/documentation/tutorials-basics/search-beginners/, added in qdrant/landing_page#2127