Skip to content

Notebook for Semantic search in 5 mins tutorial#95

Open
abdonpijpelink wants to merge 3 commits intomasterfrom
semantic-search-101-cloud
Open

Notebook for Semantic search in 5 mins tutorial#95
abdonpijpelink wants to merge 3 commits intomasterfrom
semantic-search-101-cloud

Conversation

@abdonpijpelink
Copy link
Member

@abdonpijpelink abdonpijpelink commented Feb 6, 2026

Adds a notebook for the updated "Build a Semantic Search Engine in 5 Minutes" tutorial at https://qdrant.tech/documentation/tutorials-basics/search-beginners/, added in qdrant/landing_page#2127

Copy link
Contributor

@kanungle kanungle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments to be added. I think it could help

" collection_name=COLLECTION_NAME,\n",
" query=models.Document(\n",
" text=\"alien invasion\",\n",
" model=EMBEDDING_MODEL\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add code comment: "Used in Cloud Inference"

{
"cell_type": "code",
"source": [
"EMBEDDING_MODEL=\"sentence-transformers/all-minilm-l6-v2\"\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add code comment this is needed for Cloud Inference

{
"cell_type": "markdown",
"source": [
"Next, create a client connection to your Qdrant cluster. Ensure that you have added QDRANT_URL and QDRANT_API_KEY as secrets."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's this type of tutorial where we need to double mention how to get these 2 (even in the notebook) & that it costs no money & Free Tier is forever free.

{
"cell_type": "markdown",
"source": [
"All data in Qdrant is organized within collections. Since you're storing books, let's create a collection named `my_books`."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the doc/course? If people are starting with this tutorial, they probably are starting with Qdrant, too, so I'd cross-reference more

"client.create_collection(\n",
" collection_name=COLLECTION_NAME,\n",
" vectors_config=models.VectorParams(\n",
" size=384, # Vector size is defined by the model\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which model, we didn't say anything before
Perhaps cloud_inference = True deserves a one liner of explanations
I am nitpicky since I believe that people opening a notebook wouldn't want to switch between tabs of a tutorial on the website & notebook, so it's better to give them some input.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALso we can say which models are available (for free and not for free)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should list the models in the docs, because it will be a pain to maintain. Instead, I've added instructions on where to find the list of the available free and paid models in the Inference tab of the Cluster Detail page in the Qdrant Cloud Console.

"source": [
"Upload the dataset to the `my_books` collection. Each book will be stored as a point with:\n",
"- a unique ID\n",
"- a vector generated by the `sentence-transformers/all-minilm-l6-v2` embedding model (available for free on Qdrant Cloud), based on the book's description. This is achieved by providing a `Document` object with the `model` name and the `text` to embed.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's here!
Still would put some quick TL;DR at the beginning:)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. I feel like this model name and models might be confusing for some
I'd cut on referring to the variables, it's explainable from the code, and just explain what's happening in one sentence.

{
"cell_type": "markdown",
"source": [
"How about the most recent book from the early 2000s? Qdrant, allows you to narrow down query results by applying a filter. To filter for books published after the year 2000, you can filter on the `year` field in the payload. Before filtering on a payload field, create a payload index for that field:"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comma after Qdrant looks off
Also, I'd attach links to payloads and indexing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. This order of actions here reminded me that users don't get filterable HNSW if they do payload indexes after uploading points to the collection, with HNSW indexing on
Idk if it's relevant so early on, ig no, but that might be the reason, once again, to think how to make it more obvious for a user <maybe, from the engineering side of things>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants