Valkey Search Now Delivers Expanded, Rich Search and Aggregations#466
Valkey Search Now Delivers Expanded, Rich Search and Aggregations#466madolson merged 22 commits intovalkey-io:mainfrom
Conversation
Signed-off-by: Karthik Subbarao <karthikrs2021@gmail.com>
|
please add the blog tag and add this to the blog project board. |
@makubo-aws , I checked and I don't have the permission to do either of these |
We typically just use the issue in the board, but the PR should close the issue. @KarthikSubbarao Reminder, can you split up the blog so each sentence is on its own line. Otherwise suggesting comments is very difficult. |
Yes, I will push the change in a few minutes |
Signed-off-by: Karthik Subbarao <karthikrs2021@gmail.com>
|
@madolson , Done |
| title= "Valkey Search Now Delivers Expanded, Rich Search and Aggregations" | ||
| description = "Unlocking Text, Tag, Numeric Search and Aggregations in Valkey Search" |
There was a problem hiding this comment.
@cnuthalapati, can you check the title and desc here? Both of them sound like titles to me. We can pick one, and then add a short description
There was a problem hiding this comment.
Title -> Valkey Search Now Delivers Expanded, Rich Search and Aggregations
Description -> Valkey Search enables filtering across text, tags, and numeric data and analyzing results with aggregations.
| authors= [ "karthiksubbarao", "allenss", "bcathcart", "cnuthalapati"] | ||
| +++ | ||
|
|
||
| Valkey Search, the official Valkey module, now enables searching across text, tags, and numeric fields over terabytes of data with microsecond-level latency and up to millions of queries per second. |
There was a problem hiding this comment.
Millions of queries per second depends on how many nodes are in the cluster, the query string, command options, and the dataset. Some query operations can reach 50-60K RPS per node (this is from one of the perf scenarios). It can possibly be higher if we try to find the most optimal query. But it can also be much lower (hundreds / thousands)
There was a problem hiding this comment.
We are interested in reporting at the numbers achievable for exact match tag queries which are the most used queries by an overwhelming majority. Based on our discussion, customers that require that scale of a million search RPS are able to achieve it with the right configuration. No change needed
| title= "Valkey Search Now Delivers Expanded, Rich Search and Aggregations" | ||
| description = "Unlocking Text, Tag, Numeric Search and Aggregations in Valkey Search" |
There was a problem hiding this comment.
Title -> Valkey Search Now Delivers Expanded, Rich Search and Aggregations
Description -> Valkey Search enables filtering across text, tags, and numeric data and analyzing results with aggregations.
| authors= [ "karthiksubbarao", "allenss", "bcathcart", "cnuthalapati"] | ||
| +++ | ||
|
|
||
| Valkey Search, the official Valkey module, now enables searching across text, tags, and numeric fields over terabytes of data with microsecond-level latency and up to millions of queries per second. |
There was a problem hiding this comment.
We are interested in reporting at the numbers achievable for exact match tag queries which are the most used queries by an overwhelming majority. Based on our discussion, customers that require that scale of a million search RPS are able to achieve it with the right configuration. No change needed
madolson
left a comment
There was a problem hiding this comment.
This reads a lot like a how to and not much like a blog. It reads a lot more like technical documentation.
| It supports real-time updates so new and updated data becomes searchable immediately. | ||
| Running aggregations within your Valkey Cache reduces data movement and eliminates the need to export large result sets to the application layer for post processing. | ||
| Valkey Search provides a flexible, high-performance foundation for querying across a range of use cases, from powering in-app search experiences, and recommendation systems to simple in-cache lookups. | ||
| You can use Valkey Aggregations to power in-app analytics, dashboards, or generate reports on cached data for your application. |
There was a problem hiding this comment.
Should aggregations be capitalized?
| When you add, update, or delete an indexed key, the module receives the mutation event, extracts indexed attributes, queues the indexing work, and updates the index before acknowledging the write. | ||
|
|
||
| Valkey Search supports multi-threading so you can maximize ingestion throughput by using multiple parallel connections to saturate the index update process without pipelining on a single connection. | ||
| Index updates are parallelized at the attribute level: a mutated key is decomposed into attributes, and each per-attribute index update runs in parallel. |
There was a problem hiding this comment.
@BCathcart , not sure if this is true, since a key is ingested by one worker thread. Do you want to suggest wording this differently? We can update it
There was a problem hiding this comment.
Suggestion:
Index updates are parallelized at the key level: a mutated key is processed on a single thread. The response to the client request triggering the mutation is only unblocked after the mutation has been applied to the indexes.
| Index updates are parallelized at the attribute level: a mutated key is decomposed into attributes, and each per-attribute index update runs in parallel. | ||
| The system ensures atomic visibility by exposing index changes only after a whole key finishes its attribute updates, and it then unblocks the client. | ||
|
|
||
| ## Consistency Your Application Needs |
There was a problem hiding this comment.
@cnuthalapati , this is good clarification. But this seems like a section we want to cover clearly in our documentation. We already have a WIP section for this here: https://github.com/valkey-io/valkey-doc/blob/c0060260432a6a700482c33fd2cfa036df42b1a5/topics/search.md?plain=1#L92
Let me know if you agree, or if you think it is important to cover here
There was a problem hiding this comment.
We should retain an explanation of how indexing is parallelized without going into the level of detail outlined in the documentation. We can simplify to -
"Index updates are processed by background worker threads, and index changes become visible only after the update completes, at which point the client is unblocked."
| The first release of Valkey Search focused on vector search, and this release extends the search to text, tag and numeric attribute types and adds result aggregation capabilities such as filtering, sorting, grouping, and computing metrics. | ||
| Whether you're building cutting-edge AI applications or integrating search into existing systems, we invite you to try it out. |
There was a problem hiding this comment.
Is this needed? Just dive in to how to get started.
|
|
||
| Valkey Search now enables searching across text, tags, and numeric attributes over terabytes of data with microsecond-level latency and up to millions of queries per second. | ||
| You can now search your Valkey data by combining full-text search, numeric, tag, and vector filters in a single query, then analyze results with server-side processing like grouping, counts, and averages. | ||
|
|
||
| Valkey Search supports real-time updates, so new and updated data becomes searchable immediately. This keeps search results consistent and reduces the risk of acting on stale data. | ||
| It includes built-in support for scaling to terabyte-scale clusters without requiring application or client code changes. | ||
| Valkey Search enables low-latency aggregations on your data by eliminating the need to export large result sets to the application layer for post-processing, which reduces data movement and costs. | ||
|
|
||
| Valkey Search provides a flexible, high-performance foundation for querying across a range of use cases, from powering in-app search experiences to recommendation systems and analyzing Valkey data to support in-app analytics and reporting dashboards. | ||
| The new capabilities require Valkey 9.0 or later and are licensed under the BSD-3-Clause license. | ||
| In this blog, you will learn about how Valkey Search works and understand full-text, tag (exact-match), numeric range, and aggregation queries, and explore the key use cases they enable. |
There was a problem hiding this comment.
The opening reads like a feature spec sheet rather than a blog post. It lists capabilities without establishing why the reader should care or what problem this solves. The first sentence packs ~5 claims into 30+ words, and the next several paragraphs continue listing features without any narrative thread.
Compare to the vector search blog which opens with a clear value proposition and context, or the Valkey 9.0 blog which leads with the story of what changed and why.
Suggested rewrite:
Until now, Valkey Search focused on vector similarity. This enabled a wide range of workloads such as semantic search and AI workloads, but if you needed to filter products by price range, match a category tag exactly, or search descriptions by keyword, you had to build that yourself.
That changes with the new 1.2 release of valkey-search. You can now combine full-text search, exact-match tags, numeric ranges, and vector similarity in a single query, then analyze results with server-side aggregations with the low latency you expect from Valkey.
In this post, we'll walk through what's new, show how it works with concrete examples, and explore the use cases these capabilities unlock.This establishes: (1) what existed before (context), (2) what's new (the news), (3) what you'll learn (reader expectation). The current opening skips straight to (2) without (1), which means readers who aren't already familiar with Valkey Search have no frame of reference.
| In this blog, you will learn about how Valkey Search works and understand full-text, tag (exact-match), numeric range, and aggregation queries, and explore the key use cases they enable. | ||
|
|
||
|
|
||
| ## Real-time Search with Multi-threading |
There was a problem hiding this comment.
The current structure front-loads implementation details (threading, consistency, scaling) before showing the reader what they can actually do. Most readers want to see capabilities first, then understand how it works.
Current structure:
- Feature dump intro
- Real-time Search with Multi-threading (implementation detail)
- Consistency Your Application Needs (implementation detail)
- Scale to Terabytes... (implementation detail)
- Searching Your Valkey Data — 4 subsections (~60% of the blog)
- Aggregations
- Getting Started
Suggested restructure:
- Opening hook — what's new, why it matters, what was the gap before
- What You Can Search — brief overview of the 4 query types, condensed into one section
- Show Me — one end-to-end example (e.g., e-commerce product search) that naturally uses text, tag, numeric, and hybrid queries with actual
FT.SEARCHcommands - Aggregations — with a concrete
FT.AGGREGATEexample - Under the Hood — condense threading, consistency, and scaling into one shorter section. Link to docs for the deep dive.
- Getting Started — with actual commands and links
The "Searching Your Valkey Data" section is the biggest issue — it's essentially four mini-documentation pages stacked together, each following the same template (definition → capabilities → numbered use-case list). Consider condensing the four query type descriptions into a single overview paragraph, then showing ONE compelling example that naturally combines them. Link to the Valkey Search docs for the full reference.
| In cluster mode, Valkey Search creates indexes that span multiple shards by maintaining a separate index on each shard for the keys owned by that shard. When you create, update, or drop an index on any primary, Valkey Search propagates that change to all nodes. | ||
|
|
||
| You can scale read throughput to millions of QPS by distributing queries evenly across the cluster so no single node or shard becomes the bottleneck. | ||
| You can increase throughput by using more CPUs, which leverages multithreading to scale throughput linearly for both querying and ingesting, or add replicas to increase query throughput. |
There was a problem hiding this comment.
How do we defend the millions of QPS? It feels like that is not true unless we quantify it more.
There was a problem hiding this comment.
This sentence needs revision, I can't grok it after multiple reads.
There is a weird dependency between ' using more CPUs', then leverage multithreading.
| Index updates are parallelized at the attribute level: a mutated key is decomposed into attributes, and each per-attribute index update runs in parallel. | ||
| The system ensures atomic visibility by exposing index changes only after a whole key finishes its attribute updates, and it then unblocks the client. | ||
|
|
||
| ## Consistency Your Application Needs |
There was a problem hiding this comment.
Might be me, but I would prefer, "Read after write consistency" or something. The title feels cute, but not very clear.
There was a problem hiding this comment.
I totally agree. Every time I see the term 'consistency' unqualified part of me dies.
| A write only completes after its index updates are applied, so any search sent to the same primary after the write returns will see that change. | ||
| If your application can tolerate some staleness, replicas can be used to offload reads. | ||
| On replicas, search is eventually consistent because replication and index maintenance are asynchronous, and each node maintains its own local indexes. |
There was a problem hiding this comment.
This section uses a lot of passive voice, consider revising.
- "replicas can be used to offload reads" -> "you can offload reads to replicas"
- "The index is only exposed after..." -> "Valkey Search exposes the index only after..."
| --- | ||
| title: Chaitanya Nuthalapati | ||
| extra: | ||
| photo: '/assets/media/authors/karthiksubbarao.jpg' |
There was a problem hiding this comment.
The photo path points to karthiksubbarao.jpg, not a photo for Chaitanya. This needs to be updated to the correct photo, or a placeholder should be used.
Same issue exists in bcathcart.md.
| You can then refine and shape the output with post-aggregation FILTER, SORTBY, and LIMIT, chaining stages together to build multi-step workflows in a single query. | ||
| This makes aggregations a strong fit for lightweight analytics directly on indexed Valkey data, such as: |
There was a problem hiding this comment.
GROUPBY, REDUCE, COUNT, SUM, AVG, APPLY, FILTER, SORTBY, LIMIT are all-caps here. "Aggregations" is sometimes capitalized as a proper noun and sometimes not. Please pick one convention and be consistent throughout. I'd suggest lowercase "aggregations" when used as a general concept, and only capitalize when referring to a specific command or feature name.
|
Also, lets use https://hub.docker.com/r/valkey/valkey-bundle in the getting started. |
Signed-off-by: Karthik Subbarao <karthikrs2021@gmail.com>
stockholmux
left a comment
There was a problem hiding this comment.
Lot of work needed here.
- I can't decide who this blog post is for. It covers a lot of technical ground but doesn't provide links or examples.
- It feels overly long for the information it actually contains.
- Generally, there is a lot of telling without showing.
| @@ -0,0 +1,113 @@ | |||
| +++ | |||
| title= "Search terabytes in microseconds" | |||
There was a problem hiding this comment.
I've had this argument before but 'microseconds' feels like puffery.
Show don't tell.
There was a problem hiding this comment.
I did suggest this. I think it's a fun title, but only if we can back up the contents materially. I'm not honestly convinced we can search terabytes in microsends?
| authors= [ "karthiksubbarao", "allenss", "bcathcart", "cnuthalapati"] | ||
| +++ | ||
|
|
||
| Valkey Search now enables searching across text, tags, and numeric attributes over terabytes of data with microsecond-level latency and up to millions of queries per second. |
There was a problem hiding this comment.
The repo consistently uses Valkey 'dash' search, you're using Valkey 'space' search.
| +++ | ||
|
|
||
| Valkey Search now enables searching across text, tags, and numeric attributes over terabytes of data with microsecond-level latency and up to millions of queries per second. | ||
| You can now search your Valkey data by combining full-text search, numeric, tag, and vector filters in a single query, then analyze results with server-side processing like grouping, counts, and averages. |
There was a problem hiding this comment.
'search your Valkey data' is a weird phrase. It sounds like you're searching data about Valkey, not data in Valkey.
| Valkey Search now enables searching across text, tags, and numeric attributes over terabytes of data with microsecond-level latency and up to millions of queries per second. | ||
| You can now search your Valkey data by combining full-text search, numeric, tag, and vector filters in a single query, then analyze results with server-side processing like grouping, counts, and averages. | ||
|
|
||
| Valkey Search supports real-time updates, so new and updated data becomes searchable immediately. This keeps search results consistent and reduces the risk of acting on stale data. |
There was a problem hiding this comment.
'supports' is overloaded and used frequently in the blog post. Rephrase.
There was a problem hiding this comment.
Don't use the term 'consistent' - it has special meaning in databases. Please avoid.
There was a problem hiding this comment.
What does 'risk action on stale data' really mean?
| You can now search your Valkey data by combining full-text search, numeric, tag, and vector filters in a single query, then analyze results with server-side processing like grouping, counts, and averages. | ||
|
|
||
| Valkey Search supports real-time updates, so new and updated data becomes searchable immediately. This keeps search results consistent and reduces the risk of acting on stale data. | ||
| It includes built-in support for scaling to terabyte-scale clusters without requiring application or client code changes. |
There was a problem hiding this comment.
Why are you calling out terrabytes here? Will it not work on gigabytes? Is there something special that Valkey Search does to enable this?
| ## Transform your Valkey Data with Aggregations | ||
|
|
||
| Aggregations help you analyze and summarize the results of a search query, instead of returning a raw list of matching documents. | ||
| You can use GROUPBY to form groups on any indexed attribute such as category, brand, region, and time, apply REDUCE functions such as COUNT, SUM, and AVG to compute per-group statistics, and use APPLY to create computed attributes on the fly. |
There was a problem hiding this comment.
are groupby, reduce, etc a reserved words? Should they be in an inline code block
There was a problem hiding this comment.
For that matter, why aren't they linked to docs?
| You can then refine and shape the output with post-aggregation FILTER, SORTBY, and LIMIT, chaining stages together to build multi-step workflows in a single query. | ||
| This makes aggregations a strong fit for lightweight analytics directly on indexed Valkey data, such as: | ||
|
|
||
| 1. Faceted navigation and filtering: Power dynamic filtering UIs using aggregations to compute real-time counts over the current result set (for example by category, brand, price band, rating, or availability), enabling users to narrow down search results with instant feedback on available options. |
There was a problem hiding this comment.
Could you provide a more concrete example?
| The first release of Valkey Search focused on vector search, and this release extends the search to text, tag and numeric attribute types and adds result aggregation capabilities such as filtering, sorting, grouping, and computing metrics. | ||
| Whether you're building cutting-edge AI applications or integrating search into existing systems, we invite you to try it out. |
| The first release of Valkey Search focused on vector search, and this release extends the search to text, tag and numeric attribute types and adds result aggregation capabilities such as filtering, sorting, grouping, and computing metrics. | ||
| Whether you're building cutting-edge AI applications or integrating search into existing systems, we invite you to try it out. | ||
|
|
||
| The easiest way to get started is by visiting the Valkey Search GitHub repository. Clone the repo, load Valkey Search into Valkey 9.0 or later, and start building high-performance search and aggregation workflows. |
There was a problem hiding this comment.
Where would I find out how to do this?
| The easiest way to get started is by visiting the Valkey Search GitHub repository. Clone the repo, load Valkey Search into Valkey 9.0 or later, and start building high-performance search and aggregation workflows. | ||
| You can connect using official Valkey client libraries such as Valkey GLIDE (Java, Python, Node.js, Go), valkey-py, valkey-go, and valkey-java (Java) and popular Redis-compatible clients. | ||
|
|
||
| Get Involved: Join the valkey-search community, file issues, open pull requests, or suggest improvements. |
…ypes and aggregations, address PR feedback Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>
Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>
|
Thank you @stockholmux @madolson, We are incorporating you feedback. See current changes here: |
…, add MULTI/EXEC, reword vCPU scaling Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>
Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>
Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>
Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>
…ology, grammar cleanup, intro bridge sentence Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>
Signed-off-by: Chaitanya Nuthalapati <cnu@amazon.com>
Signed-off-by: Karthik Subbarao <karthikrs2021@gmail.com>
Signed-off-by: Karthik Subbarao <karthikrs2021@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
stockholmux
left a comment
There was a problem hiding this comment.
Much improved. Mostly minor changes.
|
|
||
| Get Involved: Join the valkey-search community, file issues, open pull requests, or suggest improvements. We welcome contributions of all kinds - code, documentation, testing, and feedback. Your involvement helps make valkey-search better for everyone. | ||
|
|
||
| > **Note:** As of March 13, 2026, if you want to use Valkey Search 1.2 features on docker, use the current valkey/valkey-bundle:unstable image. |
There was a problem hiding this comment.
When does this go away? I think you should give a hint that this is temporary and until when.
There was a problem hiding this comment.
This will go away when 9.1 launches.
… bold search type labels, add links
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: chaitanya_nuthalapati <cnu@amazon.com>
Description
This PR adds the blog content for the upcoming valkey-search 1.2 release.
In the blog, we cover the search and aggregation capabilities added since 1.0 as well as diving into use cases where different search operators are helpful.
The content was put together by the Authors listed in the PR.
Issues Resolved
Check List
--signoffBy submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.