diff --git a/docs/concepts/metric_types.md b/docs/concepts/metric_types.md
index 3b37d7f07..3b9e263b5 100644
--- a/docs/concepts/metric_types.md
+++ b/docs/concepts/metric_types.md
@@ -3,11 +3,17 @@ title: Metric types
 sort_rank: 2
 ---
 
-The Prometheus client libraries offer four core metric types. These are
-currently only differentiated in the client libraries (to enable APIs tailored
-to the usage of the specific types) and in the wire protocol. The Prometheus
-server does not yet make use of the type information and flattens all data into
-untyped time series. This may change in the future.
+The Prometheus instrumentation libraries offer four core metric types. With the
+exception of native histograms, these are currently only differentiated in the
+instrumentation libraries (to enable APIs tailored to the usage of the specific
+types) and in the exposition protocols. The Prometheus server does not yet make
+use of the type information and flattens all types except native histograms
+into untyped time series of floating point values. Native histograms, however,
+are ingested as time series of special composite histogram samples. In the
+future, Prometheus might handle other metric types as [composite
+types](/blog/2026/02/14/modernizing-prometheus-composite-samples/), too. There
+is also ongoing work to persist the type information of the simple float
+samples.
 
 ## Counter
 
@@ -20,7 +26,7 @@ errors.
 Do not use a counter to expose a value that can decrease. For example, do not
 use a counter for the number of currently running processes; instead use a gauge.
 
-Client library usage documentation for counters:
+Instrumentation library usage documentation for counters:
 
    * [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Counter)
    * [Java](https://prometheus.github.io/client_java/getting-started/metric-types/#counter)
@@ -38,7 +44,7 @@ Gauges are typically used for measured values like temperatures or current
 memory usage, but also "counts" that can go up and down, like the number of
 concurrent requests.
 
-Client library usage documentation for gauges:
+Instrumentation library usage documentation for gauges:
 
    * [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Gauge)
    * [Java](https://prometheus.github.io/client_java/getting-started/metric-types/#gauge)
@@ -51,37 +57,78 @@ Client library usage documentation for gauges:
 
 A _histogram_ samples observations (usually things like request durations or
 response sizes) and counts them in configurable buckets. It also provides a sum
-of all observed values.
-
-A histogram with a base metric name of `<basename>` exposes multiple time series
-during a scrape:
-
-  * cumulative counters for the observation buckets, exposed as `<basename>_bucket{le="<upper inclusive bound>"}`
+of all observed values. As such, a histogram is essentially a bucketed counter.
+However, a histogram can also represent the current state of a distribution, in
+which case it is called a _gauge histogram_. In contrast to the usual
+counter-like histograms, gauge histograms are rarely directly exposed by
+instrumented programs and are thus not (yet) usable in instrumentation
+libraries, but they are represented in newer versions of the protobuf
+exposition format and in [OpenMetrics](https://openmetrics.io/). They are also
+created regularly by PromQL expressions. For example, the outcome of applying
+the `rate` function to a counter histogram is a gauge histogram, in the same
+way as the outcome of applying the `rate` function to a counter is a gauge.
+
+Histograms exists in two fundamentally different versions: The more recent
+_native histograms_ and the older _classic histograms_.
+
+A native histogram is exposed and ingested as composite samples, where each
+sample represents the count and sum of observations together with a dynamic set
+of buckets.
+
+A classic histogram, however, consists of multiple time series of simple float
+samples. A classic histogram with a base metric name of `<basename>` results in
+the following time series:
+
+  * cumulative counters for the observation buckets, exposed as
+    `<basename>_bucket{le="<upper inclusive bound>"}`
   * the **total sum** of all observed values, exposed as `<basename>_sum`
-  * the **count** of events that have been observed, exposed as `<basename>_count` (identical to `<basename>_bucket{le="+Inf"}` above)
-
-Use the
-[`histogram_quantile()` function](/docs/prometheus/latest/querying/functions/#histogram_quantile)
-to calculate quantiles from histograms or even aggregations of histograms. A
-histogram is also suitable to calculate an
-[Apdex score](http://en.wikipedia.org/wiki/Apdex). When operating on buckets,
-remember that the histogram is
-[cumulative](https://en.wikipedia.org/wiki/Histogram#Cumulative_histogram). See
-[histograms and summaries](/docs/practices/histograms) for details of histogram
-usage and differences to [summaries](#summary).
-
-NOTE: Beginning with Prometheus v2.40, there is experimental support for native
-histograms. A native histogram requires only one time series, which includes a
-dynamic number of buckets in addition to the sum and count of
-observations. Native histograms allow much higher resolution at a fraction of
-the cost. Detailed documentation will follow once native histograms are closer
-to becoming a stable feature.
+  * the **count** of events that have been observed, exposed as
+    `<basename>_count` (identical to `<basename>_bucket{le="+Inf"}` above)
+
+Native histograms are generally much more efficient than classic histograms,
+allow much higher resolution, and do not require explicit configuration of
+bucket boundaries during instrumentation. Their bucketing schema ensures that
+they are always aggregatable with each other, even if the resolution might have
+changed, while classic histograms with different bucket boundaries are not
+generally aggregatable. If the instrumentation library you are using supports native
+histograms (currently this is the case for Go and Java), you should probably
+prefer native histograms over classic histograms.
+
+If you are stuck with classic histograms for whatever reason, there is a way to
+get at least some of the benefits of native histograms: You can configure
+Prometheus to ingest classic histograms into a special form of native
+histograms, called Native Histograms with Custom Bucket boundaries (NHCB).
+NHCBs are stored as the same composite samples as usual native histograms with
+the same gain in efficiency. However, their buckets are still the same buckets
+statically configured during instrumentation, with their limited resolution and
+range and the same problems of aggregatability upon changing the bucket
+boundaries.
+
+Use the [`histogram_quantile()`
+function](/docs/prometheus/latest/querying/functions/#histogram_quantile) to
+calculate quantiles from histograms or even aggregations of histograms. It
+works for both classic and native histograms, using a slightly different
+syntax. Histograms are also suitable to calculate an [Apdex
+score](http://en.wikipedia.org/wiki/Apdex).
+
+You can operate directly on the buckets of a classic histogram, as they are
+represented as individual series (called `<basename>_bucket{le="<upper
+inclusive bound>"}` as described above). Remember, however, that these buckets
+are [cumulative](https://en.wikipedia.org/wiki/Histogram#Cumulative_histogram),
+i.e. every bucket counts all observations less than or equal to the upper
+boundary provided as a label. With native histograms, use the
+[`histogram_fraction()`
+function](/docs/prometheus/latest/querying/functions/#histogram_fraction) to
+calculate fractions of observations within given boundaries.
+
+See [histograms and summaries](/docs/practices/histograms) for details of
+histogram usage and differences to [summaries](#summary).
 
 NOTE: Beginning with Prometheus v3.0, the values of the `le` label of classic
 histograms are normalized during ingestion to follow the format of
 [OpenMetrics Canonical Numbers](https://github.com/prometheus/OpenMetrics/blob/main/specification/OpenMetrics.md#considerations-canonical-numbers).
 
-Client library usage documentation for histograms:
+Instrumentation library usage documentation for histograms:
 
    * [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Histogram)
    * [Java](https://prometheus.github.io/client_java/getting-started/metric-types/#histogram)
@@ -111,7 +158,7 @@ to [histograms](#histogram).
 NOTE: Beginning with Prometheus v3.0, the values of the `quantile` label are normalized during
 ingestion to follow the format of [OpenMetrics Canonical Numbers](https://github.com/prometheus/OpenMetrics/blob/main/specification/OpenMetrics.md#considerations-canonical-numbers).
 
-Client library usage documentation for summaries:
+Instrumentation library usage documentation for summaries:
 
    * [Go](http://godoc.org/github.com/prometheus/client_golang/prometheus#Summary)
    * [Java](https://prometheus.github.io/client_java/getting-started/metric-types/#summary)
diff --git a/docs/practices/histograms.md b/docs/practices/histograms.md
index fb8c242fa..fc9ca118d 100644
--- a/docs/practices/histograms.md
+++ b/docs/practices/histograms.md
@@ -3,136 +3,388 @@ title: Histograms and summaries
 sort_rank: 4
 ---
 
-NOTE: This document predates native histograms (added as an experimental
-feature in Prometheus v2.40 and becoming stable in v3.8). The intention is to
-thoroughly update this document in the foreseeable future.
-
-Histograms and summaries are more complex metric types. Not only does
-a single histogram or summary create a multitude of time series, it is
-also more difficult to use these metric types correctly. This section
-helps you to pick and configure the appropriate metric type for your
-use case.
-
-## Library support
+Histograms and summaries are more complex metric types. For historical reasons,
+histograms exist in two variants: classic histograms and native histograms, the
+latter even come in a number of sub-variants. This document helps to understand
+the difference between all those metric types, how to use them correctly, and
+how to pick the right metric type for your use case.
+
+The most important lesson to learn from this document is simple: If you can,
+use native histograms and prefer them over both classic histograms and
+summaries.
+
+Where things start to become tricky is if you find yourself in a situation
+where you cannot simply use native histograms. Most commenly, you might have to
+work with existing metrics that include classic histograms or summaries, or
+maybe the instrumentation library you are using does not support native
+histograms yet. Furthermore, there are a few specific use cases where you might
+prefer a summary or a classic histogram.
+
+With this document, you should be able to navigate the related obstacles and
+subtleties.
+
+## Overview
+
+Historically, a sample in the Prometheus world was just a timestamped floating
+point value. This value could be interpreted as a
+[counter](/docs/concepts/metric_types/#counter) or as a
+[gauge](/docs/concepts/metric_types/#gauge), i.e. most of the time Prometheus
+doesn't maintain a notion of “static typing”, and you just have to know what
+kind of metric you are dealing with (assisted by the convention that the name
+of a counter should end on `_total`).
+
+But there are more metric types than counters and gauges. In particular, there
+is a need to represent distributions of observed values (usually simply called
+“observations” in Prometheus terminology). There are fundamentally two
+different approaches:
+
+1. The instrumented program calculates a number of pre-configured quantiles
+   (e.g. the median or the 90th percentile) over pre-configured time windows
+   (e.g. the last ten minutes) and exposes them as additional metrics.
+   Prometheus implements this approach in the form of a metric type called
+   _summary_. Depending on the used algorithm, the pre-calculated quantiles are
+   usually very accurate. But the calculation has a resource cost for the
+   instrumented program. Also, you cannot “recalculate” the quantiles later if
+   you desire another time window or another percentile, and most importantly,
+   you cannot aggregate quantiles (e.g. to calculate the total 90th percentile
+   latency for a service backed by multiple replicated workers).
+2. The instrumented program represents the distribution in a more fundamental
+   way that can later be used to calculate arbitrary quantiles over arbitrary
+   time windows. Most importantly, the distribution is represented in a way
+   that can be aggregated with each other. This kind of representation is
+   sometimes called a _digest_. Prometheus implements this approach in the form
+   of a metric type called _histogram_, where observations are counted in
+   buckets, as you might know it from the general concept of a
+   [histogram](https://en.wikipedia.org/wiki/Histogram).
+
+In both approaches, Prometheus also collects the count and the sum of
+observations (see details [below](#count-and-sum-of-observations)).
+
+Common to both approaches is the need to collect a whole lot of numerical
+values per sample, not just a single floating point value as before:
+
+- In any case the count and sum of observations.
+- In the case of summaries the pre-calculated quantiles.
+- In the case of histograms a set of buckets with their population counts and
+  boundaries.
+
+The new types of metrics are also called _composite types_.
+
+In a first approach, Prometheus preserved its data model of simple timestamped
+floating point values and mapped this multitude of values into one time series
+each, distinguished by specific labels. In this way, summaries and classic
+histograms were created. In both, the count and sum of observations are each
+tracked in a separate time series. Similarly, each pre-calculated quantile of a
+summary and each bucket of a histogram is tracked in its own time series.
+PromQL operators and functions act on these individual time series, as
+explained in detail further below.
+
+On the one hand, this approach has worked quite well. While keeping the data
+model simple, it satisfies many use cases. On the other hand, it suffers from
+many limitations, especially when it comes to histograms. Thus, much later in
+Prometheus's lifetime, native histograms were introduced. A native histogram
+sample is a “composition of values”, where a single sample contains the count
+and sum of observations and a dynamic number of buckets with their population
+count and boundaries. In the Prometheus TSDB, one histogram results in one time
+series of native histogram samples rather than a bunch of independent time
+series. PromQL operators and functions now have to act on these composite
+samples rather than on the individual time series of floats before.
+
+You can read everything about native histograms in their
+[specification](/docs/specs/native_histograms/), but be warned that this is a
+very technical and detailed document. If you read on here, you can expect a
+more digestible and usage-focused explanation.
+
+If you are interested in Prometheus's journey towards native representation of
+composite types, you can read more in a [blog
+post](/blog/2026/02/14/modernizing-prometheus-composite-samples/).
+
+## Instrumentation library support
 
 First of all, check the library support for
 [histograms](/docs/concepts/metric_types/#histogram) and
 [summaries](/docs/concepts/metric_types/#summary).
 
-Some libraries support only one of the two types, or they support summaries
-only in a limited fashion (lacking [quantile calculation](#quantiles)).
+Summaries are usually supported by all libraries, but some might only track the
+count and sum of observations and omit the [quantile calculation](#quantiles).
+(Quantile-less summaries is still a legitimate use of summaries, see below.)
+
+Classic histogram support is also widespread, but native histogram support is
+still rare. Currently, the latter requires exposition via the protobuf format,
+limiting the support to protobuf-enabled libraries, like the Java and the Go
+library. Support in a text-based format is underway as part of OpenMetrics v2.
+Things should be moving very soon, so definitely check what your library has to
+offer.
+
+Even if your instrumented program only exposes classic histogram, you can
+configure Prometheus to ingest them as native histograms anyway. This will
+happen in the form of _Native Histograms with Custom Bucket boundaries_ (NHCB).
+These NHCBs have some limitations compared to the usual native histograms
+(which feature so-called standard exponential buckets), but they are still much
+more efficient to store than pure classic histograms. NHCBs handling in PromQL
+is the same as for other native histograms, so a later migration to “real”
+native histograms will be easy.
+
+## Ingestion via Open Telemetry
+
+Maybe you aren't even using a Prometheus instrumentation library, but your
+metrics come from a collector adhering to the Open Telemetry (OTel) standard.
+When ingesting OTel metrics into a Prometheus-compatible backends, the “normal”
+OTel histograms can be converted into classic histograms or NHCBs on the
+Prometheus side (hint: prefer the latter), while OTel's _exponential histograms_
+are always converted into the usual native histograms (with standard
+exponential buckets).
 
 ## Count and sum of observations
 
-Histograms and summaries both sample observations, typically request
-durations or response sizes. They track the number of observations
-*and* the sum of the observed values, allowing you to calculate the
-*average* of the observed values. Note that the number of observations
-(showing up in Prometheus as a time series with a `_count` suffix) is
-inherently a counter (as described above, it only goes up). The sum of
-observations (showing up as a time series with a `_sum` suffix)
-behaves like a counter, too, as long as there are no negative
-observations. Obviously, request durations or response sizes are
-never negative. In principle, however, you can use summaries and
-histograms to observe negative values (e.g. temperatures in
-centigrade). In that case, the sum of observations can go down, so you
-cannot apply `rate()` to it anymore. In those rare cases where you need to
-apply `rate()` and cannot avoid negative observations, you can use two
-separate summaries, one for positive and one for negative observations
-(the latter with inverted sign), and combine the results later with suitable
-PromQL expressions.
-
-To calculate the average request duration during the last 5 minutes
-from a histogram or summary called `http_request_duration_seconds`,
-use the following expression:
+Histograms and summaries both sample observations, typically request durations
+or response sizes. In all variants (even quantile-less summaries), they track
+the number of observations *and* the sum of the observed values, allowing you
+to calculate the *average* of the observed values.
+
+To do so, you generally first take a `rate` over the desired duration and then
+divide the “rate of the sum” by the “rate of the count”.
+
+For a native histogram (including an NHCB), you extract the sum and count of
+observations with the functions `histogram_sum` and `histogram_count`,
+respectively. For example, to calculate the average request duration over the
+last 5m from a native histogram called `http_request_duration_seconds`, use the
+following PromQL expression:
+
+      histogram_sum(rate(http_request_duration_seconds[5m]))
+    /
+      histogram_count(rate(http_request_duration_seconds[5m]))
+
+In the case of a summary or a classic histogram, you have separate time series
+for the sum and count of observations, marked by the magic suffixes `_sum` and
+`_count`, respectively. Thus, a summary or classic histogram called
+`http_request_duration_seconds` will result in the series
+`http_request_duration_seconds_sum` and `http_request_duration_seconds_count`,
+and the expression to calculate the average request duration over the last 5m
+will look like this:
 
       rate(http_request_duration_seconds_sum[5m])
     /
       rate(http_request_duration_seconds_count[5m])
 
-## Apdex score
-
-A straight-forward use of histograms (but not summaries) is to count
-observations falling into particular buckets of observation
-values.
+The denominator in both expressions above is also useful on its own. It
+represents the requests per second served over the last 5m. Another way of
+putting it is that the `http_request_duration_seconds_count` series behaves
+exactly like a counter for the HTTP requests (which you would call
+`http_requests_total` if you did not already have the histogram or summary to
+replace it). The key property of a counter in Prometheus is that it always goes
+up, unless there is a counter reset.
+
+If your observations are never negative, the
+`http_request_duration_seconds_sum` series also always goes up (unless there is
+a counter reset). However, if negative observations are in the mix, the sum of
+observations may also go down, breaking assumptions made by PromQL. Such a drop
+would erroneously be considered a counter reset in the
+`rate(http_request_duration_seconds_sum[5m])` calculation above, throwing off
+the result. Note that this problem only affects summaries and classic
+histograms. Native histograms (including NHCBs) are `rate`'d as a whole,
+thereby detecting counter resets correctly. In the rare cases where you cannot
+avoid negative observations and are stuck with summaries or classic histograms,
+you can use two separate summaries or histograms, one for positive and one for
+negative observations (the latter with inverted sign), and combine the results
+later with suitable PromQL expressions.
+
+Both the sum and count of observations are additive, so you can easily
+aggregate – after the rate, but before the division, and no matter what the
+underlying metric type is. These are the expressions to calculate the average
+request duration for each `job`:
+
+Native histograms:
+
+      sum by (job) (histogram_sum(rate(http_request_duration_seconds[5m])))
+    /
+      sum by (job) (histogram_count(rate(http_request_duration_seconds[5m])))
 
-You might have an SLO to serve 95% of requests within 300ms. In that
-case, configure a histogram to have a bucket with an upper limit of
-0.3 seconds. You can then directly express the relative amount of
-requests served within 300ms and easily alert if the value drops below
-0.95. The following expression calculates it by job for the requests
-served in the last 5 minutes. The request durations were collected with
-a histogram called `http_request_duration_seconds`.
+Summaries or classic histograms:
 
-      sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
+      sum by (job) (rate(http_request_duration_seconds_sum[5m]))
     /
-      sum(rate(http_request_duration_seconds_count[5m])) by (job)
+      sum by (job) (rate(http_request_duration_seconds_count[5m]))
+
+## Bucketing
+
+Histograms are essentially bucketed counters, so the most obvious use case that
+separates histograms from summaries is to count observations falling into
+particular buckets of observation values.
+
+If you instrument code with classic histograms, you will configure fixed bucket
+boundaries. If you let Prometheus ingest these classic histograms in the
+classic way, each bucket configured in that way will create a series suffixed
+with `_bucket`, no matter if the bucket is populated or not. More buckets give
+you more options and accuracy in the various queries (see below), but the “one
+series per bucket” cost is quite significant.
+
+If you ingest the classic histograms as NHCBs, unpopulated buckets have a
+negligible cost, and even populated ones are handled in a more efficient way
+because each NHCB is represented by a single series of composite samples
+(rather than by a separate series of floats for each bucket and the sum and
+count of observations).
+
+However, picking the right buckets in advance can be challenging. And changing
+buckets later creates a lot of disruption (as you will see below). If you
+instrument code directly with native histograms, you do not pick any bucket
+boundaries explicitly, but you configure a desired resolution. Buckets are
+created dynamically following an exponential bucketing schema, covering the
+whole range of floating point numbers from -Inf to +Inf. Higher resolution
+causes higher resource usage, but generally you can reach much higher resolution
+than with classic histograms for the same resource cost. Instrumentation
+libraries also offer various strategies to limit the count of populated
+buckets, like occasional resets of the histogram or adaptive resolution
+reduction. See the documentation of the instrumentation library you are using
+for details.
+
+To query the fraction of observations falling into a certain range based on a
+native histogram, use an expression like the following:
+
+    histogram_fraction(0, 0.3, sum by (job) (rate(http_request_duration_seconds[5m])))
+	
+This calculates the fraction of HTTP requests for each `job` that lasted
+between 0ms and 300ms in the last 5m. (300ms are represented here as `0.3`
+seconds as you should always use base units in Prometheus.) Note how the `sum`
+correctly aggregates by summing up the corresponding buckets in the involved
+histograms. If the histograms have different bucket layouts, they are
+reconciled first. With the usual exponential bucketing schema, this works
+smoothly, essentially by falling back to the lowest common resolution among all
+involved histograms. The same is done to reconcile different bucket layouts
+over time (in the 5m range that is used in the `rate` calculation). With NHCBs,
+the effects of this depend heavily on the details of the different bucket
+layouts. It is well possible that the reconciled aggregated histogram has just
+one bucket left, containing all observations. Because of the potentially severe
+effects, the query result gets an info-level annotation if NHCBs needed to be
+reconciled. This is also one of the reasons why native histograms with the
+dynamic exponential buckets are much easier to handle.
+
+The calculated fraction is accurate if there happens to be a bucket boundary
+precisely at 0.3. In the common case that there is not, interpolation is used
+to return an estimated fraction. This estimation is more accurate with higher
+bucket resolutions. If you already know in advance that, for example, you have
+an SLO to serve 95% of requests within 300ms, you could use the fixed bucket
+boundaries of a classic histogram to allow an accurate calculation. However, if
+your SLO changes later, changing the fixed bucket layout accordingy will be
+quite tedious. (You have to change the instrumentation of your code. And you
+will run into the issues reconciling different bucket layouts as described
+above.) If you pick native histograms with the dynamic exponential buckets, you
+won't get a bucket boundary at exactly 0.3, but with a decent resolution, the
+interpolated estimate will still be quite accurate. In return, you gain the
+freedom of changing the range boundaries at will, which is not only helpful if
+your SLO changes, but also to explore questions like “Could we maintain a
+stricter SLO based on the data of the last quarter?”.
+
+In the pure legacy case of classic histograms that were also ingested as
+classic histograms, the corresponding PromQL expression looks quite different:
+
+      sum by (job) (rate(http_request_duration_seconds_bucket{le="0.3"}[5m]))
+    /
+      sum by (job) (rate(http_request_duration_seconds_count[5m]))
 
+The `le` label name stands for “less or equal”. This label's value is the upper
+inclusive boundary of a cumulative bucket (i.e. this bucket contains all
+observations less than or equal to 0.3 – including negative observations, which
+we assume wouldn't happen in the case of observing request durations).
 
-You can approximate the well-known [Apdex
-score](http://en.wikipedia.org/wiki/Apdex) in a similar way. Configure
-a bucket with the target request duration as the upper bound and
-another bucket with the tolerated request duration (usually 4 times
-the target request duration) as the upper bound. Example: The target
-request duration is 300ms. The tolerable request duration is 1.2s. The
-following expression yields the Apdex score for each job over the last
-5 minutes:
+Note that this expression strictly requires a bucket boundary configured at
+0.3. If the histograms involved do not have a bucket with that boundary, no
+interpolation is applied. Instead of an estimation, no result is returned at
+all. If only some of the involved histograms have such a bucket, an incomplete
+result is returned, but without any warning, which is a pretty bad situation to
+be in. (Hint: Avoid this “purely classic” case. If you can, ingest classic
+histograms as NHCB. Or instrument with native histograms in the first place.)
 
-    (
-      sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
-    +
-      sum(rate(http_request_duration_seconds_bucket{le="1.2"}[5m])) by (job)
-    ) / 2 / sum(rate(http_request_duration_seconds_count[5m])) by (job)
+## Apdex score
 
-Note that we divide the sum of both buckets. The reason is that the histogram
-buckets are
-[cumulative](https://en.wikipedia.org/wiki/Histogram#Cumulative_histogram). The
-`le="0.3"` bucket is also contained in the `le="1.2"` bucket; dividing it by 2
-corrects for that.
+When reading about fractions of requests served within a certain duration
+range, you might remember the [Apdex
+score](http://en.wikipedia.org/wiki/Apdex). For this score, you set a target
+request duration and a tolerated request duration (usually 4 times the target
+request duration). Let's say your target request duration is 300ms and the
+tolerable request duration is 1.2s. If you want to calculate the Apdex score by
+`job` over the last 5m, the PromQL expression for native histograms (including
+NHCB) is straightforward. Simply add the fraction of requests within your
+duration target to half of the fraction of requests with a duration between the
+target and the tolerated duration:
+
+      histogram_fraction(0, 0.3, sum by (job) (rate(http_request_duration_seconds[5m])))
+    +
+      histogram_fraction(0.3, 1.2, sum by (job) (rate(http_request_duration_seconds[5m]))) / 2
+
+In the “pure classic” case, you _must_ have buckets present at the exact
+boundaries (giving you an accurace calculation in return). The corresponding
+PromQL expression looks quite different because the classic buckets are
+cumulative:
+
+        (
+            sum by (job) (rate(http_request_duration_seconds_bucket{le="0.3"}[5m]))
+          +
+            sum by (job) (rate(http_request_duration_seconds_bucket{le="1.2"}[5m]))
+        )
+      /
+        2
+    /
+      sum by (job) (rate(http_request_duration_seconds_count[5m]))
 
-The calculation does not exactly match the traditional Apdex score, as it
-includes errors in the satisfied and tolerable parts of the calculation.
+(For the sake of simplicity, the above expressions do not explicitly exclude
+failed requests from the satisfied and tolerated parts of the calculation, as
+it would be required for a strictly correct Apdex calculation.)
 
 ## Quantiles
 
 You can use both summaries and histograms to calculate so-called φ-quantiles,
 where 0 ≤ φ ≤ 1. The φ-quantile is the observation value that ranks at number
 φ*N among the N observations. Examples for φ-quantiles: The 0.5-quantile is
-known as the median. The 0.95-quantile is the 95th percentile.
+known as the median. The 0.95-quantile is also called the 95th percentile.
 
 The essential difference between summaries and histograms is that summaries
-calculate streaming φ-quantiles on the client side and expose them directly,
-while histograms expose bucketed observation counts and the calculation of
-quantiles from the buckets of a histogram happens on the server side using the
-[`histogram_quantile()`
+calculate streaming φ-quantiles within the instrumented program and expose them
+directly, while histograms expose bucketed observation counts and the
+calculation of quantiles from the buckets of a histogram happens on the
+Prometheus server using the [`histogram_quantile()`
 function](/docs/prometheus/latest/querying/functions/#histogram_quantile).
-
-The two approaches have a number of different implications:
-
-|   | Histogram | Summary
-|---|-----------|---------
-| Required configuration | Pick buckets suitable for the expected range of observed values. | Pick desired φ-quantiles and sliding window. Other φ-quantiles and sliding windows cannot be calculated later.
-| Client performance | Observations are very cheap as they only need to increment counters. | Observations are expensive due to the streaming quantile calculation.
-| Server performance | The server has to calculate quantiles. You can use [recording rules](/docs/prometheus/latest/configuration/recording_rules/#recording-rules) should the ad-hoc calculation take too long (e.g. in a large dashboard). | Low server-side cost.
-| Number of time series (in addition to the `_sum` and `_count` series) | One time series per configured bucket. | One time series per configured quantile.
-| Quantile error (see below for details) | Error is limited in the dimension of observed values by the width of the relevant bucket. | Error is limited in the dimension of φ by a configurable value.
-| Specification of φ-quantile and sliding time-window | Ad-hoc with [Prometheus expressions](/docs/prometheus/latest/querying/functions/#histogram_quantile). | Preconfigured by the client.
-| Aggregation | Ad-hoc with [Prometheus expressions](/docs/prometheus/latest/querying/functions/#histogram_quantile). | In general [not aggregatable](http://latencytipoftheday.blogspot.de/2014/06/latencytipoftheday-you-cant-average.html).
-
-Note the importance of the last item in the table. Let us return to
-the SLO of serving 95% of requests within 300ms. This time, you do not
-want to display the percentage of requests served within 300ms, but
-instead the 95th percentile, i.e. the request duration within which
-you have served 95% of requests. To do that, you can either configure
-a summary with a 0.95-quantile and (for example) a 5-minute decay
-time, or you configure a histogram with a few buckets around the 300ms
-mark, e.g. `{le="0.1"}`, `{le="0.2"}`, `{le="0.3"}`, and
-`{le="0.45"}`. If your service runs replicated with a number of
-instances, you will collect request durations from every single one of
-them, and then you want to aggregate everything into an overall 95th
-percentile. However, aggregating the precomputed quantiles from a
-summary rarely makes sense. In this particular case, averaging the
-quantiles yields statistically nonsensical values.
+Histograms are further divided into native and classic histograms. The
+following table lists some implications of the different approaches.
+
+|   | Native Histogram | Classic Histogram | Summary
+|---|------------------|-------------------|---------
+| Required configuration during instrumentation | Pick a desired resolution and maybe a strategy to limit the bucket count. | Pick buckets suitable for the expected range of observed values and the desired queries. | Pick desired φ-quantiles and sliding window. Other φ-quantiles and sliding windows cannot be calculated later.
+| Instrumentation cost | Observations are cheap as they only need to increment counters. | Observations are cheap as they only need to increment counters. | Observations are relatively expensive due to the streaming quantile calculation.
+| Query performance | The server has to calculate quantiles from complex histogram samples. You can use [recording rules](/docs/prometheus/latest/configuration/recording_rules/#recording-rules) should the ad-hoc calculation take too long (e.g. in a large dashboard). | The server has to calculate quantiles from a large number of bucket series. You can use [recording rules](/docs/prometheus/latest/configuration/recording_rules/#recording-rules) should the ad-hoc calculation take too long (e.g. in a large dashboard). | Fast (no quantile calculations on the server, and aggregations are impossible anyway, see below).
+| Number of time series per histogram/summary | One (with a composite sample type). | `_sum`, `_count`, one per configured bucket. | `_sum`, `_count`, one per configured quantile.
+| Quantile error (see below for details) | Limited by the configured resolution. | Error is limited by the width of the bucket the quantile is located in. | Configurable, generally very low.
+| Specification of φ-quantile and sliding time-window | Ad-hoc with [PromQL expression](/docs/prometheus/latest/querying/functions/#histogram_quantile). | Ad-hoc with [PromQL expression](/docs/prometheus/latest/querying/functions/#histogram_quantile). | Preconfigured during instrumentation.
+| Aggregation | Ad-hoc with [PromQL expression](/docs/prometheus/latest/querying/functions/#histogram_quantile), buckets are always compatible. | Ad-hoc with [PromQL expression](/docs/prometheus/latest/querying/functions/#histogram_quantile), provided there are no changes in bucket boundaries. | [Not aggregatable](http://latencytipoftheday.blogspot.de/2014/06/latencytipoftheday-you-cant-average.html).
+
+As mentioned above, classic histograms can be ingested by the Prometheus server
+as a special form of native histograms, called NHCBs (Native Histograms with
+Custom Bucket boundaries). Therefore, they share some implications with classic
+histograms and some with the usual native histograms. On the instrumentation
+side, they behave exactly like classic histograms. (In fact, they are identical
+to classic histograms, as NHCBs are only created on the server side when a
+classic histogram is ingested as an NHCB.) The query performance and number of
+time series is the same as for the usual native histograms, but the quantile
+error is the same as with a corresponding classic histogram. NHCBs treat a
+change of the bucket layout a bit more gracefully than classic histograms, but
+it is still a problematic situation (which is at least flagged as such by an
+annotation).
+
+Note the importance of the last item in the table. Let us return to the SLO of
+serving 95% of requests within 300ms. This time, you do not want to display the
+percentage of requests served within 300ms, but instead the 95th percentile,
+i.e. the request duration within which you have served 95% of requests. To do
+that, you can either configure a summary with a 0.95-quantile and (for example)
+a 5-minute decay time, or you configure a native histogram with a decent
+resolution (for example, with the Go instrumentation library, you could use a
+value of 1.1 for the `NativeHistogramBucketFactor`), or you configure a classic
+histogram with a few buckets around the 300ms mark, e.g. `{le="0.1"}`,
+`{le="0.2"}`, `{le="0.3"}`, and `{le="0.45"}`. If your service runs replicated
+with a number of instances, you will collect request durations from every
+single one of them, and then you want to aggregate everything into an overall
+95th percentile. However, aggregating the precomputed quantiles from a summary
+rarely makes sense. In this particular case, averaging the quantiles yields
+statistically nonsensical values.
 
     avg(http_request_duration_seconds{quantile="0.95"}) // BAD!
 
@@ -140,98 +392,148 @@ Using histograms, the aggregation is perfectly possible with the
 [`histogram_quantile()`
 function](/docs/prometheus/latest/querying/functions/#histogram_quantile).
 
-    histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) // GOOD.
+Native histogram version (including NHCB):
+
+    histogram_quantile(0.95, sum(rate(http_request_duration_seconds[5m]))) // GOOD.
+
+Classic histogram version:
+
+    histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) // GOOD.
 
 Furthermore, should your SLO change and you now want to plot the 90th
-percentile, or you want to take into account the last 10 minutes
-instead of the last 5 minutes, you only have to adjust the expression
-above and you do not need to reconfigure the clients.
+percentile, or you want to take into account the last 10 minutes instead of the
+last 5 minutes, you only have to adjust the expressions above and you do not
+need to reconfigure the instrumentation of the monitored programs.
 
-## Errors of quantile estimation
+### Errors of quantile estimation
 
-Quantiles, whether calculated client-side or server-side, are
-estimated. It is important to understand the errors of that
+Quantiles, whether calculated by the instrumented binary or on the Prometheus
+server, are estimated. It is important to understand the errors of that
 estimation.
 
-Continuing the histogram example from above, imagine your usual
-request durations are almost all very close to 220ms, or in other
-words, if you could plot the "true" histogram, you would see a very
-sharp spike at 220ms. In the Prometheus histogram metric as configured
-above, almost all observations, and therefore also the 95th percentile,
-will fall into the bucket labeled `{le="0.3"}`, i.e. the bucket from
-200ms to 300ms. The histogram implementation guarantees that the true
-95th percentile is somewhere between 200ms and 300ms. To return a
-single value (rather than an interval), it applies linear
-interpolation, which yields 295ms in this case. The calculated
-quantile gives you the impression that you are close to breaching the
-SLO, but in reality, the 95th percentile is a tiny bit above 220ms,
-a quite comfortable distance to your SLO.
+Continuing the histogram example from above, imagine your usual request
+durations are almost all very close to 220ms, or in other words, in a histogram
+with very high resolution, you would see a very sharp spike at 220ms, and the
+“true” 95th percentile is also close to 220ms.
+
+With the `NativeHistogramBucketFactor` of 1.1 (following the Go instrumentation
+example), the bucket this spike would fall into has a lower boundary of
+approximately 0.210 and an upper boundary of approximately 0.229. (This
+document deliberately avoids to explain the details how these boundaries are
+calculated. See the aforementioned [spec](/docs/specs/native_histograms/) for
+details.) To keep things simple, let's assume that indeed _all_ request fall
+into this bucket. The interpolation logic of `histogram_quantile` will then
+estimate the 95th percentile to be 228ms (again glossing over the details of
+the calculation here). However, given the bucket boundaries above, the true
+value could be anywhere between 210ms and 229ms, depending on the actual
+distribution of requests within the bucket. So this is a fairly accurate
+estimation, even in the worst case (the true value could be 210ms rather than
+220ms vs. the estimated value of 228ms).
+
+Now let's apply the same to the classic histogram configured as described
+above. All observations, and therefore also the 95th percentile, will fall into
+the bucket labeled `{le="0.3"}`, i.e. the bucket from 200ms to 300ms. The
+interpolation would estimate 295ms in this case, with the guarantee that the
+true value is between 200ms and 300ms. Not only is the error margin much
+larger, also the estimated value of 295ms is much farther away from the true
+value of 220ms than in case of the native histogram, where the estimation was
+228ms. Given that the SLO is at 300ms for the 95th percentile, the classic
+histogram gives you the impression that you are very close to breaching it, but
+in reality you are still doing quite well.
 
 Next step in our thought experiment: A change in backend routing
 adds a fixed amount of 100ms to all request durations. Now the request
-duration has its sharp spike at 320ms and almost all observations will
-fall into the bucket from 300ms to 450ms. The 95th percentile is
-calculated to be 442.5ms, although the correct value is close to
-320ms. While you are only a tiny bit outside of your SLO, the
-calculated 95th quantile looks much worse.
-
-A summary would have had no problem calculating the correct percentile
-value in both cases, at least if it uses an appropriate algorithm on
-the client side (like the [one used by the Go
-client](http://dimacs.rutgers.edu/~graham/pubs/slides/bquant-long.pdf)).
-Unfortunately, you cannot use a summary if you need to aggregate the
+duration has its sharp spike at 320ms.
+
+The relevant bucket of the native histogram ranges from 297ms to 324ms (again
+just stating numbers here without telling you how they are calculated), with
+the interpolated estimation for the 95th percentile being 323ms. That's an
+almost perfect guess.
+
+The classic histogram, however, will see almost all observations in the bucket
+from 300ms to 450ms. The 95th percentile is estimated to be 443ms, far away
+from the correct value close to 320ms. While you are only a tiny bit outside of
+your SLO, the estimated 95th quantile looks much worse.
+
+A summary would have had no problem calculating the correct percentile value
+very accurately in both cases, at least if it uses an appropriate algorithm
+(like the [one used by the Go instrumentation
+library](http://dimacs.rutgers.edu/~graham/pubs/slides/bquant-long.pdf) – this
+algorithm will yield very accurate results for narrow distributions as in our
+example). Unfortunately, you cannot use a summary if you need to aggregate the
 observations from a number of instances.
 
-Luckily, due to your appropriate choice of bucket boundaries, even in
-this contrived example of very sharp spikes in the distribution of
-observed values, the histogram was able to identify correctly if you
-were within or outside of your SLO. Also, the closer the actual value
-of the quantile is to our SLO (or in other words, the value we are
-actually most interested in), the more accurate the calculated value
+Luckily, due to your appropriate choice of bucket boundaries for the clasic
+histogram, in this contrived example of very sharp spikes in the distribution
+of observed values, the classic histogram was able to identify correctly if you
+were within or outside of your SLO (although it was bad in telling you how far
+away you were from breaching or keeping the SLO). However, the closer the
+actual value of the quantile is to the SLO (or in other words, the value you
+are actually most interested in), the more accurate the calculated value
 becomes.
 
-Let us now modify the experiment once more. In the new setup, the
-distributions of request durations has a spike at 150ms, but it is not
-quite as sharp as before and only comprises 90% of the
-observations. 10% of the observations are evenly spread out in a long
-tail between 150ms and 450ms. With that distribution, the 95th
-percentile happens to be exactly at our SLO of 300ms. With the
-histogram, the calculated value is accurate, as the value of the 95th
-percentile happens to coincide with one of the bucket boundaries. Even
-slightly different values would still be accurate as the (contrived)
-even distribution within the relevant buckets is exactly what the
-linear interpolation within a bucket assumes.
-
-The error of the quantile reported by a summary gets more interesting
-now. The error of the quantile in a summary is configured in the
-dimension of φ. In our case we might have configured 0.95±0.01,
-i.e. the calculated value will be between the 94th and 96th
-percentile. The 94th quantile with the distribution described above is
-270ms, the 96th quantile is 330ms. The calculated value of the 95th
-percentile reported by the summary can be anywhere in the interval
-between 270ms and 330ms, which unfortunately is all the difference
-between clearly within the SLO vs. clearly outside the SLO.
+Let us now modify the experiment once more. In the new setup, the distributions
+of request durations has a spike at 150ms, but it is not quite as sharp as
+before and only comprises 90% of the observations. 10% of the observations are
+evenly spread out in a long tail between 150ms and 450ms. With that
+distribution, the 95th percentile happens to be exactly at our SLO of 300ms.
+With the classic histogram, the calculated value would be accurate in this
+(contrived) case, as the value of the 95th percentile happens to coincide with
+one of the configured bucket boundaries. Even slightly different values would
+still be accurate as the even distribution within the relevant buckets is
+exactly what the interpolation algorithm for classic histograms assumes.
+
+The error of the quantile reported by a summary gets more interesting here. In
+the case of the Go instrumentation library, the error of the quantile in a
+summary is configured in the dimension of φ. In our case we might have
+configured 0.95±0.01, i.e. the calculated value will be between the 94th and
+96th percentile. The 94th quantile with the distribution described above is
+270ms, the 96th quantile is 330ms. The calculated value of the 95th percentile
+reported by the summary can be anywhere in the interval between 270ms and
+330ms, which unfortunately is all the difference between clearly within the SLO
+vs. clearly outside the SLO.
 
 The bottom line is: If you use a summary, you control the error in the
-dimension of φ. If you use a histogram, you control the error in the
-dimension of the observed value (via choosing the appropriate bucket
-layout). With a broad distribution, small changes in φ result in
-large deviations in the observed value. With a sharp distribution, a
-small interval of observed values covers a large interval of φ.
-
-Two rules of thumb:
-
-  1. If you need to aggregate, choose histograms.
-
-  2. Otherwise, choose a histogram if you have an idea of the range
-     and distribution of values that will be observed. Choose a
-     summary if you need an accurate quantile, no matter what the
-     range and distribution of the values is.
-
-
-## What can I do if my client library does not support the metric type I need?
+dimension of φ. If you use a histogram, you control the error in the dimension
+of the observed value, via choosing the appropriate bucket layout in case of
+the classic histogram (tough) or via choosing a bucket resolution in case of a
+native histogram (easy). With a broad distribution, small changes in φ result
+in large deviations in the observed value. With a sharp distribution, a small
+interval of observed values covers a large interval of φ.
+
+The rules of thumb are the following:
+
+  1. If you have access to native histograms, use them with a resolution that
+     matches your accuracy requirements. This combines the required accuracy
+     with the ability to aggregate and to change parameters (percentile,
+     sliding window) ad hoc via the PromQL expression.
+  2. If you cannot use native histograms, but you need aggregations, you have
+     to use classic histograms, which requires you to set appropriate bucket
+     boundaries, covering the correct range of values and finding the right
+     trade-off between the cost for the buckets and the required accuracy.
+  3. Only if aggregation isn't needed, you can start thinking about summaries.
+     The main advantage is that it gives you very accurate quantile estimation
+     (in the dimension of φ) at relatively low overall cost. However, the
+     additional requirement to pick the desired quantiles and sliding window at
+     instrumentation time is another severe drawback of summaries.
+
+## Visualization
+
+While the pre-calculated quantiles of a summary can be visualized as any other
+time series of floats, visualizing a histogram is more complex. The Prometheus
+UI shows a graphical representation of a single histogram sample in the _Table_
+view. However, in the _Graph_ view, it simply plots each component series of a
+classic histogram or – in case of a native histogram – only the sum of
+observations.
+
+A very useful visualization of a histogram over time is a heatmap. The
+Prometheus UI does not support heatmaps yet (see [tracking
+issue](https://github.com/prometheus/prometheus/issues/15346)). However,
+popular dashboarding tools like
+[Perses](https://perses.dev/plugins/docs/heatmapchart/) or
+[Grafana](https://grafana.com/docs/grafana/latest/visualizations/panels-visualizations/visualizations/heatmap/)
+are able to render heatmaps based on Prometheus histograms. The resolution of
+classic histograms is usually not high enough to create compelling heatmaps,
+but the higher resolution reachable with native histograms results in very
+detailed heatmaps.
 
-Implement it! [Code contributions are welcome](/community/). In general, we
-expect histograms to be more urgently needed than summaries. Histograms are
-also easier to implement in a client library, so we recommend to implement
-histograms first, if in doubt.