Skip to content

Conversation

@eeldaly
Copy link
Contributor

@eeldaly eeldaly commented Jan 14, 2026

What this PR does:
Add an active api tracker to print api requests in a querier on OOMKill / crash

Example log:

"These API calls didn't finish in prometheus' last run:"apis="[
{\"Method\":\"GET\",
\"Path\":\"/prometheus/api/v1/series\",
\"User-Agent\":\"PostmanRuntime/7.49.1\",
\"X-Request-ID\":\"\",
\"X-Scope-OrgID\":\"12345\",
\"end\":\"\",
\"limit\":\"100\",
\"matches\"\"metric_0metric_0metric_0m,metric_1metric_1metric_1m,metric_2metric_2metric_2m,metric_3metric_3metric\",
\"number-of-matches\":5,
\"start\":\"\",
\"timestamp_sec\":1768349758
}]"

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
"github.com/edsrzf/mmap-go"
)

type APITracker struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this tracker generic? Move to a better package maybe in utils. I feel it can be used in query frontend and more components as well. Maybe a generic name can be RequestsTracker. Make sure when inserting entries we can insert arbitrary entries?

entryMap["timestamp_sec"] = time.Now().Unix()
entryMap["Path"] = r.URL.Path
entryMap["Method"] = r.Method
entryMap["User-Agent"] = r.Header.Get("User-Agent")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

entryMap["Method"] = r.Method
entryMap["User-Agent"] = r.Header.Get("User-Agent")
entryMap["X-Scope-OrgID"] = r.Header.Get("X-Scope-OrgID")
entryMap["X-Request-ID"] = r.Header.Get("X-Request-ID")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't hardcode the request ID. It is extracted from the context

entryMap["Path"] = r.URL.Path
entryMap["Method"] = r.Method
entryMap["User-Agent"] = r.Header.Get("User-Agent")
entryMap["X-Scope-OrgID"] = r.Header.Get("X-Scope-OrgID")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract user ID from context. There is helper function to do it

Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Copy link
Contributor

@justinjung04 justinjung04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

eeldaly and others added 2 commits January 23, 2026 14:23
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com>
@eeldaly
Copy link
Contributor Author

eeldaly commented Jan 28, 2026

One issue that still needs to be addressed is that the request id is not yet in the context at the time of it being added to the tracker. Getting request id from context currently returns an empty string

Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
@eeldaly eeldaly requested a review from yeya24 January 30, 2026 21:44
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
CHANGELOG.md Outdated
* [ENHANCEMENT] Ingester: Instrument Ingester CPU profile with userID for read APIs. #7184
* [ENHANCEMENT] Ingester: Add fetch timeout for Ingester expanded postings cache. #7185
* [ENHANCEMENT] Ingester: Add feature flag to collect metrics of how expensive an unoptimized regex matcher is and new limits to protect Ingester query path against expensive unoptimized regex matchers. #7194 #7210
* [ENHANCEMENT] Querier: Add logging for api calls in querier during an OOMKill. #7216
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enhance the changelog? How about:

[ENHANCEMENT] Querier: Add active API requests tracker logging to help with OOMKill troubleshooting

func (w *RequestWrapper) ServeHTTP(rw http.ResponseWriter, r *http.Request) {
ctx := r.Context()
if requestmeta.RequestIdFromContext(ctx) == "" {
reqId := r.Header.Get("X-Request-ID")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this request ID header coming from? I don't think we use this header. And we shouldn't hardcode it here.

if requestmeta.RequestIdFromContext(ctx) == "" {
reqId := r.Header.Get("X-Request-ID")
if reqId == "" {
reqId = uuid.NewString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get why we are generating a new request ID here in this middleware. We should make sure that this wrapper runs after the request ID wrapper, which should ensure that we get request ID from the context. If there is no request ID then we don't log it.

entryMap["start"] = r.URL.Query().Get("start")
entryMap["end"] = r.URL.Query().Get("end")
entryMap["step"] = r.URL.Query().Get("step")
entryMap["lookback_delta"] = r.URL.Query().Get("lookback_delta")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support this parameter today in range query?


func (e *RangedQueryExtractor) Extract(r *http.Request) []byte {
entryMap := generateCommonMap(r)
entryMap["limit"] = r.URL.Query().Get("limit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support this parameter today in range query?

entryMap := generateCommonMap(r)
entryMap["limit"] = r.URL.Query().Get("limit")
entryMap["time"] = r.URL.Query().Get("time")
entryMap["lookback_delta"] = r.URL.Query().Get("lookback_delta")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support limit and lookback_delta today in instant query?

entryMap["Path"] = r.URL.Path
entryMap["Method"] = r.Method
entryMap["X-Scope-OrgID"], _ = users.TenantID(ctx)
entryMap["X-Request-ID"] = requestmeta.RequestIdFromContext(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we unify the entry names? Why some use uppercase some with lower case.
Some with - some with _.

Let's use more straightforward names for X-Scope-OrgID and X-Request-ID. We should just call them TenantID or UserID and RequestID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The - vs _ and naming is to match prometheus/header names. I think keeping the names the same as their header names we set in prometheus is better than renaming / shrinking

}

func generateJSONEntry(entryMap map[string]interface{}) []byte {
jsonEntry, err := json.Marshal(entryMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure that the generated byte entries we always log queries and matchers at the end? If they are too long other parameters might be truncated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were always trimming the query/matchers to fit the rest of the space using trimStringByBytes()


type RangedQueryExtractor struct{}

func generateCommonMap(r *http.Request) map[string]interface{} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add more headers including user agent, dashboard ID and panel ID

Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
…caped characters

Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
}
fieldValueEncoded = fieldValueEncoded[1 : len(fieldValueEncoded)-1]
fieldValueEncodedTrimmed := trimStringByBytes(fieldValueEncoded, size)
fieldValueEncodedTrimmed = "\"" + removeHalfCutEscapeChar(fieldValueEncodedTrimmed) + "\""

Check failure

Code scanning / CodeQL

Potentially unsafe quoting Critical

If this
JSON value
contains a double quote, it could break out of the enclosing quotes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants