[PROF-12646] Update OTel process context support with latest version of reference library by ivoanjo · Pull Request #414 · DataDog/java-profiler

ivoanjo · 2026-03-11T16:20:05Z

What does this PR do?

This PR updates the vendored version of the OTel process context support with the latest code from
open-telemetry/sig-profiling#79 (which will soon be merged upstream).

Motivation:

This latest version of the library fixes a number of small issues, but more importantly includes support for the thread context configuration keys which we need for #347 (so we will be able to shed a bunch of code once we rebase on top of this one).

Additional Notes:

From the Java API consumer side, not a lot changes with this version.

How to test the change?

Validate tests still pass!

For Datadog employees:

If this PR touches code that signs or publishes builds or packages, or handles
credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
This PR doesn't touch any of that.
JIRA: PROF-12646

…of reference library **What does this PR do?** This PR updates the vendored version of the OTel process context support with the latest code from open-telemetry/sig-profiling#79 (which will soon be merged upstream). **Motivation:** This latest version of the library fixes a number of small issues, but more importantly includes support for the thread context configuration keys which we need for #347 (so we will be able to shed a bunch of code once we rebase on top of this one). **Additional Notes:** From the Java API consumer side, not a lot changes with this version. **How to test the change?** Validate tests still pass!

ivoanjo · 2026-03-11T16:20:55Z

ddprof-lib/src/main/cpp/otel_process_ctx.cpp

The changes from this file are a bit big but all comes from upstream -- same sha1sum as the PR I linked in the description :)

dd-octo-sts · 2026-03-11T16:29:41Z

Scan-Build Report

User:	runner@runnervm0kj6c
Working Directory:	/home/runner/work/java-profiler/java-profiler/ddprof-lib/src/test/make
Command Line:	make -j4 all
Clang Version:	Ubuntu clang version 18.1.3 (1ubuntu1)
Date:	Wed Mar 11 17:51:58 2026

Bug Summary

Memory error
Bug Type	Quantity	Display?
All Bugs	1
Memory leak	1

Reports

Bug Group	Bug Type ▾	File	Function/Method	Line	Path Length
Memory error	Memory leak	otel_process_ctx.cpp	otel_process_ctx_decode_payload	693	43

dd-octo-sts · 2026-03-11T17:07:28Z

CI Test Results

Run: #23251054896 | Commit: c800bbf | Duration: 11m 0s (longest job)

✅ All 32 test jobs passed

Status Overview

JDK	glibc-aarch64/debug	glibc-amd64/debug	musl-aarch64/debug	musl-amd64/debug
8	-	✅	-	-
8-ibm	-	✅	-	-
8-j9	✅	✅	-	-
8-librca	-	-	✅	✅
8-orcl	-	✅	-	-
11	-	✅	-	-
11-j9	✅	✅	-	-
11-librca	-	-	✅	✅
17	✅	✅	-	-
17-graal	✅	✅	-	-
17-j9	✅	✅	-	-
17-librca	-	-	✅	✅
21	✅	✅	-	-
21-graal	✅	✅	-	-
21-librca	-	-	✅	✅
25	✅	✅	-	-
25-graal	✅	✅	-	-
25-librca	-	-	✅	✅

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0

Updated: 2026-03-18 15:10:49 UTC

ivoanjo · 2026-03-11T17:25:49Z

Execution failed for task ':ddprof-lib:compileDebug'.
> Failed to compile otel_process_ctx.cpp: exit code 1
  /home/runner/work/java-profiler/java-profiler/ddprof-lib/src/main/cpp/otel_process_ctx.cpp:148:71: error: use of undeclared identifier 'MFD_NOEXEC_SEAL'
    int fd = memfd_create("OTEL_CTX", MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_NOEXEC_SEAL);

Ooops I guess too old libc; I'll add a fallback #define for it.

I'll make sure to propagate this to the upstream version as well, so we don't need to keep maintaining this separately.

TL;DR if there were repeated keys when parsing the data in the testing code we could leak their values. This matches the latest version upstream in open-telemetry/sig-profiling#79 .

jbachorik

I have a few nits and some comments where I am looking for clarification before approving.

jbachorik · 2026-03-17T14:11:39Z

ddprof-lib/src/main/cpp/otel_process_ctx.cpp

+  uint64_t otel_process_monotonic_published_at_ns; // Timestamp from when the context was published in nanoseconds from CLOCK_BOOTTIME. 0 during updates.
+     char *otel_process_payload;             // Always non-null, points to the storage for the data; expected to be a protobuf map of string key/value pairs, null-terminated


Nit: This formatting looks funny

Fixed in d3fd5b3

jbachorik · 2026-03-17T14:21:32Z

ddprof-lib/src/main/cpp/otel_process_ctx.cpp

+
+  if (monotonic_published_at_ns == published_state.mapping->otel_process_monotonic_published_at_ns) {
+    // Advance published_at_ns to allow readers to detect the update
+    monotonic_published_at_ns++;


Is the clock value used just as a monotonic counter seeded by the system time?

In practice, yes, although readers can use it to get a timestamp of how long ago the context was published (because they can also get the same clock and compare timestamps)

jbachorik · 2026-03-17T14:27:51Z

ddprof-lib/src/main/cpp/javaApi.cpp

+    .deployment_environment_name = env_str.c_str(),
+    .service_instance_id = runtime_id_str.c_str(),
+    .service_name = service_str.c_str(),
+    .service_version = version_str.c_str(),
+    .telemetry_sdk_language = "java",
+    .telemetry_sdk_version = tracer_version_str.c_str(),
+    .telemetry_sdk_name = "dd-trace-java",
+    .resource_attributes = host_name_attrs,
+    .extra_attributes = NULL,
+    .thread_ctx_config = NULL


Is ProcessContextText testing also the newly added stuff?

At this point, the only new bits are the extra_attributes and the thread_ctx_config but these are not yet exposed to the Java code so there's nothing to test yet.

Once we wire those two to be set from Java (if needed/ever), we'll need to update the test.

(I can do that now if you prefer, but since it's not clear when we'll use them I was doing a YAGNI thing :D)

jbachorik · 2026-03-17T14:37:44Z

ddprof-lib/src/main/cpp/otel_process_ctx.cpp

+    // Try to create mapping from memfd
+    if (ftruncate(fd, mapping_size) == -1) {
+      otel_process_ctx_drop_current();
+      return (otel_process_ctx_result) {.success = false, .error_message = "Failed to truncate memfd (" __FILE__ ":" ADD_QUOTES(__LINE__) ")"};


Do you need to close(fd) here to avoid leaking it?

Nicely spotted! Fixed in ccf7ed4

jbachorik · 2026-03-17T14:42:45Z

ddprof-lib/src/main/cpp/otel_process_ctx.cpp

 #include <stdlib.h>
 #include <string.h>
+#include <sys/mman.h>
+#include <sys/prctl.h>


This one does not exist on macos - probably the easiest would be to ifdef the include. Alternative would be to follow the pattern in this repo and have otel_process_ctx_linux.cpp and otel_process_ctx_macos.cpp with the later one being virtually empty.

Arghh yeah. I've tweaked it in b8c7d1e to not even include the headers with the no-op build. Because we have testing on the other repo with both kinds of builds, this should make sure that we don't accidentally break the no-op build again.

The macos file is indeed an option, but I'm somewhat trying to avoid it since I'm trying to keep it easier to embed the library by having just the one .h/.c. Once more files are in play I think it complicates the lives of library consumers...

jbachorik · 2026-03-17T14:49:06Z

ddprof-lib/src/main/cpp/otel_process_ctx.cpp

-  *ptr++ = total_pairs & 0xFF;        // low byte of count
+  // ProcessContext.resource (field 1)
+  write_protobuf_tag(&ptr, 1);
+  write_protobuf_varint(&ptr, resource_size);


resource_size should probably checked to fit in UINT14_MAX

Yes, very correctly spotted out! I did a full pass to see if we missed a few more and found another. Fixed in 19fe5dc

jbachorik · 2026-03-17T14:57:35Z

ddprof-lib/src/main/cpp/otel_process_ctx.cpp

+  free(published_state.payload); // This was still pointing to the old payload
+  published_state.payload = payload;


This looks like we might race here - the reader might have obtained the payload before monotonic_published_at_ns has been zeroed.
And at this point it might be still reading from it?

Yes -- this is acknowledged in the spec and is somewhat intentional (to avoid heavier synchronization).

Quoting https://github.com/open-telemetry/opentelemetry-specification/pull/4719/changes#diff-c02a091b4a954164a90867c19771116b26d717419c16b715bce8b56d80ba49edR161 :

Readers SHOULD gracefully handle missing, incomplete, or invalid mappings. If a process does not publish context or if decoding fails, readers SHOULD fall back to default resource detection mechanisms.

We recommend the use of APIs such as process_vm_readv or equivalent to read data, as they allow gracefully handling of issues inherent to racy cross-process memory access.

So the intention is that the reader will be on the look out for the read possibly failing or being incorrect -- only after making sure it has a consistent read (monotonic_published_at_ns was the same before/after the read) does it know it can use the data it got.

(In general the eBPF Profiler is already taking care of these kinds of sharp edges -- there's a bunch of other situations e.g. for unwinding where it's racing the app and can see intermediate/inconsistent data)

In practice I'm not sure `ftruncate` could fail unless the kernel is already on fire but no reason not to fix the leak anyway.

We were accidentally importing at least one Linux-specific header, which made NOOP not work. To make sure NOOP keeps working, let's move everything it doesn't need inside the ifdef -- this way it'll be very obvious what's being compiled for NOOP and what's not.

@jbachorik

Really well-spotted by @jbachorik .

ivoanjo requested a review from a team as a code owner March 11, 2026 16:20

ivoanjo commented Mar 11, 2026

View reviewed changes

ivoanjo added 4 commits March 11, 2026 17:28

Add fallback for missing flag

f484ff7

I'll make sure to propagate this to the upstream version as well, so we don't need to keep maintaining this separately.

Pull in fix for scan-build memory leak

71eb63e

TL;DR if there were repeated keys when parsing the data in the testing code we could leak their values. This matches the latest version upstream in open-telemetry/sig-profiling#79 .

Merge branch 'main' into ivoanjo/update-process-context

ddb85f3

Merge branch 'main' into ivoanjo/update-process-context

fc90337

jbachorik reviewed Mar 17, 2026

View reviewed changes

ivoanjo added 4 commits March 18, 2026 14:55

Fix leaking fd on ftruncate failure

ccf7ed4

In practice I'm not sure `ftruncate` could fail unless the kernel is already on fire but no reason not to fix the leak anyway.

Undo weird formatting

d3fd5b3

Ensure sizes stay within limits of embedded encoder

19fe5dc

Really well-spotted by @jbachorik .

ivoanjo mentioned this pull request Mar 18, 2026

Improve validation checking/error handling process context C/C++ impl open-telemetry/sig-profiling#88

Open

		uint64_t otel_process_monotonic_published_at_ns; // Timestamp from when the context was published in nanoseconds from CLOCK_BOOTTIME. 0 during updates.
		char *otel_process_payload; // Always non-null, points to the storage for the data; expected to be a protobuf map of string key/value pairs, null-terminated

		free(published_state.payload); // This was still pointing to the old payload
		published_state.payload = payload;

Conversation

ivoanjo commented Mar 11, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scan-Build Report

Bug Summary

Reports

Uh oh!

dd-octo-sts bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Test Results

Status Overview

Uh oh!

ivoanjo commented Mar 11, 2026

Uh oh!

jbachorik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ivoanjo commented Mar 11, 2026 •

edited by atlassian bot

Loading

dd-octo-sts bot commented Mar 11, 2026 •

edited

Loading

dd-octo-sts bot commented Mar 11, 2026 •

edited

Loading