Allow the global allocator to use thread-local storage and std::thread::current()#144465
Allow the global allocator to use thread-local storage and std::thread::current()#144465bors merged 12 commits intorust-lang:mainfrom
Conversation
|
I think this merits some libs-api discussion (unless that's already happened? I didn't quickly find it). Landing this IMO implies at least an implicit guarantee (even if not necessarily stable without actual docs) that we don't introduce global allocator usage in thread local code. I think we should have some discussion and either commit to that or not. And we should discuss where we draw the line (e.g., is other runtime-like code in std supposed to do this? For example, environment variables or other bits that need allocation early, potentially even before main starts -- is that OK to call into the allocator for?) OTOH, if we can make a weaker guarantee here while still serving the purpose, that may also be good to look at? For example, can we guarantee that a pattern like dhat's for ignoring re-entrant alloc/dealloc calls is safe? i.e., that we don't need allocations for non-Drop thread locals? Or if we can only do that for a limited set, perhaps std could define a public thread local for allocators to use? |
Yes, this should probably be properly discussed.
You make a good point here about environment variables. It is very common to allow configuration and debugging of allocators through them. I'd like to at least guarantee that
I think it's good to brainstorm a bit what would be needed for a global allocator, although I don't think an allocator needs all that much from
I don't think there's much difference in constraint for the stdlib between guaranteeing this for all
Not without putting undue burden on the allocator. |
|
I did some investigation into some of the problems of making Details
The above seems feasible to me, although I also had another thought: I don't know if if !alloc_is_init() {
init_alloc();
// Now the allocator works and may be re-entrant.
spawn_background_threads();
}So I'm perfectly happy if we don't make While investigating the above I also found the following additional blocker for making
To answer my own question: |
|
I've added a commit to use a spinlock for I also investigated environment variables and... it's hopeless without a new API. The current API returns owned |
|
We talked about this in today's @rust-lang/libs-api meeting. We were supportive of doing this, and we agreed that using TLS in an allocator seems desirable. We also decided there's no point in doing this if we don't make it a guarantee of support. Thus, labeling this libs-api. @rfcbot merge Also, before merging this we'd like to see the documentation for TLS updated to make this guarantee. |
|
Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
|
Separately, it would also be useful for the global allocator documentation to provide an explicit safelist of other things you're allowed to use. |
|
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Done.
Where? On the |
|
So with this, for the first time, a program with a At the very least, this should be documented with that attribute. |
|
Much like Ralf, I am quite surprised at the "allocator fork" occurring here. Now, I would like to note that there is precedent for this. One of the early pains I encountered in replacing the system allocator with a custom allocator was that the loader will use the system allocator for loading the code (and constants, etc...) regardless. Thus, to an extent, there are already allocations bypassing the global allocator. Still, up until now, this was a well-defined set. Loaded sections of library/binary would be allocated with the system allocator, all further allocations would be allocated with the global allocator. The divide is clear. And one can (try to) write their own loader if they wish to change the statu quo. This changes removes control from the hands of the (allocator) developer. It may be fine. It may be the beginning of a slippery slope. For the point at hand, is there any reason that the thread-local destructors could not be registered in an intrusive linked list instead? That is, each thread-local variable requiring a destructor would be accompanied by a thread-local: struct Node {
pointer: *mut u8,
destructor: unsafe extern "C" fn(*mut u8),
next: *mut Node,
}And destruction would simply consist of popping the first destructor of the linked-list and executing it in a loop. (This doesn't solve all problems identified in this PR, but it may solve the most salient one) |
Well, yes and no. I think technically yes, this is the first time specifically However, Note that this doesn't affect |
On some platforms we can only install a key (that is, a pointer) into thread-local storage, which requires boxing of the thread-local variable. We must allocate on these platforms. See here: rust/library/std/src/sys/thread_local/os.rs Line 101 in 8410440 We also require allocation for
Well, considering there are platforms which hard require allocation for thread-locals, and every platform requiring allocation for |
|
@RalfJung @matthieu-m Sorry, forgot to tag you earlier, awaiting your response. |
|
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 1eb0657 (parent) -> 467250d (this PR) Test differencesShow 192 test diffsStage 1
Stage 2
Additionally, 190 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 467250ddb25a93315a8dee02dd6cc1e398be46ff --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (467250d): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -1.0%, secondary 4.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -2.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 472.886s -> 473.687s (0.17%) |
|
I realize that it's somewhat late (just saw this in the release notes), but I agree with Ralf that it's kind of surprising to switch stdlib functionality to using While, on native systems, it might be the case that there were already intermittend Given that Rust now gives a guarantee that global allocators can use TLS without modification, I doubt there is much that can be done about this at this point. But I do wonder whether there wouldn't have been a way to support TLS in allocator impls without giving up on the existing global allocator guarantees. For example, a thread local API that takes an explicit allocator akin to Footnotes
|
I'm not sure I agree that "my platform's What is the precise platform that is broken? Perhaps we can patch |
It's not that
My point was not at all that "my platform's System is broken" is the primary use case. That's why I only put it into a footnote. The point was just that a previously stated guarantee ("route all default allocation requests to a custom object") that I could imagine other people depended on as well was softened. I know it's hard to make any kind of change without the potential for breakage and sometimes one needs to be practical. What concerns me here though is that the feature "override all std allocations", which previously existed, is now gone and we only have "override most std allocations" (that's at least the effect on Wasm, even if malloc was already used in some places before on native targets).
In practice, my workaround was not broken, as in, it is currently deployed in production and working. Patching a dependency to avoid |
I can't speak for the libs team, but if we do a patch release reverting the guarantees that
However, if this means the stabilization of All of the above is of course assuming the libs team actually considers it problematic we use |
|
I've renominated this for libs-api to discuss possibly reverting this change. Also cc @Amanieu if you have any thoughts. |
|
One option might be to add something like this to GlobalAllocator. That way allocators that can deal with re-entrancy from std just fine (e.g., ones using C only, or just counting bytes allocated in a global static) could be defined with false and continue to be used by std. Name should obviously be bikeshed but it feels like a fairly reasonable tradeoff between allowing a general override and working out of the box for people who don't want to think about it. (A more complicated variant, and probably slower to stabilize, would be a separate global allocator item for std internals, but I'm not convinced that's better). Allocators are already unsafe impl, so we can require the value returned is always the same pretty easily. |
|
@Mark-Simulacrum I don't really understand your proposed solution. Who would call |
|
std's code that this PR changed to call System directly would instead condition on the global_allocator's return from requires_std_reentrancy and either call System (the default) or call the global allocator (if it opts-in). |
|
We discussed this in today's libs-api meeting. We're not inclined to revert/backport at this time (since this has reached stable and there may be code depending on the new guarantees already). The proposed next step is to open a libs-api ACP that proposes the API I suggested above to allow GlobalAlloc impls to opt-in to allowing the re-entrancy. @laurmaedje Do you think that's sufficient for your case? i.e., the override you previously had wasn't using the APIs std guaranteed here to not call into GlobalAlloc? We also noticed you mentioned "the docs previously stated that all allocations are routed through the global allocator", can you link those docs? We'd like to make sure they're updated (and presuming the ACP would get accepted, point at the strategy for opting in). |
|
Cut the ACP: rust-lang/libs-team#743 |
|
Thanks for discussing this among the libs team!
Yes, for my use case, this would absolutely suffice. While I think a TLS design akin to
I was referring to the global allocator section in the previous docs:
Of course, the "default" is doing some heavy lifting, so it's not 100% clear cut. The docs have already been updated to include the footnote so I don't think anything further needs to change here. |
Fixes #115209.
Currently the thread-local storage implementation uses the
Globalallocator if it needs to allocate memory in some places. This effectively means the global allocator can not use thread-local variables. This is a shame as an allocator is precisely one of the locations where you'd really want to use thread-locals. We also see that this lead to hacks such as #116402, where we detect re-entrance and abort.So I've made the places where I could find allocation happening in the TLS implementation use the
Systemallocator instead. I also applied this change to the storage allocated for aThreadhandle so that it may be used care-free in the global allocator as well, for e.g. registering it to a central place or parking primitives.r? @joboet