module: add clearCache for CJS and ESM#61767
Conversation
|
Review requested:
|
90303e6 to
1d0accc
Compare
|
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. |
|
I’m relatively +1 on having this in Node.js, but I recall having a a lot of discussions about this @GeoffreyBooth and @nodejs/loaders teams about this, and it would massively break the spec, expectations, and invariants regarding ESM. (Note, this is what people have been asking us to add for a long time). My personal objection to this API is that it would inadvertently leak memory at every turn, so while this sounds good in theory, in practice it would significantly backfire in long-running scenarios. An option could be to expose it only behind a flag, putting the user in charge of choosing this behavior. Every single scenario where I saw HMR in Node.js ends up in memory leaks. This is the reason why I had so much interest and hopes for ShadowRealm. |
benjamingr
left a comment
There was a problem hiding this comment.
I am still +1 on the feature from a user usability point of view. Code lgtm.
We're giving users a tool, it may be seen as a footgun by some but hopefully libraries that use the API correctly and warn users about incorrect usage emerge. |
|
@mcollina Thanks for the feedback. I agree the ESM semantics concerns are real. This API doesn’t change the core ESM invariants (single instance per URL); it only removes Node's internal cache entries to allow explicit reloads in opt‑in workflows. Even with that, existing references (namespaces, listeners, closures) can keep old graphs alive, so this is still potentially leaky unless the app does explicit disposal. I’ll make sure the docs call out the risks and the fact that this only clears Node’s internal caches, and I’d like loader team input on the final shape of the API. This commit should address some of your concerns. b3bd79a
Thanks for the review @benjamingr. Would you mind re-reviewing again so I can trigger CI? |
|
Thanks a lot for this ❤️ |
|
Rather than violating ESM invariants, can't node just provide a function that imports a url? i.e. While the given example of: const url = new URL('./mod.mjs', import.meta.url);
await import(url.href);
clearCache(url);
await import(url.href); // re-executes the moduleis indeed not spec compliant, it's perfectly legal to have something like: import { clearCache, importModule } from "node:module";
await importModule(someUrl);
clearCache();
await importModule(someUrl); // reexecute |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #61767 +/- ##
==========================================
- Coverage 89.72% 89.71% -0.02%
==========================================
Files 676 677 +1
Lines 206065 206553 +488
Branches 39508 39597 +89
==========================================
+ Hits 184897 185301 +404
- Misses 13315 13383 +68
- Partials 7853 7869 +16
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
While I am +1 to the idea in general, I am afraid the current API may bring more problem than it solves...see the comments.
(Granted it isn't really a problem unique to this specific design, I think the issue is more that this is not a very well solved problem so far, I don't really know what it should look like, though I think I might be able to point out what it should not look like to avoid adding/re-introducing leaks/use-after-frees that user land workarounds can already manage)
|
I was the one requesting this while sitting next to yagiz today. We take advantage of Module Federation which allows us to distribute code at runtime. However, when parts of the distributed system are updated, it gets stuck in module cache. I've had some workarounds, like attempting to purge require cache - however when it comes to esm, it's a difficult problem. Since we do this distribution primarily in production, and there can be thousands of updates a day, I block esm from being supported because it'll leak memory - which was fine for several years but becoming more problematic in modern tooling. On lambda we cannot just exit a process and bring a new one up without triggering a empty deploy, which has generally been a perf hit to cold start a new lambda vs try and "reset" the module cache for primitive hot reload. Now, I know this might be controversial, or not recommended - but the reality is that many large companies use federation, most fortune 50 companies use it heavily. All of them are relying on userland cobbling I've created. If there is a solution, it would be greatly appreciated by all of my users. I believe this would also be very useful in general for tooling like rspack etc where we have universal dev serves. If invalidation of specific modules causes complexity, I'd be more than happy with a nuclear option like resetModuleCache() which just clears everything entirely. Would be a little slower, but nothing is slower than killing a process and bringing up a new one. "Soft Restart" node without killing it. Don't have much opinion on spec compliance etc, can go through NAPI as well if that would avoid any spec concerns or pushback. |
|
Chiming in to say that re-loading a module is very helpful in tests. We can do this with the fabulous CJS paradigm, but ESM does not have a viable equivalent and it should. |
|
I think there are still quite a few places that need updates/tests - I tried my best to find them, but there are some dusty corners in the module loader that I have never poked at, you might want to take a heap snapshot or write more tests with
|
|
I think I addressed all of your concerns @joyeecheung. Let me know if I missed anything! |
Just pinging @guybedford to speak on the spec concerns. I think we should wait for him or someone similarly knowledgeable about the spec to comment before landing. In general I'm +1 on the feature, assuming it can be safely implemented. My (dim) recollection was that the last time we considered it, it was impossible to modify an ES module after it had been loaded into V8. Has that changed in recent years? How do you handle cases like |
guybedford
left a comment
There was a problem hiding this comment.
This seems very reasonable to me!
In terms of spec compatibility, formally hot reloading does break the host hook invariant, although the reality with loaders already does of course, and perhaps there are wording changes here that might accommodate this better. The resolve cache clearing has some interesting spec integration aspects, since the resolve cache is on the module records themselves in ECMA-262, and also in HTML there's a global resolved set of (specifier, parent) pairs. But the model implemented here seems like it would mostly be consistent with both of those constructs if we ever wanted to consider something standard.
I'd like to add these topics to the agenda at the next TC39 modules coming up next Thursday 26. The meeting is on the public calendar if anyone here interested would like to join. There are no concerns from my side, but I think it's a very interesting spec discussion to have nonetheless.
I'd like to join. Have link to calendar? |
|
Same thing |
21bc1e3 to
fb387a4
Compare
|
I do not have the whole picture, but is we are going to discuss this at the TC39 modules meeting here is my perspective from a spec point of view. A spec-compliant way of implementing hot reload of a module in a graph would be to re-execute the root of the graph, re-executing all the modules in the path(s) from the root to the updated module, and re-using the previous evaluations for modules in the other branches. This includes both edges created by static imports and by dynamic imports. For example, consider a graph like this one, with A being the root: If you want to reload F, you'd need to reload/rerun A/B/E/F (let's call them A'/B'/E'/F'), while re-using the existing D/G/H. Once that's done, all the old copies of the modules that have been re-executed can be garbage collected. There is no module that has already run At this point there is nobody that can, for example, observe that
The spec is not incompatible with hot reloading, but it makes sure that you cannot swap the dependencies under an already evaluated module. By doing so it avoids problems like #61767 (comment). I guess concretely, this would be implemented by counting the number of alive importers of a module, and only allowing to clear the cache from one with no importers alive. In the example above, after the reloading happens:
|
|
Given the "spec compliant" post above, I think an example of what I think should be possible is warranted. Consider the case where the thing being tested hooks into module loading. A test suite for such a thing may traditionally look like: test.beforeEach((ctx) => {
ctx.namespace = {}
ctx.namespace.mod = require('something')
})
test.afterEach(() => {
clearCache('something')
})
test('foo is added', (t) => {
const { mod } = t.namespace
t.assert.equal(mod.foo, 'foo')
})
test('bar is added', (t) => {
const { mod } = t.namespace
t.assert.equal(mod.bar, 'bar')
})The key to that example is that the module is completely re-required between each subtest so that the module patching happens prior to each subtest. As a user, I do not care how it happens, or what the spec folks want in order for it to happen, I just want the same concept to be 1:1 viable with ESM. |
| were removed from each cache. | ||
|
|
||
| Clears the CommonJS `require` cache and the ESM module cache for a module. This enables | ||
| reload patterns similar to deleting from `require.cache` in CommonJS, and is useful for HMR. |
There was a problem hiding this comment.
This may need to spell out a bit more concretely how HMR is supposed to use this API without leaking more. From #61767 (comment) there are a couple of things that need to be taken care of:
- The solution must clear the cache for the module that has changed
- The solution must also force a "re-linking" of the module graph so that dependent of the changed module see the update (the mechanics of this may involve more cache clearing as it bubbles up).
2 has a subtle difference in CommonJS v.s. ESM dependents. If the changed module has CommonJS dependents, the solution may need to track and clear the cache of all the dependent modules and re-evaluate from the root to see the update. For ESM dependents it's easier because evaluation is separate from linking, so the solution can simply clear the cache for the changed module and the root, and re-evaluate from the root to pick up the new branch while reusing all the existing branches (so unchanged modules won't get evaluated again). This won't violate the spec as explained by Nicolo because it's technically disposing the old graph and then linking and evaluating a new graph again that just reuse some existing module records, not mutating the existing link of an old, evaluated graph.
Although, it would be much simpler if whatever that uses it enforces that the module that gets HMR can only ever be the root itself and no bottom-up dependency HMR is supported. But I am not sure if that's a limitation that all existing HMR users can enforce.
There was a problem hiding this comment.
Would it be simpler if instead of a “clear cache” API, we just provided a “replace module” API? Where the new module must have the same exports as the old one, so that the linking wouldn’t need to be redone?
There was a problem hiding this comment.
I think regardless of whether the new module has the same exports or not, linking must be redone. Links exist between different module records (not Node.js ModuleJobs or ModuleWraps, but v8::Module), and we can't swap out what's inside a v8::Module - but can only compile a new one and relink.
doc/api/module.md
Outdated
|
|
||
| > Stability: 1.1 - Active development | ||
|
|
||
| * `specifier` {string|URL} The module specifier or URL to resolve. The resolved URL/filename |
There was a problem hiding this comment.
What does this do when you call module.clearCache('./x.js') from say path/to/y.js but do not pass in parentURL? I think the current API hints that it will resolve to path/to/x.js but in reality this actually resolves to cwd/x.js? If that's intentional this needs a test - although I think resolving relative to cwd might be rather surprising.
It might be better to just enforce that either it's a full URL, or if it's not, parentURL is mandatory. Or alternatively - never takes specifiers, always take full URLs. Users can get to the URL via import.meta.resolve or require.resolve or with more rudimentary methods like URL/path.join in the examples below. Then they can cache the URL themselves as needed. And the API is only in charge of clearing the load cache entry corresponding to that URL. That'll also simplify the implementation a lot, and it no longer needs to create a fake parent for CJS resolution.
doc/api/module.md
Outdated
| const url = new URL('./mod.mjs', import.meta.url); | ||
| await import(url.href); | ||
|
|
||
| clearCache(url); |
There was a problem hiding this comment.
This still needs examples showing what clearing for non-absolute specifiers would do.
| * @param {Record<string, string>} importAttributes | ||
| * @returns {boolean} true if any entries were deleted. | ||
| */ | ||
| deleteResolveCacheEntry(specifier, parentURL, importAttributes) { |
There was a problem hiding this comment.
Do we still need this and other things that clear the resolve cache?
doc/api/module.md
Outdated
| may remain. | ||
| When a `file:` URL is resolved, cached module jobs for the same file path are cleared even if | ||
| they differ by search or hash. | ||
| If the same file is loaded via multiple specifiers (for example `require('./x')` alongside |
There was a problem hiding this comment.
I think this also needs to explain that if both import('x', { with: { type: 'json' } }) and import('x', { with: { type: 'javascript' } }) have happened, the implementation clears the cache for both (I am not fully sure this is possible with type, but it can be possible when customization supports other import attributes)
|
I checked out the existing HMR solutions a bit and I think this API may be enough: /**
* @param {string|URL} specifier // This is what would've been passed into import(specifier) or require(specifier)
* @param {{
* parentURL: string | URL, // Mandatory, because parent identity is part of the resolution cache key
* importAttributes?: Record<string, string>, // Optional, only meaningful when resolver is "import"
* resolver: "import" | "require", // Specifies how resolution should be performed
* caches: "resolution" | "module" | "all", // resolution: only clear resolution cache; module: clear cache for the module everywhere in Node.js (not counting JS level references)
* }} options
*/
function clearCache(specifier, options) {}clearing resolution cache is still useful for HMR solutions that do cache busting URLs - which I think may actually be the more spec-compliant way to implement it. The spec violation technically doesn't come from evaluation, but from module mapping specified by HostLoadImportedModule:
The cache clearing API makes it possible for the same referrer + module request to get different module records in return, but it does not mean this must be violating the spec by nature, it just delegates the responsibility of correctness to whoever that uses this API, similar to how V8 delegates this to Node.js. One way to ensure this is correctly implemented is to use a cache busting referrer (i.e. A minimal example of using this (ignoring some complexities from e.g. fs) can be let app, rev = 0;
const reload = async () => {
const prev = rev ? `./app.mjs?hmr=${rev}` : null;
await app?.dispose?.(); // clear side effects
if (prev) {
module.clearCache(prev, {
parentURL: import.meta.url,
resolver: "import",
caches: "all",
});
}
app = await import(`./app.mjs?hmr=${++rev}`);
};
await reload();
http.createServer((req, res) => app.handle(req, res)).listen(3000);
fs.watch(new URL(import.meta.resolve('./app.mjs')), reload); |
|
@joyeecheung In that example, if |
|
@ScriptedAlchemy @Nsttt the meeting is today at 4pm UTC. If nobody sent you the link yet, feel free to email me at the email on my GitHub profile and I'll send you the meeting URL. |
This is a minimal example, but if dependencies need to be supported, the HMR solution can just append hmr parameter to all the dependencies via a loader hook that track the graph through context.parentURL and manage the lifecycle of them, which IIUC is what they already do for CJS anyway (because |
d4fb1b4 to
7e9a7c5
Compare
|
I think one way to test this more robustly (i.e. V8 can actually garbage collect it) might be something like this: // Flags: --expose-internals
const { internalBinding } = require('internal/test/binding');
const { ModuleWrap } = internalBinding('module_wrap');
const { queryObjects } require('node:v8'); // Let's run the test in CJS to reduce the noise from queryObject
let app, rev = 0;
const reload = async () => {
const prev = rev ? `./app.mjs?hmr=${rev}` : null;
if (prev) {
module.clearCache(prev, {
parentURL: import.meta.url,
resolver: "import",
caches: "all",
});
}
app = await import(`./app.mjs?hmr=${++rev}`);
};
(async() {
await reload(); // first load
await reload(); // second
const result = queryObjects(ModuleWrap, { format: 'summary' });
// Validate that result no longer includes module with a wrap whose .url includes `app.mjs?hmr=0`
})();(Or use checkIfCollectableByCounting with ModuleWrap) |
|
@joyeecheung Pushed a new test according to your recommendations. |

Introduce Module.clearCache() to invalidate CommonJS and ESM module caches with optional resolution context, enabling HMR-like reloads. Document the API and add tests/fixtures to cover cache invalidation behavior.