Skip to content

refactor(cache): implement registry isolation in cache directory#2406

Open
justDemo-hjw wants to merge 1 commit intoutooland:nextfrom
justDemo-hjw:next
Open

refactor(cache): implement registry isolation in cache directory#2406
justDemo-hjw wants to merge 1 commit intoutooland:nextfrom
justDemo-hjw:next

Conversation

@justDemo-hjw
Copy link

Fixes #2281

Summary

Currently, all packages are stored in ~/.cache/nm/ without registry dimension isolation. This causes package confusion and integrity risks when switching between different registries (e.g., registry.npmjs.org and registry.npmmirror.com).

This PR adds registry dimension isolation to the cache directory structure by including the registry host in the cache path. The cache structure changes from ~/.cache/nm/package-name/version/ to ~/.cache/nm/registry-host/package-name/version/.

Changes:

  • Added registry_to_dir_name() function to convert registry URL to directory name
  • Modified get_cache_dir() to return path with registry isolation
  • Updated clean command to use get_cache_dir() instead of hardcoded path

All existing code using get_cache_dir() automatically adapts to the new structure without modification, ensuring full backward compatibility.

Test Plan

  1. Unit tests: Added tests for registry_to_dir_name() with various registry URL formats and verified get_cache_dir() includes registry dimension.

  2. Manual testing:

    • Install packages with default registry (registry.npmmirror.com), verify cache path is ~/.cache/nm/registry.npmmirror.com/package-name/version/
    • Switch to registry.npmjs.org and install packages, verify cache path is ~/.cache/nm/registry.npmjs.org/package-name/version/
    • Run utoo clean, verify only current registry's cache is cleaned
    • Verify packages from different registries are stored in separate directories

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @justDemo-hjw, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the cache management system to introduce registry-level isolation for stored packages. By embedding the registry host into the cache directory path, it resolves potential conflicts and integrity issues that arise when using multiple package registries. This enhancement improves the reliability and predictability of the caching mechanism, ensuring that packages from different sources are kept distinct.

Highlights

  • Cache Directory Isolation: Implemented registry isolation in the cache directory structure to prevent package confusion and integrity risks when switching between different registries.
  • New Cache Path Structure: The cache path now includes the registry host, changing from "/.cache/nm/package-name/version/" to "/.cache/nm/registry-host/package-name/version/".
  • registry_to_dir_name() Function: Introduced a new utility function "registry_to_dir_name()" to convert registry URLs into a clean directory name format by stripping protocols and trailing slashes.
  • get_cache_dir() Update: Modified "get_cache_dir()" to dynamically construct the cache path, incorporating the current registry's host for isolation.
  • clean Command Adaptation: The "clean" command now utilizes the updated "get_cache_dir()", ensuring that it only cleans the cache relevant to the currently configured registry.
  • Backward Compatibility: The changes are designed to be fully backward compatible, with existing code automatically adapting to the new cache structure.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces registry isolation in the cache directory, which is a valuable enhancement for users working with multiple package registries. The implementation is well-structured, and the changes are logical. I've identified a potential cross-platform compatibility issue in how registry URLs are converted to directory names, particularly when a port is present. Additionally, I've suggested an improvement to a test case to make it more robust in verifying the new functionality. Overall, these are great changes that improve the tool's flexibility.

.strip_prefix("https://")
.or_else(|| registry_url.strip_prefix("http://"))
.unwrap_or(registry_url);
url.trim_end_matches('/').to_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of registry_to_dir_name doesn't sanitize characters that are invalid in directory names on certain operating systems. For example, if a registry URL contains a port (e.g., http://localhost:4873), the resulting directory name localhost:4873 would be invalid on Windows due to the colon :. To ensure cross-platform compatibility, it's best to replace such characters.

You could also strengthen test_registry_to_dir_name by adding a test case for a URL with a port, assuming the fix is applied:

assert_eq!(
    registry_to_dir_name("http://localhost:4873"),
    "localhost_4873"
);
Suggested change
url.trim_end_matches('/').to_string()
url.trim_end_matches('/').replace(':', "_").to_string()

Comment on lines +313 to +318
let cache_dir = get_cache_dir();
let cache_dir_str = cache_dir.to_string_lossy();

// Should contain registry directory name
// The exact structure depends on current registry setting
assert!(cache_dir_str.contains("nm") || cache_dir_str.contains("cache"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test's assertion is a bit too lenient. assert!(cache_dir_str.contains("nm") || cache_dir_str.contains("cache")) would likely pass even with the old implementation, as the path already contained .cache/nm. A more robust test would be to verify that the generated cache directory path actually ends with the directory-sanitized registry name. This would more accurately confirm that registry isolation is working as intended.

Suggested change
let cache_dir = get_cache_dir();
let cache_dir_str = cache_dir.to_string_lossy();
// Should contain registry directory name
// The exact structure depends on current registry setting
assert!(cache_dir_str.contains("nm") || cache_dir_str.contains("cache"));
let cache_dir = get_cache_dir();
let registry = super::config::get_registry();
let registry_dir_name = registry_to_dir_name(&registry);
assert!(cache_dir.ends_with(registry_dir_name));

@justDemo-hjw
Copy link
Author

#2281

@github-actions
Copy link

📊 Performance Benchmark Report (with-antd)

Utoopack Performance Report

Report ID: utoopack_performance_report_20260317_063651
Generated: 2026-03-17 06:36:51
Trace File: trace_antd.json (0.6GB, 1.66M spans)
Test Project: examples/with-antd


Executive Summary

Metric Value Assessment
Total Wall Time 8,466.0 ms Baseline
Total Thread Work (de-duped) 27,692.7 ms Non-overlapping busy time
Effective Parallelism 3.3x thread_work / wall_time
Working Threads 5 Threads with actual spans
Thread Utilization 65.4% 🆗 Average
Total Spans 1,659,360 All B/E + X events
Meaningful Spans (>= 10us) 562,724 (33.9% of total)
Tracing Noise (< 10us) 1,096,636 (66.1% of total)

Build Phase Timeline

Shows when each build phase is active and how much CPU it consumes.
Self-Time is the time spent exclusively in that phase (excluding children).

Phase Spans Inclusive (ms) Self-Time (ms) Wall Range (ms)
Resolve 127,863 3,419.0 2,751.7 5,085.4
Parse 10,219 1,158.6 1,093.5 8,311.7
Analyze 328,968 17,316.1 12,783.9 7,936.2
Chunk 26,932 2,041.7 1,868.6 2,312.2
Codegen 59,232 4,550.8 3,365.1 2,031.2
Emit 28 61.8 30.8 11.9
Other 9,482 1,476.4 1,341.9 8,466.0

Workload Distribution by Diagnostic Tier

Category Spans Inclusive (ms) % Work Self-Time (ms) % Self
P0: Scheduling & Resolution 464,706 22,065.2 79.7% 16,740.0 60.4%
P1: I/O & Heavy Tasks 2,910 136.4 0.5% 105.4 0.4%
P2: Architecture (Locks/Memory) 0 0.0 0.0% 0.0 0.0%
P3: Asset Pipeline 94,996 7,768.4 28.1% 6,344.5 22.9%
P4: Bridge/Interop 0 0.0 0.0% 0.0 0.0%
Other 112 54.5 0.2% 45.7 0.2%

Top 20 Tasks by Self-Time

Self-time is the exclusive duration: time spent in the task itself, not in sub-tasks.
This is the most accurate indicator of where CPU cycles are actually spent.

Self (ms) Inclusive (ms) Count Avg Self (us) P95 Self (ms) Max Self (ms) % Work Task Name Top Caller
7,275.5 8,215.4 206,325 35.3 0.1 8.4 26.3% module write all entrypoints to disk (1%)
2,835.1 3,710.0 36,661 77.3 0.2 234.8 10.2% analyze ecmascript module process module (77%)
1,677.1 2,862.8 24,712 67.9 0.2 61.5 6.1% code generation chunking (7%)
1,530.7 1,649.4 60,378 25.4 0.0 6.7 5.5% internal resolving resolving (30%)
1,395.3 4,080.7 69,811 20.0 0.0 15.9 5.0% process module module (15%)
1,395.1 1,395.1 32,584 42.8 0.1 11.4 5.0% precompute code generation code generation (31%)
1,268.4 1,391.7 8,745 145.0 0.0 269.8 4.6% write all entrypoints to disk None (0%)
1,213.0 1,761.6 66,776 18.2 0.0 8.0 4.4% resolving module (30%)
1,180.7 1,322.7 14,887 79.3 0.2 51.7 4.3% chunking write all entrypoints to disk (0%)
1,129.7 1,129.7 13,489 83.8 0.3 118.1 4.1% compute async module info chunking (0%)
1,026.9 1,092.0 8,046 127.6 0.5 35.6 3.7% parse ecmascript analyze ecmascript module (26%)
655.0 655.7 11,818 55.4 0.1 42.5 2.4% compute async chunks write all entrypoints to disk (0%)
292.9 292.9 1,936 151.3 0.4 17.4 1.1% generate source map code generation (96%)
83.8 83.8 786 106.6 0.0 22.6 0.3% compute binding usage info write all entrypoints to disk (0%)
66.5 66.5 2,167 30.7 0.0 4.0 0.2% read file parse ecmascript (91%)
51.2 51.2 1,873 27.4 0.0 7.7 0.2% collect mergeable modules compute merged modules (0%)
32.9 63.3 227 145.0 0.3 18.0 0.1% make production chunks chunking (2%)
30.6 30.6 13 2350.1 10.2 11.1 0.1% write file apply effects (100%)
27.8 30.3 625 44.5 0.1 2.7 0.1% async reference write all entrypoints to disk (1%)
17.9 17.9 4 4463.7 11.6 13.0 0.1% blocking map chunk groups (25%)

Critical Path Analysis

The longest sequential dependency chains that determine wall-clock time.
Focus on reducing the depth of these chains to improve parallelism.

Rank Self-Time (ms) Depth Path
1 234.9 2 process module → analyze ecmascript module
2 88.0 2 process module → analyze ecmascript module
3 78.9 2 code generation → generate source map
4 42.0 2 code generation → generate source map
5 35.6 2 analyze ecmascript module → parse ecmascript

Batching Candidates

High-volume tasks dominated by a single parent. If the parent can batch them,
it drastically reduces scheduler overhead.

Task Name Count Top Caller (Attribution) Avg Self P95 Self Total Self
analyze ecmascript module 36,661 process module (77%) 77.3 us 0.16 ms 2,835.1 ms

Duration Distribution

Range Count Percentage
<10us 1,096,636 66.1%
10us-100us 535,670 32.3%
100us-1ms 22,646 1.4%
1ms-10ms 4,308 0.3%
10ms-100ms 94 0.0%
>100ms 6 0.0%

Action Items

  1. [P0] Focus on tasks with the highest Self-Time — these are where CPU cycles are actually spent.
  2. [P0] Use Batching Candidates to identify callers that should use try_join or reduce #[turbo_tasks::function] granularity.
  3. [P1] Check Build Phase Timeline for phases with disproportionate wall range vs. self-time (= serialization).
  4. [P1] Inspect P95 Self (ms) for heavy monolith tasks. Focus on long-tail outliers, not averages.
  5. [P1] Review Critical Paths — reducing the longest chain depth directly improves wall-clock time.
  6. [P2] If Thread Utilization < 60%, investigate scheduling gaps (lock contention or deep dependency chains).

Report generated by Utoopack Performance Analysis Agent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant