Skip to content

[Registry] Cache shared manifest fetches during generation#952

Open
hssuryawanshib23-afk wants to merge 6 commits intomeshery:masterfrom
hssuryawanshib23-afk:fix/aso-manifest-dedup
Open

[Registry] Cache shared manifest fetches during generation#952
hssuryawanshib23-afk wants to merge 6 commits intomeshery:masterfrom
hssuryawanshib23-afk:fix/aso-manifest-dedup

Conversation

@hssuryawanshib23-afk
Copy link
Copy Markdown

@hssuryawanshib23-afk hssuryawanshib23-afk commented Mar 28, 2026

Summary

Cache package fetches during registry generation when multiple models share the same registrant and source URL.

Changes

  • add a per-generation package fetcher keyed by registrant and source URL
  • use singleflight so concurrent models sharing the same source only fetch once
  • reuse the fetched package across model generation instead of calling GetPackage() for every model
  • add regression tests for repeated, concurrent, and cross-registrant fetch behavior

Why

ASO model generation currently fetches the same shared manifest once per model even though all models point to the same source URL. This change deduplicates those fetches without changing generated output.

Verification

  • go test ./registry/... -count=1

Related

Signed-off-by: Harsh Sunil Suryawanshi <hssuryawanshi_b23@el.vjti.ac.in>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a packageFetcher utility to optimize package retrieval by implementing caching and request deduplication using singleflight. It refactors the model generation logic to use this fetcher and adds comprehensive unit tests to verify caching behavior and concurrent request handling. A review comment identifies a redundant cache check within the singleflight.Do block that can be removed to simplify the implementation.

hssuryawanshib23-afk and others added 2 commits March 28, 2026 20:30
Signed-off-by: Harsh Sunil Suryawanshi <hssuryawanshi_b23@el.vjti.ac.in>
@leecalcote
Copy link
Copy Markdown
Member

Code review

The caching approach using singleflight + sync.Map is well-structured, and the test coverage for the deduplication and concurrency behaviors is thorough.

Found 1 issue:

  1. modelName is excluded from the cache key, but both generator backends use it to determine which package to fetch

The cache key is built from only registrant and sourceURL:

func (pf *packageFetcher) getPackage(registrant, sourceURL, modelName string) (models.Package, error) {
cacheKey := fmt.Sprintf("%s\x00%s", utils.ReplaceSpacesAndConvertToLowercase(registrant), sourceURL)
if cachedPkg, ok := pf.cache.Load(cacheKey); ok {

However, modelName is passed to newGenerator and both backends use it materially:

  • ArtifactHub calls GetAhPackagesWithName(ahpm.PackageName) — it searches ArtifactHub by name, so different model names at the same source URL return different packages.
  • GitHub sets Name: packageName on the returned GitHubPackage, which propagates into generated components.

The result: when multiple models share the same (registrant, sourceURL) but have different names, every model after the first silently receives the cached package that was fetched using the first model's name. For ArtifactHub registrants in particular, this means the wrong package data is used for generation.

Fix: include modelName (normalized, for consistency with registrant) in the cache key, e.g. fmt.Sprintf("%s\x00%s\x00%s", registrant, sourceURL, modelName). This trades some cache efficiency for correctness, though in practice most sharing happens across models that differ only in name at the same source URL — meaning the cache would rarely be hit anyway.

@hssuryawanshib23-afk
Copy link
Copy Markdown
Author

Thanks, @leecalcote. Updated this to address the modelName correctness issue.

I split the behavior by backend:

  • ArtifactHub now includes modelName in the cache key, since package resolution there is name-
    sensitive
  • GitHub still deduplicates by shared registrant + sourceURL, but returns a per-model copy with
    the requested name so generated component metadata stays correct

So this keeps the intended optimization for shared GitHub manifests like ASO, while avoiding
incorrect package reuse for cases where the model name materially affects package selection.

I also updated the tests to cover both behaviors and re-verified locally with:

  • go test ./registry/... -count=1

@leecalcote
Copy link
Copy Markdown
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants