Skip to content

Conversation

@aignas
Copy link
Collaborator

@aignas aignas commented Feb 1, 2026

This allows us to cache what we store from the SimpleAPI in between the
runs.

Summary:

  • Refactor the existing code to use an await helper to handle parallelism
    better when we need to wait for the download of the Simple API contents and
    then do some processing.
  • Filter the return values to only the packages that are needed for the files
    instead of everything that we find on the index.
  • This in turn allows us to store the values as facts in the lock file.

TODO:

  • Add an attribute flag to disable facts by default. Enable it using the
    pip.default API for the root module only.
  • Add unit tests for facts code.
  • Receive feedback from people actually trying it.

Fixes #2731

This allows us to cache what we store from the SimpleAPI in between the
runs.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @aignas, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the efficiency and reproducibility of PyPI package resolution within Bazel by implementing a caching mechanism for SimpleAPI metadata. By storing essential 'facts' about distributions (such as filenames, hashes, and yanked status) directly in the MODULE.bazel.lock file, the system can avoid repeated network requests, leading to faster and more consistent builds. The changes also include improvements to asynchronous operations and more precise filtering of package data, ensuring that only necessary information is stored and processed.

Highlights

  • Fact Storage for PyPI SimpleAPI: Introduced a mechanism to store 'facts' (distribution filenames, hashes, and yanked status) fetched from the PyPI SimpleAPI into the MODULE.bazel.lock file, enabling caching between runs and reducing redundant network requests.
  • Parallelism and Asynchronous Operations: Refactored the SimpleAPI download process to leverage an _await helper, improving handling of parallel and asynchronous downloads and subsequent processing of package metadata.
  • Targeted Package Data Retrieval: Modified the SimpleAPI interaction to filter and retrieve only the specific package versions required, rather than fetching all available information, which optimizes data storage in facts and reduces lock file size.
  • Enhanced Version Parsing: Updated the pkg_version function to more robustly extract package versions from filenames, especially for source distributions, by requiring the distribution name for accurate parsing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a caching mechanism for SimpleAPI lookups using Bazel's facts feature, which is a significant improvement for performance on subsequent runs. The implementation involves a substantial refactoring of the download and parsing logic to be more modular and to better support asynchronous operations through a new _await helper. The changes are well-structured, particularly in how facts are managed to be compact for storage in the lockfile. My feedback focuses on improving documentation and code clarity in a few areas.

"""

def parse_simpleapi_html(*, url, content):
def parse_simpleapi_html(*, url, content, distribution = None, return_absolute = True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a few TODOs in the docstrings for new and modified functions in this file that should be addressed:

  • parse_simpleapi_html (lines 24, 26): The distribution and return_absolute parameters need descriptions.
  • pkg_version (lines 129, 130): The filename and distribution parameters need descriptions.

Please fill these in to improve the documentation.


def _read_index_result(ctx, result, output, url, cache, cache_key):
if not result.success:
def _read_simpleapi(ctx, index_url, pkg, attr, cache, get_auth = None, return_absolute = True, **download_kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a few TODOs in the docstrings for new functions in this file that should be addressed:

  • _read_simpleapi (line 282): The return_absolute parameter needs a description.
  • _read_simpleapi_with_facts (lines 342-343): The index_url and distribution parameters need descriptions.

Please fill these in to improve the documentation.

sha256s_by_version = sha256s_by_version,
)

def _facts(ctx, store_facts, facts_version = _FACT_VERSION):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameter name store_facts in this function is a bit confusing as it shadows the boolean flag with the same name in the simpleapi_download function. Here it refers to the dictionary that will be populated with facts. Consider renaming it to something like facts_dict or facts_to_populate to improve clarity and avoid ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write pip extension metadata to the MODULE.bazel.lock file

1 participant