Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly updates the AgentJet project's foundational dependencies, including the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a major upgrade to the verl backbone, refactoring core training logic, configuration, and dependencies. Key changes include adopting new Ajet specific workers, overhauling the weight synchronization mechanism to use a CheckpointManager, and introducing a more complex rollout correction algorithm. While these changes seem to align with a newer version of the underlying framework, there are several areas that need attention regarding code robustness, maintainability, and removal of debug artifacts before merging.
| try: | ||
| reward_result = reward_fn(data, return_dict=True) | ||
| reward_tensor = reward_result["reward_tensor"] | ||
| reward_extra_infos_dict = reward_result.get("reward_extra_info", {}) | ||
| except Exception as e: | ||
| print(f"Error in reward_fn: {e}") | ||
| reward_tensor = reward_fn(data) | ||
| reward_extra_infos_dict = {} | ||
|
|
There was a problem hiding this comment.
The use of a broad except Exception is risky as it can mask unexpected and critical errors during reward computation. This makes debugging difficult and can lead to silent failures. It's better to catch specific exceptions that you expect the legacy reward_fn to raise, or to check the function's signature/return type before calling it.
ajet/backbone/verl/dp_actor.py
Outdated
| # make sure we are in training mode | ||
| self.actor_module.train() | ||
|
|
||
| from ajet import bp; bp("UPP") |
| for token_id, logprob, decoded_string in zip(token_array, logprob_array, decoded_string_array) # type: ignore | ||
| ], |
There was a problem hiding this comment.
Using zip on three separate lists (token_array, logprob_array, decoded_string_array) without a length check is unsafe. If the lists have different lengths for any reason, zip will silently truncate the data to the length of the shortest list, which can lead to subtle bugs and data loss. Please add an assertion to ensure all lists have the same length before iterating.
| else: | ||
| raise NotImplementedError | ||
| assert hasattr(self.async_rollout_manager, "agent_loop_workers") | ||
| assert len(self.async_rollout_manager.agent_loop_workers) == 1, "Please set `num_workers = 1` in `ajet/default_config/verl/verl_default.yaml`" |
There was a problem hiding this comment.
The assertion message is brittle as it hardcodes a specific file path. If the configuration file is moved or renamed, this message will become misleading. It's better to provide a more general and helpful message that guides the user to the correct configuration parameter.
| assert len(self.async_rollout_manager.agent_loop_workers) == 1, "Please set `num_workers = 1` in `ajet/default_config/verl/verl_default.yaml`" | |
| assert len(self.async_rollout_manager.agent_loop_workers) == 1, "Only a single agent loop worker is supported. Please check your configuration for `ajet.rollout.agent.num_workers` and set it to 1." |
ajet/backbone/verl/dp_actor.py
Outdated
| loss = policy_loss * loss_scale_factor | ||
| loss.backward() | ||
|
|
||
| print(f'*** Current tensor shape, input_ids {input_ids.shape}, response {response_mask.shape}, loss {loss}, loss_scale_factor {loss_scale_factor}') |
| # # run gc in a thread-safe way | ||
| # if gc_lock.acquire(blocking=False): | ||
| # try: | ||
| # gc.collect() | ||
| # finally: | ||
| # gc_lock.release() |
There was a problem hiding this comment.
This block of code for garbage collection is commented out without any explanation. If it's no longer necessary, it should be removed to improve code clarity. If it's disabled temporarily, a comment explaining the reason (e.g., performance issues, handled elsewhere) would be very helpful for future maintenance.
| requires-python = ">=3.10,<3.13" | ||
| dependencies = [ | ||
| "agentscope==1.0.8", | ||
| "agentscope==1.0.7", |
There was a problem hiding this comment.
d0dd93f to
60acbdf
Compare
No description provided.