Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR refactors the ESM-2 TE model hierarchy from Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Export as export.py
participant Convert as convert.py
participant HFModel as HF ESM2 Model
participant TEModel as TE ESM2 Model
participant StateMgmt as State Management
User->>Export: export_hf_checkpoint(tag)
Export->>HFModel: Load HF checkpoint
HFModel-->>Export: model + config
Export->>Convert: convert_esm_hf_to_te(model_hf)
Convert->>TEModel: Initialize TE model (empty)
Convert->>StateMgmt: extract state dicts
StateMgmt->>StateMgmt: Apply QKV packing transforms
StateMgmt->>StateMgmt: Apply embedding padding transforms
StateMgmt->>StateMgmt: Apply FC/bias transforms
StateMgmt->>TEModel: Load transformed state
TEModel-->>Convert: Converted model
Convert-->>Export: TE model ready
Export->>Export: Save checkpoint + config
Export->>Export: Smoke test load in bf16
Export-->>User: Exported checkpoint path
sequenceDiagram
participant Client as vLLM Client
participant VLLM as vLLM Engine
participant TEModel as TE ESM2 Model
participant Embeddings as Embeddings Layer
participant Encoder as TE Encoder
Client->>VLLM: Create LLM instance (pooling mode)
VLLM->>TEModel: Load checkpoint
VLLM->>TEModel: Initialize model
Client->>VLLM: embed(sequences)
VLLM->>TEModel: forward(input_ids, attention_mask)
TEModel->>Embeddings: Get token embeddings
Embeddings-->>TEModel: Embedded tokens
TEModel->>Encoder: Apply TE encoder layers
Encoder-->>TEModel: Encoded hidden states
TEModel->>TEModel: Extract last token + L2 normalize
TEModel-->>VLLM: Pooled embeddings
VLLM-->>Client: Normalized embeddings
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can customize the tone of the review comments and chat replies.Configure the |
|
@gagank1 : Are you working on a readme? |
|
/ok to test |
@gagank1, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test |
@gagank1, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test c34c09b |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (3)
bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py (1)
405-407: Same_tied_weights_keysdict type concern.See comment on
modeling_esm_te.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py` around lines 405 - 407, The class variable _tied_weights_keys uses the modern built-in generic dict[...] typing which may be incompatible with the rest of the codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is imported) or use typing.Mapping if immutability is desired, mirroring the fix applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str, str]] with the same key/value entries retained.bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py (1)
405-407: Same_tied_weights_keysdict type concern as inmodeling_esm_te.py.See comment on the canonical file — HF expects
list[str], notdict[str, str].🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py` around lines 405 - 407, The _tied_weights_keys typed as ClassVar[dict[str, str]] should be changed to ClassVar[list[str]] to match HF expectations (same fix as in modeling_esm_te.py); replace the dict literal with a list of the relevant parameter names (e.g. ["lm_head.decoder.weight", "model.embeddings.word_embeddings.weight"]) and update any usages that assume dict semantics to use the list order or explicit pairing where needed.bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py (1)
405-407: Same_tied_weights_keysdict type concern.See comment on
modeling_esm_te.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py` around lines 405 - 407, _tied_weights_keys is annotated as a mutable dict ClassVar which repeats the same typing concern as in modeling_esm_te.py; change the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]] from typing) and, to avoid accidental mutation, assign a read-only view (e.g., types.MappingProxyType({"lm_head.decoder.weight": "model.embeddings.word_embeddings.weight"})); update the import list to include typing.Mapping and types if not present and mirror the same pattern used/fixed in modeling_esm_te.py.
🧹 Nitpick comments (4)
bionemo-recipes/vllm/launch.sh (1)
50-50:exec $DOCKER_CMDis unquoted — word splitting will break paths with spaces.If
PROJECT_ROOTcontains spaces (e.g.,/home/user/my projects/...), the-vargument will be incorrectly split into multiple tokens. Use a Bash array to avoid this:🔧 Proposed fix (array-based approach)
Replace the string-based
DOCKER_CMDwith an array throughout the script:-DOCKER_CMD="docker run -itd ..." +DOCKER_CMD=("docker" "run" "-itd" "--gpus" "all" "--network" "host" "--ipc=host" "-e" "HF_TOKEN" "--rm" "--name" "${CONTAINER}_dev") if [ "$MOUNT_DIR" = true ]; then PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)" - DOCKER_CMD="$DOCKER_CMD -v ${PROJECT_ROOT}:/workspace/bionemo-framework" + DOCKER_CMD+=("-v" "${PROJECT_ROOT}:/workspace/bionemo-framework") fi -DOCKER_CMD="$DOCKER_CMD $CONTAINER /bin/bash" +DOCKER_CMD+=("$CONTAINER" "/bin/bash") -exec $DOCKER_CMD +exec "${DOCKER_CMD[@]}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/launch.sh` at line 50, The exec $DOCKER_CMD call uses an unquoted string which allows word-splitting and breaks mount paths with spaces (e.g., PROJECT_ROOT used in -v). Convert the string DOCKER_CMD into a Bash array (e.g., DOCKER_CMD=(docker run ...)) and update all places that build/append to DOCKER_CMD so they push elements into the array, then replace exec $DOCKER_CMD with exec "${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT value) is preserved; update any helper code that concatenates DOCKER_CMD to use array operations instead.bionemo-recipes/vllm/Dockerfile (2)
2-3: Base image is hosted on an internal NVIDIA GitLab registry — not pullable outside NVIDIA.
gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-develrequires internal network/credentials access. The commented-outnvcr.io/nvidia/vllm:26.01-py3alternative on line 1 is the publicly accessible equivalent. Once an NGC release with vLLM ≥ 0.14 becomes available, switching to the public image will make this recipe usable by external contributors without additional setup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/Dockerfile` around lines 2 - 3, The Dockerfile's FROM line currently uses an internal image tag "gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is not pullable externally; replace that base image with the public equivalent "nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the FROM instruction so external contributors can build without internal credentials, and retain a short comment noting the vLLM>=0.14 requirement and to switch back when an official public image with the needed version is available.
30-30: Pin thetransformer_engineversion for reproducible builds.
pip install --no-build-isolation transformer_engine[pytorch]with no version specifier will install whichever version is latest at build time. TE releases frequently and has had breaking API changes between major versions (e.g., 1.x → 2.x). A silent version bump can break the integration without any change to this file.🔧 Proposed fix
-RUN pip install --no-build-isolation transformer_engine[pytorch] +RUN pip install --no-build-isolation "transformer_engine[pytorch]==<verified_version>"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/Dockerfile` at line 30, The Dockerfile currently installs transformer_engine without a version pin (the RUN pip install --no-build-isolation transformer_engine[pytorch] line); change that to install a specific, tested TE release by updating that RUN to include an exact version specifier (for example: RUN pip install --no-build-isolation transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and won’t break on upstream major/minor bumps; optionally add a short comment noting the chosen compatible version.bionemo-recipes/vllm/test_esm2_golden_values.py (1)
46-63:sys.path.insert+os.chdiris fragile for test infrastructure.
sys.path.insert(0, ...)at module level (line 46) andos.chdirinsidefresh_exportmake this script sensitive to working directory and import order. This is acceptable for a standalone validation script run manually inside a container, but consider adding a note that this is not designed to run as part of a pytest suite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 46 - 63, The test module mutates import paths and working directory (sys.path.insert(0, ...), os.chdir(...) inside fresh_export) which is fragile for pytest; update the file to document this by adding a clear module-level comment or docstring near sys.path.insert and a brief note on fresh_export explaining it intentionally changes cwd for export_hf_checkpoint and that the script is not intended to be run under pytest/parallel test runners (referencing sys.path.insert, ESM2_MODEL_DIR, and fresh_export by name), or alternatively guard execution with a main-check so pytest won't import/run it implicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 1-36: Add a non-root user and switch to it in the Dockerfile so
the container does not run as root; create a user/group (e.g., bionemo), chown
the application directory (/workspace/bionemo) and any cache/venv paths to that
user, and add a final USER bionemo line. Locate the Dockerfile sections around
WORKDIR /workspace/bionemo and COPY . . and insert user creation
(useradd/addgroup or groupadd) and chown before switching context, then add USER
bionemo at the end so subsequent runtime processes run unprivileged.
In `@bionemo-recipes/vllm/launch.sh`:
- Around line 36-40: The script hardcodes "--name vllm_dev" in DOCKER_CMD which
prevents multiple instances and misrepresents the positional $CONTAINER
argument; update the usage comment to clarify that the positional argument is
the image name (or image_name) and change the DOCKER_CMD assignments in the
HEADLESS branch to use a container name derived from $CONTAINER (for example
--name "$CONTAINER" or --name "${CONTAINER}_dev") instead of "vllm_dev" so the
image argument is also used as the container name and avoids name collisions.
In `@bionemo-recipes/vllm/README.md`:
- Around line 21-23: Change the fenced code block that currently uses the wrong
language tag; locate the block containing the shell command "python
test_esm2_golden_values.py" in the README and replace the opening fence language
identifier from ```python to ```bash so the command is treated as a shell
snippet and highlighted correctly.
- Line 17: The README's "or use `launch.sh`" is misleading because `launch.sh
--mount_dir` mounts the repo to `/workspace/bionemo-framework` while the manual
`docker run` mounts to `/workspace/bionemo`, causing `python
test_esm2_golden_values.py` to fail when run from WORKDIR; fix by either (A)
update `launch.sh` to mount the project root into `/workspace/bionemo` instead
of `/workspace/bionemo-framework` (adjust mount target and any downstream path
references in launch.sh), or (B) update README.md to explicitly document that
`launch.sh --mount_dir` mounts at `/workspace/bionemo-framework` and instruct
users to cd into the correct subdirectory (where `test_esm2_golden_values.py`
lives) before running the test; mention the affected files `launch.sh`,
`README.md`, and `test_esm2_golden_values.py` so reviewers can locate the
changes.
In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 147-210: The script currently only prints comparisons and never
fails; add assertions that enforce the golden-value tolerances using RTOL and
ATOL: for each pair in pairs (refer to the pairs list and variables a, b),
assert np.allclose(a, b, rtol=RTOL, atol=ATOL) (or equivalently assert
(np.abs(a-b) <= ATOL + RTOL * np.abs(b)).all()) and fail the test if not, and
also assert cosine_sim(a, b) exceeds a sensible threshold or that exact is True
when ATOL/RTOL are zero; add per-sequence assertions inside the per-sequence
loop to ensure each sequence max-diff <= ATOL + RTOL * max(|b_i|) so the test
fails on unacceptable drift.
---
Duplicate comments:
In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: _tied_weights_keys is annotated as a mutable dict
ClassVar which repeats the same typing concern as in modeling_esm_te.py; change
the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]]
from typing) and, to avoid accidental mutation, assign a read-only view (e.g.,
types.MappingProxyType({"lm_head.decoder.weight":
"model.embeddings.word_embeddings.weight"})); update the import list to include
typing.Mapping and types if not present and mirror the same pattern used/fixed
in modeling_esm_te.py.
In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The _tied_weights_keys typed as ClassVar[dict[str, str]]
should be changed to ClassVar[list[str]] to match HF expectations (same fix as
in modeling_esm_te.py); replace the dict literal with a list of the relevant
parameter names (e.g. ["lm_head.decoder.weight",
"model.embeddings.word_embeddings.weight"]) and update any usages that assume
dict semantics to use the list order or explicit pairing where needed.
In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The class variable _tied_weights_keys uses the modern
built-in generic dict[...] typing which may be incompatible with the rest of the
codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is
imported) or use typing.Mapping if immutability is desired, mirroring the fix
applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str,
str]] with the same key/value entries retained.
---
Nitpick comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 2-3: The Dockerfile's FROM line currently uses an internal image
tag "gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is
not pullable externally; replace that base image with the public equivalent
"nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the
FROM instruction so external contributors can build without internal
credentials, and retain a short comment noting the vLLM>=0.14 requirement and to
switch back when an official public image with the needed version is available.
- Line 30: The Dockerfile currently installs transformer_engine without a
version pin (the RUN pip install --no-build-isolation
transformer_engine[pytorch] line); change that to install a specific, tested TE
release by updating that RUN to include an exact version specifier (for example:
RUN pip install --no-build-isolation
transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and
won’t break on upstream major/minor bumps; optionally add a short comment noting
the chosen compatible version.
In `@bionemo-recipes/vllm/launch.sh`:
- Line 50: The exec $DOCKER_CMD call uses an unquoted string which allows
word-splitting and breaks mount paths with spaces (e.g., PROJECT_ROOT used in
-v). Convert the string DOCKER_CMD into a Bash array (e.g., DOCKER_CMD=(docker
run ...)) and update all places that build/append to DOCKER_CMD so they push
elements into the array, then replace exec $DOCKER_CMD with exec
"${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT value) is
preserved; update any helper code that concatenates DOCKER_CMD to use array
operations instead.
In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 46-63: The test module mutates import paths and working directory
(sys.path.insert(0, ...), os.chdir(...) inside fresh_export) which is fragile
for pytest; update the file to document this by adding a clear module-level
comment or docstring near sys.path.insert and a brief note on fresh_export
explaining it intentionally changes cwd for export_hf_checkpoint and that the
script is not intended to be run under pytest/parallel test runners (referencing
sys.path.insert, ESM2_MODEL_DIR, and fresh_export by name), or alternatively
guard execution with a main-check so pytest won't import/run it implicitly.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
bionemo-recipes/models/esm2/convert.pybionemo-recipes/models/esm2/export.pybionemo-recipes/models/esm2/modeling_esm_te.pybionemo-recipes/models/esm2/tests/test_cp_bshd.pybionemo-recipes/models/esm2/tests/test_cp_thd.pybionemo-recipes/models/esm2/tests/test_distributed_fp8.pybionemo-recipes/models/esm2/tests/test_distributed_strategies.pybionemo-recipes/models/esm2/tests/test_modeling_esm_te.pybionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.pybionemo-recipes/recipes/esm2_native_te/train_ddp.pybionemo-recipes/recipes/esm2_native_te/train_ddp_cp.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.pybionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/vllm/Dockerfilebionemo-recipes/vllm/README.mdbionemo-recipes/vllm/launch.shbionemo-recipes/vllm/test_esm2_golden_values.py
|
/ok to test 60043cd |
|
/ok to test c2d2351 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1473 +/- ##
=======================================
Coverage 76.45% 76.45%
=======================================
Files 102 102
Lines 7952 7952
=======================================
Hits 6080 6080
Misses 1872 1872 |
|
/ok to test b6a3bcf |
Add a vLLM-based inference recipe for ESM2 models using TransformerEngine, including model conversion utilities, export scripts, and tests. Also updates NVEsmModel and related classes to support configurable add_pooling_layer via config and use self.model as the base_model_prefix for HuggingFace compatibility. Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
0a57a07 to
f2bfbce
Compare
|
/ok to test f2bfbce |
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
|
/ok to test a9a6f75 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (8)
.pre-commit-config.yaml (1)
32-58: Consider DRY: use YAML anchors to avoid duplicating the exclude pattern.The identical exclude pattern is duplicated between
ruffandruff-formathooks. YAML anchors can reduce this duplication and ensure both hooks stay in sync.♻️ Proposed refactor using YAML anchors
- id: ruff # 1. Attempt to automatically fix any lint issues. args: ["--fix"] # Exclude check_copied_files destinations; they are verbatim copies # of source files and must not be reformatted independently. - exclude: | + exclude: &ruff_exclude | (?x)^( bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv\.py| ... )$ - id: ruff-format - exclude: | - (?x)^( - bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv\.py| - ... - )$ + exclude: *ruff_exclude🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.pre-commit-config.yaml around lines 32 - 58, The exclude regex is duplicated for the ruff and ruff-format hooks; create a single YAML anchor (e.g., &ruff_exclude) for that multiline exclude pattern and replace both exclude blocks with an alias (e.g., *ruff_exclude) so the pattern is defined once and referenced by the ruff and ruff-format hook entries in .pre-commit-config.yaml.bionemo-recipes/recipes/esm2_native_te/tests/test_train.py (1)
296-309: Consider deduplicating FP8 log assertions into a shared helper.This block is effectively duplicated from the DDP FP8 stats test; extracting a helper would reduce maintenance cost for future log-path changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_native_te/tests/test_train.py` around lines 296 - 309, Extract the repeated FP8 log assertions into a shared helper (e.g., assert_fp8_logs(fp8_log_dir: Path, rank: int = 0)) and replace the duplicated block in test_train.py with a call to that helper; the helper should verify fp8_log_dir exists, the "rank_<rank>" directory, the "nvdlfw_inspect_logs" and "nvdlfw_inspect_statistics_logs" subdirectories, and that the two files named like "nvdlfw_inspect_globalrank-<rank>.log" (metadata_log and stats_log) both exist and are non-empty. Ensure the helper uses the same Path names (fp8_log_dir, metadata_log, stats_log) so callers are straightforward to update and put the helper in a shared test utilities module (or conftest) so other tests (the DDP FP8 stats test) can import and call it.bionemo-recipes/recipes/vllm_inference/.ci_build.sh (1)
2-2: Add error handling forcdcommand.If the
esm2directory doesn't exist, the script will continue in the wrong directory and fail unexpectedly. This was also flagged by static analysis (SC2164).Proposed fix
#!/bin/bash -x -cd esm2 +cd esm2 || exit 1 PIP_CONSTRAINT= pip install -r requirements.txt ./install_vllm.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/.ci_build.sh` at line 2, The script .ci_build.sh currently runs the command `cd esm2` without checking its result; update the script to handle failure of the `cd esm2` step (e.g., test that the directory exists or check the exit status) and abort with a clear error message and non-zero exit code if the chdir fails so the build doesn't continue in the wrong directory.bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile (1)
1-24: Consider adding a non-root user for security.Trivy flagged that the image runs as root (DS-0002). While this may be acceptable for development containers, consider adding a non-root user for production deployments to follow security best practices.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile` around lines 1 - 24, Add a non-root user and switch to it before the final WORKDIR to avoid running the image as root: create a dedicated user/group in the Dockerfile (e.g., via RUN useradd/groupadd or adduser), chown the application directory (/workspace/bionemo) and any cache dirs used during build to that user, and set USER to that account before the final WORKDIR /workspace/bionemo. Ensure the conditional INSTALL_VLLM block still runs as root (keep it before creating/switching user) and only switch to the non-root user afterward so runtime processes no longer run as root.bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh (1)
10-14: Add error handling forcdcommands.If
/workspacedoesn't exist orcd vllmfails after cloning, the script will continue executing in the wrong directory, potentially causing unexpected behavior.Proposed fix
-cd /workspace +cd /workspace || { echo "ERROR: /workspace does not exist"; exit 1; } if [ ! -d vllm ]; then git clone --branch v0.15.1 --depth 1 https://github.com/vllm-project/vllm.git fi -cd vllm +cd vllm || { echo "ERROR: Failed to enter vllm directory"; exit 1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh` around lines 10 - 14, The script currently runs `cd /workspace` and `cd vllm` without checking for errors; update the install_vllm.sh flow to fail fast on directory errors by verifying `cd /workspace` succeeds (create /workspace with `mkdir -p` or exit with an error message if it cannot be accessed) and after cloning verify `cd vllm` returned successfully (if not, print a clear error and exit non‑zero). Apply these checks around the existing `cd /workspace` and `cd vllm` commands (or enable `set -euo pipefail` at the top) so the script doesn’t continue in the wrong directory.bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py (3)
174-184: Mutatingself.configmay cause unintended side effects.Directly assigning to
self.config.layer_precisionmodifies the config object, which could affect other components sharing the same config instance. Consider storing this as an instance attribute instead.Suggested fix
self.config = config self._fp8_recipe: transformer_engine.common.recipe.Recipe | None = fp8_recipe self._fp4_recipe: transformer_engine.common.recipe.Recipe | None = fp4_recipe + self._layer_precision = config.layer_precision - if self.config.layer_precision is None: + if self._layer_precision is None: if fp8_recipe is not None and fp4_recipe is not None: raise RuntimeError("Both FP8 and FP4 recipes provided, but no layer precision provided.") if fp8_recipe is not None: warnings.warn("No layer precision provided, using FP8 recipe for all layers.", UserWarning) - self.config.layer_precision = ["fp8"] * self.config.num_hidden_layers + self._layer_precision = ["fp8"] * self.config.num_hidden_layers elif fp4_recipe is not None: raise RuntimeError( "FP4 recipe provided but no layer_precision configured. " "Set layer_precision explicitly when using FP4." ) - if self.config.layer_precision is not None and "fp4" in self.config.layer_precision and fp4_recipe is None: + if self._layer_precision is not None and "fp4" in self._layer_precision and fp4_recipe is None: raise RuntimeError("layer_precision contains 'fp4' entries but no fp4_recipe was provided.")Then update
get_autocast_contextto useself._layer_precisioninstead ofself.config.layer_precision.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` around lines 174 - 184, Don't mutate the shared config by assigning to self.config.layer_precision; instead set an instance attribute (e.g., self._layer_precision) based on the existing logic (use ["fp8"] * self.config.num_hidden_layers when fp8_recipe is provided, raise for both recipes or for fp4 without explicit layer_precision), and then update get_autocast_context to read from self._layer_precision instead of self.config.layer_precision so other components sharing the config are not affected.
696-696: Extract duplicated magic number to a class constant.
mask_ratio_train = 0.15 * 0.8is defined identically in both_apply_token_dropout_bshdand_apply_token_dropout_thd. Consider defining this as a class constant to follow DRY principles.Suggested fix
class NVEsmEmbeddings(nn.Module): """Modified version of EsmEmbeddings to support THD inputs.""" + # Mask ratio used during ESM model training (15% masking * 80% replacement) + _MASK_RATIO_TRAIN = 0.15 * 0.8 + def __init__(self, config):Then replace both occurrences:
- mask_ratio_train = 0.15 * 0.8 # Hardcoded as the ratio used in all ESM model training runs + mask_ratio_train = self._MASK_RATIO_TRAINAlso applies to: 717-717
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` at line 696, The duplicated magic number (0.15 * 0.8) used for mask_ratio_train in _apply_token_dropout_bshd and _apply_token_dropout_thd should be extracted to a class-level constant (e.g., MASK_RATIO_TRAIN) on the model class (the class that defines _apply_token_dropout_bshd/_apply_token_dropout_thd) and both functions should reference that constant instead of recalculating the value inline; add the constant near other class constants, replace the inline expressions in _apply_token_dropout_bshd and _apply_token_dropout_thd with the new class constant, and ensure any imports/uses within the class access it as self.MASK_RATIO_TRAIN or ClassName.MASK_RATIO_TRAIN as appropriate.
80-80: Confusing default value forpadded_vocab_size.The default of
64contradicts the docstring which says "If not provided, defaults to vocab_size." This will cause unexpected behavior for models with larger vocabularies. Consider usingNoneas the default to match the documented behavior.Suggested fix
- padded_vocab_size: Optional[int] = 64, + padded_vocab_size: Optional[int] = None,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` at line 80, The parameter padded_vocab_size currently defaults to 64; change its signature to padded_vocab_size: Optional[int] = None (and update any type hints) and ensure the constructor/initialization logic in the class/function (where padded_vocab_size is used) treats None by setting padded_vocab_size = vocab_size so behavior matches the docstring; also update any downstream uses that assume an int to handle the None-to-vocab_size assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.pre-commit-config.yaml:
- Around line 44-45: The regex used to exclude files under
bionemo-recipes/models/(llama3|mixtral)/tests/common/ only matches the directory
itself, not files inside it; update the exclusion patterns in
.pre-commit-config.yaml for the ruff and ruff-format hooks (the entries
containing the string "bionemo-recipes/models/(llama3|mixtral)/tests/common/")
to append ".*" so they match files within that directory (e.g., change the
pattern to end with "/tests/common/.*").
In `@bionemo-recipes/models/esm2/modeling_esm_te.py`:
- Around line 414-416: The current fallback for add_pooling_layer in the model
init uses getattr(config, "add_pooling_layer", True) which conflicts with
NVEsmConfig's declared default False and the docstring; change the getattr
default to False so the fallback is getattr(config, "add_pooling_layer", False)
(keep the outer check if add_pooling_layer is None), and update the
function/class docstring where it states the default (currently "defaults to
True") to reflect the False default; reference symbols: add_pooling_layer,
config, NVEsmConfig, and the initializer in modeling_esm_te.py.
In `@bionemo-recipes/recipes/vllm_inference/.ci_test_env.sh`:
- Around line 1-2: The script lacks a shebang and blindly runs the cd command;
add a proper shebang (e.g., #!/usr/bin/env bash) at the top and guard the cd by
checking that the target directory exists and is a directory before changing
into it (for example test -d "esm2" || exit 1, or use if [ -d "esm2" ]; then cd
"esm2"; else echo "esm2 not found" >&2; exit 1; fi) so the script fails fast
instead of continuing in the wrong directory; update the file around the
existing cd esm2 invocation accordingly.
In `@bionemo-recipes/recipes/vllm_inference/esm2/convert.py`:
- Around line 110-129: The call to output_model.post_init() is redundant because
state.apply_transforms(...) already performs weight tying (see
state.apply_transforms) and related init work; remove the
output_model.post_init() call after apply_transforms, or if you only want to
ensure tied weights explicitly, replace it with a direct call to
output_model.tie_weights() instead of post_init; update the code around the
EsmForMaskedLM construction and the output_model variable accordingly.
In `@bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile`:
- Around line 16-21: The Dockerfile runs the vLLM build steps (git clone vllm,
python use_existing_torch.py, uv pip install -r requirements/build.txt --system,
uv pip install --no-build-isolation -e . --system, pip install --upgrade
"transformers[torch]") but does not pass TORCH_CUDA_ARCH_LIST and MAX_JOBS like
install_vllm.sh; fix by exporting or prefixing TORCH_CUDA_ARCH_LIST and MAX_JOBS
with appropriate values (matching install_vllm.sh) for the uv pip install -r
requirements/build.txt --system and uv pip install --no-build-isolation -e .
--system commands (and the final pip install if needed) so the build targets the
correct CUDA arch and uses the desired parallelism.
In `@bionemo-recipes/recipes/vllm_inference/esm2/export.py`:
- Around line 70-121: The export_hf_checkpoint function loads local assets using
bare relative paths which break when the caller's CWD is different; change all
asset references in export_hf_checkpoint
(AutoTokenizer.from_pretrained("esm_fast_tokenizer"),
open("model_readme.template"), shutil.copy("modeling_esm_te.py", ...),
shutil.copy("LICENSE", ...)) to resolve them relative to the recipe file by
computing root = Path(__file__).resolve().parent and then using root /
"esm_fast_tokenizer", root / "model_readme.template", root /
"modeling_esm_te.py", and root / "LICENSE" so the tokenizer, template, and
copied files are always loaded from the recipe package rather than the current
working directory or the Hub.
In `@bionemo-recipes/recipes/vllm_inference/esm2/model_readme.template`:
- Around line 91-94: The "Runtime Engine(s):" section currently lists only
"Hugging Face Transformers" but this is a vllm_inference template and should
explicitly list vLLM (and include Transformers only if both are supported);
update the "Runtime Engine(s):" block in the vllm_inference
model_readme.template to include "vLLM" and, if the recipe supports both, list
"vLLM, Hugging Face Transformers" (apply the same change at the second
occurrence referenced in the comment).
- Around line 18-20: The description for "ESM-2" in model_readme.template
overstates its capability as directly performing protein structure prediction;
update the wording to say ESM-2 is a protein language/embedding model trained
with a masked LM objective that produces sequence embeddings useful for
downstream tasks (including structure-related analyses) rather than claiming it
predicts 3D structures directly; change the paragraph referencing "ESM-2" and
the similar sentence at the other occurrence (line ~44) to this softened
phrasing so the README correctly presents the model as an embedding/language
model for downstream use.
In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py`:
- Around line 140-143: The assertion checking padded_vocab_size vs vocab_size
should be converted to an explicit runtime check: replace the assert in the
block that references self.padded_vocab_size and self.vocab_size with an
if-condition that raises a ValueError when padded_vocab_size is less than
vocab_size, using the same descriptive message (e.g., "padded_vocab_size (...)
must be greater than or equal to vocab_size (...)"); update the check in the
method or initializer where these attributes are set (the code block containing
self.padded_vocab_size and self.vocab_size) so the validation cannot be bypassed
by Python optimizations.
In `@bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py`:
- Around line 1-20: Add a top-level Google-style module docstring to conftest.py
describing the module's purpose (e.g., pytest configuration and test path setup)
to satisfy pydocstyle; place a triple-quoted docstring at the very top of the
file that briefly summarizes what conftest.py does (adjusts sys.path and
provides test fixtures/config) and include any relevant notes or authorship per
project conventions.
- Line 20: Replace the sys.path.append call in conftest.py with
sys.path.insert(0, Path(__file__).parent.parent.as_posix()) so the local esm2
package is prioritized during tests; this ensures the import in test_vllm.py
(from export import export_hf_checkpoint) resolves to the local source rather
than an installed package and you can locate the call by searching for
sys.path.append and Path(__file__).parent.parent.as_posix() in conftest.py.
In `@bionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.py`:
- Around line 62-131: Add a module-level skip marker so all tests are skipped
when CUDA isn't available: detect CUDA availability (torch.cuda.is_available())
and set pytestmark = pytest.mark.skipif(not torch.cuda.is_available(),
reason="requires CUDA") at the top of the module; this prevents session-scoped
fixtures hf_exported_embeddings and hf_reference_embeddings from calling
_hf_embed (which .to("cuda")s models/tensors) on CPU-only runners and avoids
failures during fixture setup.
---
Nitpick comments:
In @.pre-commit-config.yaml:
- Around line 32-58: The exclude regex is duplicated for the ruff and
ruff-format hooks; create a single YAML anchor (e.g., &ruff_exclude) for that
multiline exclude pattern and replace both exclude blocks with an alias (e.g.,
*ruff_exclude) so the pattern is defined once and referenced by the ruff and
ruff-format hook entries in .pre-commit-config.yaml.
In `@bionemo-recipes/recipes/esm2_native_te/tests/test_train.py`:
- Around line 296-309: Extract the repeated FP8 log assertions into a shared
helper (e.g., assert_fp8_logs(fp8_log_dir: Path, rank: int = 0)) and replace the
duplicated block in test_train.py with a call to that helper; the helper should
verify fp8_log_dir exists, the "rank_<rank>" directory, the
"nvdlfw_inspect_logs" and "nvdlfw_inspect_statistics_logs" subdirectories, and
that the two files named like "nvdlfw_inspect_globalrank-<rank>.log"
(metadata_log and stats_log) both exist and are non-empty. Ensure the helper
uses the same Path names (fp8_log_dir, metadata_log, stats_log) so callers are
straightforward to update and put the helper in a shared test utilities module
(or conftest) so other tests (the DDP FP8 stats test) can import and call it.
In `@bionemo-recipes/recipes/vllm_inference/.ci_build.sh`:
- Line 2: The script .ci_build.sh currently runs the command `cd esm2` without
checking its result; update the script to handle failure of the `cd esm2` step
(e.g., test that the directory exists or check the exit status) and abort with a
clear error message and non-zero exit code if the chdir fails so the build
doesn't continue in the wrong directory.
In `@bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile`:
- Around line 1-24: Add a non-root user and switch to it before the final
WORKDIR to avoid running the image as root: create a dedicated user/group in the
Dockerfile (e.g., via RUN useradd/groupadd or adduser), chown the application
directory (/workspace/bionemo) and any cache dirs used during build to that
user, and set USER to that account before the final WORKDIR /workspace/bionemo.
Ensure the conditional INSTALL_VLLM block still runs as root (keep it before
creating/switching user) and only switch to the non-root user afterward so
runtime processes no longer run as root.
In `@bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh`:
- Around line 10-14: The script currently runs `cd /workspace` and `cd vllm`
without checking for errors; update the install_vllm.sh flow to fail fast on
directory errors by verifying `cd /workspace` succeeds (create /workspace with
`mkdir -p` or exit with an error message if it cannot be accessed) and after
cloning verify `cd vllm` returned successfully (if not, print a clear error and
exit non‑zero). Apply these checks around the existing `cd /workspace` and `cd
vllm` commands (or enable `set -euo pipefail` at the top) so the script doesn’t
continue in the wrong directory.
In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py`:
- Around line 174-184: Don't mutate the shared config by assigning to
self.config.layer_precision; instead set an instance attribute (e.g.,
self._layer_precision) based on the existing logic (use ["fp8"] *
self.config.num_hidden_layers when fp8_recipe is provided, raise for both
recipes or for fp4 without explicit layer_precision), and then update
get_autocast_context to read from self._layer_precision instead of
self.config.layer_precision so other components sharing the config are not
affected.
- Line 696: The duplicated magic number (0.15 * 0.8) used for mask_ratio_train
in _apply_token_dropout_bshd and _apply_token_dropout_thd should be extracted to
a class-level constant (e.g., MASK_RATIO_TRAIN) on the model class (the class
that defines _apply_token_dropout_bshd/_apply_token_dropout_thd) and both
functions should reference that constant instead of recalculating the value
inline; add the constant near other class constants, replace the inline
expressions in _apply_token_dropout_bshd and _apply_token_dropout_thd with the
new class constant, and ensure any imports/uses within the class access it as
self.MASK_RATIO_TRAIN or ClassName.MASK_RATIO_TRAIN as appropriate.
- Line 80: The parameter padded_vocab_size currently defaults to 64; change its
signature to padded_vocab_size: Optional[int] = None (and update any type hints)
and ensure the constructor/initialization logic in the class/function (where
padded_vocab_size is used) treats None by setting padded_vocab_size = vocab_size
so behavior matches the docstring; also update any downstream uses that assume
an int to handle the None-to-vocab_size assignment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e2413d6e-00ae-495d-aa4b-c6737be37029
📒 Files selected for processing (41)
.pre-commit-config.yaml.secrets.baselinebionemo-recipes/models/esm2/README.mdbionemo-recipes/models/esm2/convert.pybionemo-recipes/models/esm2/export.pybionemo-recipes/models/esm2/modeling_esm_te.pybionemo-recipes/models/esm2/tests/test_cp_bshd.pybionemo-recipes/models/esm2/tests/test_cp_thd.pybionemo-recipes/models/esm2/tests/test_distributed_fp8.pybionemo-recipes/models/esm2/tests/test_distributed_strategies.pybionemo-recipes/models/esm2/tests/test_modeling_esm_te.pybionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/esm2_native_te/fp4_debugging_stats.yamlbionemo-recipes/recipes/esm2_native_te/modeling_esm_te.pybionemo-recipes/recipes/esm2_native_te/quantization.pybionemo-recipes/recipes/esm2_native_te/tests/test_quantization.pybionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.pybionemo-recipes/recipes/esm2_native_te/tests/test_train.pybionemo-recipes/recipes/esm2_native_te/train_ddp.pybionemo-recipes/recipes/esm2_native_te/train_ddp_cp.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.pybionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/vllm_inference/.ci_build.shbionemo-recipes/recipes/vllm_inference/.ci_test_env.shbionemo-recipes/recipes/vllm_inference/esm2/Dockerfilebionemo-recipes/recipes/vllm_inference/esm2/LICENSEbionemo-recipes/recipes/vllm_inference/esm2/README.mdbionemo-recipes/recipes/vllm_inference/esm2/convert.pybionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/special_tokens_map.jsonbionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/tokenizer.jsonbionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/tokenizer_config.jsonbionemo-recipes/recipes/vllm_inference/esm2/export.pybionemo-recipes/recipes/vllm_inference/esm2/install_vllm.shbionemo-recipes/recipes/vllm_inference/esm2/model_readme.templatebionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.pybionemo-recipes/recipes/vllm_inference/esm2/requirements.txtbionemo-recipes/recipes/vllm_inference/esm2/state.pybionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.pybionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.pyci/scripts/check_copied_files.py
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
|
/ok to test dcaa6c7 |
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
|
/ok to test 13046cb |
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com> # Conflicts: # .secrets.baseline
|
/ok to test 7209d43 |
Description
This PR makes the ESM2 model compatible with vLLM. Primary issues were a naming incompatibility (vLLM expects
model.prefix and ESM2 usesesm.) andNVEsmModeldefaults toadd_pooling_layer=Truewhen loading the checkpoint even though it's exported without pooler weights.Usage
python test_esm2_golden_values.pyfrom inside the container, instructions to build and run it are provided.Type of changes
Triggering Code Rabbit AI Review
To trigger a code review from code rabbit, comment on a pull request with one of these commands:
See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.
Pre-submit Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores