ESM2 changes to work with vLLM by gagank1 · Pull Request #1473 · NVIDIA/bionemo-framework

gagank1 · 2026-02-18T23:44:15Z

Description

This PR makes the ESM2 model compatible with vLLM. Primary issues were a naming incompatibility (vLLM expects model. prefix and ESM2 uses esm.) and NVEsmModel defaults to add_pooling_layer=True when loading the checkpoint even though it's exported without pooler weights.

Usage

python test_esm2_golden_values.py from inside the container, instructions to build and run it are provided.

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

@coderabbitai review - Triggers a standard review
@coderabbitai full review - Triggers a comprehensive review

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

Summary by CodeRabbit

New Features
- Added vLLM inference recipe for ESM2 models with containerized deployment support.
- Added configurable pooling layer option to ESM2 model initialization.
Bug Fixes
- Improved state dictionary filtering and checkpoint export to prevent shape mismatches in downstream applications.
Documentation
- Added vLLM inference README with installation and usage examples.
- Added Apache License 2.0 for vLLM recipe components.
Chores
- Updated pre-commit configuration with additional file exclusions.
- Enhanced FP8 statistics logging verification in tests.

copy-pr-bot · 2026-02-18T23:44:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-18T23:44:25Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3be83f97-4be6-47db-a046-23a8ab2e5e19

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR refactors the ESM-2 TE model hierarchy from esm to model namespace, adds a configurable pooling layer option to NVEsmConfig, introduces comprehensive state transformation utilities for checkpoint conversion, and adds a complete vLLM inference recipe with bidirectional HF-to-TE conversion support.

Changes

Cohort / File(s)	Summary
Model Refactoring `bionemo-recipes/models/esm2/modeling_esm_te.py`, `bionemo-recipes/recipes/esm2_native_te/modeling_esm_te.py`, `bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`, `bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py`	Updated `base_model_prefix` from "esm" to "model", added `add_pooling_layer` config option (default False), changed internal `self.esm` references to `self.model`, updated weight-tying keys to "model.embeddings.word_embeddings.weight", extended state_dict filtering to exclude `.inv_freq` keys.
Convert & Export Utilities `bionemo-recipes/models/esm2/convert.py`, `bionemo-recipes/models/esm2/export.py`, `bionemo-recipes/recipes/vllm_inference/esm2/convert.py`, `bionemo-recipes/recipes/vllm_inference/esm2/export.py`	Updated mapping keys from esm.\* to model.\* for encoder layers and embeddings; adjusted QKV packing/unpacking and embedding padding transforms to reference new paths; added `padded_vocab_size=None` parameter; created new vLLM export utilities with HF checkpoint export and model metadata handling.
State Transformation Module `bionemo-recipes/recipes/vllm_inference/esm2/state.py`	Introduced comprehensive state-dict transformation utilities including TransformCTX context holder, StateDictTransform class, decorator-based state_transform factory, and TransformFns with QKV/FC/padding operations to enable flexible bidirectional model conversions.
Test Updates `bionemo-recipes/models/esm2/tests/test_.py`, `bionemo-recipes/recipes/esm2_native_te/tests/test_.py`	Updated references from `model.esm.encoder` to `model.model.encoder` across nondistributed and distributed test paths; updated state_dict key comparisons to align with new namespace; expanded FP8 statistics logging verification.
vLLM Recipe Infrastructure `bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile`, `bionemo-recipes/recipes/vllm_inference/esm2/.ci_build.sh`, `bionemo-recipes/recipes/vllm_inference/esm2/.ci_test_env.sh`, `bionemo-recipes/recipes/vllm_inference/esm2/requirements.txt`, `bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh`	Added Docker build configuration with conditional VLLM installation, CI build/test environment scripts, and build requirements including transformer\_engine, accelerate, datasets, and PEFT.
vLLM Recipe Configs & Tokenizer `bionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/...`, `bionemo-recipes/recipes/vllm_inference/esm2/tokenizer_config.json`, `bionemo-recipes/models/esm2/esm_fast_tokenizer/...`	Added complete ESM-2 fast tokenizer vocabulary mapping, special tokens configuration (\<cls\>, \<pad\>, \<eos\>, \<unk\>, \<mask\>), and tokenizer_config with model_max_length and input_names.
vLLM Recipe Documentation & Models `bionemo-recipes/recipes/vllm_inference/esm2/README.md`, `bionemo-recipes/recipes/vllm_inference/esm2/model_readme.template`, `bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py`	Added comprehensive README documenting vLLM inference setup, template for model card with deployment info and hardware requirements, and TE-optimized ESM implementation with config validation, rotary embeddings, FP8/FP4 autocast support, and THD/BSHD input handling.
vLLM Recipe Tests & License `bionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.py`, `bionemo-recipes/recipes/vllm_inference/esm2/LICENSE`, `bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py`	Added vLLM golden tests comparing embeddings across vLLM, HF exported, and HF reference checkpoints; included Apache 2.0 license; added test configuration.
Training Script Updates `bionemo-recipes/recipes/esm2_native_te/train_ddp.py`, `bionemo-recipes/recipes/esm2_native_te/train_ddp_cp.py`, `bionemo-recipes/recipes/esm2_native_te/train_fsdp2.py`, `bionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.py`, `bionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.py`	Introduced backward-compatible base model attribute detection using conditional logic (`model.model if hasattr(model, "model") else model.esm`) to handle both wrapped and direct model structures.
Configuration & Metadata `.pre-commit-config.yaml`, `.secrets.baseline`, `bionemo-recipes/models/esm2/README.md`, `bionemo-recipes/recipes/esm2_native_te/fp4_debugging_stats.yaml`, `ci/scripts/check_copied_files.py`	Updated pre-commit exclude patterns for generated/copied files; updated secrets baseline line numbers; added vLLM recipe documentation link; updated YAML layer selection patterns from model.esm to model.model; added file synchronization mappings for vLLM recipe components.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Export as export.py
    participant Convert as convert.py
    participant HFModel as HF ESM2 Model
    participant TEModel as TE ESM2 Model
    participant StateMgmt as State Management

    User->>Export: export_hf_checkpoint(tag)
    Export->>HFModel: Load HF checkpoint
    HFModel-->>Export: model + config
    Export->>Convert: convert_esm_hf_to_te(model_hf)
    Convert->>TEModel: Initialize TE model (empty)
    Convert->>StateMgmt: extract state dicts
    StateMgmt->>StateMgmt: Apply QKV packing transforms
    StateMgmt->>StateMgmt: Apply embedding padding transforms
    StateMgmt->>StateMgmt: Apply FC/bias transforms
    StateMgmt->>TEModel: Load transformed state
    TEModel-->>Convert: Converted model
    Convert-->>Export: TE model ready
    Export->>Export: Save checkpoint + config
    Export->>Export: Smoke test load in bf16
    Export-->>User: Exported checkpoint path

sequenceDiagram
    participant Client as vLLM Client
    participant VLLM as vLLM Engine
    participant TEModel as TE ESM2 Model
    participant Embeddings as Embeddings Layer
    participant Encoder as TE Encoder

    Client->>VLLM: Create LLM instance (pooling mode)
    VLLM->>TEModel: Load checkpoint
    VLLM->>TEModel: Initialize model
    Client->>VLLM: embed(sequences)
    VLLM->>TEModel: forward(input_ids, attention_mask)
    TEModel->>Embeddings: Get token embeddings
    Embeddings-->>TEModel: Embedded tokens
    TEModel->>Encoder: Apply TE encoder layers
    Encoder-->>TEModel: Encoded hidden states
    TEModel->>TEModel: Extract last token + L2 normalize
    TEModel-->>VLLM: Pooled embeddings
    VLLM-->>Client: Normalized embeddings

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

ESM2 NVFP4 and MXFP8 support and documentation update. #1484: Modifies the same ESM-TE model codebase including NVEsmConfig, NVEsmModel, NVEsmEncoder, and weight mapping paths with related namespace and configuration changes.

Suggested reviewers

jstjohn
yzhang123
dorotat-nv
trvachov
broland-hat
polinabinder1

Poem

🐰 From esm to model the namespace springs,
New pools of config and vLLM's wings,
State transforms dance with QKV delight,
Inference runs swift through transformers' height,
Checkpoints convert from HF to TE's light! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'ESM2 changes to work with vLLM' is clear and specific, directly summarizing the main objective of making ESM2 compatible with vLLM.
Description check	✅ Passed	The PR description includes a detailed explanation of the changes, addresses the two primary issues (naming incompatibility and pooling layer defaults), provides usage instructions, and specifies the change type as a new feature.
Docstring Coverage	✅ Passed	Docstring coverage is 92.81% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gkaushik/esm2-vllm

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can customize the tone of the review comments and chat replies.

Configure the tone_instructions setting to customize the tone of the review comments and chat replies. For example, you can set the tone to Act like a strict teacher, Act like a pirate and more.

broland-hat · 2026-02-20T22:19:41Z

@gagank1 : Are you working on a readme?

gagank1 · 2026-02-23T17:06:00Z

/ok to test

copy-pr-bot · 2026-02-23T17:06:03Z

/ok to test

@gagank1, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

gagank1 · 2026-02-23T17:06:15Z

/ok to test

copy-pr-bot · 2026-02-23T17:06:18Z

/ok to test

@gagank1, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

gagank1 · 2026-02-23T17:07:18Z

/ok to test c34c09b

gagank1 · 2026-02-23T17:12:02Z

@coderabbitai review

coderabbitai · 2026-02-23T17:12:16Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (3)

bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py (1)

405-407: Same _tied_weights_keys dict type concern.

See comment on modeling_esm_te.py.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py` around
lines 405 - 407, The class variable _tied_weights_keys uses the modern built-in
generic dict[...] typing which may be incompatible with the rest of the
codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is
imported) or use typing.Mapping if immutability is desired, mirroring the fix
applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str,
str]] with the same key/value entries retained.

bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py (1)

405-407: Same _tied_weights_keys dict type concern as in modeling_esm_te.py.

See comment on the canonical file — HF expects list[str], not dict[str, str].

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`
around lines 405 - 407, The _tied_weights_keys typed as ClassVar[dict[str, str]]
should be changed to ClassVar[list[str]] to match HF expectations (same fix as
in modeling_esm_te.py); replace the dict literal with a list of the relevant
parameter names (e.g. ["lm_head.decoder.weight",
"model.embeddings.word_embeddings.weight"]) and update any usages that assume
dict semantics to use the list order or explicit pairing where needed.

bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py (1)

405-407: Same _tied_weights_keys dict type concern.

See comment on modeling_esm_te.py.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`
around lines 405 - 407, _tied_weights_keys is annotated as a mutable dict
ClassVar which repeats the same typing concern as in modeling_esm_te.py; change
the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]]
from typing) and, to avoid accidental mutation, assign a read-only view (e.g.,
types.MappingProxyType({"lm_head.decoder.weight":
"model.embeddings.word_embeddings.weight"})); update the import list to include
typing.Mapping and types if not present and mirror the same pattern used/fixed
in modeling_esm_te.py.

🧹 Nitpick comments (4)

bionemo-recipes/vllm/launch.sh (1)
50-50: exec $DOCKER_CMD is unquoted — word splitting will break paths with spaces.

If PROJECT_ROOT contains spaces (e.g., /home/user/my projects/...), the -v argument will be incorrectly split into multiple tokens. Use a Bash array to avoid this:
🔧 Proposed fix (array-based approach)

Replace the string-based DOCKER_CMD with an array throughout the script:
-DOCKER_CMD="docker run -itd ..."
+DOCKER_CMD=("docker" "run" "-itd" "--gpus" "all" "--network" "host" "--ipc=host" "-e" "HF_TOKEN" "--rm" "--name" "${CONTAINER}_dev")
 
 if [ "$MOUNT_DIR" = true ]; then
     PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
-    DOCKER_CMD="$DOCKER_CMD -v ${PROJECT_ROOT}:/workspace/bionemo-framework"
+    DOCKER_CMD+=("-v" "${PROJECT_ROOT}:/workspace/bionemo-framework")
 fi
 
-DOCKER_CMD="$DOCKER_CMD $CONTAINER /bin/bash"
+DOCKER_CMD+=("$CONTAINER" "/bin/bash")
 
-exec $DOCKER_CMD
+exec "${DOCKER_CMD[@]}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/launch.sh` at line 50, The exec $DOCKER_CMD call uses an
unquoted string which allows word-splitting and breaks mount paths with spaces
(e.g., PROJECT_ROOT used in -v). Convert the string DOCKER_CMD into a Bash array
(e.g., DOCKER_CMD=(docker run ...)) and update all places that build/append to
DOCKER_CMD so they push elements into the array, then replace exec $DOCKER_CMD
with exec "${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT
value) is preserved; update any helper code that concatenates DOCKER_CMD to use
array operations instead.
bionemo-recipes/vllm/Dockerfile (2)
2-3: Base image is hosted on an internal NVIDIA GitLab registry — not pullable outside NVIDIA.

gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel requires internal network/credentials access. The commented-out nvcr.io/nvidia/vllm:26.01-py3 alternative on line 1 is the publicly accessible equivalent. Once an NGC release with vLLM ≥ 0.14 becomes available, switching to the public image will make this recipe usable by external contributors without additional setup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/Dockerfile` around lines 2 - 3, The Dockerfile's FROM
line currently uses an internal image tag
"gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is not
pullable externally; replace that base image with the public equivalent
"nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the
FROM instruction so external contributors can build without internal
credentials, and retain a short comment noting the vLLM>=0.14 requirement and to
switch back when an official public image with the needed version is available.
30-30: Pin the transformer_engine version for reproducible builds.

pip install --no-build-isolation transformer_engine[pytorch] with no version specifier will install whichever version is latest at build time. TE releases frequently and has had breaking API changes between major versions (e.g., 1.x → 2.x). A silent version bump can break the integration without any change to this file.
🔧 Proposed fix
-RUN pip install --no-build-isolation transformer_engine[pytorch]
+RUN pip install --no-build-isolation "transformer_engine[pytorch]==<verified_version>"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/Dockerfile` at line 30, The Dockerfile currently
installs transformer_engine without a version pin (the RUN pip install
--no-build-isolation transformer_engine[pytorch] line); change that to install a
specific, tested TE release by updating that RUN to include an exact version
specifier (for example: RUN pip install --no-build-isolation
transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and
won’t break on upstream major/minor bumps; optionally add a short comment noting
the chosen compatible version.
bionemo-recipes/vllm/test_esm2_golden_values.py (1)
46-63: sys.path.insert + os.chdir is fragile for test infrastructure.

sys.path.insert(0, ...) at module level (line 46) and os.chdir inside fresh_export make this script sensitive to working directory and import order. This is acceptable for a standalone validation script run manually inside a container, but consider adding a note that this is not designed to run as part of a pytest suite.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 46 - 63, The
test module mutates import paths and working directory (sys.path.insert(0, ...),
os.chdir(...) inside fresh_export) which is fragile for pytest; update the file
to document this by adding a clear module-level comment or docstring near
sys.path.insert and a brief note on fresh_export explaining it intentionally
changes cwd for export_hf_checkpoint and that the script is not intended to be
run under pytest/parallel test runners (referencing sys.path.insert,
ESM2_MODEL_DIR, and fresh_export by name), or alternatively guard execution with
a main-check so pytest won't import/run it implicitly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 1-36: Add a non-root user and switch to it in the Dockerfile so
the container does not run as root; create a user/group (e.g., bionemo), chown
the application directory (/workspace/bionemo) and any cache/venv paths to that
user, and add a final USER bionemo line. Locate the Dockerfile sections around
WORKDIR /workspace/bionemo and COPY . . and insert user creation
(useradd/addgroup or groupadd) and chown before switching context, then add USER
bionemo at the end so subsequent runtime processes run unprivileged.

In `@bionemo-recipes/vllm/launch.sh`:
- Around line 36-40: The script hardcodes "--name vllm_dev" in DOCKER_CMD which
prevents multiple instances and misrepresents the positional $CONTAINER
argument; update the usage comment to clarify that the positional argument is
the image name (or image_name) and change the DOCKER_CMD assignments in the
HEADLESS branch to use a container name derived from $CONTAINER (for example
--name "$CONTAINER" or --name "${CONTAINER}_dev") instead of "vllm_dev" so the
image argument is also used as the container name and avoids name collisions.

In `@bionemo-recipes/vllm/README.md`:
- Around line 21-23: Change the fenced code block that currently uses the wrong
language tag; locate the block containing the shell command "python
test_esm2_golden_values.py" in the README and replace the opening fence language
identifier from ```python to ```bash so the command is treated as a shell
snippet and highlighted correctly.
- Line 17: The README's "or use `launch.sh`" is misleading because `launch.sh
--mount_dir` mounts the repo to `/workspace/bionemo-framework` while the manual
`docker run` mounts to `/workspace/bionemo`, causing `python
test_esm2_golden_values.py` to fail when run from WORKDIR; fix by either (A)
update `launch.sh` to mount the project root into `/workspace/bionemo` instead
of `/workspace/bionemo-framework` (adjust mount target and any downstream path
references in launch.sh), or (B) update README.md to explicitly document that
`launch.sh --mount_dir` mounts at `/workspace/bionemo-framework` and instruct
users to cd into the correct subdirectory (where `test_esm2_golden_values.py`
lives) before running the test; mention the affected files `launch.sh`,
`README.md`, and `test_esm2_golden_values.py` so reviewers can locate the
changes.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 147-210: The script currently only prints comparisons and never
fails; add assertions that enforce the golden-value tolerances using RTOL and
ATOL: for each pair in pairs (refer to the pairs list and variables a, b),
assert np.allclose(a, b, rtol=RTOL, atol=ATOL) (or equivalently assert
(np.abs(a-b) <= ATOL + RTOL * np.abs(b)).all()) and fail the test if not, and
also assert cosine_sim(a, b) exceeds a sensible threshold or that exact is True
when ATOL/RTOL are zero; add per-sequence assertions inside the per-sequence
loop to ensure each sequence max-diff <= ATOL + RTOL * max(|b_i|) so the test
fails on unacceptable drift.

---

Duplicate comments:
In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: _tied_weights_keys is annotated as a mutable dict
ClassVar which repeats the same typing concern as in modeling_esm_te.py; change
the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]]
from typing) and, to avoid accidental mutation, assign a read-only view (e.g.,
types.MappingProxyType({"lm_head.decoder.weight":
"model.embeddings.word_embeddings.weight"})); update the import list to include
typing.Mapping and types if not present and mirror the same pattern used/fixed
in modeling_esm_te.py.

In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The _tied_weights_keys typed as ClassVar[dict[str, str]]
should be changed to ClassVar[list[str]] to match HF expectations (same fix as
in modeling_esm_te.py); replace the dict literal with a list of the relevant
parameter names (e.g. ["lm_head.decoder.weight",
"model.embeddings.word_embeddings.weight"]) and update any usages that assume
dict semantics to use the list order or explicit pairing where needed.

In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The class variable _tied_weights_keys uses the modern
built-in generic dict[...] typing which may be incompatible with the rest of the
codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is
imported) or use typing.Mapping if immutability is desired, mirroring the fix
applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str,
str]] with the same key/value entries retained.

---

Nitpick comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 2-3: The Dockerfile's FROM line currently uses an internal image
tag "gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is
not pullable externally; replace that base image with the public equivalent
"nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the
FROM instruction so external contributors can build without internal
credentials, and retain a short comment noting the vLLM>=0.14 requirement and to
switch back when an official public image with the needed version is available.
- Line 30: The Dockerfile currently installs transformer_engine without a
version pin (the RUN pip install --no-build-isolation
transformer_engine[pytorch] line); change that to install a specific, tested TE
release by updating that RUN to include an exact version specifier (for example:
RUN pip install --no-build-isolation
transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and
won’t break on upstream major/minor bumps; optionally add a short comment noting
the chosen compatible version.

In `@bionemo-recipes/vllm/launch.sh`:
- Line 50: The exec $DOCKER_CMD call uses an unquoted string which allows
word-splitting and breaks mount paths with spaces (e.g., PROJECT_ROOT used in
-v). Convert the string DOCKER_CMD into a Bash array (e.g., DOCKER_CMD=(docker
run ...)) and update all places that build/append to DOCKER_CMD so they push
elements into the array, then replace exec $DOCKER_CMD with exec
"${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT value) is
preserved; update any helper code that concatenates DOCKER_CMD to use array
operations instead.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 46-63: The test module mutates import paths and working directory
(sys.path.insert(0, ...), os.chdir(...) inside fresh_export) which is fragile
for pytest; update the file to document this by adding a clear module-level
comment or docstring near sys.path.insert and a brief note on fresh_export
explaining it intentionally changes cwd for export_hf_checkpoint and that the
script is not intended to be run under pytest/parallel test runners (referencing
sys.path.insert, ESM2_MODEL_DIR, and fresh_export by name), or alternatively
guard execution with a main-check so pytest won't import/run it implicitly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73cd13d and c34c09b.

📒 Files selected for processing (20)

bionemo-recipes/models/esm2/convert.py
bionemo-recipes/models/esm2/export.py
bionemo-recipes/models/esm2/modeling_esm_te.py
bionemo-recipes/models/esm2/tests/test_cp_bshd.py
bionemo-recipes/models/esm2/tests/test_cp_thd.py
bionemo-recipes/models/esm2/tests/test_distributed_fp8.py
bionemo-recipes/models/esm2/tests/test_distributed_strategies.py
bionemo-recipes/models/esm2/tests/test_modeling_esm_te.py
bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py
bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py
bionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.py
bionemo-recipes/recipes/esm2_native_te/train_ddp.py
bionemo-recipes/recipes/esm2_native_te/train_ddp_cp.py
bionemo-recipes/recipes/esm2_native_te/train_fsdp2.py
bionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.py
bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py
bionemo-recipes/vllm/Dockerfile
bionemo-recipes/vllm/README.md
bionemo-recipes/vllm/launch.sh
bionemo-recipes/vllm/test_esm2_golden_values.py

bionemo-recipes/vllm/Dockerfile

bionemo-recipes/vllm/launch.sh

bionemo-recipes/vllm/README.md

bionemo-recipes/vllm/test_esm2_golden_values.py

bionemo-recipes/models/esm2/export.py

bionemo-recipes/models/esm2/modeling_esm_te.py

bionemo-recipes/models/esm2/tests/test_cp_bshd.py

bionemo-recipes/models/esm2/modeling_esm_te.py

bionemo-recipes/vllm/Dockerfile

gagank1 · 2026-03-09T20:44:27Z

/ok to test 60043cd

gagank1 · 2026-03-09T20:48:26Z

/ok to test c2d2351

codecov · 2026-03-09T22:27:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.45%. Comparing base (470e10d) to head (b6a3bcf).
⚠️ Report is 5 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1473   +/-   ##
=======================================
  Coverage   76.45%   76.45%           
=======================================
  Files         102      102           
  Lines        7952     7952           
=======================================
  Hits         6080     6080           
  Misses       1872     1872

bionemo-recipes/recipes/vllm_inference/.ci_test_env.sh

gagank1 · 2026-03-13T00:42:54Z

/ok to test b6a3bcf

Add a vLLM-based inference recipe for ESM2 models using TransformerEngine, including model conversion utilities, export scripts, and tests. Also updates NVEsmModel and related classes to support configurable add_pooling_layer via config and use self.model as the base_model_prefix for HuggingFace compatibility. Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>

gagank1 · 2026-03-14T01:20:44Z

/ok to test f2bfbce

Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>

gagank1 · 2026-03-14T22:54:42Z

/ok to test a9a6f75

.pre-commit-config.yaml

pstjohn · 2026-03-15T02:22:53Z

@coderabbitai full review

coderabbitai · 2026-03-15T02:23:04Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 12

🧹 Nitpick comments (8)

.pre-commit-config.yaml (1)

32-58: Consider DRY: use YAML anchors to avoid duplicating the exclude pattern.

The identical exclude pattern is duplicated between ruff and ruff-format hooks. YAML anchors can reduce this duplication and ensure both hooks stay in sync.

♻️ Proposed refactor using YAML anchors

       - id: ruff
         # 1. Attempt to automatically fix any lint issues.
         args: ["--fix"]
         # Exclude check_copied_files destinations; they are verbatim copies
         # of source files and must not be reformatted independently.
-        exclude: |
+        exclude: &ruff_exclude |
           (?x)^(
             bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv\.py|
             ...
           )$
       - id: ruff-format
-        exclude: |
-          (?x)^(
-            bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv\.py|
-            ...
-          )$
+        exclude: *ruff_exclude

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.pre-commit-config.yaml around lines 32 - 58, The exclude regex is
duplicated for the ruff and ruff-format hooks; create a single YAML anchor
(e.g., &ruff_exclude) for that multiline exclude pattern and replace both
exclude blocks with an alias (e.g., *ruff_exclude) so the pattern is defined
once and referenced by the ruff and ruff-format hook entries in
.pre-commit-config.yaml.

bionemo-recipes/recipes/esm2_native_te/tests/test_train.py (1)

296-309: Consider deduplicating FP8 log assertions into a shared helper.

This block is effectively duplicated from the DDP FP8 stats test; extracting a helper would reduce maintenance cost for future log-path changes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_native_te/tests/test_train.py` around lines 296
- 309, Extract the repeated FP8 log assertions into a shared helper (e.g.,
assert_fp8_logs(fp8_log_dir: Path, rank: int = 0)) and replace the duplicated
block in test_train.py with a call to that helper; the helper should verify
fp8_log_dir exists, the "rank_<rank>" directory, the "nvdlfw_inspect_logs" and
"nvdlfw_inspect_statistics_logs" subdirectories, and that the two files named
like "nvdlfw_inspect_globalrank-<rank>.log" (metadata_log and stats_log) both
exist and are non-empty. Ensure the helper uses the same Path names
(fp8_log_dir, metadata_log, stats_log) so callers are straightforward to update
and put the helper in a shared test utilities module (or conftest) so other
tests (the DDP FP8 stats test) can import and call it.

bionemo-recipes/recipes/vllm_inference/.ci_build.sh (1)

2-2: Add error handling for cd command.

If the esm2 directory doesn't exist, the script will continue in the wrong directory and fail unexpectedly. This was also flagged by static analysis (SC2164).

Proposed fix

 #!/bin/bash -x
-cd esm2
+cd esm2 || exit 1
 PIP_CONSTRAINT= pip install -r requirements.txt
 ./install_vllm.sh

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/vllm_inference/.ci_build.sh` at line 2, The script
.ci_build.sh currently runs the command `cd esm2` without checking its result;
update the script to handle failure of the `cd esm2` step (e.g., test that the
directory exists or check the exit status) and abort with a clear error message
and non-zero exit code if the chdir fails so the build doesn't continue in the
wrong directory.

bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile (1)

1-24: Consider adding a non-root user for security.

Trivy flagged that the image runs as root (DS-0002). While this may be acceptable for development containers, consider adding a non-root user for production deployments to follow security best practices.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile` around lines 1 - 24,
Add a non-root user and switch to it before the final WORKDIR to avoid running
the image as root: create a dedicated user/group in the Dockerfile (e.g., via
RUN useradd/groupadd or adduser), chown the application directory
(/workspace/bionemo) and any cache dirs used during build to that user, and set
USER to that account before the final WORKDIR /workspace/bionemo. Ensure the
conditional INSTALL_VLLM block still runs as root (keep it before
creating/switching user) and only switch to the non-root user afterward so
runtime processes no longer run as root.

bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh (1)

10-14: Add error handling for cd commands.

If /workspace doesn't exist or cd vllm fails after cloning, the script will continue executing in the wrong directory, potentially causing unexpected behavior.

Proposed fix

-cd /workspace
+cd /workspace || { echo "ERROR: /workspace does not exist"; exit 1; }
 if [ ! -d vllm ]; then
     git clone --branch v0.15.1 --depth 1 https://github.com/vllm-project/vllm.git
 fi
-cd vllm
+cd vllm || { echo "ERROR: Failed to enter vllm directory"; exit 1; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh` around lines 10
- 14, The script currently runs `cd /workspace` and `cd vllm` without checking
for errors; update the install_vllm.sh flow to fail fast on directory errors by
verifying `cd /workspace` succeeds (create /workspace with `mkdir -p` or exit
with an error message if it cannot be accessed) and after cloning verify `cd
vllm` returned successfully (if not, print a clear error and exit non‑zero).
Apply these checks around the existing `cd /workspace` and `cd vllm` commands
(or enable `set -euo pipefail` at the top) so the script doesn’t continue in the
wrong directory.

bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py (3)

174-184: Mutating self.config may cause unintended side effects.

Directly assigning to self.config.layer_precision modifies the config object, which could affect other components sharing the same config instance. Consider storing this as an instance attribute instead.

Suggested fix

         self.config = config
         self._fp8_recipe: transformer_engine.common.recipe.Recipe | None = fp8_recipe
         self._fp4_recipe: transformer_engine.common.recipe.Recipe | None = fp4_recipe
+        self._layer_precision = config.layer_precision
 
-        if self.config.layer_precision is None:
+        if self._layer_precision is None:
             if fp8_recipe is not None and fp4_recipe is not None:
                 raise RuntimeError("Both FP8 and FP4 recipes provided, but no layer precision provided.")
             if fp8_recipe is not None:
                 warnings.warn("No layer precision provided, using FP8 recipe for all layers.", UserWarning)
-                self.config.layer_precision = ["fp8"] * self.config.num_hidden_layers
+                self._layer_precision = ["fp8"] * self.config.num_hidden_layers
             elif fp4_recipe is not None:
                 raise RuntimeError(
                     "FP4 recipe provided but no layer_precision configured. "
                     "Set layer_precision explicitly when using FP4."
                 )
 
-        if self.config.layer_precision is not None and "fp4" in self.config.layer_precision and fp4_recipe is None:
+        if self._layer_precision is not None and "fp4" in self._layer_precision and fp4_recipe is None:
             raise RuntimeError("layer_precision contains 'fp4' entries but no fp4_recipe was provided.")

Then update get_autocast_context to use self._layer_precision instead of self.config.layer_precision.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` around lines
174 - 184, Don't mutate the shared config by assigning to
self.config.layer_precision; instead set an instance attribute (e.g.,
self._layer_precision) based on the existing logic (use ["fp8"] *
self.config.num_hidden_layers when fp8_recipe is provided, raise for both
recipes or for fp4 without explicit layer_precision), and then update
get_autocast_context to read from self._layer_precision instead of
self.config.layer_precision so other components sharing the config are not
affected.

696-696: Extract duplicated magic number to a class constant.

mask_ratio_train = 0.15 * 0.8 is defined identically in both _apply_token_dropout_bshd and _apply_token_dropout_thd. Consider defining this as a class constant to follow DRY principles.

Suggested fix

 class NVEsmEmbeddings(nn.Module):
     """Modified version of EsmEmbeddings to support THD inputs."""
 
+    # Mask ratio used during ESM model training (15% masking * 80% replacement)
+    _MASK_RATIO_TRAIN = 0.15 * 0.8
+
     def __init__(self, config):

Then replace both occurrences:

-        mask_ratio_train = 0.15 * 0.8  # Hardcoded as the ratio used in all ESM model training runs
+        mask_ratio_train = self._MASK_RATIO_TRAIN

Also applies to: 717-717

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` at line 696,
The duplicated magic number (0.15 * 0.8) used for mask_ratio_train in
_apply_token_dropout_bshd and _apply_token_dropout_thd should be extracted to a
class-level constant (e.g., MASK_RATIO_TRAIN) on the model class (the class that
defines _apply_token_dropout_bshd/_apply_token_dropout_thd) and both functions
should reference that constant instead of recalculating the value inline; add
the constant near other class constants, replace the inline expressions in
_apply_token_dropout_bshd and _apply_token_dropout_thd with the new class
constant, and ensure any imports/uses within the class access it as
self.MASK_RATIO_TRAIN or ClassName.MASK_RATIO_TRAIN as appropriate.

80-80: Confusing default value for padded_vocab_size.

The default of 64 contradicts the docstring which says "If not provided, defaults to vocab_size." This will cause unexpected behavior for models with larger vocabularies. Consider using None as the default to match the documented behavior.

Suggested fix

-        padded_vocab_size: Optional[int] = 64,
+        padded_vocab_size: Optional[int] = None,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` at line 80,
The parameter padded_vocab_size currently defaults to 64; change its signature
to padded_vocab_size: Optional[int] = None (and update any type hints) and
ensure the constructor/initialization logic in the class/function (where
padded_vocab_size is used) treats None by setting padded_vocab_size = vocab_size
so behavior matches the docstring; also update any downstream uses that assume
an int to handle the None-to-vocab_size assignment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.pre-commit-config.yaml:
- Around line 44-45: The regex used to exclude files under
bionemo-recipes/models/(llama3|mixtral)/tests/common/ only matches the directory
itself, not files inside it; update the exclusion patterns in
.pre-commit-config.yaml for the ruff and ruff-format hooks (the entries
containing the string "bionemo-recipes/models/(llama3|mixtral)/tests/common/")
to append ".*" so they match files within that directory (e.g., change the
pattern to end with "/tests/common/.*").

In `@bionemo-recipes/models/esm2/modeling_esm_te.py`:
- Around line 414-416: The current fallback for add_pooling_layer in the model
init uses getattr(config, "add_pooling_layer", True) which conflicts with
NVEsmConfig's declared default False and the docstring; change the getattr
default to False so the fallback is getattr(config, "add_pooling_layer", False)
(keep the outer check if add_pooling_layer is None), and update the
function/class docstring where it states the default (currently "defaults to
True") to reflect the False default; reference symbols: add_pooling_layer,
config, NVEsmConfig, and the initializer in modeling_esm_te.py.

In `@bionemo-recipes/recipes/vllm_inference/.ci_test_env.sh`:
- Around line 1-2: The script lacks a shebang and blindly runs the cd command;
add a proper shebang (e.g., #!/usr/bin/env bash) at the top and guard the cd by
checking that the target directory exists and is a directory before changing
into it (for example test -d "esm2" || exit 1, or use if [ -d "esm2" ]; then cd
"esm2"; else echo "esm2 not found" >&2; exit 1; fi) so the script fails fast
instead of continuing in the wrong directory; update the file around the
existing cd esm2 invocation accordingly.

In `@bionemo-recipes/recipes/vllm_inference/esm2/convert.py`:
- Around line 110-129: The call to output_model.post_init() is redundant because
state.apply_transforms(...) already performs weight tying (see
state.apply_transforms) and related init work; remove the
output_model.post_init() call after apply_transforms, or if you only want to
ensure tied weights explicitly, replace it with a direct call to
output_model.tie_weights() instead of post_init; update the code around the
EsmForMaskedLM construction and the output_model variable accordingly.

In `@bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile`:
- Around line 16-21: The Dockerfile runs the vLLM build steps (git clone vllm,
python use_existing_torch.py, uv pip install -r requirements/build.txt --system,
uv pip install --no-build-isolation -e . --system, pip install --upgrade
"transformers[torch]") but does not pass TORCH_CUDA_ARCH_LIST and MAX_JOBS like
install_vllm.sh; fix by exporting or prefixing TORCH_CUDA_ARCH_LIST and MAX_JOBS
with appropriate values (matching install_vllm.sh) for the uv pip install -r
requirements/build.txt --system and uv pip install --no-build-isolation -e .
--system commands (and the final pip install if needed) so the build targets the
correct CUDA arch and uses the desired parallelism.

In `@bionemo-recipes/recipes/vllm_inference/esm2/export.py`:
- Around line 70-121: The export_hf_checkpoint function loads local assets using
bare relative paths which break when the caller's CWD is different; change all
asset references in export_hf_checkpoint
(AutoTokenizer.from_pretrained("esm_fast_tokenizer"),
open("model_readme.template"), shutil.copy("modeling_esm_te.py", ...),
shutil.copy("LICENSE", ...)) to resolve them relative to the recipe file by
computing root = Path(__file__).resolve().parent and then using root /
"esm_fast_tokenizer", root / "model_readme.template", root /
"modeling_esm_te.py", and root / "LICENSE" so the tokenizer, template, and
copied files are always loaded from the recipe package rather than the current
working directory or the Hub.

In `@bionemo-recipes/recipes/vllm_inference/esm2/model_readme.template`:
- Around line 91-94: The "Runtime Engine(s):" section currently lists only
"Hugging Face Transformers" but this is a vllm_inference template and should
explicitly list vLLM (and include Transformers only if both are supported);
update the "Runtime Engine(s):" block in the vllm_inference
model_readme.template to include "vLLM" and, if the recipe supports both, list
"vLLM, Hugging Face Transformers" (apply the same change at the second
occurrence referenced in the comment).
- Around line 18-20: The description for "ESM-2" in model_readme.template
overstates its capability as directly performing protein structure prediction;
update the wording to say ESM-2 is a protein language/embedding model trained
with a masked LM objective that produces sequence embeddings useful for
downstream tasks (including structure-related analyses) rather than claiming it
predicts 3D structures directly; change the paragraph referencing "ESM-2" and
the similar sentence at the other occurrence (line ~44) to this softened
phrasing so the README correctly presents the model as an embedding/language
model for downstream use.

In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py`:
- Around line 140-143: The assertion checking padded_vocab_size vs vocab_size
should be converted to an explicit runtime check: replace the assert in the
block that references self.padded_vocab_size and self.vocab_size with an
if-condition that raises a ValueError when padded_vocab_size is less than
vocab_size, using the same descriptive message (e.g., "padded_vocab_size (...)
must be greater than or equal to vocab_size (...)"); update the check in the
method or initializer where these attributes are set (the code block containing
self.padded_vocab_size and self.vocab_size) so the validation cannot be bypassed
by Python optimizations.

In `@bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py`:
- Around line 1-20: Add a top-level Google-style module docstring to conftest.py
describing the module's purpose (e.g., pytest configuration and test path setup)
to satisfy pydocstyle; place a triple-quoted docstring at the very top of the
file that briefly summarizes what conftest.py does (adjusts sys.path and
provides test fixtures/config) and include any relevant notes or authorship per
project conventions.
- Line 20: Replace the sys.path.append call in conftest.py with
sys.path.insert(0, Path(__file__).parent.parent.as_posix()) so the local esm2
package is prioritized during tests; this ensures the import in test_vllm.py
(from export import export_hf_checkpoint) resolves to the local source rather
than an installed package and you can locate the call by searching for
sys.path.append and Path(__file__).parent.parent.as_posix() in conftest.py.

In `@bionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.py`:
- Around line 62-131: Add a module-level skip marker so all tests are skipped
when CUDA isn't available: detect CUDA availability (torch.cuda.is_available())
and set pytestmark = pytest.mark.skipif(not torch.cuda.is_available(),
reason="requires CUDA") at the top of the module; this prevents session-scoped
fixtures hf_exported_embeddings and hf_reference_embeddings from calling
_hf_embed (which .to("cuda")s models/tensors) on CPU-only runners and avoids
failures during fixture setup.

---

Nitpick comments:
In @.pre-commit-config.yaml:
- Around line 32-58: The exclude regex is duplicated for the ruff and
ruff-format hooks; create a single YAML anchor (e.g., &ruff_exclude) for that
multiline exclude pattern and replace both exclude blocks with an alias (e.g.,
*ruff_exclude) so the pattern is defined once and referenced by the ruff and
ruff-format hook entries in .pre-commit-config.yaml.

In `@bionemo-recipes/recipes/esm2_native_te/tests/test_train.py`:
- Around line 296-309: Extract the repeated FP8 log assertions into a shared
helper (e.g., assert_fp8_logs(fp8_log_dir: Path, rank: int = 0)) and replace the
duplicated block in test_train.py with a call to that helper; the helper should
verify fp8_log_dir exists, the "rank_<rank>" directory, the
"nvdlfw_inspect_logs" and "nvdlfw_inspect_statistics_logs" subdirectories, and
that the two files named like "nvdlfw_inspect_globalrank-<rank>.log"
(metadata_log and stats_log) both exist and are non-empty. Ensure the helper
uses the same Path names (fp8_log_dir, metadata_log, stats_log) so callers are
straightforward to update and put the helper in a shared test utilities module
(or conftest) so other tests (the DDP FP8 stats test) can import and call it.

In `@bionemo-recipes/recipes/vllm_inference/.ci_build.sh`:
- Line 2: The script .ci_build.sh currently runs the command `cd esm2` without
checking its result; update the script to handle failure of the `cd esm2` step
(e.g., test that the directory exists or check the exit status) and abort with a
clear error message and non-zero exit code if the chdir fails so the build
doesn't continue in the wrong directory.

In `@bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile`:
- Around line 1-24: Add a non-root user and switch to it before the final
WORKDIR to avoid running the image as root: create a dedicated user/group in the
Dockerfile (e.g., via RUN useradd/groupadd or adduser), chown the application
directory (/workspace/bionemo) and any cache dirs used during build to that
user, and set USER to that account before the final WORKDIR /workspace/bionemo.
Ensure the conditional INSTALL_VLLM block still runs as root (keep it before
creating/switching user) and only switch to the non-root user afterward so
runtime processes no longer run as root.

In `@bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh`:
- Around line 10-14: The script currently runs `cd /workspace` and `cd vllm`
without checking for errors; update the install_vllm.sh flow to fail fast on
directory errors by verifying `cd /workspace` succeeds (create /workspace with
`mkdir -p` or exit with an error message if it cannot be accessed) and after
cloning verify `cd vllm` returned successfully (if not, print a clear error and
exit non‑zero). Apply these checks around the existing `cd /workspace` and `cd
vllm` commands (or enable `set -euo pipefail` at the top) so the script doesn’t
continue in the wrong directory.

In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py`:
- Around line 174-184: Don't mutate the shared config by assigning to
self.config.layer_precision; instead set an instance attribute (e.g.,
self._layer_precision) based on the existing logic (use ["fp8"] *
self.config.num_hidden_layers when fp8_recipe is provided, raise for both
recipes or for fp4 without explicit layer_precision), and then update
get_autocast_context to read from self._layer_precision instead of
self.config.layer_precision so other components sharing the config are not
affected.
- Line 696: The duplicated magic number (0.15 * 0.8) used for mask_ratio_train
in _apply_token_dropout_bshd and _apply_token_dropout_thd should be extracted to
a class-level constant (e.g., MASK_RATIO_TRAIN) on the model class (the class
that defines _apply_token_dropout_bshd/_apply_token_dropout_thd) and both
functions should reference that constant instead of recalculating the value
inline; add the constant near other class constants, replace the inline
expressions in _apply_token_dropout_bshd and _apply_token_dropout_thd with the
new class constant, and ensure any imports/uses within the class access it as
self.MASK_RATIO_TRAIN or ClassName.MASK_RATIO_TRAIN as appropriate.
- Line 80: The parameter padded_vocab_size currently defaults to 64; change its
signature to padded_vocab_size: Optional[int] = None (and update any type hints)
and ensure the constructor/initialization logic in the class/function (where
padded_vocab_size is used) treats None by setting padded_vocab_size = vocab_size
so behavior matches the docstring; also update any downstream uses that assume
an int to handle the None-to-vocab_size assignment.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e2413d6e-00ae-495d-aa4b-c6737be37029

📥 Commits

Reviewing files that changed from the base of the PR and between 0d58b73 and a9a6f75.

📒 Files selected for processing (41)

.pre-commit-config.yaml
.secrets.baseline
bionemo-recipes/models/esm2/README.md
bionemo-recipes/models/esm2/convert.py
bionemo-recipes/models/esm2/export.py
bionemo-recipes/models/esm2/modeling_esm_te.py
bionemo-recipes/models/esm2/tests/test_cp_bshd.py
bionemo-recipes/models/esm2/tests/test_cp_thd.py
bionemo-recipes/models/esm2/tests/test_distributed_fp8.py
bionemo-recipes/models/esm2/tests/test_distributed_strategies.py
bionemo-recipes/models/esm2/tests/test_modeling_esm_te.py
bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py
bionemo-recipes/recipes/esm2_native_te/fp4_debugging_stats.yaml
bionemo-recipes/recipes/esm2_native_te/modeling_esm_te.py
bionemo-recipes/recipes/esm2_native_te/quantization.py
bionemo-recipes/recipes/esm2_native_te/tests/test_quantization.py
bionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.py
bionemo-recipes/recipes/esm2_native_te/tests/test_train.py
bionemo-recipes/recipes/esm2_native_te/train_ddp.py
bionemo-recipes/recipes/esm2_native_te/train_ddp_cp.py
bionemo-recipes/recipes/esm2_native_te/train_fsdp2.py
bionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.py
bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py
bionemo-recipes/recipes/vllm_inference/.ci_build.sh
bionemo-recipes/recipes/vllm_inference/.ci_test_env.sh
bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile
bionemo-recipes/recipes/vllm_inference/esm2/LICENSE
bionemo-recipes/recipes/vllm_inference/esm2/README.md
bionemo-recipes/recipes/vllm_inference/esm2/convert.py
bionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/special_tokens_map.json
bionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/tokenizer.json
bionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/tokenizer_config.json
bionemo-recipes/recipes/vllm_inference/esm2/export.py
bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh
bionemo-recipes/recipes/vllm_inference/esm2/model_readme.template
bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py
bionemo-recipes/recipes/vllm_inference/esm2/requirements.txt
bionemo-recipes/recipes/vllm_inference/esm2/state.py
bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py
bionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.py
ci/scripts/check_copied_files.py

.pre-commit-config.yaml

bionemo-recipes/models/esm2/modeling_esm_te.py

bionemo-recipes/recipes/vllm_inference/.ci_test_env.sh

bionemo-recipes/recipes/vllm_inference/esm2/convert.py

bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile

bionemo-recipes/recipes/vllm_inference/esm2/model_readme.template

bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py

bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py

bionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.py

Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>

gagank1 · 2026-03-16T15:17:59Z

/ok to test dcaa6c7

Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>

gagank1 · 2026-03-16T15:58:26Z

/ok to test 13046cb

bionemo-recipes/recipes/esm2_native_te/tests/test_train.py

bionemo-recipes/recipes/vllm_inference/.ci_test_env.sh

bionemo-recipes/recipes/vllm_inference/esm2/LICENSE

Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com> # Conflicts: # .secrets.baseline

gagank1 · 2026-03-20T00:04:08Z

/ok to test 7209d43

jomitchellnv

LGTM

gagank1 self-assigned this Feb 18, 2026

gagank1 marked this pull request as ready for review February 23, 2026 17:06

gagank1 requested review from cspades, dorotat-nv, jomitchellnv, jstjohn, jwilber, pstjohn, savitha-eng and trvachov as code owners February 23, 2026 17:06

gagank1 requested a review from broland-hat February 23, 2026 17:17

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes