Skip to content

[STG-1232] Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode, remove tsup, generate unified re-used ESM and CJS builds with sourcemaps and types, switch CI to new Evals CLI#1632

Merged
pirate merged 120 commits intomainfrom
esm-build
Feb 18, 2026

Conversation

@pirate
Copy link
Copy Markdown
Member

@pirate pirate commented Jan 28, 2026

why

  • unit tests, e2e integration tests, and evals were taking 10~15 minutes per pr
  • finding the specific failing eval or test was tedious and required reading tons of log output
  • we weren't leveraging pnpm or turbo caches or github action's native chromium install
  • tests were flaky because CDP would randomly fail to connect when github actions runner was overloaded
  • e2e bb tests were not actually running against bb, so it was previously just double-running e2e:local and reporting it as bb
  • tsup has been deprecated for a while and should be replaced by esbuild

what changed

  • parallelized all CI tests and broke out evals and tests into individual matrix jobs
  • created unified node+pnpm+turbo setup action that all other actions call
  • verified vanilla chromium is able to launch inside all the e2e local jobs before running any of our own code to catch resource contention flakyness
  • add proper esm and csj builds with sourcemaps and types for aligned coverage
  • remove tsup dependency
  • switch to new evals cli
  • add coverage reporting
  • fix some flaky tests needed to get everything passing reliably
Screenshot 2026-01-26 at 1 04 04 PM

Summary by cubic

Speeds up CI and makes tests stable by running Vitest/Playwright/Evals on the built ESM dist with cached artifacts, verified runner Chromium, weighted Browserbase routing, and merged coverage. Publishes a unified ESM+CJS SDK with sourcemaps/types and SEA binaries from ESM; fixes Browserbase e2e, switches to the new Evals CLI, and adds flaky‑test reporting, addressing Linear STG‑1232.

  • New Features

    • CI: split pnpm/Turbo caches; composite actions (setup/caches, Chromium launch check, region selection, CTRF/V8 uploads); reuse artifacts; use Actions Chromium via CHROME_PATH with --no-sandbox; unique CTRF reports; V8 coverage normalization/merge with cancellation safety; push triggers enabled.
    • Tests: run from dist/esm; env reporter; true Browserbase runs with weighted/randomized regions; hard timeouts for LLM and session create/CDP connect; faster multi‑click dispatch; higher timeouts and targeted skips; cleanup helpers; fixes for init‑script, wait‑for‑selector, and screenshot reliability.
    • Build/SDK/SEA: export map (ESM import + CJS require) with sourcemaps/types; SEA binaries built from ESM with sourcemaps and plain logs in CI; removed tsup and dotenv; AISdk/CustomOpenAI moved to lib and exported; getAISDKLanguageModel added; replay metrics now include method/parameters/result/timestamps/cost.
  • Migration

    • Set STAGEHAND_BROWSER_TARGET=local|browserbase, STAGEHAND_ENV (replaces TEST_ENV), CHROME_PATH; tune LLM_MAX_MS, BROWSERBASE_SESSION_CREATE_MAX_MS, BROWSERBASE_CDP_CONNECT_MAX_MS; optionally set BROWSERBASE_REGION_DISTRIBUTION.
    • Remove dotenv; set env vars directly and import @browserbasehq/stagehand via the export map (avoid dist paths).

Written for commit 5c76fec. Summary will update on new commits. Review in cubic


Summary by cubic

Ships an ESM‑first SDK with a CJS entry, unified sourcemaps, and types. CI runs Vitest/Playwright and the new Evals CLI against the built ESM dist using the runner’s Chromium (launch‑verified), with region‑weighted Browserbase routing, normalized V8 coverage, unique CTRF artifacts, and more stable tests (timeouts, Browserbase skips, understudy click/API fixes).

Speed up CI by running Vitest and Playwright tests in parallel matrices, add V8/CTRF coverage, and stabilize Browserbase/LLM flows with hard timeouts and better cleanup. Server integration tests now use the runner’s Chromium for reliability and are folded into the main CI.

  • Refactors

    • Parallelized core unit tests via discover-core-tests/core-unit-tests (Vitest matrix, fail-fast).
    • Parallelized E2E via discover-e2e-tests (Playwright local + Browserbase) gated on core changes; weighted Browserbase region selection; verify Chromium/CDP before tests; local uses runner Chromium via CHROME_PATH and --no-sandbox; Playwright can emit JUnit when CTRF_JUNIT_PATH is set.
    • Unified Lint & Build with Turbo caching and shared artifacts; tests/evals download and reuse; prepare script skips build in CI; SEA builds support sourcemaps.
    • Enforced configurable timeouts and parallelism via env (LLM_MAX_MS, BROWSERBASE_CDP_CONNECT_MAX_MS, BROWSERBASE_SESSION_CREATE_MAX_MS, LOCAL_SESSION_LIMIT_PER_E2E_TEST, BROWSERBASE_SESSION_LIMIT_PER_E2E_TEST, EVAL_*); added LLM and Browserbase hard timeouts; improved click event dispatch to reduce remote flakiness.
    • Explicitly close Browserbase sessions in tests and evals to free capacity.
    • Added composite actions: setup-node-pnpm-turbo, verify-chromium-launch, select-browserbase-region, publish-ctrf-report (JUnit→CTRF and evals), upload-v8-coverage.
    • Renamed TEST_ENV to STAGEHAND_ENV across E2E; added an env reporter for CI logs; removed the standalone stagehand-server tests workflow and ran server tests in the unified CI with Node’s test runner and GitHub Actions Chromium.
  • New Features

    • Build/Exports: ESM in dist/esm and CJS in dist/cjs via export map; types from ESM; SEA binaries built from dist with sourcemaps; server SEA job targets CJS; new build/test/coverage scripts; Node loader maps package imports to dist/esm; ESM‑safe pathing (fileURLToPath); ESLint/Prettier tweaks; dotenv removed.
    • CI: Composite actions (setup-node-pnpm-turbo, verify-chromium-launch, select-browserbase-region, upload-ctrf-report, upload-v8-coverage); uses runner Chromium via CHROME_PATH with --no-sandbox and launch checks; weighted Browserbase region selection; unique CTRF artifacts; V8 coverage normalized/merged with debug logs and graceful cancellation; Playwright/Vitest discover tests in dist/esm; push/PR triggers enabled.
    • Tests & API: Vitest/Playwright run on dist/esm; new env reporter and closeV3 helper; tighter multi‑click CDP dispatch; higher timeouts and targeted skips (e.g., console on Browserbase) reduce flakiness; local launches honor CHROME_PATH; replay metrics now include parameters/result/timestamps/cost with parser fixes; AISdk/CustomOpenAI clients moved to lib/v3/external_clients and exported; env‑driven hard timeouts for LLM ops and Browserbase session create/CDP connect; understudy API usage fixes.
  • Migration

    • Set STAGEHAND_BROWSER_TARGET=local|browserbase and CHROME_PATH; tune LLM_MAX_MS, BROWSERBASE_SESSION_CREATE_MAX_MS, and BROWSERBASE_CDP_CONNECT_MAX_MS; optionally set BROWSERBASE_REGION_DISTRIBUTION.
    • dotenv support is removed; set env vars explicitly in dev and CI.
    • Import @browserbasehq/stagehand via the export map; do not reference dist paths or prior tsup outputs.

Written for commit 8119145. Summary will update on new commits. Review in cubic

Comment thread packages/core/scripts/build-cjs.ts
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/evals/scripts/build-esm.ts">

<violation number="1" location="packages/evals/scripts/build-esm.ts:35">
P2: ~80 lines of specifier-rewriting logic (`resolveRuntimeSpecifier`, `rewriteFileRuntimeSpecifiers`, `rewriteDistRuntimeSpecifiers`, and associated constants) are exact duplicates of `packages/core/scripts/build-esm.ts`. Consider extracting them into a shared build utility (e.g., alongside `test-utils` which is already shared between packages) to avoid divergence and reduce maintenance burden.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

}
};

const resolveRuntimeSpecifier = (
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: ~80 lines of specifier-rewriting logic (resolveRuntimeSpecifier, rewriteFileRuntimeSpecifiers, rewriteDistRuntimeSpecifiers, and associated constants) are exact duplicates of packages/core/scripts/build-esm.ts. Consider extracting them into a shared build utility (e.g., alongside test-utils which is already shared between packages) to avoid divergence and reduce maintenance burden.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/evals/scripts/build-esm.ts, line 35:

<comment>~80 lines of specifier-rewriting logic (`resolveRuntimeSpecifier`, `rewriteFileRuntimeSpecifiers`, `rewriteDistRuntimeSpecifiers`, and associated constants) are exact duplicates of `packages/core/scripts/build-esm.ts`. Consider extracting them into a shared build utility (e.g., alongside `test-utils` which is already shared between packages) to avoid divergence and reduce maintenance burden.</comment>

<file context>
@@ -22,6 +32,83 @@ const run = (args: string[]) => {
   }
 };
 
+const resolveRuntimeSpecifier = (
+  importerPath: string,
+  specifier: string,
</file context>
Fix with Cubic

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we're getting rid of it all in a fast-follow pr that manually adds .js to all our imports

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've saved this as a new learning to improve future reviews.

Comment thread packages/core/package.json
@pirate pirate requested a review from monadoid February 18, 2026 01:41
@pirate pirate merged commit afbd08b into main Feb 18, 2026
148 of 149 checks passed
miguelg719 pushed a commit that referenced this pull request Feb 24, 2026
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/stagehand@3.1.0

### Minor Changes

- [#1681](#1681)
[`e3db9aa`](e3db9aa)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add cookie management
APIs: `context.addCookies()`, `context.clearCookies()`, &
`context.cookies()`

- [#1672](#1672)
[`b65756e`](b65756e)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add boolean
keepAlive parameter to allow for configuring whether the browser should
be closed when stagehand.close() is called.

- [#1708](#1708)
[`176d420`](176d420)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add
context.setExtraHTTPHeaders()

- [#1611](#1611)
[`8a3c066`](8a3c066)
Thanks [@monadoid](https://github.com/monadoid)! - Using `mode` enum
instead of old `cua` boolean in openapi spec

### Patch Changes

- [#1683](#1683)
[`7584f3e`](7584f3e)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix:
include shadow DOM in .count() & .nth() & support xpath predicates

- [#1644](#1644)
[`1e1c9c1`](1e1c9c1)
Thanks [@monadoid](https://github.com/monadoid)! - Fix unhandled CDP
detaches by returning the original sendCDP promise

- [#1729](#1729)
[`6bef890`](6bef890)
Thanks [@shrey150](https://github.com/shrey150)! - fix: support Claude
4.6 (Opus and Sonnet) in CUA mode by using the correct
`computer_20251124` tool version and `computer-use-2025-11-24` beta
header

- [#1647](#1647)
[`ffd4b33`](ffd4b33)
Thanks [@tkattkat](https://github.com/tkattkat)! - Fix [Agent] - Address
bug causing issues with continuing a conversation from past messages in
dom mode

- [#1614](#1614)
[`677bff5`](677bff5)
Thanks [@miguelg719](https://github.com/miguelg719)! - Enforce
<number>-<number> regex validation on act/observe for elementId

- [#1580](#1580)
[`65ff464`](65ff464)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add unified variables
support across act and agent with a single VariableValue type

- [#1666](#1666)
[`101bcf2`](101bcf2)
Thanks [@Kylejeong2](https://github.com/Kylejeong2)! - add support for
codex models

- [#1728](#1728)
[`0a94301`](0a94301)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - handle
potential race condition on `.close()` when using the Stagehand API

- [#1664](#1664)
[`b27c04d`](b27c04d)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fixes issue
with context.addInitScript() where scripts were not being applied to out
of process iframes (OOPIFs), and popup pages with same process iframes
(SPIFs)

- [#1632](#1632)
[`afbd08b`](afbd08b)
Thanks [@pirate](https://github.com/pirate)! - Remove automatic `.env`
loading via `dotenv`.

If your app relies on `.env` files, install `dotenv` and load it
explicitly in your code:

    ```ts
    import dotenv from "dotenv";
    dotenv.config({ path: ".env" });
    ```

- [#1624](#1624)
[`0e8d569`](0e8d569)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix issue
where screenshot masks were not being applied to dialog elements

- [#1596](#1596)
[`ff0f979`](ff0f979)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update usage/metrics
handling in agent

- [#1631](#1631)
[`2d89d2b`](2d89d2b)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add right and
middle click support to act and observe

- [#1697](#1697)
[`aac9a19`](aac9a19)
Thanks [@shrey150](https://github.com/shrey150)! - fix: support
`<frame>` elements in XPath frame boundary detection so `act()` works on
legacy `<frameset>` pages

- [#1692](#1692)
[`06de50f`](06de50f)
Thanks [@shrey150](https://github.com/shrey150)! - fix: skip piercer
injection for chrome-extension:// and other non-HTML targets

- [#1613](#1613)
[`aa4d981`](aa4d981)
Thanks [@miguelg719](https://github.com/miguelg719)! -
SupportedUnderstudyAction Enum validation for 'method' on act/observe
inference

- [#1652](#1652)
[`18b1e3b`](18b1e3b)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for
gemini 3 flash and pro in hybrid/cua agent

- [#1706](#1706)
[`957d82b`](957d82b)
Thanks [@chrisreadsf](https://github.com/chrisreadsf)! - Add GLM to
prompt-based JSON fallback for models without native structured output
support

- [#1633](#1633)
[`22e371a`](22e371a)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add warning when
incorrect models are used with agents hybrid mode

- [#1673](#1673)
[`d29b91f`](d29b91f)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add multi-region
support for Stagehand API with region-specific endpoints

- [#1695](#1695)
[`7b4f817`](7b4f817)
Thanks [@tkattkat](https://github.com/tkattkat)! - Fix: zod bug when
pinning zod to v3 and using structured output in agent

- [#1609](#1609)
[`3f9ca4d`](3f9ca4d)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add
SupportedUnderstudyActions to observe system prompt

- [#1581](#1581)
[`49ead1e`](49ead1e)
Thanks [@sameelarif](https://github.com/sameelarif)! - **Server-side
caching is now available.**

When running `env: "BROWSERBASE"`, Stagehand automatically caches
`act()`, `extract()`, and `observe()` results server-side — repeated
calls with the same inputs return instantly without consuming LLM
tokens.

Caching is enabled by default and can be disabled via `serverCache:
false` on the Stagehand instance or per individual call. Check out the
[browserbase blog](https://www.browserbase.com/blog/stagehand-caching)
for more details.

- [#1642](#1642)
[`3673369`](3673369)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix issue
where scripts added via context.addInitScripts() were not being injected
into new pages that were opened via popups (eg, clicking a link that
opens a new page) and/or calling context.newPage(url)

- [#1735](#1735)
[`c465e87`](c465e87)
Thanks [@monadoid](https://github.com/monadoid)! - Supports request
header authentication with connectToMCPServer

- [#1705](#1705)
[`ae533e4`](ae533e4)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - include
error cause in UnderstudyCommandException

- [#1636](#1636)
[`ea33052`](ea33052)
Thanks [@miguelg719](https://github.com/miguelg719)! - Include
executionModel on the AgentConfigSchema

- [#1679](#1679)
[`5764ede`](5764ede)
Thanks [@shrey150](https://github.com/shrey150)! - fix issue where
locator.count() was not working with xpaths that have attribute
predicates

- [#1646](#1646)
[`f09b184`](f09b184)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add user-agent to
CDP connections

- [#1637](#1637)
[`a7d29de`](a7d29de)
Thanks [@miguelg719](https://github.com/miguelg719)! - Improve error and
warning message for legacy model format

- [#1685](#1685)
[`d334399`](d334399)
Thanks [@tkattkat](https://github.com/tkattkat)! - Bump ai sdk & google
provider version

- [#1662](#1662)
[`44416da`](44416da)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix issue
where locator.fill() was not working on elements that require direct
value setting

- [#1612](#1612)
[`bdd8b4e`](bdd8b4e)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix issue
where screenshot mask was only being applied to the first element that
the locator resolved to. masks now apply to all matching elements.

## @browserbasehq/stagehand-server@3.6.0

### Minor Changes

- [#1611](#1611)
[`8a3c066`](8a3c066)
Thanks [@monadoid](https://github.com/monadoid)! - Using `mode` enum
instead of old `cua` boolean in openapi spec

### Patch Changes

- [#1604](#1604)
[`4753078`](4753078)
Thanks [@miguelg719](https://github.com/miguelg719)! - Enable bedrock

- [#1636](#1636)
[`ea33052`](ea33052)
Thanks [@miguelg719](https://github.com/miguelg719)! - Include
executionModel on the AgentConfigSchema

- [#1602](#1602)
[`22a0502`](22a0502)
Thanks [@miguelg719](https://github.com/miguelg719)! - Include vertex as
a supported provider

- Updated dependencies
\[[`7584f3e`](7584f3e),
[`1e1c9c1`](1e1c9c1),
[`6bef890`](6bef890),
[`ffd4b33`](ffd4b33),
[`677bff5`](677bff5),
[`65ff464`](65ff464),
[`101bcf2`](101bcf2),
[`0a94301`](0a94301),
[`b27c04d`](b27c04d),
[`afbd08b`](afbd08b),
[`e3db9aa`](e3db9aa),
[`0e8d569`](0e8d569),
[`ff0f979`](ff0f979),
[`2d89d2b`](2d89d2b),
[`aac9a19`](aac9a19),
[`06de50f`](06de50f),
[`aa4d981`](aa4d981),
[`18b1e3b`](18b1e3b),
[`957d82b`](957d82b),
[`b65756e`](b65756e),
[`22e371a`](22e371a),
[`d29b91f`](d29b91f),
[`7b4f817`](7b4f817),
[`176d420`](176d420),
[`3f9ca4d`](3f9ca4d),
[`8a3c066`](8a3c066),
[`49ead1e`](49ead1e),
[`3673369`](3673369),
[`c465e87`](c465e87),
[`ae533e4`](ae533e4),
[`ea33052`](ea33052),
[`5764ede`](5764ede),
[`f09b184`](f09b184),
[`a7d29de`](a7d29de),
[`d334399`](d334399),
[`44416da`](44416da),
[`bdd8b4e`](bdd8b4e)]:
    -   @browserbasehq/stagehand@3.1.0

## @browserbasehq/stagehand-evals@1.1.8

### Patch Changes

- Updated dependencies
\[[`7584f3e`](7584f3e),
[`1e1c9c1`](1e1c9c1),
[`6bef890`](6bef890),
[`ffd4b33`](ffd4b33),
[`677bff5`](677bff5),
[`65ff464`](65ff464),
[`101bcf2`](101bcf2),
[`0a94301`](0a94301),
[`b27c04d`](b27c04d),
[`afbd08b`](afbd08b),
[`e3db9aa`](e3db9aa),
[`0e8d569`](0e8d569),
[`ff0f979`](ff0f979),
[`2d89d2b`](2d89d2b),
[`aac9a19`](aac9a19),
[`06de50f`](06de50f),
[`aa4d981`](aa4d981),
[`18b1e3b`](18b1e3b),
[`957d82b`](957d82b),
[`b65756e`](b65756e),
[`22e371a`](22e371a),
[`d29b91f`](d29b91f),
[`7b4f817`](7b4f817),
[`176d420`](176d420),
[`3f9ca4d`](3f9ca4d),
[`8a3c066`](8a3c066),
[`49ead1e`](49ead1e),
[`3673369`](3673369),
[`c465e87`](c465e87),
[`ae533e4`](ae533e4),
[`ea33052`](ea33052),
[`5764ede`](5764ede),
[`f09b184`](f09b184),
[`a7d29de`](a7d29de),
[`d334399`](d334399),
[`44416da`](44416da),
[`bdd8b4e`](bdd8b4e)]:
    -   @browserbasehq/stagehand@3.1.0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants