diff --git a/.changeset/add-bedrock-provider-schema.md b/.changeset/add-bedrock-provider-schema.md new file mode 100644 index 000000000..775df01da --- /dev/null +++ b/.changeset/add-bedrock-provider-schema.md @@ -0,0 +1,6 @@ +--- +"@browserbasehq/stagehand": patch +"@browserbasehq/stagehand-server-v3": patch +--- + +Add bedrock to the provider enum in model configuration schemas and regenerate OpenAPI spec. diff --git a/.changeset/cool-doors-open.md b/.changeset/cool-doors-open.md new file mode 100644 index 000000000..18350f076 --- /dev/null +++ b/.changeset/cool-doors-open.md @@ -0,0 +1,5 @@ +--- +"@browserbasehq/stagehand": minor +--- + +feat: add `cdpHeaders` option to `localBrowserLaunchOptions` for passing custom HTTP headers when connecting to an existing browser via CDP URL diff --git a/.changeset/fix-cli-env-mode.md b/.changeset/fix-cli-env-mode.md new file mode 100644 index 000000000..7763987d2 --- /dev/null +++ b/.changeset/fix-cli-env-mode.md @@ -0,0 +1,5 @@ +--- +"@browserbasehq/browse-cli": patch +--- + +Fix `browse env` showing stale mode after `browse env remote` diff --git a/.changeset/flat-mice-cheer.md b/.changeset/flat-mice-cheer.md new file mode 100644 index 000000000..53e42c3a1 --- /dev/null +++ b/.changeset/flat-mice-cheer.md @@ -0,0 +1,5 @@ +--- +"@browserbasehq/stagehand": patch +--- + +Expose `headers` in `GoogleVertexProviderSettings` so model configs can pass custom provider headers (for example `X-Goog-Priority`) without TypeScript errors. diff --git a/.changeset/open-source-browse-cli.md b/.changeset/open-source-browse-cli.md new file mode 100644 index 000000000..9c225011e --- /dev/null +++ b/.changeset/open-source-browse-cli.md @@ -0,0 +1,5 @@ +--- +"@browserbasehq/browse-cli": minor +--- + +Initial release of browse CLI - browser automation for AI agents diff --git a/.changeset/optional-project-id.md b/.changeset/optional-project-id.md new file mode 100644 index 000000000..74919ca30 --- /dev/null +++ b/.changeset/optional-project-id.md @@ -0,0 +1,5 @@ +--- +"@browserbasehq/stagehand": patch +--- + +Make projectId optional for Browserbase sessions — only BROWSERBASE_API_KEY is required diff --git a/.changeset/floppy-experts-wash.md b/.changeset/public-results-mate.md similarity index 50% rename from .changeset/floppy-experts-wash.md rename to .changeset/public-results-mate.md index 79a30b391..fb85f5394 100644 --- a/.changeset/floppy-experts-wash.md +++ b/.changeset/public-results-mate.md @@ -2,4 +2,4 @@ "@browserbasehq/stagehand": patch --- -remove unnecessary log +Add configurable timeout to tools in agent diff --git a/.changeset/real-cameras-grin.md b/.changeset/real-cameras-grin.md new file mode 100644 index 000000000..22a326e00 --- /dev/null +++ b/.changeset/real-cameras-grin.md @@ -0,0 +1,5 @@ +--- +"@browserbasehq/stagehand": patch +--- + +When connecting to a browser session that has zero open tabs, Stagehand now automatically creates an initial `about:blank` tab so the connection can continue. diff --git a/.changeset/solid-rice-admire.md b/.changeset/solid-rice-admire.md deleted file mode 100644 index 2f0291d02..000000000 --- a/.changeset/solid-rice-admire.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@browserbasehq/stagehand": minor ---- - -Added Gemini 2.5 Flash to Google supported models diff --git a/.changeset/some-parrots-wave.md b/.changeset/some-parrots-wave.md new file mode 100644 index 000000000..1cd697972 --- /dev/null +++ b/.changeset/some-parrots-wave.md @@ -0,0 +1,5 @@ +--- +"@browserbasehq/stagehand": patch +--- + +fix issue where handlePossibleNavigation was producing unnecessary error logs on clicks that trigger page close diff --git a/.changeset/fifty-cats-sell.md b/.changeset/twelve-breads-yawn.md similarity index 50% rename from .changeset/fifty-cats-sell.md rename to .changeset/twelve-breads-yawn.md index dfc981460..b89270c3c 100644 --- a/.changeset/fifty-cats-sell.md +++ b/.changeset/twelve-breads-yawn.md @@ -2,4 +2,4 @@ "@browserbasehq/stagehand": minor --- -extract links +add new page.setExtraHTTPHeaders() method diff --git a/.changeset/vast-vans-crash.md b/.changeset/vast-vans-crash.md deleted file mode 100644 index 3fdc06f83..000000000 --- a/.changeset/vast-vans-crash.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@browserbasehq/stagehand": patch ---- - -Fixes a redundant unnecessary log diff --git a/.cursorrules b/.cursorrules index fe68bca7b..fee9b45c2 100644 --- a/.cursorrules +++ b/.cursorrules @@ -1,140 +1,263 @@ # Stagehand Project -This is a project that uses Stagehand, which amplifies Playwright with `act`, `extract`, and `observe` added to the Page class. +This is a project that uses Stagehand V3, a browser automation framework with AI-powered `act`, `extract`, `observe`, and `agent` methods. -`Stagehand` is a class that provides config, a `StagehandPage` object via `stagehand.page`, and a `StagehandContext` object via `stagehand.context`. +The main class can be imported as `Stagehand` from `@browserbasehq/stagehand`. -`Page` is a class that extends the Playwright `Page` class and adds `act`, `extract`, and `observe` methods. -`Context` is a class that extends the Playwright `BrowserContext` class. +**Key Classes:** -Use the following rules to write code for this project. +- `Stagehand`: Main orchestrator class providing `act`, `extract`, `observe`, and `agent` methods +- `context`: A `V3Context` object that manages browser contexts and pages +- `page`: Individual page objects accessed via `stagehand.context.pages()[i]` or created with `stagehand.context.newPage()` -- To take an action on the page like "click the sign in button", use Stagehand `act` like this: +## Initialize ```typescript -await page.act("Click the sign in button"); +import { Stagehand } from "@browserbasehq/stagehand"; + +const stagehand = new Stagehand({ + env: "LOCAL", // or "BROWSERBASE" + verbose: 2, // 0, 1, or 2 + model: "openai/gpt-4.1-mini", // or any supported model +}); + +await stagehand.init(); + +// Access the browser context and pages +const page = stagehand.context.pages()[0]; +const context = stagehand.context; + +// Create new pages if needed +const page2 = await stagehand.context.newPage(); ``` -- To plan an instruction before taking an action, use Stagehand `observe` to get the action to execute. +## Act + +Actions are called on the `stagehand` instance (not the page). Use atomic, specific instructions: ```typescript -const [action] = await page.observe("Click the sign in button"); +// Act on the current active page +await stagehand.act("click the sign in button"); + +// Act on a specific page (when you need to target a page that isn't currently active) +await stagehand.act("click the sign in button", { page: page2 }); ``` -- The result of `observe` is an array of `ObserveResult` objects that can directly be used as params for `act` like this: +**Important:** Act instructions should be atomic and specific: - ```typescript - const [action] = await page.observe("Click the sign in button"); - await page.act(action); - ``` +- ✅ Good: "Click the sign in button" or "Type 'hello' into the search input" +- ❌ Bad: "Order me pizza" or "Type in the search bar and hit enter" (multi-step) -- When writing code that needs to extract data from the page, use Stagehand `extract`. Explicitly pass the following params by default: +### Observe + Act Pattern (Recommended) + +Cache the results of `observe` to avoid unexpected DOM changes: + +```typescript +const instruction = "Click the sign in button"; + +// Get candidate actions +const actions = await stagehand.observe(instruction); + +// Execute the first action +await stagehand.act(actions[0]); +``` + +To target a specific page: ```typescript -const { someValue } = await page.extract({ - instruction: the instruction to execute, - schema: z.object({ - someValue: z.string(), - }), // The schema to extract +const actions = await stagehand.observe("select blue as the favorite color", { + page: page2, }); +await stagehand.act(actions[0], { page: page2 }); ``` -## Initialize +## Extract + +Extract data from pages using natural language instructions. The `extract` method is called on the `stagehand` instance. + +### Basic Extraction (with schema) ```typescript -import { Stagehand } from "@browserbasehq/stagehand"; -import StagehandConfig from "./stagehand.config"; +import { z } from "zod"; + +// Extract with explicit schema +const data = await stagehand.extract( + "extract all apartment listings with prices and addresses", + z.object({ + listings: z.array( + z.object({ + price: z.string(), + address: z.string(), + }), + ), + }), +); -const stagehand = new Stagehand(StagehandConfig); -await stagehand.init(); +console.log(data.listings); +``` + +### Simple Extraction (without schema) + +```typescript +// Extract returns a default object with 'extraction' field +const result = await stagehand.extract("extract the sign in button text"); + +console.log(result); +// Output: { extraction: "Sign in" } -const page = stagehand.page; // Playwright Page with act, extract, and observe methods -const context = stagehand.context; // Playwright BrowserContext +// Or destructure directly +const { extraction } = await stagehand.extract( + "extract the sign in button text", +); +console.log(extraction); // "Sign in" ``` -## Act +### Targeted Extraction -You can cache the results of `observe` and use them as params for `act` like this: +Extract data from a specific element using a selector: ```typescript -const instruction = "Click the sign in button"; -const cachedAction = await getCache(instruction); +const reason = await stagehand.extract( + "extract the reason why script injection fails", + z.string(), + { selector: "/html/body/div[2]/div[3]/iframe/html/body/p[2]" }, +); +``` -if (cachedAction) { - await page.act(cachedAction); -} else { - try { - const results = await page.observe(instruction); - await setCache(instruction, results); - await page.act(results[0]); - } catch (error) { - await page.act(instruction); // If the action is not cached, execute the instruction directly - } -} +### URL Extraction + +When extracting links or URLs, use `z.string().url()`: + +```typescript +const { links } = await stagehand.extract( + "extract all navigation links", + z.object({ + links: z.array(z.string().url()), + }), +); ``` -Be sure to cache the results of `observe` and use them as params for `act` to avoid unexpected DOM changes. Using `act` without caching will result in more unpredictable behavior. +### Extracting from a Specific Page -Act `action` should be as atomic and specific as possible, i.e. "Click the sign in button" or "Type 'hello' into the search input". -AVOID actions that are more than one step, i.e. "Order me pizza" or "Type in the search bar and hit enter". +```typescript +// Extract from a specific page (when you need to target a page that isn't currently active) +const data = await stagehand.extract( + "extract the placeholder text on the name field", + { page: page2 }, +); +``` -## Extract +## Observe -If you are writing code that needs to extract data from the page, use Stagehand `extract`. +Plan actions before executing them. Returns an array of candidate actions: ```typescript -const signInButtonText = await page.extract("extract the sign in button text"); +// Get candidate actions on the current active page +const [action] = await stagehand.observe("Click the sign in button"); + +// Execute the action +await stagehand.act(action); ``` -You can also pass in params like an output schema in Zod, and a flag to use text extraction: +Observing on a specific page: ```typescript -const data = await page.extract({ - instruction: "extract the sign in button text", - schema: z.object({ - text: z.string(), - }), +// Target a specific page (when you need to target a page that isn't currently active) +const actions = await stagehand.observe("find the next page button", { + page: page2, }); +await stagehand.act(actions[0], { page: page2 }); ``` -`schema` is a Zod schema that describes the data you want to extract. To extract an array, make sure to pass in a single object that contains the array, as follows: +## Agent + +Use the `agent` method to autonomously execute complex, multi-step tasks. + +### Basic Agent Usage ```typescript -const data = await page.extract({ - instruction: "extract the text inside all buttons", - schema: z.object({ - text: z.array(z.string()), - }), - useTextExtract: true, // Set true for larger-scale extractions (multiple paragraphs), or set false for small extractions (name, birthday, etc) +const page = stagehand.context.pages()[0]; +await page.goto("https://www.google.com"); + +const agent = stagehand.agent({ + model: "google/gemini-2.0-flash", + executionModel: "google/gemini-2.0-flash", +}); + +const result = await agent.execute({ + instruction: "Search for the stock price of NVDA", + maxSteps: 20, }); + +console.log(result.message); ``` -## Agent +### Computer Use Agent (CUA) -Use the `agent` method to automonously execute larger tasks like "Get the stock price of NVDA" +For more advanced scenarios using computer-use models: ```typescript -// Navigate to a website -await stagehand.page.goto("https://www.google.com"); +const agent = stagehand.agent({ + mode: "cua", // Enable Computer Use Agent mode + model: "anthropic/claude-sonnet-4-20250514", + // or "google/gemini-2.5-computer-use-preview-10-2025" + systemPrompt: `You are a helpful assistant that can use a web browser. + Do not ask follow up questions, the user will trust your judgement.`, +}); + +await agent.execute({ + instruction: "Apply for a library card at the San Francisco Public Library", + maxSteps: 30, +}); +``` +### Agent with Custom Model Configuration + +```typescript const agent = stagehand.agent({ - // You can use either OpenAI or Anthropic - provider: "openai", - // The model to use (claude-3-7-sonnet-20250219 or claude-3-5-sonnet-20240620 for Anthropic) - model: "computer-use-preview", - - // Customize the system prompt - instructions: `You are a helpful assistant that can use a web browser. - Do not ask follow up questions, the user will trust your judgement.`, - - // Customize the API key - options: { - apiKey: process.env.OPENAI_API_KEY, + model: { + modelName: "google/gemini-2.5-computer-use-preview-10-2025", + apiKey: process.env.GEMINI_API_KEY, }, + systemPrompt: `You are a helpful assistant.`, }); +``` -// Execute the agent -await agent.execute( - "Apply for a library card at the San Francisco Public Library" -); +### Agent with Integrations (MCP/External Tools) + +```typescript +const agent = stagehand.agent({ + integrations: [`https://mcp.exa.ai/mcp?exaApiKey=${process.env.EXA_API_KEY}`], + systemPrompt: `You have access to the Exa search tool.`, +}); +``` + +## Advanced Features + +### DeepLocator (XPath Targeting) + +Target specific elements across shadow DOM and iframes: + +```typescript +await page + .deepLocator("/html/body/div[2]/div[3]/iframe/html/body/p") + .highlight({ + durationMs: 5000, + contentColor: { r: 255, g: 0, b: 0 }, + }); +``` + +### Multi-Page Workflows + +```typescript +const page1 = stagehand.context.pages()[0]; +await page1.goto("https://example.com"); + +const page2 = await stagehand.context.newPage(); +await page2.goto("https://example2.com"); + +// Act/extract/observe operate on the current active page by default +// Pass { page } option to target a specific page +await stagehand.act("click button", { page: page1 }); +await stagehand.extract("get title", { page: page2 }); ``` diff --git a/.env.example b/.env.example index f7b468d6f..eb08f00ee 100644 --- a/.env.example +++ b/.env.example @@ -6,7 +6,7 @@ BRAINTRUST_API_KEY="" ANTHROPIC_API_KEY="" HEADLESS=false ENABLE_CACHING=false -EVAL_MODELS="gpt-4o,claude-3-5-sonnet-latest" -EXPERIMENTAL_EVAL_MODELS="gpt-4o,claude-3-5-sonnet-latest,o1-mini,o1-preview" +EVAL_MODELS="gpt-4o,claude-sonnet-4-6" +EXPERIMENTAL_EVAL_MODELS="gpt-4o,claude-sonnet-4-6,o1-mini,o1-preview" EVAL_CATEGORIES="observe,act,combination,extract,experimental" -STAGEHAND_API_URL="http://localhost:80" +AGENT_EVAL_MAX_STEPS=50 \ No newline at end of file diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 000000000..7cc4db16c --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,76 @@ +--- +name: Bug report +about: Detailed descriptions help us resolve faster +title: '' +labels: '' +assignees: '' + +--- + +**Before submitting an issue, please:** + +- [ ] Check the [documentation](https://docs.stagehand.dev/) for relevant information +- [ ] Search existing [issues](https://github.com/browserbase/stagehand/issues) to avoid duplicates + +## Environment Information + +Please provide the following information to help us reproduce and resolve your issue: + +**Stagehand:** + +- Language/SDK: [TypeScript, Python, MCP…] +- Stagehand version: [e.g., 1.0.0] + +**AI Provider:** + +- Provider: [e.g., OpenAI, Anthropic, Azure OpenAI] +- Model: [e.g., gpt-4o, claude-sonnet-4-6] + +## Issue Description + +``` +[Describe the current behavior here] + +``` + +### Steps to Reproduce + +1. +2. +3. + +### Minimal Reproduction Code + +```tsx +// Your minimal reproduction code here +import { Stagehand } from '@browserbase/stagehand'; + +const stagehand = new Stagehand({ + // IMPORTANT: include your stagehand config +}); + +// Steps that reproduce the issue + +``` + +### Error Messages / Log trace + +``` +[Paste error messages/logs here] + +``` + +### Screenshots / Videos + +``` +[Attach screenshots or videos here] + +``` + +### Related Issues + +Are there any related issues or PRs? + +- Related to: #[issue number] +- Duplicate of: #[issue number] +- Blocks: #[issue number] diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 000000000..75889eb82 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,23 @@ +--- +name: Feature request +about: Suggest an idea for this project +title: '' +labels: '' +assignees: '' + +--- + +**Is your feature request related to a problem? Please describe.** +A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] + +**Describe the solution you'd like** +A clear and concise description of what you want to happen. + +**Describe alternatives you've considered** +A clear and concise description of any alternative solutions or features you've considered. + +**Are you willing to contribute to implementing this feature or fix?** + +- [ ] Yes, I can submit a PR +- [ ] Yes, but I need guidance +- [ ] No, I cannot contribute at this time diff --git a/.github/actions/select-browserbase-region/action.yml b/.github/actions/select-browserbase-region/action.yml new file mode 100644 index 000000000..e5ccbd13e --- /dev/null +++ b/.github/actions/select-browserbase-region/action.yml @@ -0,0 +1,67 @@ +name: Select Browserbase region +description: Select a Browserbase region based on a weighted distribution. +inputs: + distribution: + description: Comma-separated region=weight list (e.g. us-west-2=40,us-east-1=20). + required: true +outputs: + region: + description: Selected region. + value: ${{ steps.select.outputs.region }} +runs: + using: composite + steps: + - id: select + shell: bash + run: | + dist="${{ inputs.distribution }}" + if [ -z "$dist" ]; then + echo "BROWSERBASE_REGION_DISTRIBUTION is empty" + exit 1 + fi + IFS=',' read -r -a entries <<< "$dist" + total=0 + regions=() + weights=() + for entry in "${entries[@]}"; do + region="${entry%%=*}" + weight="${entry#*=}" + region="$(printf '%s' "$region" | tr -d '[:space:]')" + weight="$(printf '%s' "$weight" | tr -d '[:space:]')" + if [ -z "$region" ] || [ -z "$weight" ]; then + echo "Invalid region distribution entry: $entry" + exit 1 + fi + if ! [[ "$region" =~ ^[A-Za-z0-9-]+$ ]]; then + echo "Invalid region value: $region" + exit 1 + fi + if ! [[ "$weight" =~ ^[0-9]+$ ]]; then + echo "Invalid weight for region $region: $weight" + exit 1 + fi + regions+=("$region") + weights+=("$weight") + total=$((total + weight)) + done + if [ "$total" -le 0 ]; then + echo "Invalid total weight: $total" + exit 1 + fi + roll=$((RANDOM % total)) + cumulative=0 + chosen="" + for i in "${!regions[@]}"; do + cumulative=$((cumulative + weights[i])) + if [ "$roll" -lt "$cumulative" ]; then + chosen="${regions[i]}" + break + fi + done + if [ -z "$chosen" ]; then + echo "Failed to choose Browserbase region" + exit 1 + fi + echo "Selected Browserbase region: $chosen" + echo "region=$chosen" >> "$GITHUB_OUTPUT" + echo "BROWSERBASE_REGION=$chosen" >> "$GITHUB_ENV" diff --git a/.github/actions/setup-node-pnpm-turbo/action.yml b/.github/actions/setup-node-pnpm-turbo/action.yml new file mode 100644 index 000000000..e20cc9766 --- /dev/null +++ b/.github/actions/setup-node-pnpm-turbo/action.yml @@ -0,0 +1,56 @@ +name: Setup Node, pnpm, and Turbo cache +description: Configure pnpm and Node.js with caching, restore Turbo cache, and install dependencies. +inputs: + node-version: + description: Node.js version to use. + required: false + default: "20.x" + use-prebuilt-artifacts: + description: Whether to download pre-built package from build artifacts. + required: false + default: "true" + restore-turbo-cache: + description: Whether to restore the local .turbo cache. + required: false + default: "true" + +runs: + using: composite + steps: + - uses: pnpm/action-setup@v4 + + - name: Set up Node.js + uses: actions/setup-node@v6 + with: + node-version: ${{ inputs.node-version }} + cache: 'pnpm' + cache-dependency-path: '**/pnpm-lock.yaml' + + - name: Restore Turbo cache + if: ${{ inputs.restore-turbo-cache == 'true' }} + uses: actions/cache/restore@v4 + with: + path: .turbo + key: ${{ runner.os }}-turbo-${{ hashFiles('pnpm-lock.yaml', 'pnpm-workspace.yaml', 'package.json', 'turbo.json') }}-${{ github.sha }} + restore-keys: | + ${{ runner.os }}-turbo-${{ hashFiles('pnpm-lock.yaml', 'pnpm-workspace.yaml', 'package.json', 'turbo.json') }}- + + - name: Install dependencies + shell: bash + run: pnpm install --frozen-lockfile --prefer-offline + + - name: Download build artifacts + if: ${{ inputs.use-prebuilt-artifacts == 'true' }} + uses: actions/download-artifact@v4 + with: + name: build-artifacts + path: . + merge-multiple: true + + - name: Prepare test output directories + shell: bash + run: | + mkdir -p "${GITHUB_WORKSPACE}/ctrf" + if [ -n "${NODE_V8_COVERAGE:-}" ]; then + mkdir -p "$NODE_V8_COVERAGE" + fi diff --git a/.github/actions/upload-ctrf-report/action.yml b/.github/actions/upload-ctrf-report/action.yml new file mode 100644 index 000000000..40ac37175 --- /dev/null +++ b/.github/actions/upload-ctrf-report/action.yml @@ -0,0 +1,34 @@ +name: Upload CTRF report +description: Upload CTRF report artifact. +inputs: + name: + description: Report path (used as artifact name when sanitized). + required: true + path: + description: Optional explicit path (defaults to name). + required: false + default: "" + +runs: + using: composite + steps: + - name: Normalize inputs + id: normalize + shell: bash + run: | + name="${{ inputs.name }}" + echo "name=${name//\//-}" >> "$GITHUB_OUTPUT" + if [ -n "${{ inputs.path }}" ]; then + echo "path=${{ inputs.path }}" >> "$GITHUB_OUTPUT" + else + echo "path=${{ inputs.name }}" >> "$GITHUB_OUTPUT" + fi + + - name: Upload CTRF report artifact + uses: actions/upload-artifact@v4 + with: + name: ${{ steps.normalize.outputs.name }} + # package.json anchors uploaded paths to the repository root. + path: | + package.json + ${{ steps.normalize.outputs.path }} diff --git a/.github/actions/upload-v8-coverage/action.yml b/.github/actions/upload-v8-coverage/action.yml new file mode 100644 index 000000000..11f8257dd --- /dev/null +++ b/.github/actions/upload-v8-coverage/action.yml @@ -0,0 +1,34 @@ +name: Upload V8 coverage +description: Upload V8 coverage artifacts. +inputs: + name: + description: Artifact name. + required: true + path: + description: Coverage path to upload (defaults to name). + required: false + default: "" + +runs: + using: composite + steps: + - name: Normalize artifact name + id: normalize + shell: bash + run: | + name="${{ inputs.name }}" + echo "name=${name//\//-}" >> "$GITHUB_OUTPUT" + if [ -n "${{ inputs.path }}" ]; then + echo "path=${{ inputs.path }}" >> "$GITHUB_OUTPUT" + else + echo "path=${{ inputs.name }}" >> "$GITHUB_OUTPUT" + fi + + - name: Upload coverage artifact + uses: actions/upload-artifact@v4 + with: + name: ${{ steps.normalize.outputs.name }} + # package.json anchors uploaded paths to the repository root. + path: | + package.json + ${{ steps.normalize.outputs.path }} diff --git a/.github/actions/verify-chromium-launch/action.yml b/.github/actions/verify-chromium-launch/action.yml new file mode 100644 index 000000000..e8be8fdcb --- /dev/null +++ b/.github/actions/verify-chromium-launch/action.yml @@ -0,0 +1,223 @@ +name: Verify Chromium launch +description: Validate that Chromium can start, connect to CDP, and read the page title. +inputs: + chrome-path: + description: Path to Chromium/Chrome binary. + required: false + default: "/usr/bin/chromium" + max-attempts: + description: Number of launch attempts before failing. + required: false + default: "3" + timeout-ms: + description: Milliseconds to wait for DevTools and CDP per attempt. + required: false + default: "30000" +runs: + using: composite + steps: + - shell: bash + run: | + set -euo pipefail + max_attempts="${{ inputs.max-attempts }}" + attempt=1 + while [ "$attempt" -le "$max_attempts" ]; do + if [ -n "${{ inputs.chrome-path }}" ]; then + pkill -f "${{ inputs.chrome-path }}" >/dev/null 2>&1 || true + fi + if node - <<'NODE' + const { spawn } = require("node:child_process"); + const workspace = process.env.GITHUB_WORKSPACE; + if (workspace) { + process.chdir(workspace); + } + + const chrome = "${{ inputs.chrome-path }}"; + + const timeoutMs = Number("${{ inputs.timeout-ms }}"); + const wsPrefix = "DevTools listening on "; + const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms)); + + let proc; + let wsUrl; + + const waitForWsUrl = async () => { + const deadline = Date.now() + timeoutMs; + while (!wsUrl) { + if (Date.now() > deadline) { + throw new Error( + `❌ Chromium did not expose CDP WS URL within timeout (${timeoutMs}ms)`, + ); + } + await sleep(250); + } + return wsUrl; + }; + + const cleanup = () => { + if (proc && !proc.killed) { + proc.kill("SIGKILL"); + } + }; + + (async () => { + try { + const startTime = Date.now(); + const args = [ + '--ash-no-nudges', + '--block-new-web-contents', + '--deny-permission-prompts', + '--disable-breakpad', + '--disable-client-side-phishing-detection', + '--disable-component-update', + '--disable-components=AcceptCHFrame,OptimizationHints,ProcessPerSiteUpToMainFrameThreshold,InterestFeedContentSuggestions,CalculateNativeWinOcclusion,BackForwardCache,HeavyAdPrivacyMitigations,LazyFrameLoading,ImprovedCookieControls,PrivacySandboxSettings4,AutofillServerCommunication,CertificateTransparencyComponentUpdater,DestroyProfileOnBrowserClose,CrashReporting,OverscrollHistoryNavigation,InfiniteSessionRestore', + '--disable-datasaver-prompt', + '--disable-default-apps', + '--disable-desktop-notifications', + '--disable-domain-reliability', + '--disable-external-intent-requests', + '--disable-hang-monitor', + '--disable-infobars', + '--disable-notifications', + '--disable-popup-blocking', + '--disable-print-preview', + '--disable-prompt-on-repost', + '--disable-search-engine-choice-screen', + '--disable-session-crashed-bubble', + '--disable-speech-api', + '--disable-speech-synthesis-api', + '--hide-crash-restore-bubble', + '--metrics-recording-only', + '--no-default-browser-check', + '--no-first-run', + '--no-pings', + '--noerrdialogs', + '--safebrowsing-disable-auto-update', + '--silent-debugger-extension-api', + '--simulate-outdated-no-au="Tue, 31 Dec 2099 23:59:59 GMT"', + '--suppress-message-center-popups', + "--disable-background-networking", + "--disable-default-apps", + "--disable-dev-shm-usage", + "--disable-extensions", + "--disable-notifications", + "--disable-setuid-sandbox", + "--disable-site-isolation-trials", + "--disable-sync", + "--disable-web-security", + "--headless=new", + "--no-default-browser-check", + "--no-first-run", + "--no-sandbox", + "--no-zygote", + "--password-store=basic", + "--remote-debugging-port=0", + "--test-type=gpu", + "--use-mock-keychain", + "about:blank", + ]; + proc = spawn(chrome, args, { stdio: ["ignore", "pipe", "pipe"] }); + const lineBuffers = { stdout: "", stderr: "" }; + const onData = (stream) => (data) => { + const text = data.toString(); + if (stream === "stderr") { + process.stderr.write(text); + } else { + process.stdout.write(text); + } + lineBuffers[stream] += text; + const lines = lineBuffers[stream].split(/\r?\n/); + lineBuffers[stream] = lines.pop() ?? ""; + for (const line of lines) { + const idx = line.indexOf(wsPrefix); + if (idx === -1) continue; + const rest = line.slice(idx + wsPrefix.length).trim(); + const candidate = rest.split(/\s+/)[0]; + if ( + candidate.startsWith("ws://") || + candidate.startsWith("wss://") + ) { + wsUrl = candidate; + } + } + }; + proc.stdout.on("data", onData("stdout")); + proc.stderr.on("data", onData("stderr")); + + const url = await waitForWsUrl(); + const wsFoundMs = Date.now() - startTime; + const wsFoundSec = (wsFoundMs / 1000).toFixed(2); + const connectStart = Date.now(); + const path = require("node:path"); + const workspaceRoot = process.env.GITHUB_WORKSPACE || process.cwd(); + const playwrightPath = path.join( + workspaceRoot, + "packages/core/node_modules/playwright", + ); + console.log( + `✅ CDP Url found after ${wsFoundSec}s, connecting with playwright...`, + ); + const { chromium } = require(playwrightPath); + const browser = await chromium.connectOverCDP(url, { + timeout: timeoutMs, + }); + const context = browser.contexts()[0]; + if (!context) { + throw new Error("❌ No browser context available after CDP connect"); + } + const page = context.pages()[0]; + if (!page) { + throw new Error("❌ No page available after CDP connect"); + } + const remainingMs = timeoutMs - (Date.now() - connectStart); + if (remainingMs <= 0) { + throw new Error( + `❌ CDP connect + verify timed out after ${timeoutMs}ms`, + ); + } + const sum = await Promise.race([ + page.evaluate("1 + 1"), + new Promise((_, reject) => + setTimeout( + () => + reject( + new Error( + `❌ CDP connect + verify timed out after ${timeoutMs}ms`, + ), + ), + remainingMs, + ), + ), + ]); + if (sum !== 2) { + throw new Error(`❌ Unexpected eval result: ${sum}`); + } + const totalMs = Date.now() - startTime; + const connectMs = Date.now() - connectStart; + const totalSec = (totalMs / 1000).toFixed(2); + const connectSec = (connectMs / 1000).toFixed(2); + console.log( + `✅ Chromium launched in ${wsFoundSec}s and CDP connected in ${connectSec}s (total: ${totalSec}s)`, + ); + await browser.close(); + cleanup(); + process.exit(0); + } catch (err) { + cleanup(); + console.error(err instanceof Error ? err.message : String(err)); + process.exit(1); + } + })(); + NODE + then + if [ "$attempt" -gt 1 ]; then + echo "⚠️ Chromium launch succeeded after ${attempt} attempts; GitHub Actions runner may be constrained." + fi + exit 0 + fi + echo "⚠️ Chromium launch attempt ${attempt} failed." + attempt=$((attempt + 1)) + sleep 2 + done + echo "❌ Failed to launch Chromium before running Stagehand; GitHub Actions runner is likely overloaded." + exit 1 diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 66fdc2dd8..a12848b1a 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1,4 +1,4 @@ -name: Evals +name: Tests on: pull_request: @@ -7,659 +7,821 @@ on: - synchronize - labeled - unlabeled + paths-ignore: + - "packages/docs/**" + +permissions: + contents: read + actions: write env: - EVAL_MODELS: "gpt-4o,gpt-4o-mini,claude-3-5-sonnet-latest" - EVAL_CATEGORIES: "observe,act,combination,extract,text_extract,targeted_extract" + LLM_MAX_MS: "15000" + EVAL_MODELS: "openai/gpt-4.1,google/gemini-2.0-flash,anthropic/claude-haiku-4-5" + EVAL_AGENT_MODELS: "computer-use-preview-2025-03-11,claude-sonnet-4-6" + EVAL_CATEGORIES: "observe,act,combination,extract,targeted_extract,agent" + EVAL_MAX_CONCURRENCY: 25 + EVAL_TRIAL_COUNT: 3 + LOCAL_SESSION_LIMIT_PER_E2E_TEST: 2 + BROWSERBASE_SESSION_LIMIT_PER_E2E_TEST: 3 + BROWSERBASE_REGION_DISTRIBUTION: "us-west-2=30,us-east-1=30,eu-central-1=20,ap-southeast-1=20" # percentage of load for each region when running e2e tests against prod + CHROME_PATH: /usr/bin/chromium # GitHub Actions runners ship with stable Chromium by default + BROWSERBASE_CDP_CONNECT_MAX_MS: "10000" + BROWSERBASE_SESSION_CREATE_MAX_MS: "60000" + PUPPETEER_SKIP_DOWNLOAD: "1" + PLAYWRIGHT_SKIP_DOWNLOAD: "1" + TURBO_TELEMETRY_DISABLED: "1" concurrency: - group: ${{ github.ref }} + group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true jobs: + determine-changes: + runs-on: ubuntu-latest + outputs: + core: ${{ steps.filter.outputs.core }} + cli: ${{ steps.filter.outputs.cli }} + evals: ${{ steps.filter.outputs.evals }} + server: ${{ steps.filter.outputs.server }} + docs-only: ${{ steps.filter.outputs.docs-only }} + steps: + - name: Check out repository code + uses: actions/checkout@v4 + + - name: Log GitHub API rate limit + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + run: | + headers_file=$(mktemp) + body_file=$(mktemp) + curl -sSL \ + -D "$headers_file" \ + -o "$body_file" \ + -H "Accept: application/vnd.github+json" \ + -H "X-GitHub-Api-Version: 2022-11-28" \ + -H "Authorization: Bearer $GITHUB_TOKEN" \ + https://api.github.com/rate_limit + cat "$headers_file" + echo "" + cat "$body_file" + remaining=$(jq -r '.rate.remaining' "$body_file") + if [ "$remaining" -eq 0 ]; then + reset_epoch=$(jq -r '.rate.reset' "$body_file") + reset_utc=$(date -u -d "@$reset_epoch" +"%Y-%m-%d %H:%M:%S") + reset_pacific=$(TZ=America/Los_Angeles date -d "@$reset_epoch" +"%Y-%m-%d %H:%M:%S %Z") + echo "Github API rate limited until: ${reset_pacific} (${reset_utc} UTC)" >> "$GITHUB_STEP_SUMMARY" + echo "GitHub API rate limit exhausted." + exit 1 + fi + + - uses: dorny/paths-filter@v3 + id: filter + with: + filters: | + core: + - '.github/workflows/ci.yml' + - 'packages/core/**' + - 'package.json' + - 'pnpm-lock.yaml' + - 'turbo.json' + cli: + - 'packages/cli/**' + - 'packages/core/**' + - 'package.json' + - 'pnpm-lock.yaml' + evals: + - 'packages/evals/**' + - 'package.json' + - 'pnpm-lock.yaml' + server: + - 'packages/server-v3/**' + - 'packages/server-v4/**' + - 'packages/core/**' + - 'package.json' + - 'pnpm-lock.yaml' + - 'pnpm-workspace.yaml' + - '.github/workflows/ci.yml' + docs-only: + - '**/*.md' + - 'examples/**' + - '!packages/**/*.md' + determine-evals: + needs: [determine-changes] runs-on: ubuntu-latest outputs: - run-combination: ${{ steps.check-labels.outputs.run-combination }} - run-extract: ${{ steps.check-labels.outputs.run-extract }} - run-act: ${{ steps.check-labels.outputs.run-act }} - run-observe: ${{ steps.check-labels.outputs.run-observe }} - run-text-extract: ${{ steps.check-labels.outputs.run-text-extract }} - run-targeted-extract: ${{ steps.check-labels.outputs.run-targeted-extract }} + skip-all-evals: ${{ steps.check-labels.outputs.skip-all-evals }} + eval-categories: ${{ steps.check-labels.outputs.eval-categories }} steps: - id: check-labels run: | - # Default to running all tests on main branch - if [[ "${{ github.ref }}" == "refs/heads/main" ]]; then - echo "Running all tests for main branch" - echo "run-combination=true" >> $GITHUB_OUTPUT - echo "run-extract=true" >> $GITHUB_OUTPUT - echo "run-act=true" >> $GITHUB_OUTPUT - echo "run-observe=true" >> $GITHUB_OUTPUT - echo "run-text-extract=true" >> $GITHUB_OUTPUT - echo "run-targeted-extract=true" >> $GITHUB_OUTPUT + categories=() + declare -A seen + add_category() { + local category="$1" + if [[ -z "${seen[$category]:-}" ]]; then + categories+=("$category") + seen["$category"]=1 + fi + } + + emit_categories() { + local json="[" + for category in "${categories[@]}"; do + json+="\"${category}\"," + done + json="${json%,}" + json+="]" + echo "eval-categories=$json" >> $GITHUB_OUTPUT + } + + # Check if skip-evals label is present + if [[ "${{ contains(github.event.pull_request.labels.*.name, 'skip-evals') }}" == "true" ]]; then + echo "skip-evals label found - skipping all evals" + echo "skip-all-evals=true" >> $GITHUB_OUTPUT + emit_categories exit 0 fi - # Check for specific labels - echo "run-combination=${{ contains(github.event.pull_request.labels.*.name, 'combination') }}" >> $GITHUB_OUTPUT - echo "run-extract=${{ contains(github.event.pull_request.labels.*.name, 'extract') }}" >> $GITHUB_OUTPUT - echo "run-act=${{ contains(github.event.pull_request.labels.*.name, 'act') }}" >> $GITHUB_OUTPUT - echo "run-observe=${{ contains(github.event.pull_request.labels.*.name, 'observe') }}" >> $GITHUB_OUTPUT - echo "run-text-extract=${{ contains(github.event.pull_request.labels.*.name, 'text-extract') }}" >> $GITHUB_OUTPUT - echo "run-targeted-extract=${{ contains(github.event.pull_request.labels.*.name, 'targeted-extract') }}" >> $GITHUB_OUTPUT + # Skip evals if only docs/examples changed + if [[ "${{ needs.determine-changes.outputs.docs-only }}" == "true" && "${{ needs.determine-changes.outputs.core }}" == "false" && "${{ needs.determine-changes.outputs.evals }}" == "false" ]]; then + echo "Only docs/examples changed - skipping evals" + echo "skip-all-evals=true" >> $GITHUB_OUTPUT + emit_categories + exit 0 + fi + # Check for skip-regression-evals label + if [[ "${{ contains(github.event.pull_request.labels.*.name, 'skip-regression-evals') }}" == "true" ]]; then + echo "skip-regression-evals label found - regression evals will be skipped" + else + echo "Regression evals will run by default" + add_category "regression" + fi + + # Check for specific labels + echo "skip-all-evals=false" >> $GITHUB_OUTPUT + if [[ "${{ contains(github.event.pull_request.labels.*.name, 'combination') }}" == "true" ]]; then + add_category "combination" + fi + if [[ "${{ contains(github.event.pull_request.labels.*.name, 'extract') }}" == "true" ]]; then + add_category "extract" + fi + if [[ "${{ contains(github.event.pull_request.labels.*.name, 'act') }}" == "true" ]]; then + add_category "act" + fi + if [[ "${{ contains(github.event.pull_request.labels.*.name, 'observe') }}" == "true" ]]; then + add_category "observe" + fi + if [[ "${{ contains(github.event.pull_request.labels.*.name, 'targeted-extract') }}" == "true" ]]; then + add_category "targeted_extract" + fi + if [[ "${{ contains(github.event.pull_request.labels.*.name, 'agent') }}" == "true" ]]; then + add_category "agent" + fi + emit_categories + run-lint: + name: Lint runs-on: ubuntu-latest + needs: [run-build] steps: - name: Check out repository code uses: actions/checkout@v4 - - name: Set up Node.js - uses: actions/setup-node@v4 + - uses: ./.github/actions/setup-node-pnpm-turbo with: - node-version: "20" - - - name: Install dependencies - run: | - rm -rf node_modules - rm -f package-lock.json - npm install + use-prebuilt-artifacts: "true" + restore-turbo-cache: "false" + node-version: 20.x - name: Run Lint - run: npm run lint + run: pnpm exec turbo run lint - run-build: + cancel-after-lint-failure: + name: Cancel after lint failure runs-on: ubuntu-latest + needs: [run-lint] + if: ${{ always() && needs.run-lint.result == 'failure' }} + continue-on-error: true steps: - - name: Check out repository code - uses: actions/checkout@v4 - - - name: Set up Node.js - uses: actions/setup-node@v4 - with: - node-version: "20" - - - name: Install dependencies + - name: Cancel workflow run + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | - rm -rf node_modules - rm -f package-lock.json - npm install - - - name: Run Build - run: npm run build + curl -sSfL -X POST \ + -H "Authorization: Bearer ${GITHUB_TOKEN}" \ + -H "Accept: application/vnd.github+json" \ + -H "X-GitHub-Api-Version: 2022-11-28" \ + "https://api.github.com/repos/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/cancel" - run-e2e-tests: - needs: [run-lint, run-build] + run-build: + name: Build runs-on: ubuntu-latest - timeout-minutes: 50 - env: - HEADLESS: true steps: - name: Check out repository code uses: actions/checkout@v4 - - name: Set up Node.js - uses: actions/setup-node@v4 + - uses: ./.github/actions/setup-node-pnpm-turbo with: - node-version: "20" - - - name: Install dependencies - run: | - rm -rf node_modules - rm -f package-lock.json - npm install + use-prebuilt-artifacts: "false" + node-version: 20.x - - name: Install Playwright browsers - run: npm exec playwright install --with-deps - - - name: Build Stagehand - run: npm run build + - name: Run Build + run: pnpm exec turbo run build - - name: Run E2E Tests (Deterministic Playwright) - run: npm run e2e + - name: Save Turbo cache + if: always() + uses: actions/cache/save@v4 + with: + path: .turbo + key: ${{ runner.os }}-turbo-${{ hashFiles('pnpm-lock.yaml', 'pnpm-workspace.yaml', 'package.json', 'turbo.json') }}-${{ github.sha }} - run-e2e-local-tests: - needs: [run-lint, run-build] + - name: Upload build artifacts + uses: actions/upload-artifact@v4 + with: + name: build-artifacts + include-hidden-files: true + # package.json is included to anchor artifact paths at repo root. + path: | + package.json + packages/core/dist/** + packages/core/lib/version.ts + packages/core/lib/dom/build/** + packages/core/lib/v3/dom/build/** + packages/cli/dist/** + packages/evals/dist/** + packages/server-v3/dist/** + packages/server-v3/openapi.v3.yaml + packages/server-v4/dist/** + packages/server-v4/openapi.v4.yaml + retention-days: 1 + + discover-core-tests: runs-on: ubuntu-latest - timeout-minutes: 50 - env: - HEADLESS: true + needs: [determine-changes] + if: needs.determine-changes.outputs.core == 'true' + outputs: + core-tests: ${{ steps.set-matrix.outputs.core-tests }} + has-core-tests: ${{ steps.set-matrix.outputs.has-core-tests }} + steps: - - name: Check out repository code - uses: actions/checkout@v4 + - uses: actions/checkout@v4 + with: + fetch-depth: 1 - - name: Set up Node.js - uses: actions/setup-node@v4 + - uses: ./.github/actions/setup-node-pnpm-turbo with: - node-version: "20" + use-prebuilt-artifacts: "false" + restore-turbo-cache: "false" - - name: Install dependencies + - name: Discover core test files + id: set-matrix run: | - rm -rf node_modules - rm -f package-lock.json - npm install - - - name: Install Playwright browsers - run: npm exec playwright install --with-deps + core_json=$(pnpm --filter @browserbasehq/stagehand --silent run test:core -- --list) + echo "core-tests=$core_json" >> $GITHUB_OUTPUT - - name: Build Stagehand - run: npm run build + if [ "$core_json" = "[]" ]; then + echo "has-core-tests=false" >> $GITHUB_OUTPUT + else + echo "has-core-tests=true" >> $GITHUB_OUTPUT + fi - - name: Run local E2E Tests (Deterministic Playwright) - run: npm run e2e:local + echo "Found core tests: $core_json" - run-e2e-bb-tests: - needs: [run-lint, run-build] + core-unit-tests: + name: core/${{ matrix.test.name }} runs-on: ubuntu-latest - timeout-minutes: 50 - if: > - github.event_name == 'push' || - (github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository) + needs: [run-build, discover-core-tests] + if: needs.discover-core-tests.outputs.has-core-tests == 'true' env: - OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} - ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} - BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }} - BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }} - HEADLESS: true + STAGEHAND_BROWSER_TARGET: local + STAGEHAND_SERVER_TARGET: local + + strategy: + fail-fast: false + max-parallel: 100 + matrix: + test: ${{ fromJson(needs.discover-core-tests.outputs.core-tests) }} + steps: - - name: Check out repository code - uses: actions/checkout@v4 + - uses: actions/checkout@v4 + with: + fetch-depth: 1 - - name: Set up Node.js - uses: actions/setup-node@v4 + - uses: ./.github/actions/setup-node-pnpm-turbo with: - node-version: "20" + use-prebuilt-artifacts: "true" + restore-turbo-cache: "false" - - name: Install dependencies + - name: Run Vitest - ${{ matrix.test.name }} run: | - rm -rf node_modules - rm -f package-lock.json - npm install - - - name: Install Playwright browsers - run: npm exec playwright install --with-deps + pnpm exec turbo run test:core --only --filter=@browserbasehq/stagehand -- "${{ matrix.test.path }}" - - name: Build Stagehand - run: npm run build + - uses: ./.github/actions/upload-ctrf-report + if: always() + with: + name: ctrf/core-unit/${{ matrix.test.name }}.json - - name: Run E2E Tests (browserbase) - run: npm run e2e:bb + - uses: ./.github/actions/upload-v8-coverage + if: always() + with: + name: coverage/core-unit/${{ matrix.test.name }} - run-regression-evals: - needs: - [run-e2e-bb-tests, run-e2e-tests, run-e2e-local-tests, determine-evals] + discover-server-tests: runs-on: ubuntu-latest - timeout-minutes: 9 + needs: [determine-changes] + if: needs.determine-changes.outputs.server == 'true' outputs: - regression_score: ${{ steps.set-regression-score.outputs.regression_score }} - env: - OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} - ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} - BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }} - BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }} - BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }} - HEADLESS: true - EVAL_ENV: browserbase + integration-tests: ${{ steps.set-matrix.outputs.integration-tests }} + has-integration-tests: ${{ steps.set-matrix.outputs.has-integration-tests }} + steps: - - name: Check out repository code - uses: actions/checkout@v4 + - uses: actions/checkout@v4 + with: + fetch-depth: 1 - - name: Set up Node.js - uses: actions/setup-node@v4 + - uses: ./.github/actions/setup-node-pnpm-turbo with: - node-version: "20" + use-prebuilt-artifacts: "false" + restore-turbo-cache: "false" - - name: Install dependencies + - name: Discover server test files + id: set-matrix run: | - rm -rf node_modules - rm -f package-lock.json - npm install - - - name: Build Stagehand - run: npm run build - - - name: Install Playwright browsers - run: npm exec playwright install --with-deps + int_json=$(pnpm --filter @browserbasehq/stagehand-server-v3 --silent run test:server -- --list integration) + echo "integration-tests=$int_json" >> $GITHUB_OUTPUT - - name: Run Regression Evals - run: npm run evals category regression trials=2 concurrency=20 env=BROWSERBASE - - - name: Log Regression Evals Performance - run: | - experimentName=$(jq -r '.experimentName' eval-summary.json) - echo "View results at https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentName}" - if [ -f eval-summary.json ]; then - regression_score=$(jq '.categories.regression' eval-summary.json) - echo "Regression category score: $regression_score%" - if (( $(echo "$regression_score < 90" | bc -l) )); then - echo "Regression category score is below 90%. Failing CI." - exit 1 - fi + if [ "$int_json" = "[]" ]; then + echo "has-integration-tests=false" >> $GITHUB_OUTPUT else - echo "Eval summary not found for regression category. Failing CI." - exit 1 + echo "has-integration-tests=true" >> $GITHUB_OUTPUT fi - run-combination-evals: - needs: [run-regression-evals, determine-evals] + echo "Found server integration tests: $int_json" + + build-server-sea: + name: Build SEA binary (tests, v3) + uses: ./.github/workflows/stagehand-server-v3-sea-build.yml + needs: [run-build] + with: + matrix: | + [ + {"os":"ubuntu-latest","platform":"linux","arch":"x64","binary_name":"stagehand-server-v3-linux-x64","include_sourcemaps":false}, + {"os":"ubuntu-24.04-arm","platform":"linux","arch":"arm64","binary_name":"stagehand-server-v3-linux-arm64","include_sourcemaps":false}, + {"os":"macos-15","platform":"darwin","arch":"arm64","binary_name":"stagehand-server-v3-darwin-arm64","include_sourcemaps":false}, + {"os":"macos-15-intel","platform":"darwin","arch":"x64","binary_name":"stagehand-server-v3-darwin-x64","include_sourcemaps":false}, + {"os":"windows-latest","platform":"win32","arch":"x64","binary_name":"stagehand-server-v3-win32-x64.exe","include_sourcemaps":false}, + {"os":"windows-11-arm","platform":"win32","arch":"arm64","binary_name":"stagehand-server-v3-win32-arm64.exe","include_sourcemaps":false}, + {"os":"ubuntu-latest","platform":"linux","arch":"x64","binary_name":"stagehand-server-v3-linux-x64-sourcemap","include_sourcemaps":true} + ] + use-prebuilt-artifacts: "true" + restore-turbo-cache: "false" + node-version: "20.x" + upload-only-binary: stagehand-server-v3-linux-x64-sourcemap + + server-integration-tests: + name: server/v3/integration/${{ matrix.test.name }} runs-on: ubuntu-latest - timeout-minutes: 40 + needs: [build-server-sea, discover-server-tests, run-build] + if: needs.discover-server-tests.outputs.has-integration-tests == 'true' + + strategy: + fail-fast: false + matrix: + test: ${{ fromJson(needs.discover-server-tests.outputs.integration-tests) }} + env: + BB_ENV: local + STAGEHAND_BASE_URL: http://stagehand-api.localhost:3107 + STAGEHAND_BROWSER_TARGET: local + STAGEHAND_SERVER_TARGET: sea OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} + GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} - BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }} + # Used only for testing /start with env: BROWSERBASE remote browser BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }} BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }} - HEADLESS: true - EVAL_ENV: browserbase + steps: - - name: Check out repository code - uses: actions/checkout@v4 + - uses: actions/checkout@v4 + with: + fetch-depth: 1 - - name: Check for 'combination' label - id: label-check - run: | - if [ "${{ needs.determine-evals.outputs.run-combination }}" != "true" ]; then - echo "has_label=false" >> $GITHUB_OUTPUT - echo "No label for COMBINATION. Exiting with success." - else - echo "has_label=true" >> $GITHUB_OUTPUT - fi + - uses: ./.github/actions/setup-node-pnpm-turbo + with: + use-prebuilt-artifacts: "true" + restore-turbo-cache: "false" - - name: Set up Node.js - if: needs.determine-evals.outputs.run-combination == 'true' - uses: actions/setup-node@v4 + - name: Download SEA binary + uses: actions/download-artifact@v4 with: - node-version: "20" + name: stagehand-server-v3-linux-x64-sourcemap + path: . - - name: Install dependencies - if: needs.determine-evals.outputs.run-combination == 'true' + - name: Ensure SEA binary is present and executable + shell: bash run: | - rm -rf node_modules - rm -f package-lock.json - npm install - - - name: Build Stagehand - if: needs.determine-evals.outputs.run-combination == 'true' - run: npm run build + set -euo pipefail + test -f packages/server-v3/dist/sea/stagehand-server-v3-linux-x64-sourcemap + chmod +x packages/server-v3/dist/sea/stagehand-server-v3-linux-x64-sourcemap - - name: Install Playwright browsers - if: needs.determine-evals.outputs.run-combination == 'true' - run: npm exec playwright install --with-deps + - name: Run server integration test - ${{ matrix.test.name }} + env: + SEA_BINARY_NAME: stagehand-server-v3-linux-x64-sourcemap + run: | + pnpm exec turbo run test:server --only --filter=@browserbasehq/stagehand-server-v3 -- "${{ matrix.test.path }}" - - name: Run Combination Evals - if: needs.determine-evals.outputs.run-combination == 'true' - run: npm run evals category combination + - uses: ./.github/actions/upload-ctrf-report + if: always() + with: + name: ctrf/server-v3-integration/${{ matrix.test.name }}.json - - name: Log Combination Evals Performance - if: needs.determine-evals.outputs.run-combination == 'true' - run: | - experimentName=$(jq -r '.experimentName' eval-summary.json) - echo "View results at https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentName}" - if [ -f eval-summary.json ]; then - combination_score=$(jq '.categories.combination' eval-summary.json) - echo "Combination category score: $combination_score%" - exit 0 - else - echo "Eval summary not found for combination category. Failing CI." - exit 1 - fi + - uses: ./.github/actions/upload-v8-coverage + if: always() + with: + name: coverage/server-v3-integration/${{ matrix.test.name }} - run-act-evals: - needs: [run-combination-evals, determine-evals] + discover-e2e-tests: runs-on: ubuntu-latest - timeout-minutes: 25 - env: - OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} - ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} - BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }} - BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }} - BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }} - HEADLESS: true - EVAL_ENV: browserbase - steps: - - name: Check out repository code - uses: actions/checkout@v4 + needs: [determine-changes] + if: needs.determine-changes.outputs.core == 'true' + outputs: + e2e-tests: ${{ steps.set-matrix.outputs.e2e-tests }} + has-e2e-tests: ${{ steps.set-matrix.outputs.has-e2e-tests }} - - name: Check for 'act' label - id: label-check - run: | - if [ "${{ needs.determine-evals.outputs.run-act }}" != "true" ]; then - echo "has_label=false" >> $GITHUB_OUTPUT - echo "No label for ACT. Exiting with success." - else - echo "has_label=true" >> $GITHUB_OUTPUT - fi + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 1 - - name: Set up Node.js - if: needs.determine-evals.outputs.run-act == 'true' - uses: actions/setup-node@v4 + - uses: ./.github/actions/setup-node-pnpm-turbo with: - node-version: "20" + use-prebuilt-artifacts: "false" + restore-turbo-cache: "false" - - name: Install dependencies - if: needs.determine-evals.outputs.run-act == 'true' + - name: Discover e2e test files + id: set-matrix run: | - rm -rf node_modules - rm -f package-lock.json - npm install - - - name: Build Stagehand - if: needs.determine-evals.outputs.run-act == 'true' - run: npm run build + e2e_json=$(pnpm --filter @browserbasehq/stagehand --silent run test:e2e -- --list) + echo "e2e-tests=$e2e_json" >> $GITHUB_OUTPUT - - name: Install Playwright browsers - if: needs.determine-evals.outputs.run-act == 'true' - run: npm exec playwright install --with-deps - - - name: Run Act Evals - if: needs.determine-evals.outputs.run-act == 'true' - run: npm run evals category act - - - name: Log Act Evals Performance - if: needs.determine-evals.outputs.run-act == 'true' - run: | - experimentName=$(jq -r '.experimentName' eval-summary.json) - echo "View results at https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentName}" - if [ -f eval-summary.json ]; then - act_score=$(jq '.categories.act' eval-summary.json) - echo "Act category score: $act_score%" - if (( $(echo "$act_score < 80" | bc -l) )); then - echo "Act category score is below 80%. Failing CI." - exit 1 - fi + if [ "$e2e_json" = "[]" ]; then + echo "has-e2e-tests=false" >> $GITHUB_OUTPUT else - echo "Eval summary not found for act category. Failing CI." - exit 1 + echo "has-e2e-tests=true" >> $GITHUB_OUTPUT fi - run-extract-evals: - needs: [run-act-evals, determine-evals] + echo "Found e2e tests: $e2e_json" + + run-e2e-local-tests: + name: e2e/local/${{ matrix.test.name }} + needs: [run-build, discover-e2e-tests] runs-on: ubuntu-latest timeout-minutes: 50 + if: > + needs.discover-e2e-tests.outputs.has-e2e-tests == 'true' && + github.event.pull_request.head.repo.full_name == github.repository env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} - BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }} + GOOGLE_GENERATIVE_AI_API_KEY: ${{ secrets.GOOGLE_GENERATIVE_AI_API_KEY }} BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }} BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }} HEADLESS: true - EVAL_ENV: browserbase + STAGEHAND_BROWSER_TARGET: local + STAGEHAND_SERVER_TARGET: local + strategy: + fail-fast: false + max-parallel: 20 + matrix: + test: ${{ fromJson(needs.discover-e2e-tests.outputs.e2e-tests) }} steps: - name: Check out repository code uses: actions/checkout@v4 - - name: Check for 'extract' label - id: label-check + - uses: ./.github/actions/setup-node-pnpm-turbo + with: + use-prebuilt-artifacts: "true" + restore-turbo-cache: "false" + + - uses: ./.github/actions/verify-chromium-launch + + - name: Run local E2E Tests - ${{ matrix.test.name }} run: | - if [ "${{ needs.determine-evals.outputs.run-extract }}" != "true" ]; then - echo "has_label=false" >> $GITHUB_OUTPUT - echo "No label for EXTRACT. Exiting with success." - else - echo "has_label=true" >> $GITHUB_OUTPUT - fi + pnpm exec turbo run test:e2e --only --filter=@browserbasehq/stagehand -- "${{ matrix.test.path }}" - - name: Set up Node.js - if: needs.determine-evals.outputs.run-extract == 'true' - uses: actions/setup-node@v4 + - uses: ./.github/actions/upload-ctrf-report + if: always() with: - node-version: "20" + name: ctrf/e2e-local/${{ matrix.test.name }}.json - - name: Install dependencies - if: needs.determine-evals.outputs.run-extract == 'true' - run: | - rm -rf node_modules - rm -f package-lock.json - npm install - - - name: Build Stagehand - if: needs.determine-evals.outputs.run-extract == 'true' - run: npm run build - - - name: Install Playwright browsers - if: needs.determine-evals.outputs.run-extract == 'true' - run: npm exec playwright install --with-deps - - # 1. Run extract category with domExtract - - name: Run Extract Evals (domExtract) - if: needs.determine-evals.outputs.run-extract == 'true' - run: npm run evals category extract -- --extract-method=domExtract - - - name: Save Extract Dom Results - if: needs.determine-evals.outputs.run-extract == 'true' - run: mv eval-summary.json eval-summary-extract-dom.json - - # 2. Then run extract category with textExtract - - name: Run Extract Evals (textExtract) - if: needs.determine-evals.outputs.run-extract == 'true' - run: npm run evals category extract -- --extract-method=textExtract - - - name: Save Extract Text Results - if: needs.determine-evals.outputs.run-extract == 'true' - run: mv eval-summary.json eval-summary-extract-text.json - - # 3. Log and Compare Extract Evals Performance - - name: Log and Compare Extract Evals Performance - if: needs.determine-evals.outputs.run-extract == 'true' - run: | - experimentNameDom=$(jq -r '.experimentName' eval-summary-extract-dom.json) - dom_score=$(jq '.categories.extract' eval-summary-extract-dom.json) - echo "DomExtract Extract category score: $dom_score%" - echo "View domExtract results: https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentNameDom}" - - experimentNameText=$(jq -r '.experimentName' eval-summary-extract-text.json) - text_score=$(jq '.categories.extract' eval-summary-extract-text.json) - echo "TextExtract Extract category score: $text_score%" - echo "View textExtract results: https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentNameText}" - - # If domExtract <80% fail CI - if (( $(echo "$dom_score < 80" | bc -l) )); then - echo "DomExtract extract category score is below 80%. Failing CI." - exit 1 - fi + - uses: ./.github/actions/upload-v8-coverage + if: always() + with: + name: coverage/e2e-local/${{ matrix.test.name }} - run-text-extract-evals: - needs: [run-extract-evals, determine-evals] + run-e2e-bb-tests: + name: e2e/bb/${{ matrix.test.name }} + needs: [run-build, discover-e2e-tests] runs-on: ubuntu-latest - timeout-minutes: 120 + timeout-minutes: 50 + if: > + needs.discover-e2e-tests.outputs.has-e2e-tests == 'true' && + github.event.pull_request.head.repo.full_name == github.repository env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} - BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }} + GOOGLE_GENERATIVE_AI_API_KEY: ${{ secrets.GOOGLE_GENERATIVE_AI_API_KEY }} BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }} BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }} HEADLESS: true - EVAL_ENV: browserbase + STAGEHAND_BROWSER_TARGET: browserbase + STAGEHAND_SERVER_TARGET: local + strategy: + fail-fast: false + max-parallel: 100 + matrix: + test: ${{ fromJson(needs.discover-e2e-tests.outputs.e2e-tests) }} steps: - name: Check out repository code uses: actions/checkout@v4 - - name: Check for 'text-extract' label - id: label-check - run: | - if [ "${{ needs.determine-evals.outputs.run-text-extract }}" != "true" ]; then - echo "has_label=false" >> $GITHUB_OUTPUT - echo "No label for TEXT-EXTRACT. Exiting with success." - else - echo "has_label=true" >> $GITHUB_OUTPUT - fi + - uses: ./.github/actions/setup-node-pnpm-turbo + with: + use-prebuilt-artifacts: "true" + restore-turbo-cache: "false" - - name: Set up Node.js - if: needs.determine-evals.outputs.run-text-extract == 'true' - uses: actions/setup-node@v4 + - name: Select Browserbase region + uses: ./.github/actions/select-browserbase-region with: - node-version: "20" + distribution: ${{ env.BROWSERBASE_REGION_DISTRIBUTION }} - - name: Install dependencies - if: needs.determine-evals.outputs.run-text-extract == 'true' + - name: Run E2E Tests (browserbase) - ${{ matrix.test.name }} run: | - rm -rf node_modules - rm -f package-lock.json - npm install + pnpm exec turbo run test:e2e --only --filter=@browserbasehq/stagehand -- "${{ matrix.test.path }}" - - name: Install Playwright browsers - if: needs.determine-evals.outputs.run-text-extract == 'true' - run: npm exec playwright install --with-deps - - - name: Build Stagehand - if: needs.determine-evals.outputs.run-text-extract == 'true' - run: npm run build - - - name: Run text_extract Evals (textExtract) - if: needs.determine-evals.outputs.run-text-extract == 'true' - run: npm run evals category text_extract -- --extract-method=textExtract - - - name: Save text_extract Results - if: needs.determine-evals.outputs.run-text-extract == 'true' - run: mv eval-summary.json eval-summary-text_extract-text.json - - - name: Log text_extract Evals Performance - if: needs.determine-evals.outputs.run-text-extract == 'true' - run: | - experimentNameText=$(jq -r '.experimentName' eval-summary-text_extract-text.json) - text_score=$(jq '.categories.text_extract' eval-summary-text_extract-text.json) - echo "TextExtract text_extract category score: $text_score%" - echo "View textExtract results: https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentNameText}" - - # If text_score <80% fail CI - if (( $(echo "$text_score < 80" | bc -l) )); then - echo "textExtract text_extract category score is below 80%. Failing CI." - exit 1 - fi + - uses: ./.github/actions/upload-ctrf-report + if: always() + with: + name: ctrf/e2e-bb/${{ matrix.test.name }}.json - run-observe-evals: - needs: [run-text-extract-evals, determine-evals] + - uses: ./.github/actions/upload-v8-coverage + if: always() + with: + name: coverage/e2e-bb/${{ matrix.test.name }} + + run-evals: + name: evals/${{ matrix.category }} + needs: [run-build, determine-evals, run-e2e-bb-tests] + if: >- + ${{ + always() && + needs.run-build.result == 'success' && + needs.determine-evals.result == 'success' && + needs.run-e2e-bb-tests.result != 'failure' && + needs.run-e2e-bb-tests.result != 'cancelled' && + needs.determine-evals.outputs.skip-all-evals != 'true' && + needs.determine-evals.outputs.eval-categories != '[]' + }} runs-on: ubuntu-latest - timeout-minutes: 60 + timeout-minutes: 90 + strategy: + fail-fast: false + matrix: + category: ${{ fromJson(needs.determine-evals.outputs.eval-categories) }} env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + GOOGLE_GENERATIVE_AI_API_KEY: ${{ secrets.GOOGLE_GENERATIVE_AI_API_KEY }} BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }} BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }} BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }} - HEADLESS: true - EVAL_ENV: browserbase + STAGEHAND_BROWSER_TARGET: browserbase + STAGEHAND_SERVER_TARGET: local steps: - name: Check out repository code uses: actions/checkout@v4 - - name: Check for 'observe' label - id: label-check - run: | - if [ "${{ needs.determine-evals.outputs.run-observe }}" != "true" ]; then - echo "has_label=false" >> $GITHUB_OUTPUT - echo "No label for OBSERVE. Exiting with success." - else - echo "has_label=true" >> $GITHUB_OUTPUT - fi + - uses: ./.github/actions/setup-node-pnpm-turbo + with: + use-prebuilt-artifacts: "true" + restore-turbo-cache: "false" - - name: Set up Node.js - if: needs.determine-evals.outputs.run-observe == 'true' - uses: actions/setup-node@v4 + - name: Select Browserbase region + uses: ./.github/actions/select-browserbase-region with: - node-version: "20" + distribution: ${{ env.BROWSERBASE_REGION_DISTRIBUTION }} - - name: Install dependencies - if: needs.determine-evals.outputs.run-observe == 'true' + - name: Run Evals - ${{ matrix.category }} + id: run-evals + env: + NODE_V8_COVERAGE: coverage/evals/${{ matrix.category }} run: | - rm -rf node_modules - rm -f package-lock.json - npm install - - - name: Install Playwright browsers - if: needs.determine-evals.outputs.run-observe == 'true' - run: npm exec playwright install --with-deps - - - name: Build Stagehand - if: needs.determine-evals.outputs.run-observe == 'true' - run: npm run build + log_file="$(mktemp)" + set +e + pnpm exec turbo run test:evals --only --filter=@browserbasehq/stagehand-evals -- "${{ matrix.category }}" -t "${EVAL_TRIAL_COUNT}" -c "${EVAL_MAX_CONCURRENCY}" 2>&1 | tee "$log_file" + eval_status=${PIPESTATUS[0]} + set -e + + summary_block="$( + awk ' + /^=========================SUMMARY=========================$/ { capture=1 } + capture { print } + /^Evaluation summary written to / { capture=0 } + ' "$log_file" + )" + + if [ -n "$summary_block" ]; then + { + echo "summary_text<> "$GITHUB_OUTPUT" + fi - - name: Run Observe Evals - if: needs.determine-evals.outputs.run-observe == 'true' - run: npm run evals category observe + exit "$eval_status" - - name: Log Observe Evals Performance - if: needs.determine-evals.outputs.run-observe == 'true' + - name: Log Evals Performance - ${{ matrix.category }} + env: + EVAL_STDOUT_SUMMARY: ${{ steps.run-evals.outputs.summary_text }} run: | + if [ -n "${EVAL_STDOUT_SUMMARY:-}" ]; then + echo "### Evals Summary (${{ matrix.category }})" >> "$GITHUB_STEP_SUMMARY" + echo '```' >> "$GITHUB_STEP_SUMMARY" + printf '%s\n' "$EVAL_STDOUT_SUMMARY" >> "$GITHUB_STEP_SUMMARY" + echo '```' >> "$GITHUB_STEP_SUMMARY" + fi experimentName=$(jq -r '.experimentName' eval-summary.json) echo "View results at https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentName}" if [ -f eval-summary.json ]; then - observe_score=$(jq '.categories.observe' eval-summary.json) - echo "Observe category score: $observe_score%" - if (( $(echo "$observe_score < 80" | bc -l) )); then - echo "Observe category score is below 80%. Failing CI." + category_score=$(jq ".categories[\"${{ matrix.category }}\"]" eval-summary.json) + echo "${{ matrix.category }} category score: $category_score%" + if (( $(echo "$category_score < 80" | bc -l) )); then + echo "${{ matrix.category }} category score is below 80%. Failing CI." exit 1 fi else - echo "Eval summary not found for observe category. Failing CI." + echo "Eval summary not found for ${{ matrix.category }} category. Failing CI." exit 1 fi - run-targeted-extract-evals: - needs: [run-observe-evals, determine-evals] + - uses: ./.github/actions/upload-ctrf-report + if: always() + with: + name: ctrf/evals/${{ matrix.category }}.json + + - uses: ./.github/actions/upload-v8-coverage + if: always() + with: + name: coverage/evals/${{ matrix.category }} + + merge-coverage: + name: Code Coverage Report runs-on: ubuntu-latest - timeout-minutes: 60 - env: - OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} - ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} - BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }} - BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }} - BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }} - HEADLESS: true - EVAL_ENV: browserbase + needs: + - core-unit-tests + - run-e2e-local-tests + - run-e2e-bb-tests + - run-evals + - server-integration-tests + # if: always() + if: false steps: - - name: Check out repository code - uses: actions/checkout@v4 - - - name: Check for 'targeted-extract' label - id: label-check - run: | - if [ "${{ needs.determine-evals.outputs.run-targeted-extract }}" != "true" ]; then - echo "has_label=false" >> $GITHUB_OUTPUT - echo "No label for TARGETED-EXTRACT. Exiting with success." - else - echo "has_label=true" >> $GITHUB_OUTPUT - fi + - uses: actions/checkout@v4 + with: + fetch-depth: 1 - - name: Set up Node.js - if: needs.determine-evals.outputs.run-targeted-extract == 'true' - uses: actions/setup-node@v4 + - uses: ./.github/actions/setup-node-pnpm-turbo with: - node-version: "20" + use-prebuilt-artifacts: "true" + restore-turbo-cache: "false" - - name: Install dependencies - if: needs.determine-evals.outputs.run-targeted-extract == 'true' - run: | - rm -rf node_modules - rm -f package-lock.json - npm install + - name: Download V8 coverage artifacts + uses: actions/download-artifact@v4 + continue-on-error: true + with: + pattern: coverage-* + path: . + merge-multiple: true - - name: Install Playwright browsers - if: needs.determine-evals.outputs.run-targeted-extract == 'true' - run: npm exec playwright install --with-deps + - name: Download CTRF artifacts + uses: actions/download-artifact@v4 + continue-on-error: true + with: + pattern: ctrf-* + path: . + merge-multiple: true - - name: Build Stagehand - if: needs.determine-evals.outputs.run-targeted-extract == 'true' - run: npm run build + - name: Generate merged coverage report + run: | + pnpm run coverage:merge - - name: Run targeted extract Evals - if: needs.determine-evals.outputs.run-targeted-extract == 'true' - run: npm run evals category targeted_extract -- --extract-method=textExtract + - name: Upload merged coverage report + if: always() + id: upload-coverage-artifact + uses: actions/upload-artifact@v4 + with: + name: coverage-merged + # package.json is included to anchor artifact paths at repo root. + path: | + package.json + coverage/merged + + - name: Add coverage summary to job summary + if: always() + shell: bash + run: | + echo "### Code Coverage" >> "$GITHUB_STEP_SUMMARY" + echo "" >> "$GITHUB_STEP_SUMMARY" + if [ -f coverage/merged/coverage-summary.txt ]; then + echo '```' >> "$GITHUB_STEP_SUMMARY" + cat coverage/merged/coverage-summary.txt >> "$GITHUB_STEP_SUMMARY" + echo '```' >> "$GITHUB_STEP_SUMMARY" + else + echo "Coverage summary not available." >> "$GITHUB_STEP_SUMMARY" + fi + if [ -n "${{ steps.upload-coverage-artifact.outputs.artifact-url }}" ]; then + echo "" >> "$GITHUB_STEP_SUMMARY" + echo "[Download full HTML coverage report](${{ steps.upload-coverage-artifact.outputs.artifact-url }})" >> "$GITHUB_STEP_SUMMARY" + fi - - name: Log targeted extract Evals Performance - if: needs.determine-evals.outputs.run-targeted-extract == 'true' + - name: Publish merged CTRF report + if: always() + uses: ctrf-io/github-test-reporter@v1 + with: + report-path: './ctrf/**/*.json' + summary: true + summary-report: false + summary-delta-report: true + test-report: false + failed-report: false + insights-report: true + flaky-rate-report: true + fail-rate-report: true + slowest-report: true + previous-results-report: true + fetch-previous-results: true + baseline: 1 + previous-results-max: 1 + max-workflow-runs-to-check: 5 + max-previous-runs-to-fetch: 1 + upload-artifact: true + artifact-name: ctrf-report-merged + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + + - name: Compute coverage status metrics + if: always() + id: coverage-status + shell: bash run: | - experimentName=$(jq -r '.experimentName' eval-summary.json) - echo "View results at https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentName}" - if [ -f eval-summary.json ]; then - targeted_extract_score=$(jq '.categories.targeted_extract' eval-summary.json) - echo "Targeted extract category score: $targeted_extract_score%" - if (( $(echo "$targeted_extract_score < 80" | bc -l) )); then - echo "Targeted extract score is below 80%. Failing CI." - exit 1 - fi + set -euo pipefail + shopt -s globstar nullglob + tests_failed=0 + ctrf_files=(ctrf/**/*.json) + if [ "${#ctrf_files[@]}" -gt 0 ]; then + tests_failed=$(jq -s '[.[].results.summary.failed // 0] | add' "${ctrf_files[@]}") + fi + total_coverage=0 + if [ -f coverage/merged/coverage-summary.txt ]; then + total_coverage=$(awk '/^Lines/ {gsub(/%/,"",$3); print $3}' coverage/merged/coverage-summary.txt) + fi + echo "tests_failed=${tests_failed}" >> "$GITHUB_OUTPUT" + echo "total_coverage=${total_coverage}" >> "$GITHUB_OUTPUT" + + - name: Set coverage status + if: always() + continue-on-error: true + shell: bash + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + RUN_ID: ${{ github.run_id }} + PULL_NUMBER: ${{ github.event.pull_request.number }} + TESTS_FAILED: ${{ steps.coverage-status.outputs.tests_failed }} + TOTAL_COVERAGE: ${{ steps.coverage-status.outputs.total_coverage }} + run: | + set -euo pipefail + repo="${GITHUB_REPOSITORY}" + sha="${GITHUB_SHA}" + tests_failed="${TESTS_FAILED:-0}" + total_coverage="${TOTAL_COVERAGE:-0}" + state="success" + if [ -n "${PULL_NUMBER:-}" ]; then + target_url="https://github.com/${repo}/pull/${PULL_NUMBER}/checks?check_run_id=${RUN_ID}" else - echo "Eval summary not found for targeted_extract category. Failing CI." - exit 1 + target_url="https://github.com/${repo}/actions/runs/${RUN_ID}" fi + description="non-blocking report: ${tests_failed} tests failed. ${total_coverage}% coverage" + payload=$(jq -n \ + --arg state "$state" \ + --arg target_url "$target_url" \ + --arg description "$description" \ + --arg context "Measured coverage" \ + '{state: $state, target_url: $target_url, description: $description, context: $context}') + curl -sSfL -X POST \ + -H "Authorization: Bearer ${GITHUB_TOKEN}" \ + -H "Accept: application/vnd.github+json" \ + -H "X-GitHub-Api-Version: 2022-11-28" \ + "https://api.github.com/repos/${repo}/statuses/${sha}" \ + -d "$payload" diff --git a/.github/workflows/claude.yml b/.github/workflows/claude.yml new file mode 100644 index 000000000..d51e7c4c5 --- /dev/null +++ b/.github/workflows/claude.yml @@ -0,0 +1,50 @@ +name: Claude Code + +on: + issue_comment: + types: [created] + pull_request_review_comment: + types: [created] + issues: + types: [opened, assigned] + pull_request_review: + types: [submitted] + +jobs: + claude: + if: | + (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) || + (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) || + (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) || + (github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude'))) + runs-on: ubuntu-latest + permissions: + contents: write + pull-requests: write + issues: write + id-token: write + actions: read # Required for Claude to read CI results on PRs + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 1 + + - name: Run Claude Code + id: claude + uses: anthropics/claude-code-action@v1 + with: + anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} + + # This is an optional setting that allows Claude to read CI results on PRs + additional_permissions: | + actions: read + + # Optional: Give a custom prompt to Claude. If this is not specified, Claude will perform the instructions specified in the comment that tagged it. + # prompt: 'Update the pull request description to include a summary of changes.' + + # Optional: Add claude_args to customize behavior and configuration + # See https://github.com/anthropics/claude-code-action/blob/main/docs/usage.md + # or https://code.claude.com/docs/en/cli-reference for available options + # claude_args: '--allowed-tools Bash(gh pr:*)' + diff --git a/.github/workflows/feature-parity.yml b/.github/workflows/feature-parity.yml new file mode 100644 index 000000000..9a40db33e --- /dev/null +++ b/.github/workflows/feature-parity.yml @@ -0,0 +1,147 @@ +name: Feature Parity + +on: + pull_request: + types: + - opened + - synchronize + - labeled + - unlabeled + paths-ignore: + - "packages/docs/**" + +jobs: + check-parity-label: + runs-on: ubuntu-latest + if: github.event.action == 'labeled' && github.event.label.name == 'parity' + permissions: + contents: read + pull-requests: write + issues: write + steps: + - name: Check out repository code + uses: actions/checkout@v4 + + - name: Check user permissions + uses: actions/github-script@v7 + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + script: | + const { data: permission } = await github.rest.repos.getCollaboratorPermissionLevel({ + owner: context.repo.owner, + repo: context.repo.repo, + username: context.actor + }); + + const hasWriteAccess = ['admin', 'write'].includes(permission.permission); + + if (!hasWriteAccess) { + // Remove the parity label if user doesn't have write access + await github.rest.issues.removeLabel({ + owner: context.repo.owner, + repo: context.repo.repo, + issue_number: context.issue.number, + name: 'parity' + }); + + // Add a comment explaining why the label was removed + await github.rest.issues.createComment({ + owner: context.repo.owner, + repo: context.repo.repo, + issue_number: context.issue.number, + body: `❌ **Parity Label Removed**\n\n@${context.actor}, you do not have sufficient permissions to add the 'parity' label. Only users with write access can trigger feature parity issues.\n\nIf you believe this feature should be implemented in the Python SDK, please ask a maintainer to add the label.` + }); + + throw new Error(`User ${context.actor} does not have write access to add parity label`); + } + + console.log(`User ${context.actor} has ${permission.permission} access - proceeding with parity workflow`); + + - name: Generate GitHub App token + id: generate-token + uses: actions/create-github-app-token@v1 + with: + app-id: ${{ secrets.PARITY_APP_ID }} + private-key: ${{ secrets.PARITY_APP_PRIVATE_KEY }} + owner: browserbase + repositories: stagehand + + - name: Create issue in Python SDK repository + uses: actions/github-script@v7 + with: + github-token: ${{ steps.generate-token.outputs.token }} + script: | + const { data: pullRequest } = await github.rest.pulls.get({ + owner: context.repo.owner, + repo: context.repo.repo, + pull_number: context.issue.number, + }); + + // Get PR comments for additional context + const { data: comments } = await github.rest.issues.listComments({ + owner: context.repo.owner, + repo: context.repo.repo, + issue_number: context.issue.number, + }); + + // Format comments for the issue description + let commentsSection = ''; + if (comments.length > 0) { + commentsSection = '\n\n## Recent Comments\n\n'; + comments.slice(-3).forEach(comment => { + commentsSection += `**@${comment.user.login}** commented:\n`; + commentsSection += `${comment.body.substring(0, 500)}${comment.body.length > 500 ? '...' : ''}\n\n`; + }); + } + + // Get list of changed files for context + const { data: files } = await github.rest.pulls.listFiles({ + owner: context.repo.owner, + repo: context.repo.repo, + pull_number: context.issue.number, + }); + + const changedFiles = files.map(file => `- \`${file.filename}\``).join('\n'); + + const issueTitle = `[Feature Parity] ${pullRequest.title}`; + const issueBody = `## Feature Parity Request + + This issue was automatically created from a pull request in the TypeScript Stagehand repository that was labeled with 'parity'. + + ### Original PR Details + - **PR**: #${context.issue.number} - ${pullRequest.title} + - **Author**: @${pullRequest.user.login} + - **Link**: ${pullRequest.html_url} + + ### Description + ${pullRequest.body || 'No description provided.'} + + ### Changed Files + ${changedFiles} + + ${commentsSection} + + ### Action Required + Please review the changes in the original PR and implement equivalent functionality in the Python SDK if applicable. + + --- + *This issue was automatically generated by the Feature Parity workflow.*`; + + // Create the issue in the Python repository + const { data: issue } = await github.rest.issues.create({ + owner: 'browserbase', + repo: 'stagehand-python', + title: issueTitle, + body: issueBody, + labels: ['parity'] + }); + + console.log(`Created issue: ${issue.html_url}`); + + // Add a comment to the original PR confirming the issue was created + await github.rest.issues.createComment({ + owner: context.repo.owner, + repo: context.repo.repo, + issue_number: context.issue.number, + body: `🔄 **Feature Parity Issue Created**\n\nAn issue has been automatically created in the Python SDK repository to track parity implementation:\n${issue.html_url}` + }); diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 00975f83a..f3347e72c 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -8,6 +8,7 @@ on: permissions: contents: write pull-requests: write + id-token: write concurrency: ${{ github.workflow }}-${{ github.ref }} @@ -17,38 +18,38 @@ jobs: runs-on: ubuntu-latest steps: - name: Checkout Repo - uses: actions/checkout@v3 + uses: actions/checkout@v6 + with: + fetch-depth: 0 + + - uses: ./.github/actions/setup-node-pnpm-turbo + with: + use-prebuilt-artifacts: "false" - - name: Setup Node.js 20.x - uses: actions/setup-node@v3 + - name: Configure npm registry for Trusted Publishing + uses: actions/setup-node@v6 with: node-version: 20.x registry-url: "https://registry.npmjs.org" - - name: Install dependencies - run: | - rm -rf node_modules - rm -f package-lock.json - npm install + - name: Update npm for Trusted Publishing + run: npm install -g npm@latest - - name: Build - run: npm run build + - name: Run Lint & Build + run: pnpm exec turbo run lint && pnpm exec turbo run build - name: Create Release Pull Request or Publish to npm id: changesets uses: changesets/action@v1 with: - publish: npm run release + publish: pnpm run release env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }} - name: Publish Canary if: github.ref == 'refs/heads/main' run: | - npm config set //registry.npmjs.org/:_authToken=${NODE_AUTH_TOKEN} git checkout main - npm run release-canary + pnpm run release-canary env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }} diff --git a/.github/workflows/stagehand-server-v3-release.yml b/.github/workflows/stagehand-server-v3-release.yml new file mode 100644 index 000000000..6f13b774d --- /dev/null +++ b/.github/workflows/stagehand-server-v3-release.yml @@ -0,0 +1,186 @@ +name: Release stagehand/server-v3 + +on: + push: + branches: + - main + paths: + - .changeset/** + workflow_dispatch: + +permissions: + contents: write + +concurrency: ${{ github.workflow }}-${{ github.ref }} + +env: + OAS_PATH: packages/server-v3/openapi.v3.yaml + +jobs: + detect: + name: Detect server-v3 release (changesets) + runs-on: ubuntu-latest + outputs: + release: ${{ steps.meta.outputs.release }} + version: ${{ steps.meta.outputs.version }} + tag: ${{ steps.meta.outputs.tag }} + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 1 + fetch-tags: true + + - uses: ./.github/actions/setup-node-pnpm-turbo + env: + PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1" + with: + use-prebuilt-artifacts: "false" + + - name: Determine release metadata + id: meta + shell: bash + run: | + set -euo pipefail + + latest_tag="$(git tag -l 'stagehand-server-v3/v*' --sort=-v:refname | head -n 1 || true)" + rm -f changeset-status.json + if [ -n "${latest_tag}" ]; then + pnpm changeset status --since "${latest_tag}" --output changeset-status.json + else + pnpm changeset status --output changeset-status.json + fi + + node <<'NODE' + const fs = require('fs'); + + const status = JSON.parse(fs.readFileSync('changeset-status.json', 'utf8')); + const changesets = Array.isArray(status.changesets) ? status.changesets : []; + const releases = Array.isArray(status.releases) ? status.releases : []; + + const shouldRelease = changesets.some((cs) => + (cs.releases || []).some((r) => r?.name === '@browserbasehq/stagehand-server-v3') + ); + + const serverRelease = releases.find((r) => r?.name === '@browserbasehq/stagehand-server-v3'); + if (shouldRelease && !serverRelease?.newVersion) { + throw new Error( + 'Expected @browserbasehq/stagehand-server-v3 to have a computed newVersion in changeset-status.json.' + ); + } + + const release = shouldRelease ? 'true' : 'false'; + const version = shouldRelease ? serverRelease.newVersion : ''; + const tag = `stagehand-server-v3/v${version}`; + + const out = process.env.GITHUB_OUTPUT; + fs.appendFileSync(out, `release=${release}\n`); + fs.appendFileSync(out, `version=${version}\n`); + fs.appendFileSync(out, `tag=${tag}\n`); + NODE + + - name: Create stagehand/server-v3 tag + if: steps.meta.outputs.release == 'true' + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + shell: bash + run: | + set -euo pipefail + + TAG="${{ steps.meta.outputs.tag }}" + VERSION="${{ steps.meta.outputs.version }}" + TARGET_SHA="${{ github.sha }}" + + git config user.name "github-actions[bot]" + git config user.email "41898282+github-actions[bot]@users.noreply.github.com" + + # Try to fetch the tag if it exists on remote; ignore failure for new tags + git fetch --force origin "refs/tags/${TAG}:refs/tags/${TAG}" 2>/dev/null || true + if git rev-parse -q --verify "refs/tags/${TAG}" >/dev/null; then + echo "Tag already exists: ${TAG}" + exit 0 + fi + + git tag -a "${TAG}" "${TARGET_SHA}" -m "stagehand/server-v3 v${VERSION}" + git push origin "${TAG}" + + build_binaries: + name: Build SEA binaries + needs: detect + if: needs.detect.outputs.release == 'true' + uses: ./.github/workflows/stagehand-server-v3-sea-build.yml + with: + matrix: | + [ + {"os":"ubuntu-latest","platform":"linux","arch":"x64","binary_name":"stagehand-server-v3-linux-x64","include_sourcemaps":false}, + {"os":"ubuntu-24.04-arm","platform":"linux","arch":"arm64","binary_name":"stagehand-server-v3-linux-arm64","include_sourcemaps":false}, + {"os":"macos-15","platform":"darwin","arch":"arm64","binary_name":"stagehand-server-v3-darwin-arm64","include_sourcemaps":false}, + {"os":"macos-15-intel","platform":"darwin","arch":"x64","binary_name":"stagehand-server-v3-darwin-x64","include_sourcemaps":false}, + {"os":"windows-latest","platform":"win32","arch":"x64","binary_name":"stagehand-server-v3-win32-x64.exe","include_sourcemaps":false}, + {"os":"windows-11-arm","platform":"win32","arch":"arm64","binary_name":"stagehand-server-v3-win32-arm64.exe","include_sourcemaps":false} + ] + + release: + name: Publish GitHub Release + needs: [detect, build_binaries] + if: needs.detect.outputs.release == 'true' + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 1 + fetch-tags: false + + - name: Prepare release assets directory + run: mkdir -p release-assets + + - name: Prepare stagehand/server-v3 release assets + run: | + set -euo pipefail + cp "${{ env.OAS_PATH }}" "release-assets/openapi.v3.stagehand-server-v3-${{ needs.detect.outputs.version }}.yaml" + + - name: Download SEA binary artifacts + uses: actions/download-artifact@v4 + with: + pattern: stagehand-server-v3-* + path: . + merge-multiple: true + + - name: Collect SEA binaries + shell: bash + run: | + set -euo pipefail + shopt -s nullglob + for f in packages/server-v3/dist/sea/stagehand-server-v3-*; do + cp "$f" release-assets/ + done + + - name: Create checksums + shell: bash + run: | + set -euo pipefail + cd release-assets + # Only checksum binaries (exclude openapi yaml). Avoid failing if no matches. + shopt -s nullglob + files=(stagehand-server-v3-*) + bins=() + for f in "${files[@]}"; do + [[ "$f" == *openapi* ]] && continue + [[ -f "$f" ]] && bins+=("$f") + done + : > checksums.sha256 + if [ "${#bins[@]}" -gt 0 ]; then + shasum -a 256 "${bins[@]}" > checksums.sha256 + fi + + - name: Publish stagehand/server-v3 GitHub release + uses: softprops/action-gh-release@v2 + with: + tag_name: ${{ needs.detect.outputs.tag }} + name: stagehand/server-v3 v${{ needs.detect.outputs.version }} + generate_release_notes: true + files: | + release-assets/openapi.v3.stagehand-server-v3-${{ needs.detect.outputs.version }}.yaml + release-assets/stagehand-server-v3-* + release-assets/checksums.sha256 diff --git a/.github/workflows/stagehand-server-v3-sea-build.yml b/.github/workflows/stagehand-server-v3-sea-build.yml new file mode 100644 index 000000000..39ad9fe1a --- /dev/null +++ b/.github/workflows/stagehand-server-v3-sea-build.yml @@ -0,0 +1,180 @@ +name: Stagehand Server v3 SEA Build + +on: + workflow_call: + inputs: + matrix: + description: "JSON matrix include list for SEA binaries." + required: false + type: string + default: | + [ + {"os":"ubuntu-latest","platform":"linux","arch":"x64","binary_name":"stagehand-server-v3-linux-x64","include_sourcemaps":false}, + {"os":"ubuntu-24.04-arm","platform":"linux","arch":"arm64","binary_name":"stagehand-server-v3-linux-arm64","include_sourcemaps":false}, + {"os":"macos-15","platform":"darwin","arch":"arm64","binary_name":"stagehand-server-v3-darwin-arm64","include_sourcemaps":false}, + {"os":"macos-15-intel","platform":"darwin","arch":"x64","binary_name":"stagehand-server-v3-darwin-x64","include_sourcemaps":false}, + {"os":"windows-latest","platform":"win32","arch":"x64","binary_name":"stagehand-server-v3-win32-x64.exe","include_sourcemaps":false}, + {"os":"windows-11-arm","platform":"win32","arch":"arm64","binary_name":"stagehand-server-v3-win32-arm64.exe","include_sourcemaps":false} + ] + use-prebuilt-artifacts: + description: "Whether to download pre-built package artifacts." + required: false + type: string + default: "false" + restore-turbo-cache: + description: "Whether to restore local .turbo cache." + required: false + type: string + default: "true" + node-version: + description: "Node.js version for setup." + required: false + type: string + default: "20.x" + upload-only-binary: + description: "Upload only this binary (empty => upload all)." + required: false + type: string + default: "" + workflow_dispatch: + inputs: + matrix: + description: "JSON matrix include list for SEA binaries." + required: false + default: | + [ + {"os":"ubuntu-latest","platform":"linux","arch":"x64","binary_name":"stagehand-server-v3-linux-x64","include_sourcemaps":false}, + {"os":"ubuntu-24.04-arm","platform":"linux","arch":"arm64","binary_name":"stagehand-server-v3-linux-arm64","include_sourcemaps":false}, + {"os":"macos-15","platform":"darwin","arch":"arm64","binary_name":"stagehand-server-v3-darwin-arm64","include_sourcemaps":false}, + {"os":"macos-15-intel","platform":"darwin","arch":"x64","binary_name":"stagehand-server-v3-darwin-x64","include_sourcemaps":false}, + {"os":"windows-latest","platform":"win32","arch":"x64","binary_name":"stagehand-server-v3-win32-x64.exe","include_sourcemaps":false}, + {"os":"windows-11-arm","platform":"win32","arch":"arm64","binary_name":"stagehand-server-v3-win32-arm64.exe","include_sourcemaps":false} + ] + use-prebuilt-artifacts: + description: "Whether to download pre-built package artifacts." + required: false + type: string + default: "false" + restore-turbo-cache: + description: "Whether to restore local .turbo cache." + required: false + type: string + default: "true" + node-version: + description: "Node.js version for setup." + required: false + type: string + default: "20.x" + upload-only-binary: + description: "Upload only this binary (empty => upload all)." + required: false + type: string + default: "" + +jobs: + build_binaries: + name: Build SEA binaries (${{ matrix.binary_name }}) + runs-on: ${{ matrix.os }} + strategy: + fail-fast: false + matrix: + include: ${{ fromJson(inputs.matrix) }} + + steps: + - name: Checkout repository + uses: actions/checkout@v6 + with: + fetch-depth: 1 + fetch-tags: false + + - uses: ./.github/actions/setup-node-pnpm-turbo + env: + PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1" + PLAYWRIGHT_SKIP_DOWNLOAD: "1" + PUPPETEER_SKIP_DOWNLOAD: "1" + with: + use-prebuilt-artifacts: ${{ inputs.use-prebuilt-artifacts }} + restore-turbo-cache: ${{ inputs.restore-turbo-cache }} + node-version: ${{ inputs.node-version }} + + - name: Build SEA binary (ESM) + env: + SEA_TARGET_PLATFORM: ${{ matrix.platform }} + SEA_TARGET_ARCH: ${{ matrix.arch }} + SEA_BINARY_NAME: ${{ matrix.binary_name }} + SEA_INCLUDE_SOURCEMAPS: ${{ matrix.include_sourcemaps && '1' || '0' }} + run: pnpm exec turbo run build:sea:esm --filter=@browserbasehq/stagehand-server-v3 + + - name: Verify SEA binary exists + shell: bash + run: | + test -f "packages/server-v3/dist/sea/${{ matrix.binary_name }}" + + - name: Verify SEA binary launches cleanly + shell: bash + env: + RUNNER_ARCH: ${{ runner.arch }} + run: | + set -euo pipefail + + binary="packages/server-v3/dist/sea/${{ matrix.binary_name }}" + matrix_arch="${{ matrix.arch }}" + runner_arch="$(echo "${RUNNER_ARCH}" | tr '[:upper:]' '[:lower:]')" + + if [[ "${matrix_arch}" != "${runner_arch}" ]]; then + echo "Runner arch (${runner_arch}) does not match matrix arch (${matrix_arch})." + echo "Launch verification must run on same-arch runners." + exit 1 + fi + + if [[ "${{ matrix.platform }}" != "win32" ]]; then + chmod +x "${binary}" + fi + + port="$((30000 + RANDOM % 10000))" + log_file="$(mktemp)" + launched="false" + + cleanup() { + if [[ -n "${pid:-}" ]] && kill -0 "${pid}" 2>/dev/null; then + kill "${pid}" 2>/dev/null || true + wait "${pid}" 2>/dev/null || true + fi + } + trap cleanup EXIT + + PORT="${port}" "${binary}" >"${log_file}" 2>&1 & + pid=$! + + for _ in {1..30}; do + if ! kill -0 "${pid}" 2>/dev/null; then + wait "${pid}" 2>/dev/null || true + echo "SEA binary exited before becoming healthy." + cat "${log_file}" + exit 1 + fi + + if curl --silent --show-error --fail "http://127.0.0.1:${port}/healthz" >/dev/null; then + launched="true" + break + fi + + sleep 1 + done + + if [[ "${launched}" != "true" ]]; then + echo "SEA binary did not become healthy within 30 seconds." + cat "${log_file}" + exit 1 + fi + + - name: Upload artifact + uses: actions/upload-artifact@v4 + if: ${{ inputs.upload-only-binary == '' || matrix.binary_name == inputs.upload-only-binary }} + with: + name: ${{ matrix.binary_name }} + # package.json is included to anchor artifact paths at repo root. + path: | + package.json + packages/server-v3/dist/sea/${{ matrix.binary_name }} + retention-days: 7 diff --git a/.github/workflows/stagehand-server-v4-release.yml b/.github/workflows/stagehand-server-v4-release.yml new file mode 100644 index 000000000..d2a978662 --- /dev/null +++ b/.github/workflows/stagehand-server-v4-release.yml @@ -0,0 +1,186 @@ +name: Release stagehand/server-v4 + +on: + push: + branches: + - main + paths: + - .changeset/** + workflow_dispatch: + +permissions: + contents: write + +concurrency: ${{ github.workflow }}-${{ github.ref }} + +env: + OAS_PATH: packages/server-v4/openapi.v4.yaml + +jobs: + detect: + name: Detect server-v4 release (changesets) + runs-on: ubuntu-latest + outputs: + release: ${{ steps.meta.outputs.release }} + version: ${{ steps.meta.outputs.version }} + tag: ${{ steps.meta.outputs.tag }} + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 1 + fetch-tags: true + + - uses: ./.github/actions/setup-node-pnpm-turbo + env: + PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1" + with: + use-prebuilt-artifacts: "false" + + - name: Determine release metadata + id: meta + shell: bash + run: | + set -euo pipefail + + latest_tag="$(git tag -l 'stagehand-server-v4/v*' --sort=-v:refname | head -n 1 || true)" + rm -f changeset-status.json + if [ -n "${latest_tag}" ]; then + pnpm changeset status --since "${latest_tag}" --output changeset-status.json + else + pnpm changeset status --output changeset-status.json + fi + + node <<'NODE' + const fs = require('fs'); + + const status = JSON.parse(fs.readFileSync('changeset-status.json', 'utf8')); + const changesets = Array.isArray(status.changesets) ? status.changesets : []; + const releases = Array.isArray(status.releases) ? status.releases : []; + + const shouldRelease = changesets.some((cs) => + (cs.releases || []).some((r) => r?.name === '@browserbasehq/stagehand-server-v4') + ); + + const serverRelease = releases.find((r) => r?.name === '@browserbasehq/stagehand-server-v4'); + if (shouldRelease && !serverRelease?.newVersion) { + throw new Error( + 'Expected @browserbasehq/stagehand-server-v4 to have a computed newVersion in changeset-status.json.' + ); + } + + const release = shouldRelease ? 'true' : 'false'; + const version = shouldRelease ? serverRelease.newVersion : ''; + const tag = `stagehand-server-v4/v${version}`; + + const out = process.env.GITHUB_OUTPUT; + fs.appendFileSync(out, `release=${release}\n`); + fs.appendFileSync(out, `version=${version}\n`); + fs.appendFileSync(out, `tag=${tag}\n`); + NODE + + - name: Create stagehand/server-v4 tag + if: steps.meta.outputs.release == 'true' + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + shell: bash + run: | + set -euo pipefail + + TAG="${{ steps.meta.outputs.tag }}" + VERSION="${{ steps.meta.outputs.version }}" + TARGET_SHA="${{ github.sha }}" + + git config user.name "github-actions[bot]" + git config user.email "41898282+github-actions[bot]@users.noreply.github.com" + + # Try to fetch the tag if it exists on remote; ignore failure for new tags + git fetch --force origin "refs/tags/${TAG}:refs/tags/${TAG}" 2>/dev/null || true + if git rev-parse -q --verify "refs/tags/${TAG}" >/dev/null; then + echo "Tag already exists: ${TAG}" + exit 0 + fi + + git tag -a "${TAG}" "${TARGET_SHA}" -m "stagehand/server-v4 v${VERSION}" + git push origin "${TAG}" + + build_binaries: + name: Build SEA binaries + needs: detect + if: needs.detect.outputs.release == 'true' + uses: ./.github/workflows/stagehand-server-v4-sea-build.yml + with: + matrix: | + [ + {"os":"ubuntu-latest","platform":"linux","arch":"x64","binary_name":"stagehand-server-v4-linux-x64","include_sourcemaps":false}, + {"os":"ubuntu-24.04-arm","platform":"linux","arch":"arm64","binary_name":"stagehand-server-v4-linux-arm64","include_sourcemaps":false}, + {"os":"macos-15","platform":"darwin","arch":"arm64","binary_name":"stagehand-server-v4-darwin-arm64","include_sourcemaps":false}, + {"os":"macos-15-intel","platform":"darwin","arch":"x64","binary_name":"stagehand-server-v4-darwin-x64","include_sourcemaps":false}, + {"os":"windows-latest","platform":"win32","arch":"x64","binary_name":"stagehand-server-v4-win32-x64.exe","include_sourcemaps":false}, + {"os":"windows-11-arm","platform":"win32","arch":"arm64","binary_name":"stagehand-server-v4-win32-arm64.exe","include_sourcemaps":false} + ] + + release: + name: Publish GitHub Release + needs: [detect, build_binaries] + if: needs.detect.outputs.release == 'true' + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 1 + fetch-tags: false + + - name: Prepare release assets directory + run: mkdir -p release-assets + + - name: Prepare stagehand/server-v4 release assets + run: | + set -euo pipefail + cp "${{ env.OAS_PATH }}" "release-assets/openapi.v4.stagehand-server-v4-${{ needs.detect.outputs.version }}.yaml" + + - name: Download SEA binary artifacts + uses: actions/download-artifact@v4 + with: + pattern: stagehand-server-v4-* + path: . + merge-multiple: true + + - name: Collect SEA binaries + shell: bash + run: | + set -euo pipefail + shopt -s nullglob + for f in packages/server-v4/dist/sea/stagehand-server-v4-*; do + cp "$f" release-assets/ + done + + - name: Create checksums + shell: bash + run: | + set -euo pipefail + cd release-assets + # Only checksum binaries (exclude openapi yaml). Avoid failing if no matches. + shopt -s nullglob + files=(stagehand-server-v4-*) + bins=() + for f in "${files[@]}"; do + [[ "$f" == *openapi* ]] && continue + [[ -f "$f" ]] && bins+=("$f") + done + : > checksums.sha256 + if [ "${#bins[@]}" -gt 0 ]; then + shasum -a 256 "${bins[@]}" > checksums.sha256 + fi + + - name: Publish stagehand/server-v4 GitHub release + uses: softprops/action-gh-release@v2 + with: + tag_name: ${{ needs.detect.outputs.tag }} + name: stagehand/server-v4 v${{ needs.detect.outputs.version }} + generate_release_notes: true + files: | + release-assets/openapi.v4.stagehand-server-v4-${{ needs.detect.outputs.version }}.yaml + release-assets/stagehand-server-v4-* + release-assets/checksums.sha256 diff --git a/.github/workflows/stagehand-server-v4-sea-build.yml b/.github/workflows/stagehand-server-v4-sea-build.yml new file mode 100644 index 000000000..3ad278dc5 --- /dev/null +++ b/.github/workflows/stagehand-server-v4-sea-build.yml @@ -0,0 +1,180 @@ +name: Stagehand Server v4 SEA Build + +on: + workflow_call: + inputs: + matrix: + description: "JSON matrix include list for SEA binaries." + required: false + type: string + default: | + [ + {"os":"ubuntu-latest","platform":"linux","arch":"x64","binary_name":"stagehand-server-v4-linux-x64","include_sourcemaps":false}, + {"os":"ubuntu-24.04-arm","platform":"linux","arch":"arm64","binary_name":"stagehand-server-v4-linux-arm64","include_sourcemaps":false}, + {"os":"macos-15","platform":"darwin","arch":"arm64","binary_name":"stagehand-server-v4-darwin-arm64","include_sourcemaps":false}, + {"os":"macos-15-intel","platform":"darwin","arch":"x64","binary_name":"stagehand-server-v4-darwin-x64","include_sourcemaps":false}, + {"os":"windows-latest","platform":"win32","arch":"x64","binary_name":"stagehand-server-v4-win32-x64.exe","include_sourcemaps":false}, + {"os":"windows-11-arm","platform":"win32","arch":"arm64","binary_name":"stagehand-server-v4-win32-arm64.exe","include_sourcemaps":false} + ] + use-prebuilt-artifacts: + description: "Whether to download pre-built package artifacts." + required: false + type: string + default: "false" + restore-turbo-cache: + description: "Whether to restore local .turbo cache." + required: false + type: string + default: "true" + node-version: + description: "Node.js version for setup." + required: false + type: string + default: "20.x" + upload-only-binary: + description: "Upload only this binary (empty => upload all)." + required: false + type: string + default: "" + workflow_dispatch: + inputs: + matrix: + description: "JSON matrix include list for SEA binaries." + required: false + default: | + [ + {"os":"ubuntu-latest","platform":"linux","arch":"x64","binary_name":"stagehand-server-v4-linux-x64","include_sourcemaps":false}, + {"os":"ubuntu-24.04-arm","platform":"linux","arch":"arm64","binary_name":"stagehand-server-v4-linux-arm64","include_sourcemaps":false}, + {"os":"macos-15","platform":"darwin","arch":"arm64","binary_name":"stagehand-server-v4-darwin-arm64","include_sourcemaps":false}, + {"os":"macos-15-intel","platform":"darwin","arch":"x64","binary_name":"stagehand-server-v4-darwin-x64","include_sourcemaps":false}, + {"os":"windows-latest","platform":"win32","arch":"x64","binary_name":"stagehand-server-v4-win32-x64.exe","include_sourcemaps":false}, + {"os":"windows-11-arm","platform":"win32","arch":"arm64","binary_name":"stagehand-server-v4-win32-arm64.exe","include_sourcemaps":false} + ] + use-prebuilt-artifacts: + description: "Whether to download pre-built package artifacts." + required: false + type: string + default: "false" + restore-turbo-cache: + description: "Whether to restore local .turbo cache." + required: false + type: string + default: "true" + node-version: + description: "Node.js version for setup." + required: false + type: string + default: "20.x" + upload-only-binary: + description: "Upload only this binary (empty => upload all)." + required: false + type: string + default: "" + +jobs: + build_binaries: + name: Build SEA binaries (${{ matrix.binary_name }}) + runs-on: ${{ matrix.os }} + strategy: + fail-fast: false + matrix: + include: ${{ fromJson(inputs.matrix) }} + + steps: + - name: Checkout repository + uses: actions/checkout@v6 + with: + fetch-depth: 1 + fetch-tags: false + + - uses: ./.github/actions/setup-node-pnpm-turbo + env: + PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1" + PLAYWRIGHT_SKIP_DOWNLOAD: "1" + PUPPETEER_SKIP_DOWNLOAD: "1" + with: + use-prebuilt-artifacts: ${{ inputs.use-prebuilt-artifacts }} + restore-turbo-cache: ${{ inputs.restore-turbo-cache }} + node-version: ${{ inputs.node-version }} + + - name: Build SEA binary (ESM) + env: + SEA_TARGET_PLATFORM: ${{ matrix.platform }} + SEA_TARGET_ARCH: ${{ matrix.arch }} + SEA_BINARY_NAME: ${{ matrix.binary_name }} + SEA_INCLUDE_SOURCEMAPS: ${{ matrix.include_sourcemaps && '1' || '0' }} + run: pnpm exec turbo run build:sea:esm --filter=@browserbasehq/stagehand-server-v4 + + - name: Verify SEA binary exists + shell: bash + run: | + test -f "packages/server-v4/dist/sea/${{ matrix.binary_name }}" + + - name: Verify SEA binary launches cleanly + shell: bash + env: + RUNNER_ARCH: ${{ runner.arch }} + run: | + set -euo pipefail + + binary="packages/server-v4/dist/sea/${{ matrix.binary_name }}" + matrix_arch="${{ matrix.arch }}" + runner_arch="$(echo "${RUNNER_ARCH}" | tr '[:upper:]' '[:lower:]')" + + if [[ "${matrix_arch}" != "${runner_arch}" ]]; then + echo "Runner arch (${runner_arch}) does not match matrix arch (${matrix_arch})." + echo "Launch verification must run on same-arch runners." + exit 1 + fi + + if [[ "${{ matrix.platform }}" != "win32" ]]; then + chmod +x "${binary}" + fi + + port="$((30000 + RANDOM % 10000))" + log_file="$(mktemp)" + launched="false" + + cleanup() { + if [[ -n "${pid:-}" ]] && kill -0 "${pid}" 2>/dev/null; then + kill "${pid}" 2>/dev/null || true + wait "${pid}" 2>/dev/null || true + fi + } + trap cleanup EXIT + + PORT="${port}" "${binary}" >"${log_file}" 2>&1 & + pid=$! + + for _ in {1..30}; do + if ! kill -0 "${pid}" 2>/dev/null; then + wait "${pid}" 2>/dev/null || true + echo "SEA binary exited before becoming healthy." + cat "${log_file}" + exit 1 + fi + + if curl --silent --show-error --fail "http://127.0.0.1:${port}/healthz" >/dev/null; then + launched="true" + break + fi + + sleep 1 + done + + if [[ "${launched}" != "true" ]]; then + echo "SEA binary did not become healthy within 30 seconds." + cat "${log_file}" + exit 1 + fi + + - name: Upload artifact + uses: actions/upload-artifact@v4 + if: ${{ inputs.upload-only-binary == '' || matrix.binary_name == inputs.upload-only-binary }} + with: + name: ${{ matrix.binary_name }} + # package.json is included to anchor artifact paths at repo root. + path: | + package.json + packages/server-v4/dist/sea/${{ matrix.binary_name }} + retention-days: 7 diff --git a/.github/workflows/stainless.yml b/.github/workflows/stainless.yml new file mode 100644 index 000000000..0f8d69be8 --- /dev/null +++ b/.github/workflows/stainless.yml @@ -0,0 +1,60 @@ +name: Build SDKs for pull request + +on: + pull_request: + types: + - opened + - synchronize + - reopened + - closed + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number }} + cancel-in-progress: true + +env: + STAINLESS_ORG: ${{ vars.STAINLESS_ORG }} + STAINLESS_PROJECT: ${{ vars.STAINLESS_PROJECT }} + OAS_PATH: packages/server-v3/openapi.v3.yaml + +jobs: + preview: + if: github.event.action != 'closed' + runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: write + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 2 + + - name: Run preview builds + uses: stainless-api/upload-openapi-spec-action/preview@v1 + with: + stainless_api_key: ${{ secrets.STAINLESS_API_KEY }} + org: ${{ env.STAINLESS_ORG }} + project: ${{ env.STAINLESS_PROJECT }} + oas_path: ${{ env.OAS_PATH }} + config_path: stainless.yml + + merge: + if: github.event.action == 'closed' && github.event.pull_request.merged == true && github.event.pull_request.base.ref == 'main' + runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: write + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 2 + - name: Run merge build + uses: stainless-api/upload-openapi-spec-action/merge@v1 + with: + stainless_api_key: ${{ secrets.STAINLESS_API_KEY }} + org: ${{ env.STAINLESS_ORG }} + project: ${{ env.STAINLESS_PROJECT }} + oas_path: ${{ env.OAS_PATH }} + config_path: stainless.yml diff --git a/.gitignore b/.gitignore index e5ea06bbe..2f22d2ad3 100644 --- a/.gitignore +++ b/.gitignore @@ -9,12 +9,23 @@ screenshot.png .env downloads/ dist/ -evals/**/public -lib/dom/build/ -evals/public +.browserbase/ +packages/evals/**/public +packages/core/lib/dom/build/ +packages/core/lib/v3/dom/build/ +packages/evals/public *.tgz evals/playground.ts tmp/ eval-summary.json -pnpm-lock.yaml +package-lock.json evals/deterministic/tests/BrowserContext/tmp-test.har +packages/core/lib/version.ts +packages/core/test-results/ +/examples/inference_summary +/inference_summary +.turbo +.idea +coverage/ +ctrf/ +.stagehand-sea/ diff --git a/.prettierignore b/.prettierignore index 9581fb07d..c4cf4cc5d 100644 --- a/.prettierignore +++ b/.prettierignore @@ -1,3 +1,21 @@ pnpm-lock.yaml README.md -**/*.json \ No newline at end of file +**/*.json +docs/ +.github/ +dist/ +node_modules/ +lib/dom/build/ +lib/v3/dom/build/ +packages/core/dist/ +packages/core/lib/dom/build/ +packages/core/lib/v3/dom/build/ +packages/evals/dist/ +packages/docs/ +*.min.js +.browserbase/ +.browserbase/** +**/.browserbase/ +**/.browserbase/** +stainless.yml +openapi.*.yaml diff --git a/CHANGELOG.md b/CHANGELOG.md index 2b34a6abc..b37ce4d87 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,222 @@ # @browserbasehq/stagehand +## 3.0.0 + +### Major Changes + +- Removes internal Playwright dependency +- A generous 20-40% speed increase across `act`, `extract`, & `observe` calls +- Compatibility with Playwright, Puppeteer, and Patchright +- Automatic action caching (agent, stagehand.act). Go from CUA → deterministic scripts w/o inference +- A suite of non AI primitives: + - `page` + - `locator` (built in closed mode shadow root traversal, with xpaths & css selectors) + - `frameLocator` + - `deepLocator` (crosses iframes & shadow roots) +- bun compatibility +- Simplified extract schemas +- CSS selector support (id-based support coming soon) +- Targeted extract and observe across iframes & shadow roots +- More intuitive type names (observeResult is now action, act accepts an instruction string instead of an action string, solidified ModelConfiguration) + +Check the [migration guide](https://docs.stagehand.dev/v3/migrations/v2) for more information + +## 2.5.0 + +### Minor Changes + +- [#981](https://github.com/browserbase/stagehand/pull/981) [`8244ab2`](https://github.com/browserbase/stagehand/commit/8244ab247cd679962685ae2f7c54e874ce1fa614) Thanks [@sameelarif](https://github.com/sameelarif)! - Added support for `stagehand.agent` to interact with MCP servers as well as custom tools to be passed in. For more information, reference the [MCP integrations documentation](https://docs.stagehand.dev/best-practices/mcp-integrations) + +### Patch Changes + +- [#959](https://github.com/browserbase/stagehand/pull/959) [`09b5e1e`](https://github.com/browserbase/stagehand/commit/09b5e1e9c23c845903686db6665cc968ac34efbb) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - add webvoyager evals + +- [#1049](https://github.com/browserbase/stagehand/pull/1049) [`e3734b9`](https://github.com/browserbase/stagehand/commit/e3734b9c98352d5f0a4eca49791b0bbf2130ab41) Thanks [@miguelg719](https://github.com/miguelg719)! - Support local MCP server connections + +- [#1025](https://github.com/browserbase/stagehand/pull/1025) [`be85b19`](https://github.com/browserbase/stagehand/commit/be85b19679a826f19702e00f0aae72fce1118ec8) Thanks [@tkattkat](https://github.com/tkattkat)! - add support for custom baseUrl within openai provider + +- [#1040](https://github.com/browserbase/stagehand/pull/1040) [`88d1565`](https://github.com/browserbase/stagehand/commit/88d1565c65bb65a104fea2d5f5e862bbbda69677) Thanks [@miguelg719](https://github.com/miguelg719)! - Allow OpenAI CUA to take in an optional baseURL + +- [#1046](https://github.com/browserbase/stagehand/pull/1046) [`ab5d6ed`](https://github.com/browserbase/stagehand/commit/ab5d6ede19aabc059badc4247f1cb2c6c9e71bae) Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for gpt-5 in operator agent + +## 2.4.4 + +### Patch Changes + +- [#1012](https://github.com/browserbase/stagehand/pull/1012) [`9e8c173`](https://github.com/browserbase/stagehand/commit/9e8c17374fdc8fbe7f26e6cf802c36bd14f11039) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix disabling api validation whenever a customLLM client is provided + +## 2.4.3 + +### Patch Changes + +- [#951](https://github.com/browserbase/stagehand/pull/951) [`f45afdc`](https://github.com/browserbase/stagehand/commit/f45afdccc8680650755fee66ffbeac32b41e075d) Thanks [@miguelg719](https://github.com/miguelg719)! - Patch GPT-5 new api format + +- [#954](https://github.com/browserbase/stagehand/pull/954) [`261bba4`](https://github.com/browserbase/stagehand/commit/261bba43fa79ac3af95328e673ef3e9fced3279b) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add support for shadow DOMs (open & closed mode) when experimental: true + +- [#944](https://github.com/browserbase/stagehand/pull/944) [`8de7bd8`](https://github.com/browserbase/stagehand/commit/8de7bd8635c2051cd8025e365c6c8aa83d81c7e7) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - Bump zod version compatibility and add pathing spec + +- [#919](https://github.com/browserbase/stagehand/pull/919) [`3d80421`](https://github.com/browserbase/stagehand/commit/3d804210a106a6828c7fa50f8b765b10afd4cc6a) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - enable scrolling inside of iframes + +- [#963](https://github.com/browserbase/stagehand/pull/963) [`0ead63d`](https://github.com/browserbase/stagehand/commit/0ead63d6526f6c286362b74b6407c8bebc900e69) Thanks [@tkattkat](https://github.com/tkattkat)! - Properly handle images in evaluator + clean up response parsing logic + +- [#961](https://github.com/browserbase/stagehand/pull/961) [`8422828`](https://github.com/browserbase/stagehand/commit/8422828c4cd5fd5ebcf348cfbdb40c768bb76dd9) Thanks [@tkattkat](https://github.com/tkattkat)! - Add more evals for stagehand agent + +- [#946](https://github.com/browserbase/stagehand/pull/946) [`b769206`](https://github.com/browserbase/stagehand/commit/b7692060f98a2f49aeeefb90d8789ed034b08ec2) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: unable to act on/get content from some same process iframes + +- [#962](https://github.com/browserbase/stagehand/pull/962) [`72d2683`](https://github.com/browserbase/stagehand/commit/72d2683202af7e578d98367893964b33e0828de5) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - handle namespaced elements in xpath build step + +## 2.4.2 + +### Patch Changes + +- [#865](https://github.com/browserbase/stagehand/pull/865) [`6b4e6e3`](https://github.com/browserbase/stagehand/commit/6b4e6e3f31d5496cf15728e9018eddeb04839542) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - improve type safety for trimTrailingTextNode + +- [#897](https://github.com/browserbase/stagehand/pull/897) [`e77d018`](https://github.com/browserbase/stagehand/commit/e77d0188683ebf596dfb78dfafbbca1dc32993f0) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix selfHeal to remember intially received arguments + +- [#920](https://github.com/browserbase/stagehand/pull/920) [`c20adb9`](https://github.com/browserbase/stagehand/commit/c20adb95539fed8c56a4aa413262a9c65a8e6474) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: tab handling on API + +- [#882](https://github.com/browserbase/stagehand/pull/882) [`b86df93`](https://github.com/browserbase/stagehand/commit/b86df93b9136aae96292121a29c25f3d74d84bf7) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - remove elements that don't have xpaths from observe response + +- [#905](https://github.com/browserbase/stagehand/pull/905) [`023c2c2`](https://github.com/browserbase/stagehand/commit/023c2c273b46d3792d7e5d3c902089487b16b531) Thanks [@tkattkat](https://github.com/tkattkat)! - Delete old images from anthropic cua client + +- [#925](https://github.com/browserbase/stagehand/pull/925) [`8c28647`](https://github.com/browserbase/stagehand/commit/8c2864755ecd05c8f7de235d4198deec0dd5f78e) Thanks [@miguelg719](https://github.com/miguelg719)! - Remove \_refreshPageFromApi() + +- [#887](https://github.com/browserbase/stagehand/pull/887) [`87e09c6`](https://github.com/browserbase/stagehand/commit/87e09c618940f364ec8af00455a19a17ec63cbd3) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: allow xpaths with prepended 'xpath=' for targeted extract + +- [#864](https://github.com/browserbase/stagehand/pull/864) [`a611115`](https://github.com/browserbase/stagehand/commit/a61111525d70b450bdfc43f112380f44899c9e97) Thanks [@miguelg719](https://github.com/miguelg719)! - Temporarily patch custom clients serialization error on api + +- [#881](https://github.com/browserbase/stagehand/pull/881) [`69913fe`](https://github.com/browserbase/stagehand/commit/69913fe1dfb8201ae2aeffa5f049fb46ab02cbc2) Thanks [@miguelg719](https://github.com/miguelg719)! - Pass sdk version number to API for debugging + +- [#913](https://github.com/browserbase/stagehand/pull/913) [`b1b83a1`](https://github.com/browserbase/stagehand/commit/b1b83a1d334fe76e5f5f9dd32dc92c16b7d40ce6) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move iframe out of 'experimental' + +- [#891](https://github.com/browserbase/stagehand/pull/891) [`be8497c`](https://github.com/browserbase/stagehand/commit/be8497cb6b142cc893cea9692b8c47bd19514c60) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: nested iframe xpath bug + +- [#883](https://github.com/browserbase/stagehand/pull/883) [`98704c9`](https://github.com/browserbase/stagehand/commit/98704c9ed225ca25bbde4bb3dc286936e9c54471) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add timeout for JS click + +- [#907](https://github.com/browserbase/stagehand/pull/907) [`04978bd`](https://github.com/browserbase/stagehand/commit/04978bdd30d2edcbc69eb9fd91358a16975ea2eb) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - store mapping of CDP frame ID -> page + +## 2.4.1 + +### Patch Changes + +- [#856](https://github.com/browserbase/stagehand/pull/856) [`8a43c5a`](https://github.com/browserbase/stagehand/commit/8a43c5a86d4da40cfaedd9cf2e42186928bdf946) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - set download behaviour by default + +- [#857](https://github.com/browserbase/stagehand/pull/857) [`890ffcc`](https://github.com/browserbase/stagehand/commit/890ffccac5e0a60ade64a46eb550c981ffb3e84a) Thanks [@miguelg719](https://github.com/miguelg719)! - return "not-supported" for elements inside the shadow-dom + +- [#844](https://github.com/browserbase/stagehand/pull/844) [`64c1072`](https://github.com/browserbase/stagehand/commit/64c10727bda50470483a3eb175c02842db0923a1) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - don't automatically close tabs + +- [#860](https://github.com/browserbase/stagehand/pull/860) [`b077d3f`](https://github.com/browserbase/stagehand/commit/b077d3f48a97f47a71ccc79ae39b41e7f07f9c04) Thanks [@miguelg719](https://github.com/miguelg719)! - Set default schema on extract options with no schema + +- [#842](https://github.com/browserbase/stagehand/pull/842) [`8bcb5d7`](https://github.com/browserbase/stagehand/commit/8bcb5d77debf6bf7601fd5c090efd7fde75c5d5e) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - improved handling for OS level dropdowns + +- [#846](https://github.com/browserbase/stagehand/pull/846) [`7bf10c5`](https://github.com/browserbase/stagehand/commit/7bf10c55b267078fe847c1d7f7a60d604f9c7c94) Thanks [@miguelg719](https://github.com/miguelg719)! - Filter attaching to target worker / shared_worker + +## 2.4.0 + +### Minor Changes + +- [#819](https://github.com/browserbase/stagehand/pull/819) [`6a18c1e`](https://github.com/browserbase/stagehand/commit/6a18c1ee1e46d55c6e90c4d5572e17ed8daa140c) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - try playwright click and fall back to JS click event + +### Patch Changes + +- [#826](https://github.com/browserbase/stagehand/pull/826) [`124e0d3`](https://github.com/browserbase/stagehand/commit/124e0d3bb54ddb6738ede6d7aa99a945ef1cacd1) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix issue where we are unable to take actions on text nodes + +- [#818](https://github.com/browserbase/stagehand/pull/818) [`1660751`](https://github.com/browserbase/stagehand/commit/1660751cd14cb5b27d44f8167216afb8d1c3c45c) Thanks [@miguelg719](https://github.com/miguelg719)! - Added CUA support for Claude 4 models + +- [#821](https://github.com/browserbase/stagehand/pull/821) [`cadac9d`](https://github.com/browserbase/stagehand/commit/cadac9da09123d12e5d496a0e8b12660964c1b33) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - use playwright instead of playwright test + +- [#832](https://github.com/browserbase/stagehand/pull/832) [`759da55`](https://github.com/browserbase/stagehand/commit/759da55775eb2df81d56ae18c0f386fd9b02a9f0) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix \_refreshPageFromAPI to use parametrized apiKey + +- [#810](https://github.com/browserbase/stagehand/pull/810) [`a175a51`](https://github.com/browserbase/stagehand/commit/a175a519b8c14300db6f1ed30709e113d18e99db) Thanks [@miguelg719](https://github.com/miguelg719)! - Update logos + +- [#822](https://github.com/browserbase/stagehand/pull/822) [`8527a80`](https://github.com/browserbase/stagehand/commit/8527a80522c3eedb9516a6caa1a0e4e4be981a3d) Thanks [@miguelg719](https://github.com/miguelg719)! - Add model with date tag for OpenAI CUA + +- [#833](https://github.com/browserbase/stagehand/pull/833) [`55fca2f`](https://github.com/browserbase/stagehand/commit/55fca2f7da63cc0ef6e27b45a33f63c666cdce7e) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - adjust stagehandLogger.warn() level to be 1 instead of 0 + +## 2.3.1 + +### Patch Changes + +- [#796](https://github.com/browserbase/stagehand/pull/796) [`12a99b3`](https://github.com/browserbase/stagehand/commit/12a99b398d8a4c3eea3ca69a3cf793faaaf4aea3) Thanks [@miguelg719](https://github.com/miguelg719)! - Added a experimental flag to enable the newest and most experimental features + +- [#807](https://github.com/browserbase/stagehand/pull/807) [`2451797`](https://github.com/browserbase/stagehand/commit/2451797f64c0efa4a72fd70265110003c8d0a6cd) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - include version number in StagehandDefaultError message + +- [#803](https://github.com/browserbase/stagehand/pull/803) [`1d631a5`](https://github.com/browserbase/stagehand/commit/1d631a57a197390f672b718ae5199991ab27cfb1) Thanks [@miguelg719](https://github.com/miguelg719)! - Enable session affinity for cache optimization + +- [#804](https://github.com/browserbase/stagehand/pull/804) [`9c398bb`](https://github.com/browserbase/stagehand/commit/9c398bb9ec2d10bdb53ad5aa7e3b58cce24fdb2b) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update operatorResponseSchema based on new openai spec + +- [#786](https://github.com/browserbase/stagehand/pull/786) [`c19ad7f`](https://github.com/browserbase/stagehand/commit/c19ad7f1e082e91fdeaa9c2ef63767a5a2b3a195) Thanks [@miguelg719](https://github.com/miguelg719)! - Handle reroute to account for rollout + +## 2.3.0 + +### Minor Changes + +- [#737](https://github.com/browserbase/stagehand/pull/737) [`6ef6073`](https://github.com/browserbase/stagehand/commit/6ef60730cab0ad9025f44b6eeb2c83751d1dcd35) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - deprecate useTextExtract and remove functionality + +### Patch Changes + +- [#741](https://github.com/browserbase/stagehand/pull/741) [`5680d25`](https://github.com/browserbase/stagehand/commit/5680d2509352c383ad502c9f4fabde01fa638833) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - use safeparse for zod validation + +- [#783](https://github.com/browserbase/stagehand/pull/783) [`4de92a8`](https://github.com/browserbase/stagehand/commit/4de92a8af461fc95063faf39feee1d49259f58ba) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix the readme logo link + +## 2.2.1 + +### Patch Changes + +- [#721](https://github.com/browserbase/stagehand/pull/721) [`be8652e`](https://github.com/browserbase/stagehand/commit/be8652e770b57fdb3299fa0b2efa4eb0e816434e) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix stagehand.close() functionality to include calling browser.close() + +- [#724](https://github.com/browserbase/stagehand/pull/724) [`6b413b7`](https://github.com/browserbase/stagehand/commit/6b413b7ad00b13ca0bd53ee2e7393023821408b6) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - rm refine step in extract + +- [#712](https://github.com/browserbase/stagehand/pull/712) [`7eafbd9`](https://github.com/browserbase/stagehand/commit/7eafbd9b1a73b37effa444929767df7c592caf02) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - deprecated `onlyVisible` param and remove its functionality + +- [#725](https://github.com/browserbase/stagehand/pull/725) [`1b50aa6`](https://github.com/browserbase/stagehand/commit/1b50aa61cf0a429dd6cb2760a08f7f698a50454b) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - dont overwrite .describe() when user defines a zod schema with z.string().url().describe() + +- [#717](https://github.com/browserbase/stagehand/pull/717) [`f2b7f1f`](https://github.com/browserbase/stagehand/commit/f2b7f1f284eef1f96753319b66c7d0b273a6f8cd) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - don't publish uncompiled ts to npm + +- [#719](https://github.com/browserbase/stagehand/pull/719) [`c8d672f`](https://github.com/browserbase/stagehand/commit/c8d672f7c410c256defbc2e87ead99239837aa28) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix `Invalid schema for response_format` error when extracting links + +- [#722](https://github.com/browserbase/stagehand/pull/722) [`bebf204`](https://github.com/browserbase/stagehand/commit/bebf2044502333c694743078c5b0c9deae11fb79) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - replace NBSP with regular space & remove special characters from dom+a11y tree + +- [#714](https://github.com/browserbase/stagehand/pull/714) [`37d6810`](https://github.com/browserbase/stagehand/commit/37d6810a704773d0383a86f98f5f17c7d5b21975) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix the native AI SDK client implementation to optionally take in an API key + +## 2.2.0 + +### Minor Changes + +- [#655](https://github.com/browserbase/stagehand/pull/655) [`8814af9`](https://github.com/browserbase/stagehand/commit/8814af9ece99fddc3dd9fb32671d0513a3a00c67) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - extract links + +- [#675](https://github.com/browserbase/stagehand/pull/675) [`35c55eb`](https://github.com/browserbase/stagehand/commit/35c55ebf6c2867801a0a6f6988a883c8cb90cf9a) Thanks [@tkattkat](https://github.com/tkattkat)! - Added Gemini 2.5 Flash to Google supported models + +- [#668](https://github.com/browserbase/stagehand/pull/668) [`5c6d2cf`](https://github.com/browserbase/stagehand/commit/5c6d2cf89c9fbf198485506ed9ed75e07aec5cd4) Thanks [@miguelg719](https://github.com/miguelg719)! - Added a new class - Stagehand Evaluator - that wraps around a Stagehand object to determine whether a task is successful or not. Currently used for agent evals + +### Patch Changes + +- [#706](https://github.com/browserbase/stagehand/pull/706) [`18ac6fb`](https://github.com/browserbase/stagehand/commit/18ac6fba30f45b7557cecb890f4e84c75de8383c) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - remove unused fillInVariables fn + +- [#692](https://github.com/browserbase/stagehand/pull/692) [`6b95248`](https://github.com/browserbase/stagehand/commit/6b95248d6e02e5304ce4dd60499e31fc42af57eb) Thanks [@miguelg719](https://github.com/miguelg719)! - Updated the list of OpenAI models (4.1, o3...) + +- [#688](https://github.com/browserbase/stagehand/pull/688) [`7d81b3c`](https://github.com/browserbase/stagehand/commit/7d81b3c951c1f3dfc46845aefcc26ff175299bca) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - wrap page.evaluate to make sure we have injected browser side scripts before calling them + +- [#664](https://github.com/browserbase/stagehand/pull/664) [`b5ca00a`](https://github.com/browserbase/stagehand/commit/b5ca00a25ad0c33a5f4d3198e1bc59edb9956e7c) Thanks [@miguelg719](https://github.com/miguelg719)! - remove unnecessary log + +- [#683](https://github.com/browserbase/stagehand/pull/683) [`8f0f97b`](https://github.com/browserbase/stagehand/commit/8f0f97bc491e23ff0078c802aaf509fd04173c37) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - use javsacript click instead of playwright + +- [#705](https://github.com/browserbase/stagehand/pull/705) [`346ef5d`](https://github.com/browserbase/stagehand/commit/346ef5d0132dc1418dac18d26640a8df0435af57) Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed removing a hanging observation map that is no longer used + +- [#698](https://github.com/browserbase/stagehand/pull/698) [`c145bc1`](https://github.com/browserbase/stagehand/commit/c145bc1d90ffd0d71c412de3af1c26c121e0b101) Thanks [@sameelarif](https://github.com/sameelarif)! - Fixing LLM client support to natively integrate with AI SDK + +- [#687](https://github.com/browserbase/stagehand/pull/687) [`edd6d3f`](https://github.com/browserbase/stagehand/commit/edd6d3feb47aac9f312a5edad78bf850ae1541db) Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed the schema input for Gemini's response model + +- [#678](https://github.com/browserbase/stagehand/pull/678) [`5ec43d8`](https://github.com/browserbase/stagehand/commit/5ec43d8b9568c0f86b3e24bd83d1826c837656ed) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - allow form filling when form is not top-most element + +- [#694](https://github.com/browserbase/stagehand/pull/694) [`b8cc164`](https://github.com/browserbase/stagehand/commit/b8cc16405b712064a54c8cd591750368a47f35ea) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add telemetry for cua agents to stagehand.metrics + +- [#699](https://github.com/browserbase/stagehand/pull/699) [`d9f4243`](https://github.com/browserbase/stagehand/commit/d9f4243f6a8c8d4f3003ad6589f7eb4da6d23d0f) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - rm deprecated primitives from stagehand object + +- [#710](https://github.com/browserbase/stagehand/pull/710) [`9f4ab76`](https://github.com/browserbase/stagehand/commit/9f4ab76a0c1f0c2171290765c48c3bcea5b50e0f) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - support targeted extract for domExtract + +- [#677](https://github.com/browserbase/stagehand/pull/677) [`bc5a731`](https://github.com/browserbase/stagehand/commit/bc5a731241f7f4c5040dd672d8e3787555766421) Thanks [@miguelg719](https://github.com/miguelg719)! - Fixes a redundant unnecessary log + ## 2.1.0 ### Minor Changes diff --git a/README.md b/README.md index 788d6c073..56dda5f6d 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,29 @@ -
-