Skip to content

feat: Automate eval tests setup#3828

Open
re-pixel wants to merge 4 commits intosuperplanehq:mainfrom
re-pixel:feat/eval-automation
Open

feat: Automate eval tests setup#3828
re-pixel wants to merge 4 commits intosuperplanehq:mainfrom
re-pixel:feat/eval-automation

Conversation

@re-pixel
Copy link
Copy Markdown
Collaborator

What changed

Automates local agent eval setup: a bootstrap script talks to the running SuperPlane app (owner setup or password login), ensures an eval canvas and service account, then prints or merges EVAL_ORG_ID, EVAL_CANVAS_ID, SUPERPLANE_API_TOKEN, and SUPERPLANE_BASE_URL into agent/.env. Adds make agent.evals.bootstrap, httpx for the script, and updates agent/.env.example / agent/README.md.

How

  • New agent/scripts/bootstrap_eval_env.py (cookie session + public HTTP API).
  • Makefile runs it in the agent container with host-owned user, UV_CACHE_DIR and UV_PROJECT_ENVIRONMENT under /tmp so uv sync does not fight a root-owned bind-mounted .env / .venv.

Related Issues

#3666

@superplanehq-integration
Copy link
Copy Markdown

👋 Commands for maintainers:

  • /sp start - Start an ephemeral machine (takes ~30s)
  • /sp stop - Stop a running machine (auto-executed on pr close)

@re-pixel re-pixel requested a review from shiroyasha March 30, 2026 15:21
Comment thread agent/scripts/bootstrap_eval_env.py
@re-pixel re-pixel force-pushed the feat/eval-automation branch from 2e0f53d to e55ee5b Compare March 30, 2026 15:36
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread db/structure.sql
CREATE TABLE public.casbin_rule (
id integer NOT NULL,
ptype character varying(100) NOT NULL,
ptype character varying(100),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casbin ptype NOT NULL constraint accidentally removed

Medium Severity

The casbin_rule.ptype column changed from NOT NULL to nullable, and a new unique index idx_casbin_rule was added on all-nullable columns. These changes are unrelated to eval test automation and likely a side effect of the gorm-adapter's AutoMigrate being captured in a schema dump. The original migration explicitly defined ptype VARCHAR(100) NOT NULL. Removing NOT NULL on ptype weakens constraints on the authorization table. Additionally, the new unique index on (ptype, v0, v1, v2, v3, v4, v5) is largely ineffective because PostgreSQL treats NULLs as distinct in B-tree unique indexes — since v0v5 are commonly NULL, duplicate Casbin rules won't be prevented in most cases.

Additional Locations (1)
Fix in Cursor Fix in Web

@shiroyasha
Copy link
Copy Markdown
Collaborator

shiroyasha commented Mar 30, 2026

@re-pixel to bootstrap, I should just run make agent.evals.bootstrap, or I have to do something else as well?

Comment thread agent/README.md Outdated
Comment thread Makefile Outdated
@re-pixel
Copy link
Copy Markdown
Collaborator Author

@re-pixel to bootstrap, I should just run make agent.evals.bootstrap, or I have to do something else as well?

you have to have the app running, run the make agent.evals.bootstrap, paste variables it printed to .env and you are set to run evals. this is stated more clearly now in the README

Signed-off-by: re-pixel <relja.brdar@gmail.com>
@re-pixel re-pixel force-pushed the feat/eval-automation branch from 228efd7 to 714f2a5 Compare March 30, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants