Skip to content

Add AI visual regression tester via GitHub Agentic Workflows#811

Merged
supervoidcoder merged 4 commits intomainfrom
copilot/add-ai-visual-regression-tester
Feb 26, 2026
Merged

Add AI visual regression tester via GitHub Agentic Workflows#811
supervoidcoder merged 4 commits intomainfrom
copilot/add-ai-visual-regression-tester

Conversation

Copy link
Contributor

Copilot AI commented Feb 26, 2026

  • Created .github/workflows/ai-visual-regression.md agentic workflow
  • Uses Playwright MCP tool (not CLI) for browser navigation and screenshots
  • Added "sleep *" and "seq *" to bash allowlist so the health-check polling loop can run
  • Added discussions: false to safe-outputs.add-comment to drop unnecessary discussions: write permission (least privilege)
  • Compiled to .github/workflows/ai-visual-regression.lock.yml (via gh aw compile)
Original prompt

This section details on the original issue you should resolve

<issue_title>AI visual regression tester</issue_title>
<issue_description>clearly the stupid visual reggression tester that is supposed to test new react components and what not is absolutely vibe coded garbage so lets vibe code a less garbagy workflow using AI. in the regular static workflow the logic for traversing the GUI to get the point where it can screenshot the added component/button/ etc. is completely broken. half, if not all of the images that are returned in the comment are always broken. but now github has introduced agentic workflows which allow us to use AI to do workflows. these have an mcp server and when used with vision models they can theoretically see screenshots and know more than theregular static workflow. it can also click through the buttons more intelligenly and look at the diff to see what has been added. However, these cost premium requests for me, so:

  • They should only run on PRs made by members of the org
  • It should not run on draft PRs
  • It should see if the actual UI code even changed at all before going to the agent job
  • Should not run if another one is alredy running
  • </issue_description>

<agent_instructions>make sure to read this discussion (and my comments/questions on it https://github.com/orgs/community/discussions/186451

also make sure to read this https://github.com/github/gh-aw/blob/main/create.md
</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: supervoidcoder <88671013+supervoidcoder@users.noreply.github.com>
Copilot AI changed the title [WIP] Add AI visual regression tester for React components Add AI visual regression tester via GitHub Agentic Workflows Feb 26, 2026
@supervoidcoder
Copy link
Member

@copilot are you stupid why do you think i sent you that discussion link ... sighh look at this thread:

[supervoidcoder](https://github.com/supervoidcoder)
[2 weeks ago](https://github.com/orgs/community/discussions/186451#discussioncomment-15807315)
I wish it could use a free model so it wouldn’t use up a whole premium request (I’m not sure if it does, but I see it uses Claude sonnet 4.5 which usually takes up 1x request, so correct me if I’m wrong) for repetitive small tasks. For example the CI Doctor, we run tests on every commit. Would the only solution be to just make tests run weekly?

Also, I wish copilot agent could run on windows runners too. This way stuff like the cli test tool could run on my projects like [win-witr](https://github.com/supervoidcoder/win-witr) and other cli tools which are very often Windows only since I know Windows best.

5 replies
@[pelikhan](https://github.com/pelikhan)
pelikhan
[4 days ago](https://github.com/orgs/community/discussions/186451#discussioncomment-15890139)
You can use any model available in your subscription by using the engine.model front matter field.

@[pelikhan](https://github.com/pelikhan)
pelikhan
[4 days ago](https://github.com/orgs/community/discussions/186451#discussioncomment-15890147)
You can run your tests on a windows runner in a job, store the artifacts, then analyze them in the agent job.

Custom jobs can be added under the jobs with the usual actions syntax.

@[supervoidcoder](https://github.com/supervoidcoder)
supervoidcoder
[2 days ago](https://github.com/orgs/community/discussions/186451#discussioncomment-15915159)
Wait, @pelikhan , I've got another cool idea. I heard that it has playwright on the MCP right? Well, if I use a vision model, like Gemini 3 (which should be allowed since I saw it it in the VS Code model picker with my github copilot pro subscription) then it can be able to SEE my web app, right? That would be really cool. Now I can get it to say if my PR broke a button or mangled the UI. I've been using regular github actions using pixel match, and it keeps yelling at me half the time due to a 0.01% change which happens to be a stray pixel in either screenshot 😆

@[eaftan](https://github.com/eaftan)
eaftan
[14 hours ago](https://github.com/orgs/community/discussions/186451#discussioncomment-15928633)
Yeah, you can definitely do that, and we do similar things ourselves!

https://github.github.com/gh-aw/reference/tools/#playwright-tool-playwright
https://github.com/github/gh-aw/blob/main/.github/workflows/unbloat-docs.md

@[supervoidcoder](https://github.com/supervoidcoder)
supervoidcoder
[14 hours ago](https://github.com/orgs/community/discussions/186451#discussioncomment-15928815)
Yeah, you can definitely do that, and we do similar things ourselves!

https://github.github.com/gh-aw/reference/tools/#playwright-tool-playwright
https://github.com/github/gh-aw/blob/main/.github/workflows/unbloat-docs.md

Yay, thanks!

so look at the unbloat docs md. you will see that AGENTIC WORKFLOWS ALREADY COMES WITH PLAYWRIGHT TOOL AND IT SHOULDN'T USE CLI

Co-authored-by: supervoidcoder <88671013+supervoidcoder@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 26, 2026

@copilot are you stupid why do you think i sent you that discussion link ... sighh look at this thread:

...

Fixed in d4e42cf. The workflow now uses the Playwright MCP tool directly — the agent calls Playwright to navigate URLs and take screenshots (saved automatically to /tmp/gh-aw/mcp-logs/playwright/). Bash is only used for npm build and npx http-server to serve the app. No npx playwright CLI anywhere.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • github.github.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)
  • https://api.github.com/repos/actions/checkout/git/ref/tags/v4
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v4 --jq .object.sha (http block)
  • https://api.github.com/repos/actions/github-script/git/ref/tags/v8
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha (http block)
  • https://api.github.com/repos/github/gh-aw/git/ref/tags/v0.50.4
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v0.50.4 --jq .object.sha (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Copy link
Member

@supervoidcoder supervoidcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@supervoidcoder supervoidcoder marked this pull request as ready for review February 26, 2026 16:35
Copilot AI review requested due to automatic review settings February 26, 2026 16:35
@github-actions
Copy link
Contributor

github-actions bot commented Feb 26, 2026

🤖 Auto-Component Test Results

🎉 No new UI components detected!

This PR doesn't add any new buttons, menus, or clickable components.


Triggered by Copilot • Commit d4e42cf

@github-actions
Copy link
Contributor

📝 Mega PR Test Skipped

Hi @Copilot! The comprehensive Mega PR test was skipped because this PR only changes documentation or configuration files (.md, .txt, .yml, .yaml).

To force the Mega PR test to run, comment /test pr on this PR.

@github-actions
Copy link
Contributor

🎬 Visual Regression Testing - Skipped

Hi @Copilot! VRT was skipped because this PR only changes documentation or configuration files (.md, .txt, .yml).

To force VRT to run, comment /vrt on this PR.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new GitHub Agentic Workflow to perform AI-assisted visual regression checks on UI-related pull requests, replacing the existing brittle static visual regression approach.

Changes:

  • Introduces a human-editable gh-aw workflow source that builds the app, drives UI navigation via Playwright MCP, and reports screenshot comparisons.
  • Adds the compiled .lock.yml workflow that implements gating (org roles, non-draft, concurrency) and runs the Copilot-based agent with safe outputs and cache-memory baseline persistence.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
.github/workflows/ai-visual-regression.md Defines the agentic workflow frontmatter (gating, tool allowlist, cache-memory) and the agent prompt (diff analysis → build → serve → screenshot → baseline compare → report).
.github/workflows/ai-visual-regression.lock.yml Compiled GitHub Actions workflow generated by gh aw compile, including activation/pre-activation gating, agent execution, safe-outputs comment posting, and cache-memory persistence.

- "ls *"
- "echo *"
- "kill *"
- "curl *"
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt’s health-check loop uses seq and sleep, but those commands aren’t allowed in the tools.bash allowlist. As-is, the agent will likely be blocked when following Step 3. Either add "sleep *" (and "seq *", if keeping this exact loop) to tools.bash, or change the polling example to avoid those commands (e.g., brace expansion plus a Playwright/Node-based wait).

Suggested change
- "curl *"
- "curl *"
- "sleep *"
- "seq *"

Copilot uses AI. Check for mistakes.
safe-outputs:
add-comment:
max: 1
hide-older-comments: true
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safe-outputs add-comment tool causes the compiled workflow to request discussions: write permission, which appears unnecessary for a PR-only reporter. Consider setting safe-outputs.add-comment.discussions: false (per the tool’s documented option) so the workflow runs with least privilege while still being able to comment on PRs/issues.

Suggested change
hide-older-comments: true
hide-older-comments: true
discussions: false

Copilot uses AI. Check for mistakes.
const allowedExtensions = [".png",".json"];
const result = validateMemoryFiles('/tmp/gh-aw/cache-memory', 'cache', allowedExtensions);
if (!result.valid) {
core.setFailed(`File type validation failed: Found $${result.invalidFiles.length} file(s) with invalid extensions. Only .png, .json are allowed.`);
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor bug in the failure message template: the string interpolation uses $${result.invalidFiles.length} which will render an extra $ (e.g., Found $3 file(s) ...). This should be ${result.invalidFiles.length} like the similar step later in the file, otherwise the error output is confusing when file type validation fails.

Suggested change
core.setFailed(`File type validation failed: Found $${result.invalidFiles.length} file(s) with invalid extensions. Only .png, .json are allowed.`);
core.setFailed(`File type validation failed: Found ${result.invalidFiles.length} file(s) with invalid extensions. Only .png, .json are allowed.`);

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're not supposed to edit the compiled yml

@supervoidcoder
Copy link
Member

@copilot apply changes based on the comments in this thread

(don't edit the compiled file directly) don't forget to recompile

…ermission

Co-authored-by: supervoidcoder <88671013+supervoidcoder@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 26, 2026

@copilot apply changes based on the comments in this thread

(don't edit the compiled file directly) don't forget t...

Done in 7e2cdcb:

  • Added "sleep *" and "seq *" to the bash allowlist so the health-check polling loop works
  • Added discussions: false to safe-outputs.add-comment to drop the unnecessary discussions: write permission
  • Recompiled the lockfile

Skipped the $${...} fix in the lockfile per the comment thread — that's generated code.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/actions/checkout/git/ref/tags/v4
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v4 --jq .object.sha (http block)
  • https://api.github.com/repos/actions/github-script/git/ref/tags/v8
    • Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq .object.sha (http block)
  • https://api.github.com/repos/github/gh-aw/git/ref/tags/v0.50.5
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v0.50.5 --jq .object.sha (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

AI visual regression tester

3 participants