Skip to content
This repository was archived by the owner on Oct 28, 2025. It is now read-only.

feat: remote and local scanning tools#189

Merged
brandonspark merged 17 commits intomainfrom
brandon/two-ways-to-scan
Sep 15, 2025
Merged

feat: remote and local scanning tools#189
brandonspark merged 17 commits intomainfrom
brandon/two-ways-to-scan

Conversation

@brandonspark
Copy link
Collaborator

@brandonspark brandonspark commented Sep 11, 2025

What:

This PR makes it so there are separate tools for remote and local scanning.

Why:

The type signatures for both tools are different--in the remote case, we want the agent to send over the contents of the files (at least for now, before we set up the middleware solution). In the local case, we only want the agent to send the file paths--this will help with latency, since in the local case the server can simply read off the file contents from the filesystem.

We could have the agent itself determine if it is in a hosted environment, and based off this adjust the arguments it gives, but it's generally not a good idea to put more work on the agent. It's easier if we just ask for different things in both cases, hence why we have made two tools.

How:

This PR makes it so that only one tool exists at a time, semgrep_scan_remote in the hosted case, and semgrep_scan in the local case. Note that the former is only useful so long as we still have mcp.semgrep.ai.

Test plan:

I verified that when connecting to a local server without the env var, I get semgrep_scan. When I set SEMGREP_IS_HOSTED, I get only semgrep_scan_remote.

@brandonspark brandonspark changed the title bifurcate into two scanning tools feat: remote and local scanning tools Sep 11, 2025
Copy link
Collaborator Author

brandonspark commented Sep 11, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

Base automatically changed from brandon/delete-security-check to main September 11, 2025 21:16
@brandonspark brandonspark force-pushed the brandon/two-ways-to-scan branch from 7681be7 to bcd566a Compare September 12, 2025 17:48
@brandonspark brandonspark marked this pull request as ready for review September 12, 2025 17:57
@brandonspark brandonspark force-pushed the brandon/two-ways-to-scan branch from c0ca947 to 53568f2 Compare September 12, 2025 18:35
@brandonspark brandonspark force-pushed the brandon/two-ways-to-scan branch from 67cba73 to a31f7ad Compare September 12, 2025 18:43
@brandonspark brandonspark force-pushed the brandon/two-ways-to-scan branch from da05b04 to e0dc143 Compare September 12, 2025 18:52
@mcp.tool()
@with_tool_span()
async def semgrep_scan(
async def semgrep_scan_core(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we keep @with_tool_span() here?

Copy link
Collaborator Author

@brandonspark brandonspark Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, no. I only had with_tool_span on _rpc and _cli because I wanted to differentiate the two paths. This is the "common path", and both of its callers (semgrep_scan and semgrep_scan_remote) have tool spans on them already, meaning I don't think this one is particular informative.

It looks like:

semgrep_scan semgrep_scan_remote
\ /
\ /
semgrep_scan_core
/
/
semgrep_scan_rpc semgrep_scan_cli

god damn it it doesn't format right in graphite

@liukatkat
Copy link
Collaborator

i keep on running into this issue of the agent calling the tool correctly but somehow it says it is passing in the config parameter:
image

i tried printing out the config and it looks like an empty string? this has never happened before this change.

Copy link
Collaborator Author

ah I know what this is, I accidentally moved validate_config earlier on in the logic than it should be.

Copy link
Collaborator Author

latest commit should fix that

@liukatkat
Copy link
Collaborator

wait which commit i am not seeing a new commit come in

@brandonspark
Copy link
Collaborator Author

wait which commit i am not seeing a new commit come in

missed it, I have a local branch named differently and I invoked the wrong command. there now

Copy link
Collaborator

@liukatkat liukatkat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes look good to me! i also tested remote scanning and local scanning (with and without rpc) and they all worked! the only thing that i think might be worth taking a look at is the failing test. i made a comment in test_local_scan.py!

content = json.loads(results.content[0].text) # type: ignore
assert isinstance(content, dict)
assert len(content["paths"]["scanned"]) == 1
assert content["paths"]["scanned"][0].startswith("hello_world")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i ran this test and i failed with the following error:

FAILED tests/integration/test_local_scan.py::test_local_scan - AttributeError("'dict' object has no attribute 'startswith'") [single exception in Excep...

i printed out content["paths"]["scanned"] and it looks like this [{'value': '/var/folders/v1/c709tgfj1rl5f7vchd_gqwmh0000gn/T/hello_worldgwb69cub.py7052-e0c40a.py'}], so maybe we need to extract the value out first and also look for hello_world in the string instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't seem to see the same: I have that content is

contents {'version': '1.135.0', 'results': [], 'errors': [], 'paths': {'scanned': ['hello_world_iyj_wbm.py']}, 'skipped_rules': []}

when working locally. How are you running this code? I am just doing pipenv run pytest -vv -k "test_local_scan"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i did uv run pytest, what's the difference between the two? also, i tried the command you ran and it passes now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They shouldn't really be different--the two commands just dictate whether the virtualenv should be managed by pipenv or uv. I was able to get it to pass locally with uv run pytest also.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just retried uv run pytest and it passes now. but i also went back to check when i got the error and i ran the exact same command on this branch. i don't know why this is happening. but i think it should be fine to merge!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is similar to how when i did uv run mcp -v ... it didn't reflect your changes, and only worked when i did uv run python ...?

@brandonspark brandonspark merged commit 2f39e8e into main Sep 15, 2025
13 checks passed
@brandonspark brandonspark deleted the brandon/two-ways-to-scan branch September 15, 2025 18:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants