feat: remote and local scanning tools#189
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
7681be7 to
bcd566a
Compare
c0ca947 to
53568f2
Compare
67cba73 to
a31f7ad
Compare
da05b04 to
e0dc143
Compare
| @mcp.tool() | ||
| @with_tool_span() | ||
| async def semgrep_scan( | ||
| async def semgrep_scan_core( |
There was a problem hiding this comment.
should we keep @with_tool_span() here?
There was a problem hiding this comment.
In my opinion, no. I only had with_tool_span on _rpc and _cli because I wanted to differentiate the two paths. This is the "common path", and both of its callers (semgrep_scan and semgrep_scan_remote) have tool spans on them already, meaning I don't think this one is particular informative.
It looks like:
semgrep_scan semgrep_scan_remote
\ /
\ /
semgrep_scan_core
/
/
semgrep_scan_rpc semgrep_scan_cli
god damn it it doesn't format right in graphite
|
ah I know what this is, I accidentally moved |
|
latest commit should fix that |
|
wait which commit i am not seeing a new commit come in |
missed it, I have a local branch named differently and I invoked the wrong command. there now |
| content = json.loads(results.content[0].text) # type: ignore | ||
| assert isinstance(content, dict) | ||
| assert len(content["paths"]["scanned"]) == 1 | ||
| assert content["paths"]["scanned"][0].startswith("hello_world") |
There was a problem hiding this comment.
i ran this test and i failed with the following error:
FAILED tests/integration/test_local_scan.py::test_local_scan - AttributeError("'dict' object has no attribute 'startswith'") [single exception in Excep...
i printed out content["paths"]["scanned"] and it looks like this [{'value': '/var/folders/v1/c709tgfj1rl5f7vchd_gqwmh0000gn/T/hello_worldgwb69cub.py7052-e0c40a.py'}], so maybe we need to extract the value out first and also look for hello_world in the string instead?
There was a problem hiding this comment.
I don't seem to see the same: I have that content is
contents {'version': '1.135.0', 'results': [], 'errors': [], 'paths': {'scanned': ['hello_world_iyj_wbm.py']}, 'skipped_rules': []}
when working locally. How are you running this code? I am just doing pipenv run pytest -vv -k "test_local_scan"
There was a problem hiding this comment.
oh i did uv run pytest, what's the difference between the two? also, i tried the command you ran and it passes now
There was a problem hiding this comment.
They shouldn't really be different--the two commands just dictate whether the virtualenv should be managed by pipenv or uv. I was able to get it to pass locally with uv run pytest also.
There was a problem hiding this comment.
i just retried uv run pytest and it passes now. but i also went back to check when i got the error and i ran the exact same command on this branch. i don't know why this is happening. but i think it should be fine to merge!
There was a problem hiding this comment.
maybe it is similar to how when i did uv run mcp -v ... it didn't reflect your changes, and only worked when i did uv run python ...?


What:
This PR makes it so there are separate tools for remote and local scanning.
Why:
The type signatures for both tools are different--in the remote case, we want the agent to send over the contents of the files (at least for now, before we set up the middleware solution). In the local case, we only want the agent to send the file paths--this will help with latency, since in the local case the server can simply read off the file contents from the filesystem.
We could have the agent itself determine if it is in a hosted environment, and based off this adjust the arguments it gives, but it's generally not a good idea to put more work on the agent. It's easier if we just ask for different things in both cases, hence why we have made two tools.
How:
This PR makes it so that only one tool exists at a time,
semgrep_scan_remotein the hosted case, andsemgrep_scanin the local case. Note that the former is only useful so long as we still havemcp.semgrep.ai.Test plan:
I verified that when connecting to a local server without the env var, I get
semgrep_scan. When I setSEMGREP_IS_HOSTED, I get onlysemgrep_scan_remote.